PCA

Principal Component Analysis is a process in which you’re condensing multiple features into principal components that symbolize, but not perfectly, the amount of distinction in each data.¬† PCA is important because it’s picking out the most important analysis. Jockers said “Generally speaking these methods involves “training” a machine to recognize a particular author’s feature-usage patterns and then allowing the machine to classify a new text¬† according to how well it “matches” or is similar to the training data” (Jockers 67), this is significant because it helped me become more familiar when viewing the table. My table consisted of terms that were related love, and religion. One of the most frequently used words was God with a total of 5,218 words, which doesn’t surprise me because Shakespeare was highly influenced by God and faith during this time period. In the table you can see the most variance in the terms God, gutenberg, tm, and works. I think they’re in principal component 2, because of their distance, whereas the terms believe, soul, years, house, etc., are in Principal Component 1. I was shocked to see that the terms lord, father, and king weren’t as closed to how many times the word God has been used. I found it fascinating to see the farthest distance that was in the table was between gutenberg, and the term said. This is revealing that they’re the most opposite from each other.

One Reply to “PCA”

  1. I think you may have only done this with stopwords turned on? Don’t forget that this approach, while fine, is also excluding the actual most frequent words.

    I’m guessing the word “gutenberg” is so prominent since it’s part of the boilerplate language for most of your corpus? “TM” and work/works are also likely in the same group, and should be removed to help better reveal your results.

    Remember, we’re not necessarily seeing a word’s “uniquess” so much as it’s ability to explain the variance among our corpus. That might be especially important for making sense of the y-axis.

    I was hoping to get more explanation of the relationship of these words to the texts in your corpus!

Comments are closed.