My data set is composed of state of the union addresses from 1989 – 2018. I felt that this would be tricky because it means that there are certain topics that will always be addressed. The president or whoever is giving the speech would have to talk about things like the budget, war, education, America, etc.  I was afraid that because of this more or less the data would look the same. My prediction was not totally correct, the PVA showed me the most frequently used words as American, people, new, year, work, make, world. It also showed me the proximity these words had to other words and to each other. It was interesting to note that at 125 terms, the word people and health were on top of each other in PCA # 1. Which is to mean that whenever the American people were being talked about, their health was also being mentioned.

Also in the graphs the word “tonight” kept appearing, I was confused as to why this was a frequently used word, but then I realized that the word “congress” appears just as many times and is relatively close to the word “tonight” on the correspondence analysis #1 graph, using 160 terms.  It made sense that this words would be so close to each other considering that the speaker of the state of the union address would be speaking to the house of representatives during the speech. This helped me realize that this tool in Voyant could be useful in finding the audience of text. This address is not only for the people watching at home but also for the representatives.  I had been having issues with finding the usefulness in Voyant, and understanding how to read the data.

Next I looked at another PCA, this time at 122 terms. In PCA #2 the words government, social, security, freedom, and Iraq were all clustering together. I assume that this cluster is about the war in Iraq. Possibly it could be referring to the government needing to preserve our security and freedom in the war against Iraq.  I found it interesting that these were clustering together, also clustering together in a prior graph was weapons and freedom.

The words work, make, together, time, challenge also appear together in a cluster, this makes me thing that the topic of discussion here is something relating to coming together to make something or work together towards something. I at first had thought it had something to do with jobs but jobs do not cluster together together with these words. Instead jobs it more towards the bottom of PCA # 2 and it clusters together with energy, businesses, homes. This is the cluster that is concerned with jobs, and its looking to businesses and energy for these jobs. I do not know where home comes in but this is my guess for now as to what it could mean. Maybe homes could be the concern for both a home and jobs in the economy.

Something I found particularly interesting was that the term education was not among the big terms. It had been used a number of 163 times. I found it interesting that education wouldn’t be a more commonly used term considering children had been used 307 times. It makes me think that maybe the state of the union address doesn’t particularly concern itself with the education of children.

Lastly I looked at document similarity, on the Y-axis they were all spread out everywhere. It looks like the y-axis pertains to the time period of these speeches. They were not perfectly organized by the time period but the older presidents were towards the bottom and the newer presidents were higher up. Obama was at the very top of the graph. On the x-axis the date grouped itself mostly to the right. With the exception of Obama’s 2010 speech, Bush’s 2007 and 2008 speech, and Trump’s 2019 speech, they are all more towards the right of the graph. Even more interesting is that Obama’s 2010 speech and Trump’s 2019 speech are in the same quadrant. This leads me to think that there most be something similar in these speeches.  As to what could be the component for the x-axis I could not decipher that.

Overall I observed that the older president’s like Clinton, George W. Bush and George H.W. Bush clustered close together in the document similarity graph. It could have to do with the fact that most of them were written pretty close to each other and that the war on terror had not begun yet.

<!– Exported from Voyant Tools (voyant-tools.org).
The iframe src attribute below uses a relative protocol to better function with both
http and https sites, but if you’re embedding this into a local web page (file protocol)
you should add an explicit protocol (https if you’re using voyant-tools.org, otherwise
it depends on this server.
Feel free to change the height and width values or other styling below: –>
<iframe style=’width: 100%; height: 800px;’ src=’//voyant-tools.org/?panels=cirrus%2Creader%2Ctrends%2Csummary%2Ccontexts&corpus=1ccac31e8c66f4cdbd202c19b224b7b2′></iframe>

One Reply to “PCA”

  1. I see what you mean about your texts perhaps being too homogenous, but that could be a strength if you do see difference, since you’re working in a fairly well-defined genre.

    I’m curious about how we can interpret your two principal components. Did you try adding stop words back in? It looks like a handful of Obama speeches really tilt the y-axis, and then only 4 other speeches pushing the x-axis. With those taken out I wonder what you would see?

Comments are closed.