The project was overall frustrating because I felt the other projects I honestly learned more interesting facts from. In my Network Analysis I noticed that it is really split up into three groups. On the right what I see is Africa and Italy being close together. That was not much of a surprise due to other analysis’ Italy and Africa were always pretty much close together. Going towards the middle we see that Italy has one of the biggest circles. This means that it has most relations with other works. It has 7 edges Which is the most out of all of them. Going towards the left we pretty much see al of Canada together and a little Australia. All of the Canada circles are pretty small in comparison to the other circles on the analysis. Both Italy and Africa have larger circles than the other countries which I felt was interesting.
When I first enrolled in this class I was super nervous. I did not come in with any basic computer skills and I did not understand how to follow along most of the time during class. I became frustrated at myself especially because eI am due to graduate in May. What I noticed is that the only way someone can learn a new skill is to remain patient and, honest with themselves. I think pretty much everyone in the class knew I do not know how to work a Mac and, most were ahead of me in that particular concept. But, as I listened more I was not the only one getting upset and frustrated. A lot of other students felt the same way I felt and they had the computer skills, well more than I did. So I became calm. I think it is really great that this class has tables as opposed to desks because it makes it easier to follow along when you have a group with you because, the professor is just one person and it can be difficult for one person to answer every single question and, also teach class. I often read other classmate’s projects and we all had completely different topics and, I enjoyed reading about other projects and, what they found during the analysis. It began to speak my interest more and more as I kept learning, I did enjoy the class and, I felt it was a very nice way to end the semester.
In terms of the syllabi project, I thought it was interesting the intitial concept of the project. My group and I were supposed to do Oppose. We found it very difficult because no matter what texts we put for the set whenever, we put a test set the image would never come up. It was frustrating for me because it honestly really did seem interesting since, I am familiar with most of these classes or, classes that are similar. I felt it was a good project and, should definitely be done in any future classes with maybe more time to meet with their fellow classmates in their group to get the project done.
Completing this project was very difficult for me to grasp at first. I ended up needing to get assistance from fellow classmates. What I ended up doing was comparing Africa and Europe together, utilizing the countries Australia and Italy. Canada was utilized to represent my test set however, I am not too sure why it came out in two colors (black and yellow ) and I tried multiple times after to fix. it on my laptop at home but, couldn’t. What I noticed was exactly what I thought would happen! Canada was exactly in the middle of both Europe and Africa. I find that super interesting and, I think it was really cool how it worked out like that because that is what I was really hoping for. My primary set consisted of African texts. My preferred words therefore, were of African texts. The preferred list starts off with a word that has no meaning to the analysis, “although”. As I continue to read down the preferred list we see more words that make sense. These words include black, white, African, men, ca pain and lieutenant which all are for people. We go into location when we see the words, ship, deck, boat. We get the theme of violence with the terms savages, gun, spears. What I found interesting which I really wish I understood why these words came up is that a lot of the words on this list also has to do with numbers. For instance, hundred, yards, number, five, three, six , minutes. What I understood from this is that numbers could have been of importance and, the measurement of numbers. Measurement such as, yard and, then minutes. In the secondary set it consisted of the European texts and started off with words like herself. The next few lines had a connection with a female character(s). These words include Mrs, lady, madame, Emily, girl. This I found interesting because going back to my PCA I remember a lot of adjectives coming up that had to do with females and also, the name Emily came up as well. What I also noticed in the avoided words emotions come up a lot. Which is pretty funny because women are usually connected and associated with emotions over men. These words include emotion, melancholy, kindness, happiness. Also, location is mentioned as well with the terms apartment, castle, Italy, Florence, and street. A theme that I noticed in the European texts is the theme of Art. We see the words art, music and passion. In comparison men are more prominent in the preferred list otherwise known as the African texts and, women are more prominent in European texts. With that comes words that are more associated with either sex. And really it shows what the continents value most. Africa values more to talk about the violence they have experienced historically over speaking about emotions and beauty that is more associated with Europe. It’s not really such coincidence if you think about it.
English 391 w
Voyant Tools is used to analyze texts. It can be used to analyze online texts or ones uploaded by users. Utilizing this website I was able to get a much better understanding as opposed to last project of my texts that I have chosen for my corpus. In voyant tools we utilized the Principal Component Analysis which is used on Voyant to optimize the data . There are three forms of Principal Component Analysis. The three forms are Correspondence Analysis, Document Similarity, and Scatter Plot. My corpus consists of fictional texts from different countries and continents. In the previous project I really was not able to understand exactly why certain topics were picked. By using the Correspondence Analysis on Voyant, I was able to understand not only my texts but, my previous project better.This tool displays the results of a statistical analysis using a scatter plot visualization. There are two types of analysis available: Principal Component Analysis and Correspondence Analysis. Scatter Plot is used to show the relation of the words used in a corpus. This visualization provides a statistical analysis that takes the word’s relation from each document. Each document ids used to show a dimension. Correspondence Analysishandles the data in such a way that both the rows and columns are analyzed. This means that given a table of word frequencies, both the words themselves and the document segments will be showed in the picture.
In this first picture we see the countries in light blue and, terms in dark blue. Towards the left of the x- axis we see Canada, Australia, Africa, and Italy. Italy is more towards the top of the y-axis. The only Italian author lower in the y-axis is Davis and, they are close to Canada. Africa seems to be all close together hover, it is a little close to Australia. Towards the right of the y-axis there is one African text and Canadian text that are pretty much the outsiders. Ine the second picture we are able to see the words come and came closer together towards the center of the visualization. The word “said” all the way to the left has a bigger circle compared to the other words displayed so, that can only mean to me that it is a very common word used.
I also used the same corpus and completed the Document Similarity offered on Voyant. Document Similarity is essentially the same as Correspondence Analysis, but terms aren’t shown in the graph.
In the Document Similarity I see only words but, no text names. Towards the left I see more speech words; showing me there is more dialogue. Towards the right I see more sight words. I see a similarity in the y-axis but, not in the x- axis.
In the bottom of the visualization I see more references to people. I see the name “anne” which was also one of the words I saw in the last project, Topic Modeling Tool. I see Miss, Mrs, Mr,, men. On the right of the top of the y- axis and right of the x-axis, I see references to the body. Words such as eyes, face, hand , head. And also close to those descriptive words I see actions that can be completed with these body parts. For example, heard, looked, saw, think. In the middle I see references to location with the term place. Terms like day, night, moment, went, going, work, came. Can all be associated with a location and a person in relationship to a location. I see also regular terms that have a relationship with one another. Words like, right, good, great; think, know. I still see the words said all the way to itself on the left of the x-axis and in towards the bottom on the y-axis. Also, in the same color as the previous visualization dark blue. I was able to get an idea what type of dialogueis being used in these texts and, what were the priorities of the texts. What I think of as the priority is descriptions of the characters, location and setting. It was able to help me understand how similar all these countries actually are.
Project #3– Topic Modeling and Visualization
March 28, 2019
In machine learning and natural language processing, a topic model is a type of statistical model for discovering the abstract “topics” that occur in a collection of documents. Topic modeling is a frequently used text-mining tool for discovery of hidden semantic structures in a text body. Intuitively, given that a document is about a particular topic, one would expect particular words to appear in the document more or less frequently. The “topics” produced by topic modeling techniques are clusters of similar words. A topic model captures this intuition in a mathematical framework, which allows examining a set of documents and discovering, based on the statistics of the words in each, what the topics might be and what each document’s balance of topics is.
It was a little difficult for me utilizing this tool towards my corpus. My corpus is Comparing Countries and some of the texts are written in different languages. Therefore, some of the topics were words in different languages that I could not understand. I immediately thought to google the translations. Some words I was able to get a translation and, others I was not. Some words I received translations for and they were for languages that I would not think that I would get such as, Arabic. When I look and read the list of Topics I am able to understand how some of the topics and, words may be grouped together but, other time I really am not. I’m not sure if honestly my corpus is the best example to use for Topic Modeling since it is involving other languages.
However, overall looking at the topics that were provided I can understand why these were chosen since the genre I chose was Fiction for all countries. For example some words that I would think are more common in fiction texts are; Madame, Don, King, and love. Some words made sense that they were grouped together. For instance on one group there were words like; Anne, Mrs., miss, Mr., Marilla (Spanish version of her), Diana. So these are all words that say to me we are talking about people, mostly woman. But then in that exact same group also that’s in there is the word “good.” That was honestly driving me crazy. I did not understand how the word good can be in the same group as these other words.
I also tried to use other graphs that were offered on the website. I stuck with the pie chart option because honestly it’s the easiest for me to understand. Also, in my laptop at home I had all the tools but, I was not able to make the graphs online. I was not able to because when I logged on the website not all the contributions or topics were popping up for me. I ended up having to wait until I was able to use the school’s laptop to complete the project. Other than the complications, the topic modeling tool is super easy to navigate and, doing the graphs was an even easier task for me. To be honest I just feel when it’s my home laptop and with my corpus as stated before it makes things a bit more complicated. But none of the things are hard or complicated for me to do. However, the other graph visualizations were difficult for me to understand.
Another thing that I really couldn’t figure out is for the Topic Modeling some words kept repeating themselves which I did not understand why. I went back to my corpus to see if maybe I did something wrong and I even went to the IT desk at school and they said everything looked fine. So I honestly was not sure if that was normal or, not. Other than that everything seemed pretty good. Compared to everything that was previously learned I know I grasped on this topic faster than the other ones that were previously introduced to us. I thought the pie chart was in fact helpful because I was able to see why certain texts were grouped together.
March 14, 2019
Exactly like the first project this one as well was challenging for me. I did not have enough time to complete the project in class and therefore had to download Rstudio and the other applications on my computer. I did end up downloading the applications that I needed but, when I was home I was not able to complete the tasks and was not sure what to do. I deleted the apps because, since I am not 100 percent confident in even managing I a laptop I got nervous and assumed I did something incorrect. Professor assisted me to download the necessary apps to complete the assignment accordingly.
I first did the Cluster Analysis for my topic Comparing Countries. I did the assignment with using the features most used 100 out of 100 words. I found it extremely interesting my results. Each country had their own color which was good except for Canada. Canada has colors blue and black only because there was a typo made in one of the text names. Otherwise, they should all be black. What I found interesting to discover was that both Italy and Africa had very common words that were being used. All of the other countries stayed with their own country in the picture. But both Africa and Italy are intertwined on the top of the picture and on the bottom of it too. On the top of the picture you are able to see that in the country Africa both Collingwood novels and Kingston novels are together and, what’s in the middle is Radcliffe from Italy and Capes from Italy. However, if you look on the bottom you are able to see that there are two novels written by the same author Schreiner from Africa. There is an author from Italy separating these two texts from Africa. I found that very interesting. Not only are Africa and Italy two different countries but, they are in two different continents entirely. They also have a different language and, different culture. I also see in the bottom Australia mixed in, in between Italy and Africa. It made me wonder how this would look if we added in more countries in Africa and Europe. Canada pretty much is in the whole middle and just with itself. I wondered if I added in the United States if it would mix in with Canada. Another thing that I found interesting was all the way to the left there was an Italian novel by itself. I confirmed that the novel is in fact a fiction novel so I am not sure why that is.
When I look at the Consensus Tree I see a different perception since I changed it to 100 out of 1,000 words. It’s not a completely different perception but, it is different. I noticed that the author that was previously all the way to the left from Italy got closer to another Italian author. Previously Italy was in between Africa and this time, Africa s in between Italy. Previously in the bottom there was an Italian author in between Africa and this time Australia is in between Africa and, not Italy is in between Africa and Canada. Canada stayed by itself again so nothing ended up changing with that.
Overall this was an informative project for me because I get to see the value in changing the features. To be honest in class I did not see what was the point to change the feture of how many words we are evaluating but, it is very interesting how much it honestly makes a difference in your analysis. The analysis was very informative. It makes me want to look into more why European countries and Africa did intertwine because to me they are extremely different. The project definitely open up your curiosity to subjects outside of just making a diagram for the class to look at. I would like to add more novels to this moving forward so I am definitely going to learn what is the best way to go about that so I can really expand my knowledge in this topic. At this point I am debating if I want to stay with just fiction or add more genres to the analyzation. I never really cared for history, geography or nay of the topics but, I am interested in books and it does show me with this tool how I can get more engaged in other topics.
March 14, 2019
Overall this project was difficult for me in the beginning. I have no background on technology and, honestly don’t even know how to do basic function on a Mac. I was frustrated in the beginning but, after meeting with Professor I feel much better. I honestly did not know what I wanted to do for this project. At first, I was going to compare lyrics from songs. I would have loved to do that just because I listen to music every day and love focusing on the lyrics over the beat. I did not end up doing that just because it was going to be extremely tedious to find all of the lyrics I needed, I did not want to put too much on my plate since I do not even know basic knowledge of using a laptop. I moved then on to the idea of fantasy books. I had a few series that I was going to use. I was thinking of using Harry Potter, Chronicles of Narnia, Lord of the rings etc. Again, it became difficult to find these books since they came out recent. I got to the point where I was just going to result to Shakespeare. Even though it is such a basic topic, I honestly did not know what else to do.
Once Professor and I met we looked together on the Gutenberg website. Under categories I saw “Comparing Countries.” I thought that would be a great idea. I decided to take fiction books from each country I chose. I chose to compare the countries Africa, Australia, Canada, and Italy. I chose 8 books from Africa, 4 books from Australia, 8 books from Canada, and 7 books from Italy. In total having 27 books. Ideally, I would have honestly preferred to have an even number from each country. However, If I would have done an even number from each country then I would have had to have a mix of genres.
At the point in completing the project I felt stuck. I honestly grew an interest in comparing the countries and I was excited to progress with this topic and analyze it with further steps in the overall project. I did not know if I should change my topic since I did not have an even number for each country or, should I change the genre that I was comparing. I looked into changing the genre but that was no help. In the Gutenberg site still with all the Countries that were offered none had all an even number for me to work with. Some countries did not even have certain genres as others which automatically eliminated them for use. Also, my original goal was to take two books from each author that was provided on the website. The first country I started working with was Canada. That is why in the Canada portion there is two texts from each author. As I continued with the other countries I was not able to take two texts from each author that was provided.
I continued to try to see other sites and learn how to download texts from other sites to see if I could have completed my overall goal. It became challenging and certain websites were giving me issues. I did not want to make the experience more frustrating for myself especially since I am trying to learn and that is the main priority. I decided to take would I can get from all the countries that were provided from me and, maybe it actually is better because I have more of a variety to work with, I thought that with my analysis it can only be accurate if there the same number of texts from each author. I thought about it and I do not think that matters because I am not trying to compare the authors only. I am also trying to compare the writings of the same genre in different countries.
I became more and more interested in this topic as I kept looking for different authors. To me fiction novels are interesting because they’re based off your imagination. To me it is not only comparing the stlyes of writing, comparing the words but, more comparing the mindset. It is fascinating to me to compare the mindset of individuals from different countries because we are comparing the education, the language, culture etc. There is more that goes into it than just words on the paper it makes you think, why those words? So to me this assignment is more geographical then anything else.