Network Analysis

I am still not entirely sure that I understand what exactly the network analysis graphs show us in terms of the texts in our corpora. Every network analysis graph is made up of nodes and edges, nodes being (definition) and edges being (definition). A am also aware that when working with Palladio, you can make the nodes different sizes to the sum of the weights. Here is where I begin to become a little confused. The weights of the edges either refers to how strong a connection between texts is, or how far you have to travel between two point. I believe that in terms of our tests and interpretation of the weight, it is more so the former than the latter, as when looking at the accompanying cluster analyses, the texts that are pictured closer together are the ones with the larger edge weights. For some of these edges and connections, that is a no brainer.

Quite a few of the texts that have been written by the same author have a large weight between then, denoting that those texts have been heavily influenced by one another. This is somewhat obvious as since the texts are written by the same author, it would stand to reason that they would be influenced by one another as the person writing and, more than likely, the style of writing is going to be very similar. There was a slight difference in the weight when looking at the data cultivated from different amounts of most frequent words. I guess the differences there can also be explained in a some what “no brainer” kind of way. The more words that were looked at, the larger the weights. For example, when looking at the 25 most frequent words, the largest weight was 6, while when looking at the 1,000 most frequent words, the largest weight was 60. This difference in weight can be connected to the amount of words looked at, as when we look at more words, more connections can be subsequently made.

It was refreshing to see however, that regardless of how many words the program was asked to take into account, the texts with the biggest edge weights were the same. The only difference was the addition of text connections when looking at fewer words. This makes me wonder what the connections might look like at just the first most frequent word as well as how many words we would have to look at in order to make the weight something ridiculous like over 100.


A question that this visualization made me ask was what divides the Dickinson novels into separate pairs. If you look at the 25 most frequent words (where I’m only looking at those with a weight of 6), you can see that David Copperfield and Great Expectations are linked, while Oliver Twist and A Tale of Two Cities are linked together separately. I find this particularly interesting as since we are only taking into account the first 25 most frequent words, I wouldn’t have thought that there were that many distinctive words that would skew the data. The words should’ve been simple enough that perhaps all of his works would be linked, but as we see here, this is not the case.

When we look at the 1,000 most frequent words, there are only four sets of texts that have a sum of weight that equals 60. As with all of our tests and runs in class never going higher than about 10, I was pretty surprised to see 60 as the highest weight. I was also surprised to see the way that it counted up to 60. In previous preliminary tests, the weights were usually arranged 1-6 in standard numerical order. For this set though, the weights are jumping all over the place, not bothering to go in order or to even hit every number between 1 and 60. I don’t know if this is because the texts are either very influenced or not very influenced, but I think it begins to come down to which words the texts have in common since as you go farther down we find more words that are specific to certain authors or certain texts. What also surprises me about the second set of visuals is the one pair of texts, The Coral Island and Swiss Family Robinson. These texts are not by the same author, but I am guessing that the subject matter or perhaps the themes that the novels address might be similar enough to qualify as such a high weight as though they were written by the same person. It also makes me wonder how close those two texts were published in relation to the years, and whether the authors knew of each others works, as a similarity that strong would lead us to believe that there is a high connection between them.

One Reply to “Network Analysis”

  1. Lol, nodes are the points, edges are the connections. In our case, “undirected” (being equal in both directions).

    You don’t have to size the nodes, but it might help see the relationships. It’s sort of like in a dendrogram when you can measure the distance between items to see their distance . . . a larger-sized mode means it has higher weighted edges (connections) to other texts. The example is the person who slightly knows ten people versus the person who knows 2 people really reall well. One’s not better; just different.

    Great strategy using filtering and reducing the MFW to see the limits of the author signal.

    I’m interested in the Ewing, since it’s central like Dickens. I wonder if we might use a network graph like this to get a better sense of why some texts are considered canonical and some not read as much.

Comments are closed.