Quantitative Analysis

Quantitative Analysis is a scientific approach to understanding literature by treating the words in the text as data. As opposed to close reading, like we’ve been doing all year, Franco Moretti advocates “distant reading”.  He argues that since no one person can truly understand and identify patterns between vast amounts of literature, readers should employ various computer programs. These programs analyze and compare various texts and are able to decipher categories such as genre through patterns that a reader would never be able to do, such as the prevalence of specific words.

As an example of distant reading, I selected Robert Frost’s first book of poems, “A Boy’s Will”, published in 1913. I chose Frost not only because there are so many great examples of his work but also because of his method of recycling words throughout the course of his poetry and the variety of uses and definitions attributed to them. On the website Wordle, which evaluates various writings and artistically portrays the repetition of words in the works, I imputed the entire book of Frost’s poems.

http://www.wordle.net/show/wrdl/4503999/Robert_Frost%27s_A_Boy%27s_Will

As you can see, the word that is used the most is “one”. I then searched the word “one” on Google Ngrams, a website that displays the popularity of words in the collective works of Google Books, spanning from 1500-2000. The word “one” was a bit of a roller coaster up until 1800 but then steadily became more popular until peaking about the time that this book was published.

http://books.google.com/ngrams/graph?content=one&year_start=1500&year_end=2000&corpus=0&smoothing=3

The pitfall of “distant reading” is that not only is the reader limited by the amount of literature/data they are using but there can be inaccuracy when relying on it to identify literature. The next most commonly used word in our Wordle is “love”, a very popular word in the English language so you would expect it to be consistently popular in Ngrams, but take a look:

http://books.google.com/ngrams/graph?content=love&year_start=1500&year_end=2000&corpus=0&smoothing=3

We see that not only did the word not really start to be used in our catalog of Google Books until about 1570, but has really fallen out of favor since the 1700’s. While the word “love” was certainly used during Frost’s time, we would be inclined to say that this book was most likely published between 1580 and 1680, we could even go so far as to narrow the years down to 1671-1674, when the word peaked in popularity. So while I believe that the idea of “distant reading” is very interesting and should certainly be explored as a possible method, but I believe that readers may not yet be able to use this tool effectively in the course of trying to understand a work of literature.

12 thoughts on “Quantitative Analysis

  1. I’ve ceased becoming someone who sits in the corner hoping for technology to solve our problems, but I greedily latch onto that “yet” in that your last sentence. It seems we need more tools that can be used on still more books, and then suggestions on how to apply them. I would never have thought of “predicting” the publication date of a book from which words it used, but it certainly is interesting. My question: could such search attributes be the sorts of things that replace the Foucaultian “author function?”

  2. Although I am interested by the concept of distant reading, I do feel very skeptical in assuming these tactics as a basis for literary interpretation. As you point out, relying on these more scientific methods can create significant gaps between the words on the page and what technology predicts. So, the technology of today can and should be used in a supplemental form, however, I don’t think we are ever going to do away with close reading and other forms of analysis in favor of distant reading. I think most of us can (or might) agree on that, besides, where would the fun be?

  3. I think the concept of distant reading is interesting yet, I feel unsure about adopting this method of literary interpretation. For me, the limitation and inaccuracies of distant reading are too large to allow it to be a primary form of analysis. To play devil’s advocate, this method does allow us to analyze and compare text in a way in which humans could never do without this form of technology. Still, even with the profound ability that distant reading can offer, I still prefer and agree with the analysis tactics of close reading and using it as a primary source of understanding text.

  4. So far we’ve looked at quantitative analysis across large numbers of literary works, but I think it can be applied in conjunction with close reading (a la Moretti’s Hamlet diagrams) to build examine a given work at multiple levels. Imagine an english class in 2030 where we have the e-text of Mrs. Dalloway, and concentrate on all possible levels of analysis in it. Using close reading, we apply racial, postcolonial, gender, feminist and a variety of other critical approaches. Simultaneously, as a semester long project, we analyze meaning in the work quantitatively. We compile statistics not only on word frequency, but on word positioning in sentences, sentence structure. We can analyze the structures that create meaning on the sentence level and find those trends throughout the book; when does Clarissa get angry, and how does that look syntactically? Does it occur at certain times in the book? The potential is enormous, and doesn’t have to be divorced from the traditional reader experience to which we have grown attached. Like close reading and criticism in general, it can enhance what it means to read a work, deepen our knowledge and love for it, and doesn’t require a loss in beauty.

  5. The concluding paragraph of David L. Hoover’s Article “Quantitative Analysis and Literary Studies” adds some insightful commentary on quantitative analysis. Hoover’s paragraph comments on the relationship between traditional literary studies and quantitative analysis. Hoover argues that this relationship is delicate, but can be greatly beneficial to literary studies.

    As has often been noted, quantitative analysis has not had much impact on traditional literary studies. Its practitioners bear some of the responsibility for this lack of impact because all too often quantitative studies fail to address problems of real literary significance, ignore the subject-specific background, or concentrate too heavily on technology or software. The theoretical climate in literary studies over the past few decades is also partly responsible for the lack of impact, as literary theory has led critics to turn their attention away from the text and toward its social, cultural, economic, and political contexts, and to distrust any approach that suggests a scientific or “objective” methodology. There are, however, signs of progress on both these fronts. The recent increased interest in archives within literary criticism will almost necessarily lead to the introduction of quantitative methods to help critics cope with the huge amount of electronic text now becoming available. Some quantitative studies have also begun to appear in mainstream literary journals, a sure sign of their growing acceptance. The increasing frequency of collaborations between literary scholars and practitioners of quantitative methods of many kinds also promises to produce more research that strikes an appropriate balance between good methodology and significant results. Prospects for the emergence of quantitative approaches as a respected, if not central, branch of literary studies seem bright.

    The rest of the article can be found at: http://www.digitalhumanities.org/companion/view?docId=blackwell/9781405148641/9781405148641.xml&chunk.id=ss1-6-9

    Hoover says that, “quantitative studies fail to address problems of real literary significance.” I believe this statement hits home to the central problem of literary quantitative analysis. Many are worried that since this form of analysis is so different from traditional forms, it cannot be helpful. There is also a fear that it may become a substitute for these traditional methods. I believe that as technology progresses literary analysis can benefit from the innovations just as physics, mathematics, and other fields of study. The balance will work itself out. Literature will never be subjected to only quantitative analysis, but rather will have more of the mysteries of literature explored. Questions such as, “When did the word become popular?” and “What words are commonly found in different styles of writing?” can now be explored. Tools such as http://books.google.com/ngrams and http://www.wordle.net/ are innovative, but there are still many flaws that exist in this technology (as with any new technology). Literary critics need to contemplate what questions of literary significance can be answered by these tools and what limitations these tools present.

  6. I also find distant reading an extremely interesting topic, like everyone else seems to. But I also have my doubts about it, like everyone else seems to as well. Distant reading can definitely help us understand literature as a concept with trends and cultural implications, like say looking at athletics or medicine throughout history…if that makes sense. I find it interesting that you decided to include the word “love” in your post because the idea and theme of “love” in literature seems to me like something that could be lost through distant reading. I don’t agree with distant reading for the most part simply because of this – the personal connections we have to themes like love in certain works of literature, which could be why some of us have chosen to be English majors, is lost when the study of literature becomes computerized.

  7. I too was initially interested in the concepts of quantitative analysis, but saw my original curiosity die a quick and painful death once I tried putting it into practice. As you outline with this blog post, the process seems easy enough. You impute an author’s collection of works into a website, and it vomits back to you a (very aesthetically pleasing, I’ll admit) list of the most used words. The problem I have is that I am then left with this collection of data, and I am free to do anything I want with it. I know that I would box myself into the trap of simply explaining away things that I don’t readily understand by saying “it must be a fluke or a kink in the system.” I would also feel that any conclusion I came to from that information would feel forced, since I had to mold my thought process around data that already existed. I would feel like I was manipulating my understanding of the human condition to simply fit the results I got from a mechanical source. It just all seems too cold, unattached, and indifferent for me, and if we ever actually did resort to this, I would derive virtually no pleasure from literature. So it’s a very interesting and amusing method to think about, but I cringe at the thought of actually letting myself use it for any real academic purpose.

Leave a Reply

Your email address will not be published.