For the distant reading part of this project, I asked my computer science group to create a stylistic fingerprint of “Berenice,” one of Edgar Allan Poe’s horror stories, and of “The Purloined Letter,” one of his detective stories, to see which was more similar to “The Murders in the Rue Morgue.” At first glance, the answer to this question seems obvious. “The Murders in the Rue Morgue” and “The Purloined Letter” are both detective stories with the same main characters and related plots, and so are likely to be more alike in terms of length, language and sentence structure than are “Berenice” and “The Murders in the Morgue.” But “The Murders in the Rue Morgue” incorporates many elements of Poe’s horror fiction (which includes “Berenice”) while “The Purloined Letter” does not, so I wanted to create a stylistic fingerprint of all three to see whether any of the similarities in mood and tone between “The Murders in the Rue Morgue” and “Berenice” would be reflected in the results of a more quantitative analysis of the texts.
I presented this question to the computer science group and they wrote a program to analyze the stylistic features of each work from the standpoint of average word length, type-token ratio, hapax legomena ratio, average sentence length, and average sentence complexity. I did ask whether they could analyze any other features of the texts that might reflect mood or tone but they told me they could not because that would be beyond their programming capabilities. They explained that the type-token ratio is the ratio of the number of different words in a text to the total number of words in that text. In other words, it measures vocabulary variation within a text. The more distinct words a text uses, the higher its type-token ratio will be. The hapax legomena ratio is the ratio of words that occur exactly once in the text to the total number of words in that text so it measures the number of words used only once in a text. The computer science group defined average sentence complexity as the average number of punctuations in each sentence. Average sentence length and average word length are self-explanatory. To determine which text was more similar to “The Murders in the Rue Morgue”, they calculated the results from each text for the five stylistic features mentioned above and then calculated the percentage error between “Berenice” and “The Murders in the Rue Morgue”, and then between “The Purloined Letter” and “The Murders in the Rue Morgue”, and they concluded that, because “The Purloined Letter” had a lower percentage error, it was more similar to “The Murders in the Rue Morgue.”
It is not surprising that the two detective mysteries were more alike than “They Murders in the Rue Morgue” and the horror story, but I wanted a little more information from them so that I could see the results for each of the stylistic features, and the magnitude of the difference between them in case the individual results told a different story. The information they provided is summarized in this chart:
|Average Word Length||Type-Token Ratio||Hapax Legomena Ratio||Average Sentence Length||Average Sentence Complexity|
|The Murders in the Rue Morgue||4.75||0.28||0.19||20.53||3.08|
|The Purloined Letter||4.78||0.34||0.34||27.87||4.81|
When I looked at these individual results, I was surprised to see that “Berenice” (the horror story) and “The Purloined Letter” (the detective story) were more similar to each other than either was to “The Murders in the Rue Morgue.” Before doing this analysis, I assumed the two detective stories would be more alike, but that it was possible that the detective story with gothic elements would be more similar to the gothic horror story than to the other detective story. Based on this analysis, it appears that it makes no difference at all whether both are detective stories or both include gothic horror elements, since the horror story and the detective story without any gothic horror elements were most alike. The stylistic fingerprint developed for this project does not appear to be influenced at all by the genre of the work being analyzed (horror vs. detective fiction).