Distant Reading Analysis: Findings and Reflection

To briefly introduce and explain my research question, the poet of my interest, John Donne, wrote various poems, from romantic poems (singing about both sensual erotic love and spiritual love) to highly religious (Christian) lyrics. As the pioneer of metaphysical poets, he used highly intellectual but witty metaphors and conceit in their poetry to express his idea. As a result, many of his romantic poems contains religious connotations. Also, in his religious poems, he often depicts sensual or devotional love scenes (or both at the same time) to convey the diversity of God’s love and characteristics. I wanted to ask whether John Donne’s religious and romantic poetry can be distinguished based on their vocabulary and structure. Since they are so interrelated, I expect there might not be a clear difference. To answer this question, I collaborated with students from the Comp 130 course to perform a statistical analysis of the two bodies of texts representing religious and romantic poems. The two corpora I selected for the stylometric analysis were Holy Sonnets (HS) for religious poems and Songs and Sonnets (S&S) for romantic poems.

Holy Sonnets(HS) is a collection of 19 sequenced religious sonnets. It’s perceived to be written mostly around 1609 to 1611. Some exceptions are estimated to be dated after 1615 (when John first took the order as an Anglican priest). On the other hand, Songs and Sonnets(S&S) is a broad collection of Donne’s 55 romantic poems thought to be written from 1593 to 1617. The detail dates for the poems in Songs and Sonnets are not clearly discovered and are just assumed based on the maturity of its subject and Donne’s biography. Also, while lyrics of Holy Sonnets were collected under the same content when first published in 1633 print in a well-sequenced manner (however, not all 19 sonnets were included initially, and some were added in the later publications), Songs and Sonnets didn’t have its name on the table of contents until revised publication was printed in 1635. Even though there are some other great religious poems written after Donne’s ordainment in 1615, the reason for exclusively choosing Holy Sonnets as the representative for Donne’s religious poetry was because Holy Sonnets was written relatively close in time to the poems of Songs and Sonnet compared to other religious poems. If poems written too far away from each other in terms of dates were used for comparison, I assumed that the difference might be too obvious and not so interesting to investigate.

The stylometric analysis options that the Comp 130 students could offer were average line length, average sentence length, and average sentence complexity for structural investigation of the poems, and type-token ratio(TTR) and hapax legomena ratio(HLR) that could tell the size of the vocabulary that the author has used in the text. However, as we proceeded with the analysis, I found average sentence length and average sentence complexity not very adequate to explore the structure of poems due to poetic license that could slightly distort the sentence structure like punctuations and grammar. Also, the result of average line length told a noticeable difference between Holy Sonnets and Songs and Sonnets. Since Holy Sonnets was composed of actual sonnets, the poems were all 14 line lengths. In contrast, despite its name, Songs and Sonnets didn’t contain any sonnets. The formalities of the lyrics are very diverse among them.

Below is the outcome from analyzing the HLR and TTR of the two corpora:

Firstly, we were able to observe a clear difference that Holy Sonnets generally had high TTR and HLR values than the Song and Sonnets. However, there were also some overlapping between the corpora that required further investigation. The question that might be asked is whether the works consisting of the overlapping zone were produced in a relatively similar period of time. If so, it might indicate that Donne had used a specific size of vocabulary in a specific period of time. Secondly, while the TTR and HLR values of Songs and Sonnets were very dispersive (45% to 76% for TTR and 25 to 58 for HLR), the values of Holy Sonnets were relatively clustered.

However, it was unsure whether these differences in TTR and HLR values of the two bodies of text originated from the difference between religious and romantic poems. So, I examined the lists of the words of each corpus used to calculate the HLR and TTR values of the texts. Interestingly, since the number of the words of Holy Sonnets were generally lower than that of Song and Sonnets, words like ‘is’ and ‘she’ that should generally be used multiple times in the texts were used relatively few times or even only once. As a result, those common words were also included in the list of hapax legomena of Holy Sonnets, highering the TTR and HLR value of the texts. This indicates the difference in TTR and HLR between Holy Sonnets and Songs and Sonnets is not caused by the difference between religious and romantic poems but is dependent on the length of the poems.

Through the distant reading process, I wasn’t able to clearly discern between Holy Sonnets and Songs and Sonnets. This result aligned with my original expectation that since Donne’s romantic and religious poems are so interrelated, they are difficult to be differentiated. However, my procedure had several limitations and wasn’t effective to control the multiple variables that could influence my outcome. Firstly, the length and the amount of text were so different between the two corpora. Also, Holy Sonnets were written in a more narrow period of time than Songs and Sonnets, therefore might be not enough to effectively represent the religious poems of John Donne. These multiple variables have decreased the reliability of my result and alarmed me to set up more homogenous bodies of texts for future comparison.

Back to Front Page