How far will core vocabulary get you?

One of the claims that scholars make about vocabulary acquisition in Latin and Greek is that a relatively small number of high frequency lemmas (dictionary headwords) accounts for a high percentage of word forms in a typical text. John Muccigrosso and Wilfred Major, for example, estimate that the number of lemmas that will generate 80% of a typical text in Latin is 1500, in Greek, about 1100. (Muccigrosso, 2004, p. 416; Major, 2008, p. 7). Of course it stands to reason that this figure will differ between texts, and within texts, since some authors use relatively simple vocabulary (Nepos, Lysias), while some do not (Juvenal, Aeschylus), and some passages within an author have more unusual words than others. I and others have long wanted a way to calculate the “core percentage” in a given piece of text, that is the number of word forms in a section of a text that derive from high frequency lemmas. This would be both interesting from the point of view of literary criticism, and helpful pedagogically. Some data on that is now emerging in the case of Latin, thanks to the work of LASLA, of Bret Mulligan and his Bridge application, and the Excel skills of Derek Frymark (Dickinson ’12). If we take the 1000-word DCC core Latin vocabulary as the definition of high frequency lemmas, then 78% of Caesar’s Gallic War consists of core lemmas, excluding proper names. The core percentages by book in Caesar’s Gallic War look like this:

Book      Percentage

1             0.80

2             0.78

3             0.77

4             0.79

5             0.77

6             0.78

7             0.75

Individual chapters range from a high of 100% (7.61) to a low of 57% (7.72).

In the Aeneid (taking the chunks of the text as presented in Perseus) the average is 70% core, with a high of 88% (7.1–4), and a low of 46% (6.417–425).

Two Dickinson students, Seth Levin and Connor Ford, are working on visualizing the core percentage data for the Aeneid and the Gallic War as part of Dickinson’s Mellon-funded Digital Boot Camp, led by Patrick Belk, starting this week. I look forward to sharing the results in the next few weeks, and hearing what you think of them!

References

Major, Wilfred E. (2008). It’s Not the Size, It’s the Frequency: The Value of Using a Core Vocabulary in Beginning and Intermediate Greek. CPL Online, 4.1, 1-24.

Muccigrosso, John (2004). “Frequent Vocabulary in Latin Instruction.” Classical World, 97, 409-433.

Leave a Reply

Your email address will not be published.