{"id":1163,"date":"2016-01-12T12:44:30","date_gmt":"2016-01-12T12:44:30","guid":{"rendered":"http:\/\/blogs.dickinson.edu\/dcc\/?p=1163"},"modified":"2016-01-15T18:04:41","modified_gmt":"2016-01-15T18:04:41","slug":"how-far-will-core-vocabulary-get-you","status":"publish","type":"post","link":"https:\/\/blogs.dickinson.edu\/dcc\/2016\/01\/12\/how-far-will-core-vocabulary-get-you\/","title":{"rendered":"How far will core vocabulary get you?"},"content":{"rendered":"<p>One of the claims that scholars make about vocabulary acquisition in Latin and Greek is that a relatively small number of high frequency lemmas (dictionary headwords) accounts for a high percentage of word forms in a typical text. John Muccigrosso and Wilfred Major, for example, estimate that the number of lemmas that will generate 80% of a typical text in Latin is 1500, in Greek, about 1100. (Muccigrosso, 2004, p. 416; Major, 2008, p. 7). Of course it stands to reason that this figure will differ between texts, and within texts, since some authors use relatively simple vocabulary (Nepos, Lysias), while some do not (Juvenal, Aeschylus), and some passages within an author have more unusual words than others. I and others have long wanted a way to calculate the \u201ccore percentage\u201d in a given piece of text, that is the number of word forms in a section of a text that derive from high frequency lemmas. This would be both interesting from the point of view of literary criticism, and helpful pedagogically. Some data on that is now emerging in the case of Latin, thanks to the work of <a href=\"http:\/\/web.philo.ulg.ac.be\/lasla\/\" target=\"_blank\">LASLA<\/a>, of Bret Mulligan and his <a href=\"http:\/\/bridge.haverford.edu\/\">Bridge <\/a>application, and the Excel skills of Derek Frymark (Dickinson \u201912).<\/p>\n<p>If we take the <a href=\"http:\/\/dcc.dickinson.edu\/latin-vocabulary-list\" target=\"_blank\">1000-word DCC core Latin vocabulary<\/a> as the definition of high frequency lemmas, then 78% of Caesar\u2019s <em>Gallic War<\/em> consists of core lemmas, excluding proper names. The core percentages by book in Caesar\u2019s <em>Gallic War<\/em>\u00a0(excluding Hirtius&#8217; Book 8, for which we have no LASLA data) look like this:<\/p>\n<p>Book\u00a0\u00a0\u00a0\u00a0\u00a0 Percentage<\/p>\n<p>1\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 0.80<\/p>\n<p>2\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 0.78<\/p>\n<p>3\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 0.77<\/p>\n<p>4\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 0.79<\/p>\n<p>5\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 0.77<\/p>\n<p>6\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 0.78<\/p>\n<p>7\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 0.75<\/p>\n<p>Individual chapters range from a high of 91% (4.8) to a low of 57% (7.72). 44 sentences in the work consist of 100% core vocabulary (e.g. 1.8.3 and 1.10.4), while there are two sentences, 3.13.4 and 3.13.4, which tie for a low of 17%.<\/p>\n<p>In the <em>Aeneid<\/em> (taking the chunks of the text as presented in Perseus) the average chunk is 70% core, with a high of 88% (7.1\u20134), and a low of 46% (6.417\u2013425). The book by book totals are as follows:<\/p>\n<p>Book\u00a0\u00a0\u00a0\u00a0\u00a0 Percentage<\/p>\n<p>1\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 0.72<\/p>\n<p>2\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 0.73<\/p>\n<p>3\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 0.70<\/p>\n<p>4\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 0.72<\/p>\n<p>5\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 0.70<\/p>\n<p>6\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 0.71<\/p>\n<p>7\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 0.69<\/p>\n<p>8\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 0.69<\/p>\n<p>9\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 0.71<\/p>\n<p>10\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 0.70<\/p>\n<p>11\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 0.72<\/p>\n<p>12\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 0.70<\/p>\n<p>Two Dickinson students, Seth Levin and Connor Ford, are working on visualizing the core percentage data for the <em>Aeneid<\/em> and the <em>Gallic War<\/em> as part of Dickinson\u2019s Mellon-funded <a href=\"http:\/\/blogs.dickinson.edu\/dbcamp\/\" target=\"_blank\">Digital Boot Camp<\/a>, led by Patrick Belk, starting this week. I look forward to sharing the results in the next few weeks, and hearing what you think of them!<\/p>\n<p><strong>References<\/strong><\/p>\n<p>Major, Wilfred E. (2008).\u00a0<a href=\"https:\/\/camws.org\/cpl\/cplonline\/files\/Majorcplonline.pdf\" target=\"_blank\">It\u2019s Not the Size, It\u2019s the Frequency: The Value of Using a Core Vocabulary in Beginning and Intermediate Greek<\/a>.<em>\u00a0CPL Online,<\/em>\u00a04.1, 1-24.<\/p>\n<p>Muccigrosso, John (2004). &#8220;Frequent Vocabulary in Latin Instruction.&#8221;\u00a0<em>Classical World,<\/em> 97, 409-433.<\/p>\n<p><em>Note<\/em>: this post was edited Jan. 15, 2016, to take into account some corrections in the data, and to add the book by book figures for the <em>Aeneid<\/em>.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>One of the claims that scholars make about vocabulary acquisition in Latin and Greek is that a relatively small number of high frequency lemmas (dictionary headwords) accounts for a high percentage of word forms in a typical text. John Muccigrosso &hellip; <a href=\"https:\/\/blogs.dickinson.edu\/dcc\/2016\/01\/12\/how-far-will-core-vocabulary-get-you\/\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":65,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"ngg_post_thumbnail":0,"footnotes":""},"categories":[1],"tags":[],"class_list":["post-1163","post","type-post","status-publish","format-standard","hentry","category-uncategorized"],"_links":{"self":[{"href":"https:\/\/blogs.dickinson.edu\/dcc\/wp-json\/wp\/v2\/posts\/1163","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blogs.dickinson.edu\/dcc\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blogs.dickinson.edu\/dcc\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blogs.dickinson.edu\/dcc\/wp-json\/wp\/v2\/users\/65"}],"replies":[{"embeddable":true,"href":"https:\/\/blogs.dickinson.edu\/dcc\/wp-json\/wp\/v2\/comments?post=1163"}],"version-history":[{"count":0,"href":"https:\/\/blogs.dickinson.edu\/dcc\/wp-json\/wp\/v2\/posts\/1163\/revisions"}],"wp:attachment":[{"href":"https:\/\/blogs.dickinson.edu\/dcc\/wp-json\/wp\/v2\/media?parent=1163"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blogs.dickinson.edu\/dcc\/wp-json\/wp\/v2\/categories?post=1163"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blogs.dickinson.edu\/dcc\/wp-json\/wp\/v2\/tags?post=1163"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}