Vocabulary Study with Mnemosyne

In an ideal world all vocabulary would be learned contextually, but when trying to learn Latin in a limited amount of time, we usually need flashcards. Guest writer Alex Lee (alexlee@uchicago.edu) describes how to study the DCC Core Latin vocabulary using a nifty piece of software called Mnemosyne, and the electronic flashcards he made for it using the DCC Latin core. Mnemosyne allows for targeted and adaptive use of the cards.

alexlee (2)Learning any language involves acquiring a large amount of vocabulary. For this reason, I think it is very useful for Latin and Greek students to put time and effort into systematic vocabulary study.

One effective way to accomplish this is with flash cards. These days, however, we have the additional option of using special software that removes much of the tedium from the process. More importantly, such software can calculate the best time to present cards for review (using a spaced-repetition algorithm). In this way words can be committed to long-term memory as efficiently as possible.

The value of systematic vocabulary study?

One might reasonably question the benefits of systematic vocabulary study. Strong arguments have been made that vocabulary is better learned in context – that one really acquires new words through actual use. On this view, in which there is a clear distinction between the memorization of word definitions and the actual acquisition of those words, the memorization of vocabulary only helps insofar as it reduces the amount of time spent looking up words. The words thus memorized are not learned or acquired in the real sense, i.e., one is not able to understand and use these words directly and fluidly. Instead, one’s understanding of the word is mediated by the definition that has been memorized.

I’m actually very sympathetic to this view, and I think that any word that has been memorized must be reinforced by actual use, in a meaningful context. Indeed, in the post-beginner stages, new words should be acquired through extensive reading. At the beginner level, however, and when the words in question are core vocabulary words, the systematic study of these words will serve an important boot-strapping purpose. Students will expend less time and energy trying to figure out the meanings and forms of basic words, and they will be less overwhelmed in trying to understand the texts that they encounter. Because the memorized words appear so frequently, it shouldn’t take long before the initial “vocabulary-list understanding” of each word is converted into actual acquisition.

Mnemosyne

The software that I recommend to my students is called Mnemosyne. It is free, it runs on multiple platforms (Windows, Mac OS X, and Linux), and it has a fairly simple interface.

Mnemosyne keeps cards in a virtual deck. You can add new cards individually, or you can import them in bulk from some other source. Cards are organized according to tags. Each card can have multiple tags, and these tags can be hierarchical. For example, all DCC Latin Core Vocabulary cards begin with the tag CoreLatin, and under this grouping they are tagged according to frequency (CoreLatin::1-200, CoreLatin:201-500, and CoreLatin::501-1000) and semantic grouping (e.g. CoreLatin::Measurement).

In the remainder of this post I will describe how to set up and use Mnemosyne to study the DCC Latin Core Vocabulary. (There are similar software packages out there, such as Anki, but I am not as familiar with them.)

This is meant as a sort of quick start guide. For more details and explanation of other features, take a look at the Mnemosyne documentation.

Installation and setup

Go to the download page and fetch the appropriate package for your platform. The installation procedures for Windows and Mac OS X are fairly typical. (Linux users, however, might need to do some additional work, but I assume they will be able to handle that.)

Settings

When you run the software for the first time, select the Configure Mnemosyne… item, which is located under the Settings or Preferences menu. The configuration options are divided among three tabs: General, Card appearance, and Sync server. For options under General, I use the following:

settings_general

I also recommend looking at the options under Card appearance and setting a larger font.

Import cards

Now you want import the DCC Latin Core Vocabulary cards into your deck. Download the file dcc_core_latin.cards online here. In Mnemosyne, go to File → Import…, choose the file format “Mnemosyne 2.x *.cards files”, and for the file itself click on the Browse button and select the dcc_core_latin.cards file that you downloaded.

import
Now that you have imported these cards, you can view them using the card browser. Go to Cards → Browse cards…. You should see something like this:

card_browser
(The filename in the box will look different, depending on where the downloaded file is located on your system.) Leave the additional tags blank, and press the OK button. An additional information window will pop up; you can just click OK again.

Usage

Activating cards

The tags that have been attached to the cards make it possible for you to mark only a subset of cards as “active” at any given time. For example, go to Cards → (De)activate cards…, and in the right-side pane unselect everything except for 1-200. Click OK.

Now the software will only present you with cards with the tag CoreLatin::1-200, which means that you are studying the cards for the words that fall in the top 200 in the frequency rankings. (There are actually more than 200 such cards, but that is because I have split a handful of entries from the list into multiple cards, e.g., longus -a -um and longē.)

In fact, there are twice as many cards as you might expect, because each word can be presented in two ways: for recognition (Latin to English) and for production (English to Latin). The relevant check-boxes are located in the upper left pane, within the item labeled Vocabulary. Most people probably want to start with recognition only, so uncheck the Production box for now.

Learning new cards

At this point the software will prompt you with a Latin entry in the upper box. Try to think of the correct answer and then click the “Show answer” button (you can also press spacebar or enter). The answer will be revealed in the lower box.

Now you need to grade your response (you can click on the button or press the corresponding number key):

  • If I had no idea about the answer, I typically select 0.
  • If I did not get it right but am getting some vague notion of the answer, I select 1.
  • If I think I knew it well enough to remember for a day or two, I select 2 or 3.
  • If I knew the word, I select 4.
  • If I knew the word immediately and with great ease, I select 5.

Cards that are graded with 0 or 1 will be presented to you again on the same day. If I am in the process of learning a new card, I usually have to grade it as a 1 several times, so that it keeps reappearing within the same session, until I have an initial knowledge of it.

Cards that are graded with 2–5 will be scheduled for subsequent days. The higher the grade, the longer it will be until you see that card again.

Reviewing cards

Cards that you have not yet learned sit in the “Not memorised” pile, while cards that you learned in previous sessions might appear in the “Scheduled” pile (see the status bar at the bottom of the main application window).

If you previously learned a card, the software might decide that you now need to review it. In this case the card will be “scheduled” for today. When you are presented with the card, you must once again grade your response:

  • If I forgot the card, I select 1 (sometimes 0 if I totally forgot it).
  • If I remembered the card, but just barely or with great difficulty, I select 2 or 3. This means the interval was probably a bit too long.
  • If I was able to remember the card correctly, though perhaps with some effort, I select 4. This means the interval was just right.
  • If I remembered the card very easily, I select 5. This means the interval was probably too short.

Mnemosyne will keep a record of your progress with each card. The goal is to show you a card just before you are going to forget it again, as this is supposed to be the best time to review a piece of information in order to promote long-term retention.

Try your best to set aside a chunk of time each day to (a) review previously-learned cards and (b) learn new cards (if you have any new cards pending). Mnemosyne will take care of all the prompting and scheduling; you just have to sit down and go through the cards!

Studying for quizzes (using the cramming scheduler)

Let’s say that you need to study for an upcoming quiz. In this case you want to see all of the active cards, regardless of when they are scheduled. And you don’t want your responses to each card to be recorded by Mnemosyne, because that would mess up the long-term learning schedule for those cards.

In these situations you want to use Mnemosyne’s “Cramming Scheduler”. Go to Manage plugins… under Settings or Preferences, and enable the “Cramming scheduler”. While this plugin is active, all cards will be shown, and no scheduling information will be saved. When you are done studying for the quiz, don’t forget to go back and disable the Cramming scheduler.

Long term memorization

At a little over one thousand words, the DCC Latin Core Vocabulary is a substantial yet manageable list. My hope is that with the aid of Mnemosyne, we can make it as easy as possible for students to start memorizing these words.

The use of tags allows subsets of the Core Vocabulary to be enabled incrementally. For example, students can start with the CoreLatin::1-200 group of highest-frequency words. Once those are learned, they can activate the CoreLatin::201-500 group, and after that the CoreLatin::501-1000 group.

After cards are learned for the first time, however, Mnemosyne will continue to present them again for review; but each card will be presented at appropriate intervals. If students are diligent about taking a few minutes each day to review cards, they can easily make steady progress toward committing these words to long-term memory

Alex Lee (alexlee@fastmail.net) is a PhD candidate in Classics at the University of Chicago. He has a strong interest in Latin and Greek language pedagogy – in particular, the implications of language acquisition theory and the use of technology as an aid to teaching. His dissertation examines the argumentative and rhetorical function of images in Plato’s Republic.

How principal are Greek principal parts?

I just finished adding the principal parts to the DCC ancient Greek core vocabulary list, something I meant to do last summer, but which got lost in the shuffle. So that’s done, and up. Phew. Anybody who has tried to learn ancient Greek knows what a big hurdle the principal parts are: absolutely essential, but a beastly task of brute memorization. I am here to say that, as one who focuses more on Latin than on Greek, I have to re-learn some of them on a regular basis if I want to read (or teach) Greek well. This is not the fun, life-affirming, profound, aesthetically enriching part of Greek. This is the boot camp, the weight-lifting one must do to get there.

The idea behind principal parts is to put in your hands, and hopefully in your brain, all the different stems of a verb, so that (theoretically) any declined form can be derived from, or traced back to, one of them. But of course it’s not quite that simple.

On the one hand, some verb forms and related things are extremely common, but not really directly derivable from the principal parts as they are traditionally presented. εἰκός, for example, is a very common participial form meaning “likely, plausible” that is not immediately apparent from the principal parts of ἔοικα. It’s in the dictionary, of course, but somewhat buried in the entry on ἔοικα.

On the other hand, many Greek verbs have principal parts whose stems are only very rarely employed. πέφασμαι, for example, is a perfect tense principal part of a very common verb, φαίνω. But forms derived from it are rare. πέφαγκα, another perfect form listed by Smyth among the “principal” parts is very rare indeed, with only seven attestations in the TLG, almost all of those from late antique grammarians and lexica. I guarantee you will never encounter it outside a grammar book.

Part of the problem here is that our apparatus for learning ancient Greek is largely derived from big, comprehensive, scientific grammars of the 19th century, and thus have a tendency to completism, rather than the conveying of what is most essential. This is a general problem that does not only affect the issue of principal parts.

Enter into this picture the database, specifically the TLG and its lemmatizer tool. This is the tool that attempts to determine from what dictionary head word (or lemma), a given form derives. I have complained elsewhere about the impotence of existing lemmatizers when it comes to determining the meaning of homographs–forms that are spelled the same but derive from different lemmas, or forms derived from a single lemma, but which could have more than one grammatical function. This is a serious and as yet unsolved problem when it comes to asking a computer to analyze a given chunk of Greek or Latin. And the homograph problem also substantially compromises frequency data based on machine-analyzed large corpora of Greek and Latin.

But one thing at which the lemmatizers are extraordinarily good–theoretically flawless– is telling how many occurrences of a certain word form there are in a given corpus. And by examining that data you can get in most cases a very accurate picture of how common are the forms derived from a particular stem or principle part in a Greek verb. In other words, the TLG Lemma Search (which is what I have been working with in making the principal parts lists for our site), helps us see more clearly than has ever been possible which principal parts of each verb are the most important, and which very common forms lie slightly outside the traditional lists of principal parts. It has the potential to make principal parts lists far more informative and helpful to the language learner even than the information found in Smyth, LSJ, or any of the current textbooks.

I can think of a couple ways in which TLG lemmatizer data could be used to enhance the presentation of Greek principal parts. One could, for example, have a second list of, say, the five most statistically common forms of a given verb. In the case of πάρειμι, for example, that would be the following (with the total raw occurrences in TLG as of today):

παρόντος (8587), παρόν (5406), παρόντα (4920), παρόντων (4442), παρόντι (3451)

In fact the top 10 or so are all participial. παρών παροῦσα παρόν: that’s what I call a principal part!

Another way to do it would be to print in bold the principal part from which the most forms derive, or even use a couple different font sizes to reflect how commonly used each principal part is. For σῴζω, save, the figures are (roughly) as follows σῴζω (8600) σώσω (1300), ἔσωσα (5500), σέσωκα (400), σέσωσμαι (700), ἐσώθην (8800). Interesting to see the aorist passive stem beat out the present stem. The top vote-getters in terms of forms are σωθῆναι, ἔσωθεν, σώζεται/σῴζεται, σῶσαι, and σῶσον.

People who are better at Greek and spend more time with large corpora and their analysis than I do have probably thought of all this long ago, and there may be some principal parts lists that incorporate some of this data. If so, I would love to hear about it.

Before closing I should give a huge thank you to Prof. Stephen Nimis from Miami University of Ohio and his collaborator Evan Hayes, whose principal parts list in their edition of Lucian’s A True Story (soon to be re-published on our site with extra features) was of great assistance as I was making our list. And I should mention here also the crucial help I have had all along with our Greek list from the great Wilfred Major, of Louisiana State University.

 

 

 

A Sight Reading Approach to Using the DCC

One of the key features of the DCC site is that each text comes equipped with hand-made running vocabulary lists, containing the main definitions for each word, but also the particular one relevant to the context. Very common words are excluded. These take a lot of effort to prepare, of course, so I thought it would be good to explain why we do this.

The point is not just to make it easier for readers to find the correct lemma behind a given form (something automated tools are still very bad at). It also allows for a way of teaching that focuses students’ out of class efforts on vocabulary acquisition and comprehension, rather than the (much harder task of) translation. A vocabulary-focused sight reading approach can help fight the bane of Latin and Greek pedagogy: students writing down the “correct” translation in class, and giving it back on tests, which improves their ability to memorize English, but doesn’t do much for their Latin or Greek.

In essence this is what is now fashionably called a flipped classroom approach, where easier rote tasks are put outside class time, and the hardest tasks are done inside class, collaboratively. In my view the positive psychological effect of this are well worth the effort. Many classical teachers have used this kind of approach over the years. My own particular inspiration is Edwin Post, a professor at De Pauw around the turn of the 20th c., and author of the wonderful Latin at Sight (1895). I know many teachers out there are doing similar things, and would love to hear suggestions and refinements, especially things that DCC could do to better enable this kind of pedagogy.

The routine as I have worked it out in my own classes (one which of course admits of many variations) is as follows:

Students’ class preparation consist of a mix of
• vocabulary memorization for passages to be read at sight in class, and
• comprehension/grammar worksheets on other passages (ones not dealt with normally in class).
Class itself consists mainly of
• sight translation, and
• review and discussion of previously sight-read passages
• grammar review as needed
Testing consists of
• sight passages with comprehension and grammar questions (like the worksheets), and
• vocabulary quizzes.

Textual analysis is done orally in class, through more interpretive worksheets on previously read passages, and in paper assignments.

The rationale behind doing things this way is that:
• students become good at reading Latin or Greek ex tempore. They lose their fear of it. They start to recognize word groupings and syntactical relationships, rather than isolated vocabulary items.
• students learn to guess at unknown words based on context rather than becoming stuck on the first unfamiliar word, or relying too much on the dictionary
• students have no incentive to memorize English translations; the incentive is to master high frequency vocabulary that is likely to be seen again in a new context. These items are learned contextually.
• students get used to identifying grammatical features that actually occur in the text, rather than isolated grammar lessons that don’t always have a clear relationship to reading. Grammar is less a burdensome extra, but as a tool that allows the extracting sense out of a text.
• total quantity of text covered may be somewhat less in class, but worksheets allow at least as much reading total as in the traditional method, probably more

To implement this it is important to
• Have vocabulary lists made up ahead of time. If working toward a high frequency master list, separate the lists into high and non-high frequency portions. Otherwise, just have reasonably comprehensive lists made up. Put it all on a web site for them to study before class. Quiz these occasionally first thing in class. No need to do this every day. They have an incentive to learn vocab. so as not to look too clueless in class. Midterm and final involve comprehensive vocabulary review of words already seen.
• Have worksheets made up ahead of time. Comprehension questions can be written in Latin or Greek, and call for responses in Latin or Greek. This is very difficult at first, but helpful in the long run. Comprehension questions in English are somewhat easier, but make it possible at times for students to merely skim the text looking for key words. But one needs to be resigned to the fact that they will not glean every single nuance of these passages. This is ok. More exposure is better. For the grammar questions, have them spot several instances of a particular construction; or manipulate things, e.g., find several verbs in the imperfect and put in all six tenses and translate (this is a mini synopsis). Focus on pronouns, relative pronouns, reflexives, participles, transitive vs. intransitive verbs, finding word groupings like transitive verbs and their direct object. This kind of grammatical analysis powerfully reinforces sight reading skill.
• When sight reading in class it is essential to do “pre-reading.” Give a little talk about what the passage is about, point out proper names, unusual vocabulary, tricky constructions ahead of time. That way they go in knowing what it is basically about, and will not be phased by knotty bits.
• Make a point of reviewing everything. This gives lots of confidence, reading fluency, vocabulary reinforcement.
• Progress to more sophisticated worksheets that include interpretive tasks, like picking out the most significant or emphatic words, judging the tone, finding literary and rhetorical techniques, inferring what the author wants you to think about what it being said.
• Throughout it is important to communicate with the students what you are doing and why. The notions of high frequency vocabulary, guessing, getting the gist and not worrying so much about the details, these are things the students can get behind. With this good will you can do a lot of more detailed grammatical discussion and textual analysis.
• Grading should be low stakes on the worksheets, at least initially

The feedback from my students on this has been good. Certainly the relationship to grammar is transformed. They suddenly become rather curious about grammatical structures that will help them figure out what is going on. With the worksheets the assumption is that the text makes some kind of sense, rather than what used to be the default assumption, that it’s Latin (or Greek), so it’s not really supposed to make that much sense anyway, right?

–Chris Francese

Do the Flaws in the Perseus Word Study Tool Matter?

In a recent post I tried to categorize the problems of the Perseus Word Study Tool, as tested on a section of Vergil. More surprising to me than the overall rate of error (about one in three words was misidentified in some way) was the fact that many of the errors were not subject to correction by means of Perseus’ “voting” system; and that even when voting was in operation, it often did not correct the error. Sometimes the correct choice was not an available option; other times, unanimous correct votes were ignored, and unanimous incorrect votes were accepted. At Aen. 5.17, to add another example to those mentioned the earlier post, the vocative magnanime was incorrectly called an adverb on the basis of six incorrect user votes.

The inadequacy of the LWST will not have been news to anyone who has used it. The question is, is the level of error pedagogically significant? Is the LWST good enough for the purposes of a typical Latin student? In other words, should the average Latinist care? It is not good enough, and the level of error and the specific types of errors in this flagship classical DH project are pedagogically significant and worthy of attention, I believe, for several reasons.

1. Words that give students the most trouble–relative pronouns, demonstratives, quam, ut, modo, Q-words in general–are exactly those least likely to be handled well by the LWST. The earlier post has some examples from my small sample, but I’ll add here that in Aen. 5.30 (magis . . . ) quam, when it comes to that quam, the LWST offered no fewer than seven possible quams to choose from (all numbered quam 1-7), none of which has the correct definition in the context (“than”).

2. The LWST is of course helpless when it comes to unusual or idiomatic expressions, of which there is a good example in my sample at 5.6, were notum must be translated “the knowledge that.”

3. The tool naturally can analyze only what is there. It cannot tell when something is left out or assumed.

4. A major structural problem is represented by bad short definitions of the type (to choose again from examples offered by my sample)  iubet = “imposed,” iam = “are you going so soon,” frustra = “in deception, in error,” or more subtly, the fact that the common meaning of tendere, “direct one’s course,” does not appear in the short def. for that word.This is important because, even though one can click on and read the full Lewis & Short dictoinary definition, intermediate students are very unlikely to click through and sift through long entries in search of the correct definition.

5. Moreover, the LWST obscures the relationships between words, which is key to learning to read Latin. This is why seemingly minor accidence mistakes are meaningful. Misled on a part of speech, or the gender of an adjective or the case of a noun, the student will likely not see the syntactical connection between words, and thus the tool reinforces the urge to produce the dreaded “word salad” translations.

6. More broadly, with its cryptic statistical data and jumbled pseudo-information, the LWST reinforces the the impression that many students have: that Latin isn’t really supposed to make sense anyway, that it’s all some kind of fiendish crossword puzzle.

Gregory Crane in an important article and apologia for Perseus, has said that the goal of the Perseus Project is to provide “machine-actionable knowledge.”

Reference materials, in particular, are structured to support automatic systems (e.g., the morphological analyzer learns Greek and Latin morphology from a machine actionable grammar) and to be decomposed into small chunks and then recombined to provide dynamic commentaries. If you retrieve a book in a language that you cannot read or on a topic that you cannot understand, the system can find translations where these already exist, machine translation and translation support systems, reference works, and general background information suited to the general background and immediate purposes of the reader. In knowledge bases, the boundaries between books begin to dissolve.

But clearly machines are spectacularly bad at understanding Latin at the moment. Crane thinks in terms of many decades, and is waiting for massive improvements in artificial intelligence, or teams of graduate students to encode correct grammatical analysis in texts. But such a prospect seems increasingly far off, and given the size of the Perseus Digital Library (10.5 million words at the moment), it seem unlikely that the millions of errors can be corrected any time soon, if ever. Indeed, would it be worth huge the investment of time and money? In the meantime, we need to create a collaborative tool for generating reasonably correct and reliable vocabulary lists for Latin (and Greek) authors that will be helpful for students and teachers around the world. Why we should do this, and what kind of tool I have in mind, will be the subjects of future posts.

–Chris Francese

 

Types of Error in the Perseus Latin Word Study Tool

The Perseus Latin Word Study Tool (LWST) is intended to provide dictionary definitions and grammatical analysis of all words in the Latin texts available in the Perseus Digital Library, currently 10.5 million words.

A check of the definitions and grammatical analysis of an arbitrarily chosen chunk of Vergil’s Aeneid (5.1-34, 223 words), found that it was incorrect in 79 instances, or 35.4% of the time (and correct 64.6% of the time). The most common type of error (21 instances,  26.6% of all errors, 9.4% of all words) was a mistake of accidence, for example duri (5.5) was taken as genitive singular instead of nominative plural. In 17 cases (21.5% of errors, 7.6% of all words) words were assigned to the wrong lemma, as when quoque (“and whither”) was derived from quoque (“also, too”), or venti (“winds,” 5.20) was assigned to the verb venio, “come,” as if it were the perfect participle. This particular mistake occurred three times in this passage, and the correct lemma was not listed as a possible option. In 14 instances (17.7% of errors, 6.3% of all words) the dictionary definitions provided were wildly wrong. This was true of some very common words. iam was glossed as “are you going so soon,” nec as “and not yet,” ab as “all the way from.” Elissae (5.3) was glossed as “Hannibal.” In every case this type of error was seen to come from the pulling, seemingly at random, of a word or phrase from the dictionary of Lewis & Short on which the LWRT is based. In 11 instances (13.9% of errors,  4.9% of all words), the relevant definition in the context at hand was not provided (though it could be found by clicking to and reading through the full Lewis & Short dictionary entry). For example, cerno was glossed as “separate, part, sift,” but not “perceive,” or infelicis (5.3) glossed as “unfruitful, not fertile barren,” rather than “unfortunate.” More seriously, all relative pronouns were glossed as interrogatives (“who? which? what? what kind of a?”), and described simply as “pron.” The word “relative” did not appear on the page. In 8 instances (10% of errors, 3.6% of all words) a word was assigned to the incorrect part of speech, as when medium (5.1) was called a noun rather than an adjective, or locutus (5.14) assigned to the rare 4th decl. noun “a speaking” rather than to loquor. In 4 cases (5% of errors, 1.8% of all words), there was no definition available. And in all cases deponent verbs were incorrectly labeled passive (4 instances in this particular section, or 5% of errors, 1.8% of all words).

Now, the makers of Perseus are perfectly aware of the flaws in LWST, and attempt to use the power of social media of help remedy the situation. Subjoined to the analysis of every ambiguous word, after an explanation of the methodology used, one finds a plea to help by voting.

The possible parses for this word have been evaluated by an experimental system that attempts to determine which parse is correct in this context. The system is composed of a number of “evaluators”–each of which uses different criteria to score the possibilities–whose votes are weighted to determine the best answer. The percentages in the table above show each evaluator’s score for each form, which are then combined to determine each form’s overall score.
This selection used the following evaluators:
• User-voting evaluator: Scores parses based on the number of votes each one has received from users. Weighted more heavily as more users vote for a given word in a text.
• Prior-form frequency evaluator: Evaluates forms based on the preceding word in the text; finds the most likely parse among this word’s possible morphological features and the preceding word’s possible features based on the frequency of each possible pair.
• Word-frequency evaluator: Scores parses based on how often the dictionary word appears in the Perseus corpus. Only used when a given form could be from more than one possible word.
• Tagger evaluator: Evaluator based on pre-computed automatic morphological tagging
• Form frequency evaluator: Scores parses based on how often their morphological features (first-person, indicative, plural, and so on) occur among all the words in the Perseus corpus.
User votes are weighted more heavily than the other methods, which are all treated equally.
Don’t agree with the results? Cast your vote for the correct form by clicking on the [vote] link to the right of the form above!

But here too, some problems arose in my sample. First of all, only a handful of doubtful words had any votes. Second, many of the error types identified above do not admit of voting. And third, those that did have votes did not always benefit from having them. Here is the entry on the word rates in ut pelagus tenuere rates (5.8), showing a preference for the (incorrect) accusative, despite nine user votes for the (correct) nominative.

 

On the word pater in Quidve, pater Neptune, paras? (5.14), ten incorrect user votes for the nominative win out over the (obviously correct) vocative.

More common, however, is the lack of any user votes at all, as in this very confusing jumble of information on the word hoc (5.18). Note that the correct lemmatization (> hic) has a nonsensical definition; that the morphological analysis states it can only be a pronoun (“pron.”) whereas here, as often, it is a demonstrative adjective; and finally that the LWST incorrectly concludes that the form derives from the lemma huc.

Another odd and thankfully rare genre of error occurs in the case of deinde (5.14), which is correctly analyzed, but put beside a fictional alternative, the present imperative of a verb *deindo.

I would like to know if the same level of error and types of errors occur when LWST is unleashed on a prose text. Perhaps there the idea of a “prior-form frequency evaluator” would make more sense.

It is not my intent to denigrate the huge achievements of Perseus in our field. It is certainly better to have the LWST than not to have it. My purpose here is just to investigate the nature and extent of its errors. If this sample is at all representative, something along the lines of 3.5 million errors exist in the current database. I would also like to ask, is it realistic to think that qualified people can be found to correct the mistakes of the LWST? What is the incentive for professional Latinists to do so?

I also have a proposal for a different kind of tool, which I will save for another post, since this one is already too long. Your thoughts?

–Chris Francese