Do the Flaws in the Perseus Word Study Tool Matter?

In a recent post I tried to categorize the problems of the Perseus Word Study Tool, as tested on a section of Vergil. More surprising to me than the overall rate of error (about one in three words was misidentified in some way) was the fact that many of the errors were not subject to correction by means of Perseus’ “voting” system; and that even when voting was in operation, it often did not correct the error. Sometimes the correct choice was not an available option; other times, unanimous correct votes were ignored, and unanimous incorrect votes were accepted. At Aen. 5.17, to add another example to those mentioned the earlier post, the vocative magnanime was incorrectly called an adverb on the basis of six incorrect user votes.

The inadequacy of the LWST will not have been news to anyone who has used it. The question is, is the level of error pedagogically significant? Is the LWST good enough for the purposes of a typical Latin student? In other words, should the average Latinist care? It is not good enough, and the level of error and the specific types of errors in this flagship classical DH project are pedagogically significant and worthy of attention, I believe, for several reasons.

1. Words that give students the most trouble–relative pronouns, demonstratives, quam, ut, modo, Q-words in general–are exactly those least likely to be handled well by the LWST. The earlier post has some examples from my small sample, but I’ll add here that in Aen. 5.30 (magis . . . ) quam, when it comes to that quam, the LWST offered no fewer than seven possible quams to choose from (all numbered quam 1-7), none of which has the correct definition in the context (“than”).

2. The LWST is of course helpless when it comes to unusual or idiomatic expressions, of which there is a good example in my sample at 5.6, were notum must be translated “the knowledge that.”

3. The tool naturally can analyze only what is there. It cannot tell when something is left out or assumed.

4. A major structural problem is represented by bad short definitions of the type (to choose again from examples offered by my sample)  iubet = “imposed,” iam = “are you going so soon,” frustra = “in deception, in error,” or more subtly, the fact that the common meaning of tendere, “direct one’s course,” does not appear in the short def. for that word.This is important because, even though one can click on and read the full Lewis & Short dictoinary definition, intermediate students are very unlikely to click through and sift through long entries in search of the correct definition.

5. Moreover, the LWST obscures the relationships between words, which is key to learning to read Latin. This is why seemingly minor accidence mistakes are meaningful. Misled on a part of speech, or the gender of an adjective or the case of a noun, the student will likely not see the syntactical connection between words, and thus the tool reinforces the urge to produce the dreaded “word salad” translations.

6. More broadly, with its cryptic statistical data and jumbled pseudo-information, the LWST reinforces the the impression that many students have: that Latin isn’t really supposed to make sense anyway, that it’s all some kind of fiendish crossword puzzle.

Gregory Crane in an important article and apologia for Perseus, has said that the goal of the Perseus Project is to provide “machine-actionable knowledge.”

Reference materials, in particular, are structured to support automatic systems (e.g., the morphological analyzer learns Greek and Latin morphology from a machine actionable grammar) and to be decomposed into small chunks and then recombined to provide dynamic commentaries. If you retrieve a book in a language that you cannot read or on a topic that you cannot understand, the system can find translations where these already exist, machine translation and translation support systems, reference works, and general background information suited to the general background and immediate purposes of the reader. In knowledge bases, the boundaries between books begin to dissolve.

But clearly machines are spectacularly bad at understanding Latin at the moment. Crane thinks in terms of many decades, and is waiting for massive improvements in artificial intelligence, or teams of graduate students to encode correct grammatical analysis in texts. But such a prospect seems increasingly far off, and given the size of the Perseus Digital Library (10.5 million words at the moment), it seem unlikely that the millions of errors can be corrected any time soon, if ever. Indeed, would it be worth huge the investment of time and money? In the meantime, we need to create a collaborative tool for generating reasonably correct and reliable vocabulary lists for Latin (and Greek) authors that will be helpful for students and teachers around the world. Why we should do this, and what kind of tool I have in mind, will be the subjects of future posts.

–Chris Francese

 

Types of Error in the Perseus Latin Word Study Tool

The Perseus Latin Word Study Tool (LWST) is intended to provide dictionary definitions and grammatical analysis of all words in the Latin texts available in the Perseus Digital Library, currently 10.5 million words.

A check of the definitions and grammatical analysis of an arbitrarily chosen chunk of Vergil’s Aeneid (5.1-34, 223 words), found that it was incorrect in 79 instances, or 35.4% of the time (and correct 64.6% of the time). The most common type of error (21 instances,  26.6% of all errors, 9.4% of all words) was a mistake of accidence, for example duri (5.5) was taken as genitive singular instead of nominative plural. In 17 cases (21.5% of errors, 7.6% of all words) words were assigned to the wrong lemma, as when quoque (“and whither”) was derived from quoque (“also, too”), or venti (“winds,” 5.20) was assigned to the verb venio, “come,” as if it were the perfect participle. This particular mistake occurred three times in this passage, and the correct lemma was not listed as a possible option. In 14 instances (17.7% of errors, 6.3% of all words) the dictionary definitions provided were wildly wrong. This was true of some very common words. iam was glossed as “are you going so soon,” nec as “and not yet,” ab as “all the way from.” Elissae (5.3) was glossed as “Hannibal.” In every case this type of error was seen to come from the pulling, seemingly at random, of a word or phrase from the dictionary of Lewis & Short on which the LWRT is based. In 11 instances (13.9% of errors,  4.9% of all words), the relevant definition in the context at hand was not provided (though it could be found by clicking to and reading through the full Lewis & Short dictionary entry). For example, cerno was glossed as “separate, part, sift,” but not “perceive,” or infelicis (5.3) glossed as “unfruitful, not fertile barren,” rather than “unfortunate.” More seriously, all relative pronouns were glossed as interrogatives (“who? which? what? what kind of a?”), and described simply as “pron.” The word “relative” did not appear on the page. In 8 instances (10% of errors, 3.6% of all words) a word was assigned to the incorrect part of speech, as when medium (5.1) was called a noun rather than an adjective, or locutus (5.14) assigned to the rare 4th decl. noun “a speaking” rather than to loquor. In 4 cases (5% of errors, 1.8% of all words), there was no definition available. And in all cases deponent verbs were incorrectly labeled passive (4 instances in this particular section, or 5% of errors, 1.8% of all words).

Now, the makers of Perseus are perfectly aware of the flaws in LWST, and attempt to use the power of social media of help remedy the situation. Subjoined to the analysis of every ambiguous word, after an explanation of the methodology used, one finds a plea to help by voting.

The possible parses for this word have been evaluated by an experimental system that attempts to determine which parse is correct in this context. The system is composed of a number of “evaluators”–each of which uses different criteria to score the possibilities–whose votes are weighted to determine the best answer. The percentages in the table above show each evaluator’s score for each form, which are then combined to determine each form’s overall score.
This selection used the following evaluators:
• User-voting evaluator: Scores parses based on the number of votes each one has received from users. Weighted more heavily as more users vote for a given word in a text.
• Prior-form frequency evaluator: Evaluates forms based on the preceding word in the text; finds the most likely parse among this word’s possible morphological features and the preceding word’s possible features based on the frequency of each possible pair.
• Word-frequency evaluator: Scores parses based on how often the dictionary word appears in the Perseus corpus. Only used when a given form could be from more than one possible word.
• Tagger evaluator: Evaluator based on pre-computed automatic morphological tagging
• Form frequency evaluator: Scores parses based on how often their morphological features (first-person, indicative, plural, and so on) occur among all the words in the Perseus corpus.
User votes are weighted more heavily than the other methods, which are all treated equally.
Don’t agree with the results? Cast your vote for the correct form by clicking on the [vote] link to the right of the form above!

But here too, some problems arose in my sample. First of all, only a handful of doubtful words had any votes. Second, many of the error types identified above do not admit of voting. And third, those that did have votes did not always benefit from having them. Here is the entry on the word rates in ut pelagus tenuere rates (5.8), showing a preference for the (incorrect) accusative, despite nine user votes for the (correct) nominative.

 

On the word pater in Quidve, pater Neptune, paras? (5.14), ten incorrect user votes for the nominative win out over the (obviously correct) vocative.

More common, however, is the lack of any user votes at all, as in this very confusing jumble of information on the word hoc (5.18). Note that the correct lemmatization (> hic) has a nonsensical definition; that the morphological analysis states it can only be a pronoun (“pron.”) whereas here, as often, it is a demonstrative adjective; and finally that the LWST incorrectly concludes that the form derives from the lemma huc.

Another odd and thankfully rare genre of error occurs in the case of deinde (5.14), which is correctly analyzed, but put beside a fictional alternative, the present imperative of a verb *deindo.

I would like to know if the same level of error and types of errors occur when LWST is unleashed on a prose text. Perhaps there the idea of a “prior-form frequency evaluator” would make more sense.

It is not my intent to denigrate the huge achievements of Perseus in our field. It is certainly better to have the LWST than not to have it. My purpose here is just to investigate the nature and extent of its errors. If this sample is at all representative, something along the lines of 3.5 million errors exist in the current database. I would also like to ask, is it realistic to think that qualified people can be found to correct the mistakes of the LWST? What is the incentive for professional Latinists to do so?

I also have a proposal for a different kind of tool, which I will save for another post, since this one is already too long. Your thoughts?

–Chris Francese

Getting social

NITLE‘s Senior Fellow extraordinaire Bryan Alexander stopped by Dickinson on June 28 to help a working group of faculty staff and administrators crafting a digital humanities grant proposal. As part of his presentation he kindly gave us some expert feedback on the DCC site. He liked the design very much and appreciated the foregrounding of our editorial committee, but pointed out that out current design is not very mobile-friendly; that the search field is rather hidden and inadequate; and that the site itself is not interactive. And he asked what we were doing to promote awareness of the site, to which I didn’t have much of an answer.

Bryan’s main suggestion on that front was to create more of a presence in social media. I had been intending to do this, but hadn’t really focused on how important it is if the site is actually going to get used. There’s a lot going on in the interweb, and you can’t just sit back and assume the relevant audience will find you. I started up a new DCC Facebook page, and a Twitter feed (@DCComm) as well. I must say I am much more comfortable on FB than on Twitter. But the classical Twitterati are very energetic and supportive, and I’m starting to understand why people like the medium.

As for getting word out in a more traditional way, I will be speaking in several places in the fall on the DCC and the Digital Humanities principles that drive it: The University of Illinois Urbana-Champaign (Sept. 14), The University of Virginia (Sept. 22 and 23), The University of Pennsylvania (October 18), and in Amsterdam at the European Society for Textual Scholarship’s 2012 conference ‘Editing Fundamentals’ (November 22-24). I am very much looking forward to all the feedback and suggestions I will get, and am hoping to lure some more collaborators into the project as well.

–Chris Francese

2012 Summer Research Assistants

in basement of Bosler Hall

Summer research assistants (left to right) Merri Wilson (’13), Alice Ettling (’12), Jimmy Martin (’13), and Derek Frymark (’12), are hard at work improving the DCC, in space generously provided by Dickinson’s Media Center. The first order of business is refining and organizing the core vocabulary lists that will form the set of words not glossed in the running lists. Much of the rest of the summer will be taken up with  creating the site for Prof. Turpin’s edition of Ovid’s Amores, Book 1. Funding for their work is provided by the Roberts Fund for Classical Studies at Dickinson.

Summer Plans

Summer 2012 will be our most active yet. Under the overall direction of Chris Francese, four Dickinson students and recently graduated alumni will be on hand for eight weeks: Alice Ettling (’12), Derek Frymark (’12), Meredith Wilson (’13) and Jimmy Martin (’13). Dickinson Adjunct Faculty member Joanne Miller will also be helping with the editing. The primary tasks will be:

  • the organization of the core vocabulary list into categories based on frequency, morphology, and meaning, to make them easier for students to use.
  • the creation of the Ovid, Amores 1 site on the basis of William Turpin’s excellent notes and introduction
  • the putting of some finishing touches on the Caesar and Sulpicius Severus sites.
  • the preparation of a print version of the Caesar site

Meanwhile, Bart Huelsenbeck, Postdoctoral Fellow in Digital Classics, with arrive in Carlisle in July with his family. Bart will spend the next academic year working on the DCC, teaching two courses, and working on his own exciting projects centered around the scribes at monastery of Corbie, the renowned French scriptorium responsible for the preservation of a wealth of classical Latin texts.

 

Beyond the Blackboard

Very nice piece by Matt Getty in the Dickinson Magazine on digital scholarship at Dickinson prominently features the DCC. My favorite paragraphs:

One important lesson Willoughby Fellows learn is that when it comes to using the latest gadgets and electronic media in the classroom, what you leave out can be just as important as what you bring in. Take those limitless possibilities that the Web opened for the Dickinson College Commentaries. As Francese and his students considered what to include, the infinite margins yawned before them as both opportunities and challenges.

“The temptation is just to put everything in there because you can,” Francese explains. “But you have to resist the urge to add all the bells and whistles just for their own sake. If you put everything in there, it’s overwhelming. It’s not helpful to the reader. We took our cue from Steve Jobs. You have to keep the user experience in mind at every moment and ask yourself, what would be useful here? What would a reader really want to know right now?”

Chris Francese is working with Alice Ettling ’12 and other students to bring ancient Latin texts into the 21st century through the Dickinson College Commentaries. The online, peer-reviewed site is helping Latin scholars around the world, but the project also has had an impact on the students working on it. “It’s definitely deepened the learning experience for me,” says Ettling. “I did a lot of work structuring vocabulary lists, which called my attention to how students learn vocabulary. That helped me refine my own approach to how I learn.”

Laura Gibbs on the DCC Latin list

Indomitable force for digital classics good Laura Gibbs has adopted the DCC Latin core vocabulary list as the basis for her work presenting vocabulary for her enormous collection of neo-Latin distichs. Herblog post on the subject is full of very kind words about our project. What I most like about the way she presents the distichs is her introductions, just enough to give you some orientation. Thanks, Laura, for spreading the word about the DCC and the vocabulary list.


Postdoctoral Fellow in Digital Classics Hired

Great news, Bart Huelsenbeck will join the DCC project as Postdoctoral Fellow in Digital Classics at Dickinson for the 2012-13 academic year. He will teach two courses during the year, work on his own research projects (a large-scale investigation into manuscripts copied at ninth-century Corbie, a French scriptorium responsible for the preservation of a wealth of classical Latin texts), and contribute to the Dickinson Commentaries Project. 

We are grateful to the College for making this exciting position possible. Bart holds the PhD from Duke University (2009), and brings a wide and deep set of skills to the job. He has taught for thirteen years in many contexts, from high school and junior high schools (inner city and suburban) to graduate level Latin courses at Cornell. His recent course “A History of Reading” at Cornell addressed, among other things, the question of what digital technologies are doing to us–to how we read, behave, and think.

As a longtime contributor to the Center for Hellenic Studies Homer & the Papyri database, a significant component of the Homer Multitext, Bart has worked for years to connect classical antiquity to the present revolution in digital technologies. We are delighted that Bart will be joining us, and look forward to the substantial contribution he will make in shaping the future direction of the project.