Improving Perseus

The flagship digital classics site Perseus is preparing to re-design its interface, amidst a whirlwind of infrastructure upgrades, tool development, and ambitious plans for multilingual support. It’s a daunting task, and in acknowledgment of the difficulty project director Gregory Crane has floated a draft RFP, with a tentative list of desiderata, for public comment. It is extraordinary and wonderful to invite the whole user community to comment on the development of a site that is so central to digital classics, indeed digital humanities itself, at such an early stage of the design process. So . . . here are my thoughts, offered with the utmost respect for the revolutionary impact of Perseus on our field and on digital humanities, and the massive contribution Perseus makes to global learning about the Greek and Roman classics.

It’s no secret that many users have been unhappy with the existing Perseus interface for a long time. Old concerns with speed seem to have been addressed. But navigation issues remain. The Word Study Tool continues to be inadequate. Translation and commentary content continue to be outdated. Aesthetics leave a lot to be desired. And the glut of information on the page that is often of unclear value and relevance to readers continues to be a major concern. How to proceed?

Users

It’s crucial to let an awareness of the audience drive the design discussion. Crane defines three types of users: a) advanced researchers; b) somewhat knowledgeable students; and c) readers who have no knowledge of a language at all but want to study a text as deeply as possible. Which pieces of Perseus content will be each be most interested in? Professional scholars have historically had little interest in, and even hostility towards, Perseus, which was not originally conceived with them in mind, has little to offer them, and which they often perceive as a way for their students to avoid learning morphology, and a source of misinformation about morphology and poor translations. The plans articulated in the RFP, with their focus on treebanking, linked data infrastructure, continued reliance on automatic parsing tools, and no discussion about updating text and translation content, don’t seem set to change that. The professional audience also has access to research libraries and high quality, edited databases like TLG, LLT, TLL, LCL, and Brills New Pauly, which far surpass Perseus in terms of accuracy and completeness. Somewhat knowledgeable students are the core constituency. They typically need accurate texts, translations, and word-level definitions and parsing. A huge boon to this group is Perseus’ digitization of older but still very valuable encyclopedias, such as Smith’s Dictionaries (e.g. this), and the various lexica. The total neophytes would also value word-by-word definitions and parsing, analogous to the interlinear trots of an earlier age, but badly need concise and consistently accurate dictionary entries, which Word Study Tool does not yet provide. An audience implicit in Crane’s whole discussion is the global, non-English speaking audience who would like to encounter classical texts with helps in their native languages, and not have to go through English. This is a massive undertaking, given the lack of legacy reference works of the kind on which English Perseus is based. It would involve Russian 5-year-plan style mobilization of scholarly time and effort, and will be the work of many decades. So it seems unwise to make design decisions now for an audience for whom you don’t yet have much in the way of content. Another implicit audience is corpus linguists. But this is a very small audience and not worth catering to in terms of design decisions.

So from a design perspective it seems imperative to focus on the needs of the intermediate student or self-taught learner who wants to encounter texts in historical languages. What resources does Perseus provide to that audience?

Content:

  • Original language texts: a major service provided by Perseus, the crown jewel.
  • English Translations: often seriously outdated or even (in cases such as the translation of Ovid’s Amores by Christopher Marlowe) downright archaic. There are also many gaps (see below). Sometimes good contemporary translators have contributed their work (Vincent Katz for Propertius and Anne Mahoney for Sulpicia). 
  • Commentaries: seriously outdated, except in cases where good scholars have contributed material, such as Jim O’Donnell’s notes for Boethius’ Consolation of Philosophy. Some of the older material is still valuable for specialists, e.g. T. Rice Holmes on Caesar.
  • Grammars: very valuable, but not easy to navigate, and not effectively tied to individual passages that might need elucidation
  • Encyclopedias: very valuable, but not easy to navigate, and not effectively tied to individual passages that might need elucidation. The navigation and searching in Smith’s invaluable mythological and biographical dictionary is particularly bad (try searching, for example, for Ajax or Helen)
  • Lexica: supremely valuable, but not easy to navigate. Perseus’ digitization of lexica has been one of its most significant contributions. Logeion has in essence fixed the navigation and interface problems Perseus (adding new content, too) and become a fundamental part of the field for all the above-mentioned core audience, and specialists as well.
  • Textbooks, such as Benner’s selections from the Iliad and Allen & Greenough on Caesar.

Tools:

  • Word Study Tool (pop-up dictionary and parsing tool that activates on clicking a word). This is perhaps the most controversial item, the heart of the digital services Perseus provides, but the source of much of the distrust from professionals and love but also frustration from students. The new way forward is going to be via Alpheios and treebank data, with which I am not familiar enough to comment. In my opinion, though, we’re still many years away from a reliable automatic parser, even though some texts, like Homer, are fully parsed by humans and ready to go. One current issue is that the Word Study Tool sometimes directly contradicts definitions and parses in handmade notes like those of O’Donnell.

So, prima facie, if I were setting out to improve Perseus, I would try to serve that core audience of students and autodidacts by a) finding or commissioning competent, up-to-date translations of classical works; b) commissioning commentary content that explains the texts for learners and connects it thoughtfully to the various reference works; c) improving the accuracy of the word study tool; d) improving the interfaces of the grammars and encyclopedias, to do for them what Logeion did for the lexica; e) digitizing better, author-specific lexica so learners have just the information they need to read, say Xenophon or Cicero, not the firehose of a large lexicon or the very unreliable scattershot of “short defs” (a world in which the Latin scribo [“write”] means “to scratch, grave, engrave, draw”).

Improving the interface, not the content, is the focus of the RFP, so I’ll take some of the issues raised there, in order.

Chunking and Browsing

Perseus confronts an important problem: how do we divide up and tag classical texts so as to allow individual passages to be located easily in a digital environment? This key infrastructure and navigation issue is also being worked on by Harvard University Press and the Loeb series. Perseus is focused on the emerging standard CITE architecture which will create a new, machine readable reference system for classical texts. But there is also the existing “system”—chaotic, not readily machine readable, but very widely used.  Ideally, readers should be able to take a citation they find in their reading (e.g “Tertullian, On the Shows 22”), plug it into a search box, and find the relevant primary text in the original and translation, so as to check the accuracy of the use of the primary text in the scholarly literature (or for that matter Wikipedia or elsewhere on the internet). It is hard to overstate the existing barriers to this basic, crucial scholarly and intellectual process on the internet. Students without specialized knowledge cannot readily do it. I recently charged a class of 35 undergraduates in an introductory course taught in English to look up and check a single scholarly reference of their choice from an article  (one which didn’t use that many abbreviations and was written for a general audience). I asked them simply to find the original source, read it, and say whether the primary source backed up the point the scholarly author was making. Only 6 of the 35 were able to find what they were looking for successfully on the first try, and one of the main obstacles is that you can’t just go even to the Loeb Digital Library (much less the open internet) for a mainline classical text and put in a citation and find a translation. If you have specialized knowledge of classical texts, or unusual tenacity, you can do it, but that is not the way things should be in the age of Perseus. So I would prioritize this, and work if possible with Harvard UP to develop standard tags that reflect traditional reference systems, in addition to working on the CITE URN system for the long term.

I would also like to put in a plug for the virtues of the traditional “card” breaks of Perseus. In the proposal this is treated as something of a holdover from primitive versions of Perseus, but in fact such medium-size chunking, though somewhat arbitrary and not as precise as sentence by sentence or line by line systems of reference, carries distinct advantages. One unsolved problem in digital classics is the aggregation of commentary traditions. Notes in the existing classical commentary tradition are often, but not always, tied to particular words via a lemma (specific words from the source text repeated at the beginning of the comment). So ideally you would want to see all the comments on a particular word. But the fact is that editors used no standard system of lemmas, and often commented on ranges of lines, not specific words. So an agreed-upon card chunking would be immensely useful for aggregating notes in a sensible way that really catches all the relevant material. DCC has adopted Perseus card chunks as standard, and I think they should not lightly be abandoned.

Word frequencies

“We need information about word frequencies—this is a very important function for critical reading.” (p.10) Important for corpus linguists perhaps, but not for most readers. The main issue the core audience would want to know is: is this word common (one that I should memorize or write down) or is it unusual (and hence not worth the time focusing on now).  The focus of Perseus on statistical word frequencies (themselves based on the often faulty parses of the WST), and the devotion of screen space to this, is an example of catering to the vanishingly small corpus linguist audience. The Max/Mix figures are confusing rather than illuminating for most people. I did not understand them fully until I read the explanation in this RFP I would remove all this information to some secluded spot where the interested can find it.

Left hand workspace

Metadata at the top: “Do we even need this? Does it deserve this scree real estate?” (p. 13) No, definitely not.

Canned searches: “Do we need this on the left hand side?” No.

Table of contents: left nav like this seems to be standard web design. Removing all the stuff above will put it in its rightful, prominent place. I would also remove the browsing bar navigation above, which is not standard web design, is not terribly attractive, and which I personally rarely use. Left nav is sufficient.

Right hand work space

Focus/Load: “this is a very attractive feature.” Agreed.

“Provide an index of relevant works that cite the focus text” (p. 18) This References panel seems like information glut to me. To actually utilize this information to interpret a given passage requires time and skill and courage beyond what most users will possess. I consider this to be clutter, to be removed to some more discrete location.

Alignment with manuscripts: this seems too ambitious, and beyond what the core users of Perseus need to have. It makes sense as a separate project, like the Homer Multitext, which really is for specialists.

Here now is my personal list of desiderata, chosen based on what is not there now. I realize some of this may be in the works.

Texts lacking (e.g.):

Archimedes, Augustine (except for a few letters), Galen (only one treatise), Lactantius, Libanius, Orosius, Arnobius

Texts with no translations (e.g.):

Apuleius, Aelius Aristeides, Arrian, Augustus RG, Marcus Aurelius Meditations, Ausonius, Bede, Cicero De Oratore, De Re Publica; Cassius Dio, Dionysius of Halicarnassus, Eusebius, Greek Anthology, Juvenal, Lucian, Martial, Scriptores Historiae Augustae, Seneca the Younger (except Apocol.), Valerius Maximus

Outdated things, e.g.:

Aristophanes trans. (1907); Allen-Sykes commentary on the Homeric Hymns; Catullus translation; Horace Odes trans. 1882 by Conington; Lucretius trans. Leonard (1916)

Reference Desiderata

Good English-Greek and English-Latin dictionaries

These reflections are based on an admittedly rather hasty survey of what’s there now, and I am sure the Perseus team is working hard on many of these problems. But this is the direction I would take to simultaneously streamline the interface and enrich the content.

The lesson of Logeion is that we can help. Take some Perseus content and improve the navigation issues and whatever else you see that needs fixing. We did that with the Latin grammar of Allen & Greenough, and it has become one of the most popular parts of our site. If I had time and money, I would do that to every grammar that Perseus has digitized, and add Munro’s Homeric Grammar for good measure. Perseus has shown us how to build the future of classical studies. Let’s all contribute to making that future serve our scholarly communities.