Johan Winge’s New Latin Macronizer

Inscription_latine_avec_apex_extrait

image credit: Vincent Ramos via Wikimedia Commons

A new Latin macronizer has come on the scene, and it is superb. It should become an essential tool for Latin teachers and editors of Latin texts. The author is Johan Winge, who just completed his undergraduate studies in the Language Technology Programme at Uppsala University, supervised by Joakim Nivre. The macronizer is the result of his thesis work for the degree. I had the opportunity to give it a good test run recently, as I read the Ilias Latina along with about twenty Latin teachers at the Dickinson Summer Latin Workshop. I took the PHI text (Vollmer’s Teubner from 1913) of this 1070-line condensation of the Iliad into Latin hexameters, put it in a Word document, and ran it through Winge’s macronizer. We read the text together and spotted the cases where corrections were needed.

The claim on the site that “The expected accuracy on an average classical text is estimated to be about 98% to 99%” seems like no exaggeration. What makes Winge’s macronizer more effective that other tools such as Kevin Ryan’s Macron Helper or Felipe Vogel’s māccer is that it does not work on the basis of a database of previously macronized forms. Rather, it uses a part-of-speech tagger (RFTagger) trained on the Latin Dependency Treebank, and with macrons provided by a customized version of the Morpheus morphological analyzer.

You’ll have to read Johan’s thesis, Automatic Annotation of Latin Vowel Length, to get all the technical details. I’ll just say that it performed splendidly on the Ilias Latina. Here is a typical stretch, lines 344-374, with the errors highlighted:

dumque inter sēsē procerēs certāmen habērent,
concilium omnipotēns habuit rēgnātor Olympī 345
foederaque intentō turbāvit Pandarus arcū,
tē, Menelāe, petēns; latērīque volātile tēlum
incīdit et tunicam ferrō squāmīsque rigentem
dissecat: excēdit pugna gemebundus Atrīdēs
castraque tūta petit; quem doctus ab arte paternā 350
Paeōniīs cūrat iuvenis Podalīrius herbīs
itque iterum in caedēs horrendaque proelia victor.
armāvit fortēs Agamemnonis īra Pelasgōs
et dolor in pugnam cūnctōs commūnīs agēbat.
bellum ingēns oritur multumque utrimque cruōris 355
funditur et tōtīs sternuntur corpora campīs;
inque vicem Trōumque cadunt Danaumque catervae.
nec requiēs datur ūlla virīs; sonat undique Mavors
tēlōrumque volant cūnctīs ē partibus imbrēs.
occīdit Antilochī rigidō dēmersus in umbrās 360
ēnse Thalysiadēs optātaque lūmina linquit.
inde manū fortī Grāiōrum terga prementem
occupat Anthemiōne satum Telamōnius Aiāx
et praedūrātō trānsfīxit pectora tēlō:
purpureō vomit ille animam cum sanguine mixtam, 365
ōra rigat moriēns. tum magnīs Antiphus hastam
vīribus adversum cōnātūs corpore tōtō
torquet in Aeacidēn: tēlumque errāvit ab hoste
inque hostem cecidit, trānsfīxit et inguine Leucōn:
concīdit īnfēlīx prōstrātus vulnere fortī 370
et carpit viridēs moribundus dentibus herbās.
†impiger †Atrīdēs cāsū concussūs amīcī
Democoonta petit tēlōque adversā trabālī
tempora trānsadigit …

You will note that of the 11 “mistakes” on this page, only one (Mavors) is a genuine error. All the others are simply ambiguous forms, issues that need to be decided by a human. Virtually all of the cases that did not fall into the category of “ambiguous forms that need to be decided by a human” were Greek proper names, in which this text abounds. For some reason the form Achillis consistently came out with a long mark on the final vowel. Paris came out with a final macron twice, but without it three times. There were quantity issues with Nereus, and his daughters.The strange form mēō emerged at line 851. But virtually all the time, with all ordinary Latin words, the macronizer performed brilliantly. The greatest delight was seeing it correctly macronize the phrase rēbus in artīs (line 968), where the final word almost always has a short “i”–but not here. That will have been the result of the Treebank data, I am guessing.

Mr. Winge, I salute you!

Postrcipt 7/21/15: Johan writes that his source code is now available.  Also, the picture I posted originally is not of him but of his friend Francesco Veneziano. Apologies to both Johan and Francesco for that one!