A New Latin Macronizer

Felipe Vogel has released a new Latin macronizer, Maccer, and I thought I would take it for a spin and share the results. It works based on a database of previously macronized Latin texts (some provided by DCC), and is still in development.

For my test I figured I would use an unusual text I have been working on lately, Historiarum Indicarum Libri XVI, about the Portuguese exploration of the Far East in the 16th century. It was published by the Jesuit humanist Pietro Maffei in 1588, and the Latin is excellent and full of interest. Book 6 is a fascinating ethnography of China, informed by reports from Jesuit missionaries who visited and lived in China over a number of years. The last print edition was 1751: Joannis Petri Maffeii Bergomatis E Societate Jesu Historiarum Indicarum Libri XVI (Vienna: Bernardi, 1751), and thanks to a tip from Terence Tunberg (who introduced me to this text) I tracked it down on the site of the Dresden Library. Since there is no fully digitized text, my students and I transcribed Book 6 this past fall. Here is an excerpt, with no macrons.

E Sinarum provinciis maxime occidua est Cantonia. Eo priusquam pervenias, multae occurrunt insulae; quas praefecti regii praesidiis et classibus tenent: neque ipsorum iniussu progredi advenas Cantonem est fas. Fernandus Andradius, ut exponere coeperam, cum ad Tamum insulam pervenisset, post diuturnam moram, transitu aegre tandem impetrato, cum duobus expeditis et egregie ornatis navigiis, cetera classe ad Tamum relicta, Cantonis portum invehitur, ac magistratuum permissu Thomam legatum exponit, cui aedes et lautia de more attributa. Ibi Fernandus, mira lenitate ac iustitia contrahendo cum incolis, haud ita difficili negotio aditum ad ea commercia nostris aperuit.

With Vogel’s macronizer this becomes

Ē ✖Sinarum prōvinciīs maximē ✖occidua ✪est ✖Cantonia. Eō priusquam perveniās, multae occurrunt īnsulae; quās ✖praefecti ✖regii praesidiīs et classibus tenent: neque ipsōrum ❡iniussū prōgredī ✖advenas ✖Cantonem ✪est fās. ✖Fernandus ✖Andradius, ut expōnere ✖coeperam, cum ad ✖Tamum īnsulam pervēnisset, post diūturnam moram, trānsitū aegrē tandem ✖impetrato, cum duōbus expedītīs et ēgregiē ✖ornatis nāvigiīs, cētera classe ad ✖Tamum ✪relictā, ✖Cantonis portum invehitur, ac magistrātuum ❡permissū ✖Thomam lēgātum expōnit, cui aedēs et ✖lautia dē mōre ❡attribūta. Ibi ✖Fernandus, ✒mīrã ✖lenitate ac iūstitia ✖contrahendo cum incolīs, haud ita ✖difficili negōtiō aditum ad ✒eã commercia nostrīs aperuit.

The symbols mean this:

unknown word, i.e. not yet in Vogel’s database.
ambiguous: uncertain vowels marked with a tilde (~).
guessed based on frequency.
prefix or enclitic detected attached to a known word.
invalid characters detected.

I made sixteen corrections in 92 words.

21 words were flagged as unknown, 10 of those were proper names (Sinārum, occidua, Cantonia, praefectī, regiī, advenās, Cantonem, Fernandus, Andradius, coeperam, Tamum, impetrātā, ornātīs, Tamum, Cantonis, Thomam, lautia, Fernandus, lēnitāte, contrahendō, difficilī). I made 9 corrections in that group, leaving alone most of the proper names for now.

3 words were guessed based on frequency, all correctly (est, est, relictā).

3 words were marked as “prefix detected,” all correctly macronized (iniussū, permissū, attribūta)

2 were marked as having invalid characters (mīrā, ea), had tildes over the vowel, and had to be corrected by hand.

Only two words were incorrect but not flagged as in any way problematic (cēterā, iūstitiā). In both cases it was an ambiguous first-declension -a. The other vowels in those words were correct.

The hand-corrected result is as follows:

Ē Sinārum prōvinciīs maximē occidua est Cantonia. Eō priusquam perveniās, multae occurrunt īnsulae; quās praefectī regiī praesidiīs et classibus tenent: neque ipsōrum iniussū prōgredī advenās Cantonem est fās. Fernandus Andradius, ut expōnere coeperam, cum ad Tamum īnsulam pervēnisset, post diūturnam moram, trānsitū aegrē tandem impetrātā, cum duōbus expedītīs et ēgregiē ornātīs nāvigiīs, cēterā classe ad Tamum relictā, Cantonis portum invehitur, ac magistrātuum permissū Thomam lēgātum expōnit, cui aedēs et lautia dē mōre attribūta. Ibi Fernandus, mīrā lēnitāte ac iūstitiā contrahendō cum incolīs, haud ita difficilī negōtiō aditum ad ea commercia nostrīs aperuit.

I would call this very good results, and it should be possible to do even better given a larger database. In theory we could do even better than that by marrying a parser and a dictionary like LaNe that has quantities accurately marked. If all goes well I hope to embark on such a project this fall with the help of a Dickinson Computer Science senior student. The other thing I would like to see is an editing environment that would make inserting macrons as easy as clicking on the vowel. This would really help in the inevitable process of hand correction.

Thank you Felipe, for this amazing tool!