Desmond Schmidt’s recent article in the Journal of TEI about how to create a truly portable and interoperable digital scholarly editions came at an opportune time for me. DCC is entering into a relationship with Open Book Publishers in Cambridge to exchange our (Creative Commons licensed) content. They will publish some of our commentaries as books and eBooks, and we will publish some of their book commentaries as multimedia, web-based editions. But how to actually make the transference?
We are starting by delivering Bret Mulligan’s commentary on Nepos’ Life of Hannibal. OBP needs it in a format they can use and set in InDesign and publish in EPUB. But how should the transfer happen? How can we actually share the open licensed scholarly content of DCC so it can actually be re-purposed and pe-published in different formats? Not easily, it turns out. Our commentaries are just html pages in Drupal, not XML based and TEI tagged documents, and thus, in the view of one early critic of the project, “not truly digital.” XML-TEI is intended as a universal standard for editing and tagging documents of all kinds, and not adopting that for our project was at the time a decision based on cost. Anyway, after various investigations on the OBP side it turned out the best way for us to get our commentaries is to OBP deliver the via . . . wait for it . . . Microsoft Word–with all the labor and possibilities for error that that involves.
Wouldn’t things be better if our texts were marked up in XML-TEI? No, according to Schmidt. He argues, in effect, that TEI is actually hindering the sharing of digital scholarly editions. The problem is the subjectivity of TEI tagging and the diversity of the tags themselves, which in Schmidt’s view makes true interoperability of scholarly editions in TEI a pipe dream. The solution he proposes, as I understand it, is to get all the tags and metadata out completely and into separate files, preserving the text as plain text (in multiple versions if we are dealing with revisions or variants). He is evidently developing an editing environment which ends up creating zipped files that completely separate the text itself, annotation data that points back to the text, and metadata. A few choice quotes:
Syd Bauman (2011), one of the original editors of TEI P5, has since observed that interoperability of TEI-encoded texts today—that is, the exchange of unmodified TEI files between different programs—is “impossible.” (9)
One obvious remedy to this problem is to remove the main source of non-interoperability, namely the embedded markup itself, from the text. By removing it, the part which contains all the significant interpretation can later be added or substituted at will. (21)
What remains when the markup is removed is a residue of plain text that is highly interoperable, which can be exchanged with other researchers, just as the files on Gutenberg.org are downloaded by the tens of thousands every day (Leibert 2008). However, if one suggests this to someone who regularly uses TEI-XML, the immediate objection is made that this will solve nothing, because even plain ASCII texts are still an interpretation of what the transcriber sees on the page (e.g. Sperberg-McQueen 1991, 35). This point, although valid to a degree, misses an important distinction. (22)
And it goes on in this interesting vein. I would love to hear from people who are wiser and more experienced than I am about Schmidt’s critique of embedded TEI annotation and his proposed solution. In the meantime, I need to go format some stuff in Microsoft Word.
Why not to share the materials in PDF? I just have searched the Drupal community and there are tools to export Drupal pages to PDF. This would be pretty suiting format.
Thanks for the comment! Pdf is actually part of the plan with Open Book Publishers. We’ve always been proud that are stuff is not “just a pdf,” and that is has lots of links and multimedia. But I recognize that people would sometimes like something static, especially to print out. We’ve been much more focused on developing dynamic digital content, and don’t have the skills to type set well, so the partnership with OBP is really attractive for this reason. They can do the pdfs and do them well.
Have you seen Pandoc? It can read HTML and DOCX (among other things) very well, and can spit out wonderfully clean files in EPUB, InDesign format, you name it. Absolutely brilliant (and designed by a philosophy professor).
The Schmidt article is interesting, but I feel that he is offering a technological solution to a political problem. We make editions the way we do not because it’s the best way of doing it, but because there is almost nobody who publishes peer-reviewed digital editions, and we haven’t built up enough of a critical mass to make it worthwhile to go beyond our print-oriented software. Similarly, TEI isn’t interoperable so much because of the technology (though there are some issues there) as because we simply haven’t agreed on what information needs to be included in a typical digital edition, or come up with a standard way of modelling that information.
Thank you very much, Andrew, I had not seen Pandoc. I will give it a spin! And you are definitely singing my song when it comes to the _human_ aspects of this problem. It’s one thing to design the house and fix the plumbing. Who is going to move in? Peer review is the essential piece, the sine qua non for long term flourishing, and that’s a big part of what DCC is trying to provide.
I second Pandoc. Have you heard of Markdown? You may find it useful if you want to store texts in plain text format but still need to keep some formatting information. Pandoc can convert to just about anything, including HTML, Word, and PDF via LaTeX.