Monthly Archives: April 2015

Mellon Grant Interim Report 2015

Dickinson College received a $700,000 grant in December 2012 from The Andrew W. Mellon Foundation for use over approximately four years to support faculty and curricular development in the digital humanities. The Mellon Foundation provided project funding to support the following: 1) a one-course reassigned time for the faculty chair of a digital humanities advisory board to guide the initiative; 2) a postdoctoral teaching fellowship to help introduce the latest digital technologies, link Dickinson’s efforts to a larger community of scholars, and assist our Library and Information Systems (LIS) staff in defining needed future capabilities; 3) competitive internal grants for faculty to incubate significant expansion of existing digital projects and/or pilot the use of new tools in teaching and research, including providing student-faculty research opportunities; 4) an intensive program to better train undergraduate students for robust collaboration with faculty on complex digital projects; 5) a virtual “digital studio” to provide accessibility, visibility, and outreach for the best work being done at Dickinson in this field, 6) workshops with representatives of all humanities departments and with key all-college committees to enhance their capacity to support and evaluate digital work in the humanities and across the curriculum; and 7) work toward defining learning outcomes expected for Dickinson students with regard to digital humanities skills.

Here are some excerpts of the report prepared for the Mellon Foundation on activities completed in the second year of the grant, prepared by Chris Francese, Patrick Belk, and Cheryl Kremer:

Digital Humanities Advisory Committee

Over the past year our Digital Humanities Advisory Committee (DHAC), which is the key planning committee for this initiative, continued to meet regularly to guide and oversee all aspects of the project. The committee is currently comprised of seven faculty members: Chris Francese, Asbury J. Clarke Professor of Classical Studies (Chair); Susan Rose, Director of Community Studies Center and the Charles A. Dana Professor of Sociology; Matthew Pinsker, Associate Professor of History and Pohanka Chair in American Civil War History; Lynn E. Helding, Associate Professor of Music; Gregory Steirer, Assistant Professor of English and Film Studies; Sarah Kersh, Visiting Assistant Professor of English; and Patrick Belk, Postdoctoral Fellow in Digital Humanities. Also serving on the committee are five administrators who have a strong interest and/or connection to digital humanities work and to this grant project: Patricia Pehlman, Director of Academic Computing; Jim Gerencser, College Archivist; Todd Bryant, Language Technology Specialist; Ryan Burke, Web Development Specialist; Sarah Sheriff, Director of Online Marketing; and Cheryl Kremer, Director of Academic and Foundation Relations. 

The chair of DHAC (Professor Chris Francese) receives one course reassigned time each academic year through this grant to coordinate this multi-faceted initiative. Over the past year, he has used his reassigned time to organize and lead the committee’s work, to regularly update a robust portal for Dickinson digital humanities efforts, and to maintain an active blog with news and notes about ongoing DH projects and events.

Postdoctoral Teaching Fellow

Our first Mellon Postdoctoral Fellow in Digital Humanities, Matthew Kochis, accepted a tenure-track position at another institution last fall. We conducted a successful search for his replacement and hired Patrick Belk (Ph.D. in English from the University of Tulsa, 2012) who began at Dickinson in August 2014. Professor Belk organized and executed our second successful Digital Boot Camp in January 2015, established a Textual Studies Lab, collaborated on a number of faculty-led projects, and taught a course called Early Science Fiction in the Magazines (ENGL 101) for 22 students in the spring semester of 2015.

Professor Belk has maintained an active scholarly career during his time at Dickinson.  In February 2015 he delivered the manuscript for his first monograph, Empires of Print: Adventure Fiction in the Magazines 1899-1919, which will appear in print from Ashgate. He also continues to enhance the award-winning digital archive of early twentieth-century pulp magazines, The Pulp Magazines Project, using funding from our Mellon Foundation digital humanities grant to hire students to help him tag and create metadata for his magazine scans in TEI-compliant XML. In November 2014 he delivered a paper at the Modernist Studies Association Conference entitled “Baroness Orczy’s Eldorado (1913) in Africa.”

Our postdoctoral fellow also has been busy helping Dickinson faculty develop their own digital humanities projects and ideas. An excellent example of this work is a project with Jacob Sider Jost, Assistant Professor English, who is developing a web site entitled “18th Century Poets Connect,” which documents patronage, printing, and literary affiliation networks using data compiled over many years. Professor Belk helped Professor Jost and his student research assistant Mary Naydan ’15 reimagine the possibilities of his data, and he built an elegant interface in Drupal. He also helped faculty in the Departments of History and English—in addition to forging a new and productive partnership with Associate Professor of Computer Science Grant Braught to help steer senior computer science majors towards collaborative digital humanities projects for their senior theses. On the strength of his work with the Boot Camp and with faculty development, he has been invited to serve as a consultant for Guilford College as they plan to make the most of their own Mellon grant.

Professor Belk also helped to organize a two-day visit by Cliff Wulfman to Dickinson in April 2015. Wulfman is the co-founder of the Center for Digital Humanities at Princeton, which has attracted an extraordinary amount of interest. During his visit to Dickinson he presented on the topic “Thinking Big: Five Steps to Successful Digital Project Development.” Approximately 60 people attended, including representatives from other colleges in our surrounding area (among them Gettysburg, Bucknell, Lafayette, and Franklin and Marshall Colleges).  

Finally, Professor Belk also worked with Professor Francese to submit a proposal to Dickinson’s Space Utilization Committee advocating for a physical space for textual editing distinct from Dickinson’s existing Media Center. As a result, the college was able to commit space until the end of summer 2015 for a Textual Studies Lab, which is currently located in a room within our Waidner-Spahr Library. This space contains three work stations equipped with software for XML editing and OCR processing, and a digitization cradle manufactured by Professor Belk himself.

Digital Humanities Fund

Another major component of our digital humanities grant has been to review and award internal grants to support Dickinson faculty members interested in beginning or advancing their digital humanities efforts. The following is a list of grants awarded to Dickinson faculty through our Digital Humanities Fund since our last report, the vast majority of which have involved students in substantive ways.

Art and Art History

Melinda Schlitt

  • reassigned time to support work on curating images for a multimedia edition of Vergil’s Aeneid in development with Dickinson College Commentaries
  • student assistant to support work on curating images for a multimedia edition of Vergil’s Aeneid

Anthropology/Archaeology

Christofilis Maggidis

  • student summer assistant and rental of 3-D scanning equipment for documentation of finds and architectural remains at Lower Town in Mycenae, Greece

Classical Studies

Chris Francese

  • student summer assistant to gather notes and images for a multimedia digital edition of Vergil’s Aeneid
  • two academic year student assistants, one creating descriptions and metadata for images from an important illustrated Aeneid editios of 1502 in support of the multimedia Aeneid digital edition, and a second for the creation of a Database of Latin Grammar in Caesar’s De Bello Gallico
  • consultant Megan Ayer (Ph.D. Classical Studies, University of Buffalo) to edit and complete the digital version of T.D. Goodell’s School Grammar of Attic Greek (1902)
  • consultant Derek Frymark to complete running vocabulary lists for the whole of Vergil’s Aeneid based on Henry Frieze’s Virgilian Dictionary

English

Jacob Sider Jost

  • student assistant to support “Patronage, Print, and the Economics of Eighteenth-Century Poetry.” The resulting web site is called “18th Century Poets Connect.”

Wendy Moffat

  • student assistant to help create a searchable database for images and documents for her book project, A Disbelief in Obstacles: Three Prophetic Americans and the Great War [Site here.]

Greg Steirer

  • Atlas.ti software licenses for a student and an educational instructor to assist with a book chapter entitled “Bioware and the Politics of Video Game Authorship” (in progress)

French and Italian

Nicoletta Marini-Maio

  • student assistants to help develop the online, open-access peer-reviewed journal project entitled “gender/sexuality/italy

German

Sarah McGaugheySarah Bair (Education), and Todd Bryant

  • three student assistants to help create online language lessons for blended learning to be used with The Mixxer, the Dickinson-based social networking website for connecting students in foreign language courses with native speakers abroad who are studying English
  • travel to CALICO, the Computer Assisted Language Instruction Consortium, on May 6-10, 2014 at Ohio University in Athens, Ohio.

History

Crystal Moten

Emily Pawley

  • reassigned time to help oversee work on the Dickinson history web project [Site here.]

Matthew Pinsker

  • faculty consultant John Osborne to help supervise the development of the main House Divided research engine
  • two student assistants to help develop the multi-media projects and video tutorials for the new Lincoln’s Writings website
  • support for an E-book publication series, videotaped panels/exhibits, and Voice of Lincoln podcasts
  • stipends for advisory board members (David Blight, Catherine Clinton, Eric Foner, Harold Holzer, James Oakes, and Anne Sarah Rubin)

Karl Qualls

  • student summer assistant to support research on Russian immigration to the US in Prince Gagarin, the website: Russian Americans

Political Science/International Studies

Ed Webb and Todd Bryant (Academic Technology)

  • three student summer assistants to work on the creation of two historical simulations in Minecraft, one covering Europe and the Americas in 1492 and the other covering Europe and Africa beginning in 1876.

Sociology

Susan Rose and James Gerencser (College Archives)

  • three summer student assistants to work on digitizing student files for the Carlisle Indian Industrial School project
  • two student assistants for the academic year
  • three student assistants in spring for one week of work in the National Archives
  • consultant Krista Gray to develop Drupal site
  • three student assistants in fall
  • consultant Blair Williams to process ledgers and other bound materials

Digital Boot Camp Program

Our second successful “Digital Boot Camp” was held from January 5 through 16, 2015 to provide training for students interested in working with faculty on digital humanities projects. Eleven students participated: Victoria DeLaney ’17 (English/Spanish), Jackie Goodwin ’17 (Environmental Studies/Sociology), Wesley Lickus ’17 (Environmental Science), Nick Bailey ’16 ( International Business & Management), Andrew McGowan ’16 (Biochemistry & Molecular Biology), Harris Risell ’16 (English), Anna Leistikow ’15 (International Studies), Melissa Pesantes ’15 (Italian Studies/Anthropology), Katherine Purington ’15 (Classical Studies), Olivia Wilkins ’15 (Chemistry/Mathematics), and Maurice Royce ’16 (Computer Science). 

These students completed online tutorials at home during the week of January 5, 2015 and convened on campus for further instruction and to work on their own projects. Training included ArcGIS, Drupal, XML, and discussions of metadata and other DH principles. Other instructors included Michael D’Aprix, Daniel Plehkov, Leah Orr, and Don Sailer. Most of the student projects represented collaborations with faculty members, academic departments, or student organizations on campus. The projects can be viewed at http://dh.dickinson.edu/belkp/. There also was a well-attended showcase of this work in the HUB Social Hall on campus on January 27, 2015 at which these students had a chance to present and explain their projects.

Digital Studio

The Digital Studio that highlights the many digital projects headed by Dickinson faculty has been expanded to accommodate new projects as they become established. The Dickinson Digital Humanities blog, maintained by Professor Francese, is also very active—with 25 posts since the last report. The majority of these are based on reports submitted by faculty of work carried out with the support of the grant. There have been essays by Dickinson faculty members discussing various aspects of their work and announcements of DH-related campus events. The website also displays guidelines for faculty interested in applying for funding and a definition and discussion of the concepts behind the digital humanities.

Workshops and Defining Learning Outcomes

As explained in our last interim report, Jeffrey McClurken conducted a workshop for our faculty in January 2014 as a first step toward defining learning outcomes for Dickinson students with regard to digital humanities skills. At the conclusion of that workshop, faculty participants were encouraged by the Provost to return to their departments to discuss the possibility of convening smaller departmental workshops to work more intensively on specific learning outcomes relevant to their disciplines. We agreed to provide internal funds (as cost sharing) for several of these follow-up workshops.

Two departments have conducted workshops since our last report (History and Spanish & Portuguese). The Department of History met twice over the summer of 2014 and had very good discussions. They plan to bring in an outside consultant to campus this spring and will provide a final report on their progress later in the year. The Department of Spanish & Portuguese also met twice, in the summer of 2014 and again briefly during winter break. They have developed proposed learning outcomes for two courses in their major. Students taking Spanish 231 will “develop their ability to locate and assess the quality of a range of written and digital sources” and students in Spanish 305 “will learn to annotate a text digitally in closed and collaborative formats. Students will write for various digital platforms with an awareness of audience and scope.”

Plans and Goals for Upcoming Year

Next year we plan to conduct our third Digital Humanities Boot Camp in January 2016, and the Digital Humanities Advisory Committee will continue to solicit proposals and award internal grants to our faculty for digital humanities scholarly projects, professional development activities, and summer and academic-year student collaborators and assistants. 

With regard to technology, we hope to implement an XML database, with Fedora platform and Apache Solr search application for faculty projects that involve the creation of TEI-encoded texts for digital scholarship and research. Professor Francese also plans to guide the establishment of the Textual Studies Lab in its new form under the auspices of the Archives and Special Collections of the college (not using Mellon funding.)  Finally, the Digital Humanities Advisory Committee also hopes to sponsor a “DHAC-a-thon” modeled on the NEH-funded “Digging into Data Challenge and Pennsylvania State University’s “HackPSU.” The goal will be to invite teams of two to three undergraduate humanities majors to explore and create visualizations using data spreadsheets provided by our Archives and Special Collections and from other Dickinson projects. A small cash prize will be offered for the best work as judged by DHAC members.

Dickinson remains profoundly grateful to the Mellon Foundation for support of this comprehensive initiative in the digital humanities. As this interim report indicates, this grant continues to harness the creativity of our faculty and students, creating many new opportunities for faculty and students to create useful digital humanities resources. We expect to continue to leverage the Mellon Foundation’s generous support to continue exciting new projects and collaborations in the year ahead.

Cliff Wulfman on Skunks, Shmoos, and the Future of DH

[The following slides and presentation notes are from Cliff Wulfman’s talk, “Thinking Big,” which took place Thursday, April 2, 2015 in Stafford Auditorium on the campus of Dickinson College. The Digital Humanities Advisory Committee thanks Dr. Wulfman for his permission to share them–PSB].

Slide01

I want to thank Chris and Patrick for inviting me to speak with you this afternoon.  I’m a close reader by training and inclination, so I can’t start a talk like this without “problematizing” our terms:

“Successful Digital Humanities Project Development”

Indeed, I’m going to use those terms as the framework for exploring these five steps, though not in syntactic order.

1. DIGITAL: Let’s begin with the term digital, and its verbal derivation, digitize.

Slide03

The term digital is, of course, treacherously polysemous.  It has become a metonym for the discrete values modern computers use to represent information, and so to digitize is to represent information by means of discrete values.  Digital data is simply information stored as ordered sequences of discrete states.  These ordered sequences are often called files or streams, and they come in many varieties, but at the most basic level they are all the same: audio files, image files, text files are all just sequences of bits.

So the digital in digital humanities refers to the binary representation of information as bits.  It does not, in other words, connote numerical or mathematical so much as it does symbolic, or semiotic.

Slide4

It is about representability.

So digital humanities is not equivalent to statistical humanities, although the showiest face of digital humanities is the visualization of maps, graphs, and trees derived from the application of social-science methods to texts and to phenomena of interest to historians of various types, literary and otherwise. The rhetorical impact of these visualizations is undeniable, but at bottom they are simply a way of displaying quantitative information, and computation is not equivalent to quantification. Computation also entails the application of procedural logic and heuristics: using an encoded knowledge base and a reasoning algorithm, for example, to diagnose an illness from a set of symptoms.

Nor is digital humanities equivalent to making web pages.

Slide5

For scholars in the humanities, in most cases, web sites are akin to publications: they constitute the presentation of research, not the research itself.  So in almost all cases, creating a web site does not constitute a digital humanities project.

At the same time, the World Wide Web has evolved, from a collection of lightly encoded text files linked together by the HTTP data-transfer protocol, into a network of data and services. So creating a trove of carefully prepared data in machine-readable format — a digital edition encoded in the schema of the Text Encoding Initiative, for example, or a biographical dictionary encoded using the standards of linked open data — does constitute a digital humanities project.

So the first step to successful digital humanities project development is understanding what it means for something to be digital.Slide06

2. PROJECT: Next: Defining a project.

Slide07

As a researcher, you may already have disciplinary knowledge and traditional practice guiding and constraining your conception and realization of a project. What makes a scholarly or academic project a digital humanities project?

Defining a project isn’t always straightforward in the humanities.

Slide8

These endeavors are not always product-oriented; even when they are, the product is frequently intangible: an idea; an argument; an analysis; a method; a critique; etc. I’m leaving aside articles and monographs as direct products of research: they are secondary instruments of dissemination

Sometimes there is tangible product, though: editions; transcriptions; databases; instruments for research and analysis.

When thinking in terms of a project, then, it is important to learn to think strategically:

Slide09Think about the outcomes you want to want to achieve, and why they are important: what will the consequences of this work be?

Think about the resources your work will require. Particular materials, in particular forms? Tools for accomplishing specific tasks?  Whose time and attention will you be drawing upon, and for how long?

How difficult is your project? What are the risk factors: what sorts of things might go wrong, what sorts of events might interfere with the successful completion of your project? What are your contingency plans? Can your project produce partial successes, or is it all or nothing? (Not a good idea.)

Try to organize your project into phases, each of which has its own success criteria, and each of which builds on the preceding phases.

If it sounds like I’m telling you to learn to think like an engineer, I am.

3. HUMANITIES:

Slide10

Earlier, I talked about what it means for something to be digital. Chiseling a definition of the term digital is easy; sharpening the meaning of the term humanities is much, much more difficult – so difficult and contentious, in fact, that I’m not going to address it directly at all, other than to suggest it has more to do with subject-matter than method.  Instead, just as I have tried to complicate the popular conflation of digital humanities with social science, I want to take this opportunity to distinguish digital humanities from digital librarianship.  Once again, these endeavors often overlap significantly, but they are different.

From one perspective, a library is a hoard of physical artifacts whose principal function is to be looked at. Seen from that perspective, digitization is an image-making activity: rendering surfaces on which drawings and inscriptions appear into sequences of bits that a computer can use to produce a reflection of that surface. From another perspective, a library is a gathering of texts whose principal function is to be read. From this perspective digitization is a linguistic activity: rendering words or other symbols into sequences of bits that a computer can use to create linguistic symbols that can be analyzed and compared.

It is the scholar’s privilege to regard the library from the latter perspective; it is the librarian’s burden to view it from the former, and in large measure the job of libraries is conservative digital photo-duplication: not creating a digital library so much as digitizing an existing one.

Thus the work of the digital scholar depends on that of the digital librarian, and in some aspects overlaps considerably with it, but it is not the same work. Likewise the work of the information scientist; the software engineer; the computer scientist (all different sorts of work, often done by different people).

This is part of the reason the digital humanities are so often hyped as being collaborative: quite often, work in DH requires knowledge and expertise from a variety of fields.  By bringing in many different perspectives you necessarily get many different priorities, points of view, cutting across different traditional academic disciplines, but focusing on humanities questions.

So, step three in developing a successful digital humanities project is to conceptualize your work in the context of an interdisciplinary framework of humanistic endeavor.

Slide11

4. SUCCESSFUL: Defining success isn’t always straightforward in the humanities, and in research in general.

Slide12

I’m going to hazard the following measure of a good DH Project:

“a good DH project uses domain knowledge and intellectual labor to create digital objects that can be curated and shared with others through standard formats and services.”

That last criterion (accessibility) strongly implicates the world wide web, but it needn’t always. And it certainly doesn’t necessitate a whizzy web site.

Slide13

But defining success is a useful discipline nonetheless. For one thing, it can help you focus your work by articulating specific outcomes you want to achieve.

What specific goals do you expect to meet with this work?  A full and compelling argument?  An insightful biography?  A meticulous accounting of an event, or an object, or an archive?  If there are products of your work, what are they?  On what basis can you or others evaluate their quality, their success or failure?

Of course, this kind of outcome-orientation isn’t appropriate at all stages of research, but the point at which you can articulate goals and deliverables is the point at which research becomes a project.

Slide14

Defining successful outcomes also helps to organize time and effort.  Most of us know the value of setting intermediate goals and deadlines; organizing these around success criteria can help make them realistic.

Let me give you some examples (this is a highly opinionated list) of “Bad (or Meh) DH Projects”:

Slide16

Slide17

Slide18

Slide19

Now another, equally opinionated, list of “Good (or Exemplary) DH Projects”:

Slide20

The Text Creation Partnership to improve the OCR of 18th century typography is a good DH project.  Good DH projects are those whose products or outcomes can be used in multiple ways by others.

EXEMPLARY PROJECTS

The Valley of the Shadow is one of the first digital humanities projects.

Slide22

Begun in 1993 by Ed Ayers and Will Thomas, at Uva, it is an electronic archive of two communities in the American Civil War–Augusta County, Virginia, and Franklin County, Pennyslvania. The Valley Web site includes encoded, searchable newspapers, population census data, agricultural census data, manufacturing census data, slave-owner census data, and tax records. The Valley Web site also contains letters and diaries, images, maps, church records, and military rosters.

What makes it particularly important, to my mind, is that it was designed not as a showcase but as a working research tool.

Ayers and Thomas published a web-based hypertext article that explicitly uses hypertext and full-text encoded archival material to make an argument.

The Shelley-Godwin Archive is another exemplary archival project.

Slide23

It features transcriptions of manuscripts that are deeply encoded to allow users to study the composition history of the materials.

Mapping the Republic of Letters is another.

Slide24

Based at Stanford, this project gathers meta data about the networks of correspondence among the luminaries of the Age of Enlightenment and uses it to produce wonderful visualizations of them.

5. DEVELOPMENT: So how do you go about doing this? How do you develop a DH project?

Slide25

Talk with people.

We’ve already talked about the almost inherently collaborative nature of the digital humanities.  There simply is not (not yet, anyway) a strong, documented track record of digital humanities methods and approaches; they are in any case highly interdisciplinary and under rapid evolution.

The proliferation of DH centers at universities testifies to the anxiety on the part of researchers to acquire new competencies as part of their academic work.  So seek out others in your field who have already had some experience, and ask them how they did it; seek out colleagues in other fields to talk with you about methodologies and approaches.

Climb the steep hill.

Slide27

This is really important. Ask yourself if you are willing to take the time to learn something new, different, and possibly outside your comfort zone.

Be prepared to acquire a more than superficial understanding of computational practices and methods.  Not that you have to become a master programmer; but you should understand the fundamentals of programming and computer science: data structures and algorithms; inputs and outputs.

Just as you would not undertake a professional study of Homer without learning Greek, learn the the language of computer engineering: how could I represent the objects of my study in machine-readable forms? Can I develop models of things and events? How might I manipulate those representations? Could I describe procedures, techniques, tricks for analyzing them, generating them, enhancing them, expressing them in different forms?

Deploy project-oriented thinking.

Slide28

In developing your project, employ the project-oriented strategic thinking we discussed earlier:  Try to lay out your project as a series of incremental steps and accomplishments.

Be flexible.

Unless your project is very straightforward and extremely well defined, it is likely to change in response to external events (funding, personnel) and internal evolution (discoveries made in the course of the project).

But, don’t just go chasing rabbits down the rabbit-hole. It’s very tempting to let the scope of your project expand over time as you learn about new things, see someone’s nifty tool, and so on.

Scope creep founders projects.

At the same time, though, don’t hobble your imagination or your ambition based on what you can see from here, today.

Don’t be afraid to think big.

Slide29

Let me share with you a little thought experiment.  A few months ago I was asked to speak on a conference panel entitled “Modernism and Big Data.”

The so-called “digital humanities” are at this early stage of engagement as much a series of considered poses, or deliberative positions, as anything else.  So to hold a panel on “Modernism and Big Data” was to propose a consideration of “Humanism as Big Science,” to position ourselves, to imagine ourselves, as big scientists asking big questions, knowing all the while that we were “playing pretend”.

In what follows, I am going to pretend that the collective textual remnants of the late 19th and early 20th centuries have all been processed into a machine-readable textual corpus. We don’t have it now, but it is not so far-fetched to imagine that we will be able to capture a significant portion of the written record, at least that portion already under institutional control in libraries and archives. It wasn’t all that long ago that the Google Books project seemed absolutely preposterous.

And besides, we’re just playing.

Slide30

Big Science asks big questions, such as “what is the nature of matter?”  The enormity of the question and the value of obtaining an answer (both practical value and intellectual value) drive research, collaboration, funding — they provide the energy that turns the wheels of research.

Perhaps, in this big-science fantasy we’re indulging ourselves in for the moment, we can imagine what such a Big Question might be, and speculate on what sort of engine posing it might awaken.  In our context I can imagine no bigger question than Raymond Williams’ question, ”When was Modernism?”

This seems a reasonable — and somewhat preposterous — Big Question to start with.  But we could just as easily ask something just as grandiose, like “WHAT was Modernism?”  — answering which is a precondition to answering the “When?” question, or “WHERE was Modernism?”

Slide31

These questions share the playful, tantalizing precision of Virginia Woolf’s famous aphorism from “Mr Bennet and Mrs Brown.”

Less often quoted is her qualification. Nevertheless, let’s succumb to temptation and take Woolf’s assertion at face value.  How would we go about proving or disproving her hypothesis? Could the immensity of Big Data help us, and if so, how?

So, in Woolf’s spirit, and since one must be arbitrary, let us call our Big Science endeavor…

Slide32

We’re talking Big Science here – REALLY BIG – like the Manhattan Project, or the search for the Higgs boson. So let’s keep playing dress-up and imagine an alternative reality where the Institutions of Power actually thought these questions were as important as finding out whether a subatomic particle actually exists or not, or how to blow up the planet. That is, we would have access to REALLY BIG RESOURCES, with really big expectations.

What would it mean for us, institutionally and professionally, to address ourselves collectively to answering such a question?  What would happen to the current models of promotion and tenure, department composition, teaching, publication? Who would have to be involved?

We would inevitably want some Theorists.

Slide33

We want to describe a state change: for some definition of human character, we want to be able to say that before some point (the “December 1910 Moment”), human character was in state H and after that point it was in state H′.

We might then call Modernism a function which, when applied to Human Character H, transforms it to H prime.

As with so much theory, the discussion quickly becomes highly arcane.  So I’m going to leave the theorists to do their thing for the moment and turn to the Empiricists.

Slide34

They’re the ones who get to play with the big toys, the big machines, the big data. Sometimes they get to play pirate, or skunks – more about that in a minute.  The linear accelerator model: building a ginormous machine that you can use to produce humungous amounts of data, which you can then search for traces in. The ginormous machine is history, which has left a humungous data trail of artifacts and documents in its wake.

How might the Empiricists use that Big Data to locate the December 1910 Moment?

Well, statistical topic modeling seems pretty tantalizing. If Woolf’s hypothesis is correct, we should expect to find topic models after the December 1910 moment that do not exist before that moment. The simple existence of the moment doesn’t explain what caused the change: that is, it doesn’t explain what the Modernism function is.  That’s the problem with History: it isn’t testable. You can’t change the factors in some equation and re-run events to see how the factors affect them.

Slide35

The Empiricists include scholars like Greg Crane, who ask what do you do with a million books, and Brewster Kahle of the Internet Archive, who asks us to imagine capturing the entire human record in digital form, and Stephen Ramsay, who articulates the Screwmeneutical Imperative to subvert the academic orthodoxies and ideologies of method and form an anarchic version of The December 1910 Project, a “community of practice” that valorizes Roland Barthe’s playful writerly text.

Now, right about now you’re maybe getting a little tired of playing dress-up. But before we pooh-pooh these visionary questions, let’s recall the remarkable thing Google did with its Google Books project. Sure: it isn’t perfect, and it leaves lots of things out, and it’s texts are really, really dirty.

But this is how *big* works.  It isn’t small acts of perfection: perfectly crafted editions, for example.  Big works through iterative refinement, each iteration changing the state of things in such a way as to open opportunities for further refinement.  Unattended OCR, the holy grail: a machine that can read printed text as well as a trained human being.  We don’t have it yet, so today the results of unattended OCR are dirty.

But OCR algorithms continue to improve (need citations). In fact, the principal value of generation X digitization projects like the Google Books project is the /page capture/.  If those pages were photographed well, the OCR can always be re-run, and over time the cost of processing and re-processing will decline.

So, on the one hand, we must develop research methods that tolerate noise, while at the same time anticipating improvements in the accuracy of text recognition.

Slide36

The larger message I’m trying to convey is this one.  The most valuable part of the December 1910 Project is the social and institutional infrastructure that supports, promotes, protects, and preserves human effort..  Put your emphasis on the stuff that machines need but can’t do. The most expensive, most valuable part of digital humanities work is the work done by trained human beings.  That’s the work that can’t be re-processed cheaply, no matter how little you pay graduate students.  Don’t treat it lightly! Don’t stick it in a Word document and forget about it.  Spend some time thinking about the best ways to capture that intellectual work so that it can be re-used in today’s scholarly world: that may not be a verbal argument published in a scholarly monograph, but a data set – a formal marshalling of evidence – represented in a way that can be taken up by reasoning machines as well as reasoning people.

Don’t become slaves to the machine: hack the machine, or partner with people who can. Make the machine work for you by giving it information it can use.

Give it highly crafted, machine-actionable metadata: not just the usual library metadata – names, titles, dates of publication and so on.

Slide37

We will need granular structured analyses of complex pages, like those in newspapers and magazines.  Not slabs of undifferentiated text, but pages that have been decomposed into their structural regions, mult-page articles that have been joined together into discrete wholes. Much of this work can now be automated, but it still needs human assistance.

Give the machine descriptions of nuanced relations and assertions that it can read.

Slide38

Statements in first-order predicate logic are a start.  Here is a portion of a graph describing the publication of Bayard Boysen’s “Lake” in the first issue of Broom, a description that captures the complex relationships among abstract entities (“the magazine Broom”, “a poem called ‘Lake’”) and concrete realities – a copy of the first issue of Broom, housed in Firestone Library, and a set of electronic files that embody various representations of it. These sorts of assertions – encoded in some sort of standard schema, like RDF – are the raw material of the knowledge base the so-called “semantic web” promises to become. There are lots of problems with the semantic web, just as there are problems with Google Books, but it is for now by far the best place to start putting our scholarly effort.

Slide39

I want to conclude with a nod to three pioneers of computer science, Vannevar Bush, Douglas Englebart, and J. R. Licklider. At the dawn of the computer age, these men, all three engineers and administrators, each had a vision of the computer that was profoundly humanistic.  Bush’s Memex, often cited as the precursor to the world wide web, was a machine that enabled people to link and track the vastness of human knowledge more efficiently.

Doug Englebart, inventor of the mouse and a variety of other ground-breaking technologies, saw in computers the possibility of augmenting the human intellect.

R. Licklider, director of the Defense department’s Advanced Research Projects Agency, from which the Internet sprang, envisioned a “human computer symbiosis” in which humans and machines partner to extend the reach of human thinking and decision-making.

For each of them, the computer was not an enormous calculating machine, but an empowering system that people could engage to increase the store of human knowledge. If you can develop projects that participate in, extend, and augment this vision, they will indeed be successful digital humanities projects.

Which brings us to skunks.

Slide40

I read with great pleasure and sympathy Bethany Nowviskie’s blog post entitled ‘a skunk in the library’.  Nowviskie traces the term to Lockheed Martin in the 1940s, where it was used to describe a “rogue team” of engineers who functioned outside the usual corporate culture in order to accomplish special things, and she applies it to to the Scholar’s Lab at UVa, which she directs.

Nowviskie mentions parenthetically that the engineers took the term “skunkworks” from Al Capp’s L’il Abner, but she doesn’t pursue the allusion, staying with the meaning that has evolved from the Lockheed Martin appropriation: a group of elite creatives who get special license to do wonderful, innovative things.  Following this etymology, those creative people are the skunks.  And who wouldn’t want to be a skunk?  These skunks are like the kids in the Gifted and Talented program: they may be misfits, some of them, but they’re precious and special, and they smell bad only to Department Chairs, who don’t savor liberty and innovation.

The thing is, that’s not how things were in the hillbilly hamlet of Dogpatch, and I want to conclude with that.  (I also want to claim the right to use the term “hillbilly”, as I was born and bred in West Virginia and am proud to be called one.)

In the world of Li’l Abner, the “Skonk Works” was a toxic chemical factory on the outskirts of Dogpatch, where the lone operator, “Big Barnsmell,” crafted a mysterious concoction called ‘skonk oil’ by brewing dead skunks and old shoes in a still.  Dozens of Dogpatch residents died every year of the toxic fumes.

According to Ben Rich, the second director of the Lockheed Martin skunk works, the group got its name because the original facility was located next to a toxic-smelling plastics factory and one of the engineers likened their own secretive operation to factory in the Al Capp cartoon.

Slide41

So there are several things to think about here.  First, the skunks aren’t in charge.  They aren’t the workers in the “Skonk Works”; they are the raw material.  Second, the work of the skunk works isn’t benign “creative innovation”; it is industrial pollution.  Nowviskie acknowledges the unease occasioned by use of the term “skunkworks”: “there’s a level of honesty and self-awareness involved in not calling them snuggly bunnies.”

There’s a larger story here about papering over the toxic effects of the digital revolution, literally, as in the waste byproducts of microchip manufacture, and figuratively in the effects of automation on an underclass of workers (the denizens of Dogpatch) and the fact that the Lockheed Martin operation designed war planes.  These bunnies are not snuggly at all, and they aren’t even amusingly off-beat: they are fodder for a noxious process of commodification.

I’m afraid that to expect academia to work like Lockheed Martin, or like Silicon Valley start-ups, or even like a forward-looking library, is naïve. From what I’ve seen, the skunks are the graduate students, the adjuncts, and the alt-acs who do the work but don’t get the credit; who build the intellectual playgrounds Steve Ramsay describes but aren’t allowed inside.  To call them skunks is to give them a roguish tang; in fact, they risk becoming that other legendary Al Capp creature …

Slide41

The Shmoo, which exists to be a commodity: delicious to eat, and eager to be eaten.

The Digital Humanities, Big Data: these highfalutin terms promise much, and we can fantasize about the opportunities they open up, the roles they may let us play, the discoveries they may enable. But let’s not allow our dress-up fantasies to become wish-fulfillment. Higher Education is in crisis; intellectualism is in decline; graduate education is in a death spiral. Let’s not pretend that DH is going to solve all these problems: even more, let’s not let DH become part of the problem.

Thank you.