Curating with EarlyPrint
For the fourth week, we had a chance to partner with Joe Lowenstein and Doug Knox who are working on a project to go through and correct errors in digitized images and transcriptions of the massive library of English language books published between 1500-1700. The goal of the TCP, or text creation partnership, is to produce a digital corpus that can be searched by algorithms to examine changes in English writing and publishing over long periods of time.
Out part in the project was to curate the errors in a single text, making corrections to sections that the original transcribers marked as particularly difficult to read. As I was doing this, I began to think about how my impulses as a historian interacted with my impulses as a digital humanist.
One correction in particular sparked off this tension, a single currency symbol

It’s clearly a denomination of money, but whether that is a P for pence, or an L for pounds is unclear. My historian-brain wants me to do research on the letter forms to make a historically informed guess. My digital-humanist-brain on the other hand acknowledges that this sort of approach is slow, and that exactly which symbol that is will not have a massive effect on the way this text is read from a distance.
The digital humanist in me acknowledges that the process of having non-English speakers transcribe what they see, which is hoe the TCP was originally curated, and then have an algorithm regularize spellings is likely the most efficient way to get down to brass tacks and begin doing the ‘real’ work of corpus analysis. But is there not benefit in the doing? Is the process of correction not an interesting or useful endeavor? With a body of work this large I’m tempted to say no. Learning more about the print literature at the time is an interesting and historically important practice but preparing texts for distant reading doesn’t necessarily require it. Even 98-99% accuracy in most cases accomplishes what the data needs to, and in order to most effectively use the person-power available, close reading of texts is best left for other times and places.
Leave a Reply