Reading the Unreadable
Reading the Unreadable

Reading the Unreadable

When considering Digital Humanities, the "Digital" side of things is often the focus. Adam Hammond spends a good deal of time defining the discipline (circa fifteen years ago), and most of the words he expends on it surround a justification of computerized tools to examine literature. Everyone from Franco Moretti (distant reading) to Matthew K. Gold (general) to Katherine Hayles (posthumanism) to Nicholas Carr (hyper reading) to modern AI scholarship attempt to both define DH and ultimately justify its application.

In short, the case for DH usually takes a three step approach as it battles against established traditions regarding what constitutes close reading and humanities scholarship. Step one defines an issue that occurs within the space of the humanities (topic or content), with an accompanying hypothesis. Step two establishes a digital toolset to examine that hypothesis, leading to a conclusion. Step three - the unusual step - then attempts to justify why the digital tool was necessary in the first place, thus always seeming to put DH in the unfortunate position of confirming, at all times, why it is a relevant discipline.

I've noted in previous work that the schism between "humanities" and "digital humanities" occurred for seemingly no real reason other than to appeal to traditional notions of what humanities scholarship "should be" about fifty years ago. Most, if not all, other disciplines had no such schism. Math didn't bifurcate into "Math" and "Digital Math" at the invention of the calculator or the computer, for example. If humanities scholarship had embraced the tools of the computational linguist, the data scientist, and the AI engineer from the outset, then we wouldn't be in the situation where we must necessarily both define and justify our discipline while performing quality research.

Alas, we are here.

To combat this, I think rather than attempt to throw another justification for DH at the wall, it might behoove us to see how DH operates through a short example project - something that combines the digital (AI, cybersecurity, image recognition) with the humanities (history, library sciences) and allows us to take a glimpse into the culture of the past by reading a text to which we've had access for a very long time, but which we could never before read.

Let's talk about ciphers.

Crime of the Century

June 28th, 1916. The Lima, Ohio Western Ohio Ticketing Office is robbed. Harvey Shaw, the ticket agent, professes to journalists that the robbery was conducted so calmly and meticulously that it must have been either an inside job or the work of a professional. The thief "Walk[ed] calmly out of [the] office and disappear[ed] on Elizabeth street. Other witnesses claim to have seen the thief, but despite setting up a canvas and perimeter, Lima PD were unable to apprehend the suspect.

"Bold Robber Makes Get-A-Way With W.O. Cash Box"
Armed robber steals cash box

The thief, it seems, made a clean getaway, or, as the article title suggests in fantastic early twentieth century lingo, a clean "get-a-way." The robbery was reported in several nearby newspapers, including the Marion Daily Star (6/28/16), The Sandusky Star Journal (6/29/16), and the Lancaster Daily Gazette (6/29/16).

"W.O. Robber Still Eludes the Police"

After a few follow-up articles in the intervening days updating readers on the still-elusive thief, something interesting happens in the Times-Democrat. Ten days later, on 8 July 1916, the paper publishes a strange proclamation accompanied by a mysterious ciphertext. The appearance of a ciphertext in and of itself is not altogether strange, as basic ciphers often appear alongside crossword puzzles and (more recently) Sudoku puzzles to give newspaper readers a sense of engagement and a bit of a mind bender. The strange proclamation, however, identifies this as more than a mere reader diversion.

"The police department of Lima, O., is greatly puzzled over a cryptic message received in connection with the robbery of a Western Ohio ticket agent." This statement, found in the Times-Democrat when the cipher is published, and which then goes on to ask the public for assistance in decrypting the strange text, makes clear the connection between the cipher and an element of crime - which, given the temporal and proximal significance, in all likelihood is the W.O. Robbery.

"Mysterious Cypher is Puzzle to Everybody"

"At the request of a citizen of Lima," the article states, "we publish a note written in cypher. As it is of the utmost importance that the contents of the note be ascertained. Any suggestions by readers of the paper which will in any way assist in learning the contents of the note will be greately [sic] appreciated."

"Mysterious Cipher is Still Unsolved"

As some follow-up articles make clear, including one published the next day and including a more clear reading of the cipher text itself, a number of readers tried their hands at cracking the cipher text for the police. One article even includes a "substitution matrix" worked out by a reader, which, unfortunately, doesn't reveal the correct translation of the text.

From here, appearance in the historical record vanish, and aside from a minor mention elsewhere in the paper, which is far more concerned with the topical news of the day, the cipher, and the event, fade into memory less than a month after occurring.

One potential reason for this rapid disappearance may have to do with the perception that perhaps the entire event was an elaborate hoax. The 4 July 1916 edition of the Cashocton Morning news notes: "The Eastern Puzzlers League, organized in 1883 for the construction, solving, and exploitation of enigmas, met [in Warren, OH] today for its semi-annual convention." A mere 200 miles, or a three-hour drive from Lima - in other words, far enough to not make the Lima news, but not so far as to be unreachable - a Puzzle Maker/Solver National Semi-Annual Convention was held. It doesn't seem to be common for such conventions to publish their findings in newspapers, but the temporal relevance can't be ignored.

Or can it?

The only way to know for sure is to solve the cryptogram itself, which, perhaps owing to its obscurity, hasn't been solved in over one hundred years.

Enter Digital Humanities

According to the Times-Democrat, here is the full text of the cryptogram:?

WAS NVLVLAFT BY AAKAT TXPXSCK UPBK TXPHN OHAY YBTX CPT MXHG WAE SXFP ZAVFZ ACK THERE FIRST TXLK WEEK WAYX AZ WITH THX

At first examination, there seem to be snippets of plain text (PT) mixed with cipher text (CT). Words like "was," "by," "there," "first," "week," and "with" all readily jump off the page. Solving the cryptogram should, therefore, be fairly straightforward. As you'll soon see, it is anything but.

We begin our cryptographic analysis by first using artificial intelligence - in this case, a neural cipher identifier, which looks for patterns within text (digrams and trigrams, as well as letter frequency, total length of text, and so forth) to determine the cipher type. For those who've never dabbled in this kind of research before, there are many cipher applications, usually in cybersecurity and military spheres. However, for the context of our cipher - a newspaper print from 1916 Ohio - such applications are an impossibility. I used three disparate neural cipher identifiers and compared their outputs to find a consensus cipher type - in this case, a monoalphabetic substitution cipher known as either "Patristocrat" or "Aristocrat," an old, fairly common, and fairly low security cipher.

The neural identifiers suggested other cipher types as well, but I was able to eliminate them from the list of potentials thanks to their age - all were created after the cipher was published. Already we begin to see the impact of humanities research (history) in sharpening the focus of the digital, and shortly we'll see the opposite.

Armed with the potential cipher types - but having no crib or key - I turned to our next DH tool, a Brute-force software called CryptoCrack. Over many hours of GPU time, I exhaustively used dictionary attacks, hill climbing, and ultimately brute force K1-K4 to see if the cipher yielded results.

Finally, I got a hit. I was able to read (at least part) of the cipher. Using the key (K1) EWSLABGHVMNYRFXUQCPTIZOJKD, the text decrypted to:

bec kidident fl eeyet tosocry psfy toshk whel lfto rst johg bea cons veinv ery thama numct tody baay belo ve buth tho

Do you see it?

Beck I didn't flee yet

It seems as if we're on the right track, after all, though more questions were raised than answered. A rendezvous note for our W.O. robber? Was he trying to reach someone in his gang - a common practice for the criminal element at the time in the region (John Dillinger, for example, a decade later)? Who was "Beck?"

Further in the cipher, more clues are revealed. Near the center lies the text:

rst johg bea cons

Interestingly, if we assume the "r" is part of another word (like above), then we're left with a slight misspelling of "St. John" - which just so happens to be an unincorporated community only a half hour South of Lima. What about "bea cons?" If you read beacons, then we're on the same page. But what beacon? There is no St. John's beacon, nor is there one in Lima - no church, monument, or other landmark extant in 1916. There is, however, the Akron Beacon-Journal - another Ohio newspaper which might contain the clue to the rest of the code. As of this article, I have been unable to obtain scans of that paper, but perhaps another clue in a temporally-relevant scan might be located therein?

Using "Beck I didn't flee yet" as my primary clue, I continued attempting to solve the rest of the cipher by assuming that the PT letters from the original message ("there," "week," etc.) were actually PT and not CT. I substituted these into the message and got to the following:

Beck I didn't flee yet to so cry psfy toshk wh el l ft or st johg beacon's veinv ery there first tody week belo ve with tho

At this point, mind you, I was over 10 billions keys in over 20 different cipher styles into cracking the message, and still K1 Patristocrat, like the AI recommended, yielded the closest to PT I had seen.

I decided that perhaps our anonymous reader from 1916, the same one who suggested a crib, might actually be on to an idea when he suggested that "the text was made of multiple different codes." I isolated the still jumbled parts of speech and continued using K1 Patristocrat, ultimately using another billion plus keys and arriving at the following:

Beck I didn't flee yet stature male stand info from e st jogn bea cons going ove there first rthe week most go with rat

This translation still has errors (I dispute the word "rat," and there are misspellings in "ove" and "rthe," as well as the fact the message itself makes little sense, but we're creeping closer to a solution. "Ove," when combined with the apparently rogue letter R in front of "rthe" clearly spells "over." The PT inserted also makes sense. Finally, I made a few modifications based on intuition from this solution and here is the ultimate translation - or, as close as I've been able to thus far get:

Beck I didn't flee yet. Stature (statue?) male (mail?) stand info from e (East?) St. John Beacon's. Going over there first the week most go with XXX.

Using the historical analysis as evidence, this very clearly seems to be a rendezvous.

For whom, we will perhaps never know. Further historical research on connections to organized crime yielded only a single figure that was temporally relevant and active in the Ohio region - John Dillinger. The problem there, of course, is that Dillinger wasn't active until long after 1916, and would have been too young to pull off the job. A few members and associates of his perhaps may fit the bill, but without a more thorough description of the suspect, this remains entirely speculation. It should also be noted that no historical reference to "Beck" seems to exist, and this might well be a nickname or alias.

A Research Symphony

Digital Humanities is more than "The Humanities, but digital," as Hammond cheekily describes. It's a symphony of humanities research coupled with, and sharpening the focus of, digital toolsets to discover untold secrets. It's solving puzzles from history using modern technology. It's pointing the 21st century at the rest and seeing what happens. It's equal parts experimentation, data collection, and data curation, and synthesis. It's a multimodal discipline that blends computer science, math, and literature to, quite literally, read the unreadable.

I've also, not coincidentally, turned this project into a pedagogical one in my DH classes. The possibilities are truly endless.

I wouldn't have it any other way.

要查看或添加评论,请登录

Mark DiMauro, PhD的更多文章

  • Coding Away, or, Why AI Won't Kill Computer Science

    Coding Away, or, Why AI Won't Kill Computer Science

    "Coding is dead!" "Don't go to school for computer science, you won't find a job!" "The sky is falling!" ..

  • Toward a Turing Audit: A Proposal for Systematic Authenticity Assessments in Academic Writing

    Toward a Turing Audit: A Proposal for Systematic Authenticity Assessments in Academic Writing

    Abstract The purpose of a writing assessment, when divorced from "staple" writing instruction like grammar…

  • Algorithmic Education

    Algorithmic Education

    I have posts - both present and forthcoming - regarding the complex interplay of ethical responsibilities, assignment…

  • Editor or Auditor?

    Editor or Auditor?

    "AI will change everything." This statement is certainly no surprise; no deep, valuable, or unseen insight will be…

    1 条评论
  • Sophocles: Lost & Built

    Sophocles: Lost & Built

    Some time ago, when AI was still nascent and with only trial versions of GPT-3 available in the OpenAI Playground (this…

    2 条评论

社区洞察

其他会员也浏览了