Scientists claim to crack an elusive centuries-old code -- and it's Hebrew

Canadian computer scientists claim to have cracked a centuries-old coded manuscript. After using artificial intelligence, sophisticated algorithms and a little help from Google Translate, the University of Alberta researchers hypothesized that the long-elusive 600-year-old text is written in encrypted Hebrew.

In their article, “Decoding Anagrammed Texts Written in an Unknown Language and Script,” University of Alberta Prof. Grzegorz Kondrak and doctoral student Bradley Hauer outline their methodology in decoding ciphered texts such as the mysterious Voynich Manuscript.

The 240-page heavily illustrated, vellum-bound volume is now kept at the Yale University Beinecke Rare Books and Manuscripts Library. For hundreds of years, the meaning of its unique script, lengthy texts and esoteric drawings of flora, fauna, the cosmos and plenty of naked women has entirely eluded scholars.

Asked why he has found the Voynich manuscript compelling, Kondrak told The Times of Israel this week that “It’s like asking why people try to climb Mount Everest. People love to try to solve puzzles, and do things other people were not able to do.”

Lisa Fagin Davis, executive director of the Medieval Academy of America, reading the Voynich manuscript in 2016. (Raymond Clemens, Beinecke Rare Book and Manuscript Library)

The mystery of the Voynich Manuscript has spawned dozens of attempts to break the code, including by the team of cryptographers from Britain’s Bletchley Park that broke the Nazis’ Enigma codes. It has also birthed a whole genre, with hundreds of fictional works; an eerily layered a cappella vocal octet written by Yale School of Music graduate Stephen Gorbo; an online memory game using selected illustrations; and a citation in the comic book, “Marvel Adventures: Black Widow and The Avengers, #18,” which features a scene in which the manuscript goes missing from the Yale library.

This week, international press pounced on the publication of a fresh theory that may explain the manuscript’s puzzling provenance, brought to the world by the university labs which developed the software that beat professional players at the Texas hold ’em poker game.

Paul Tobin and Ig Guara’s ‘Marvel Adventures: Black Widow and The Avengers, #18,’ featuring the Beinecke Rare Book and Manuscript Library and Beinecke MS 408 (the Voynich Manuscript). (Beinecke Library)

But with all the media hype, preliminary results appear less than sensational. Kondrak and Hauer’s computer-driven translation only turned up a few individual Hebrew words such as “farmer,” “light,” “air” and “fire,” and a single, long-awaited translated full sentence: “She made recommendations to the priest, man of the house and me and people.”

As noted by the authors in their article, “According to a native speaker of the language, this is not quite a coherent sentence.”

This week, The Times of Israel spoke with Hebrew-language, medieval and computer science scholars who broke down the so-called decryption breakthrough.

An abiding fascination with the Voynich Manuscript

Deep inside the British Museum, there is a shelf dedicated to antiquarian book dealer Wilfred Voynich. He is honored for having sold the institution an unheard-of collection of 150 “unique” old books of which these were the only known copies. It is therefore not surprising that Voynich is the man who rediscovered the sole copy of the manuscript that now bears his name. It was found while conducting a somewhat suspicious purchase of a lot of 20-30 books from an Italian Jesuit college in 1912.

According to René Zandbergen, a noted Voynich manuscript devotee who has crafted a comprehensive website detailing the intricate ins and outs of its scholarship, “the secrecy of the sale is evident from the fact that Jesuit ownership traces were erased from these books, as can been seen clearly in modern digital scans of them.”

With origins shrouded in uncertainty, the Voynich Manuscript is accepted as an elaborate hoax by some and a font of knowledge by others. Its authorship is unknown, which has led to a 360-degree array of speculation as to its creator, with theorists pointing in turn to sex-mad Egyptians, Italian witches, aliens, and Leonardo da Vinci, and even a Polish-born, Moscow-trained former pharmacist — the polyglot book dealer Voynich himself.

An undated image of antiquarian book dealer Wilfred Voynich, a former Polish revolutionary who is said to have proficiency in 18 languages. (public domain)

The earliest known citation of the manuscript is found in a circa-1665 letter, which depicts an earlier purchase (for 600 gold ducats) by Habsburg Emperor Rudolph II (1576–1612). Rudolph apparently gave the book to his physician, Jacobus Sinapius, and it later found its way to Prague alchemist Georg Baresch.

Circa 1665, Baresch’s heir, Joannes Marcus Marci, gave the book to renowned Jesuit scholar Athanasius Kircher, along with a frustrated letter in which he stated, “Such Sphinxes as these obey no one but their master.”

The next high-profile master was Voynich, who purchased it in 1912 but only presented it to the public in 1915.

The volume is written in an unknown script, in an unknown language. It appears to be divided into sections, which are grouped according to its illustrations. In herbal and pharmaceutical sections, many of the drawings depict plants that scholars cannot place. Its astronomical and cosmological sections include indecipherable geometric configurations. And there are also anatomical drawings with mostly feminine figures standing naked, sometimes in fantastical tubes.

An ‘anatomical’ illustration from Beinecke MS 408, aka the Voynich Manuscript. (Yale University’s Beinecke Rare Book and Manuscript Library)

What is presumed fact about the 22.5 x 16 centimeter bound calf parchment codex is that it originally contained 116 numbered leaves, of which 14 are now missing. There are several foldouts of different dimensions among its pages, containing large vivid illustrations. The text consists of left-to-right script, which is written in a clear hand — or maybe hands — with well-spaced words and sentences that are divided into paragraphs.

Dr. Lisa Fagin Davis, the executive director of the Medieval Academy of America, received her PhD from Yale University and worked on the Voynich Manuscipt for years. She describes the text in a much-read blog as being written in 25 distinct letterforms, some of which appear to have capitalization.

“There are no cross-outs or corrections, and each grapheme is always written using the same ductus, or sequence of pen strokes. Ligatures between particular graphemic pairs are consistent and even. This suggests that the codex was written by scribes who had written the script before or were copying an exemplar,” writes Fagin Davis.

In 2009, the parchment was radiocarbon dated by the University of Arizona. The Arizona scientists took four samples from the codex and came up with 95 percent certainty of a date of circa 1404-1438. Later, the ink and paint pigment were analyzed using forensic testing by Yale University in 2014. There were no signs of modern ingredients and the detected ingredients were available and existed in recipes since the Middle Ages, according to Zandbergen’s website. He added, “Despite some suggestions to the contrary, all materials were in use in Europe.”

A foldout illustration from page 88v of the Beinecke MS 408, aka the Voynich Manuscript. (Yale University’s Beinecke Rare Book and Manuscript Library)

There are few instances of “non-coded” language found in the book, written in Latin lettering. In one instance, the word rot, German for “red,” was discovered written inside a plant illustration during high-resolution imaging.

According to Zandbergen, “A comparison with another 15th-century herbal: MS 362 of the ‘Biblioteca Civica Bertoliana’ in Vicenza (one of the so-called alchemical herbals), makes it plausible that these are color annotations, and the mother tongue of the Voynich MS author was German.” But that, he would admit, is speculation.

After decades of analysis, it turns out the Voynich Manuscript is utterly unique.

As far as medieval manuscripts go, there’s nothing like the Voynich

“As far as medieval manuscripts go, there’s nothing like the Voynich,” Fagin Davis told The Times of Israel this week. She has been “fascinated” by the manuscript since first encountering it in her 1988 Introduction to Latin Paleography class at the Beinecke Library at Yale.

“The idea of a manuscript that can’t be read and that is illustrated by images that defy interpretation is fascinating enough. Add to that the manuscript’s history and the scientific evidence that demonstrates it was definitely written in the early fifteenth century and then mix in dozens, if not hundreds, of proposed translations and theories, each more bizarre and fantastic than the next, and you have an object that is truly irresistible,” said Fagin Davis.

While attending Yale graduate school Fagin Davis worked as the assistant to the curator of pre-1600 manuscripts, where she was put in charge of Voynich-related correspondence for several years and responded to requests for images. She also corresponded with innumerable would-be codebreakers.

Fagin Davis, who keeps a copy of the manuscript PDF on her digital devices, said her current interest is in how it captures popular imagination.

“I collect Voynich-ephemera such as novels and other works, and find myself reviewing solutions fairly regularly,” she said. Some of those “solutions,” she noted, have been generated through artificial intelligence.

Nuts and bolts of the Alberta study

Could the Hauer and Kondrak article’s conclusion — that the manuscript is written in coded Hebrew — be a possibility?

In their article, the University of Alberta computer scientists describe the Voynich manuscript as “the most challenging type of a decipherment problem.”

An illustration from Beinecke MS 408, aka the Voynich Manuscript. (Yale University’s Beinecke Rare Book and Manuscript Library)

“Inspired by the mystery of both the Voynich Manuscript and the undeciphered ancient scripts, we develop a series of algorithms for the purpose of decrypting unknown alphabetic scripts representing unknown languages,” write Hauer and Kondrak.

The scientists performed their study on 43 pages of the manuscript, which contain 17,597 words and 95,465 characters, transcribed into 35 characters of the Currier alphabet — one of the semi-accepted systematic transliteration attempts of the Voynich script.

Observing the relative frequencies of the symbols, the scientists aim to rank them according to how often they appear, then “normalize” the frequencies to illustrate a probable distribution. That normalization is then matched with the frequency and distribution of letters from a set of almost 400 candidate languages.

The actual decipherment, they write in their article, is performed with the wonderfully named “fast greedy-swap algorithm.” The algorithm basically allows the scientists to change word and letter order until they arrive at a more logical outcome.

They use three systems for identifying the text’s language: the letter frequency method, the decomposition pattern method and the trial decipherment method. Two out of three ranked Hebrew (and other Semitic languages) near the top. The third method earmarked the southern Mexican language of Mazatec. Interestingly, in the trial decipherment method, the man-made language Esperanto was also consistently highly ranked.

A typical astronomical illustration from Beinecke MS 408, aka the Voynich Manuscript. (Yale University’s Beinecke Rare Book and Manuscript Library)

According to the authors, “While there is no complete agreement between the three methods about the most likely underlying source language, there appears to be a strong statistical support for Hebrew from the two most accurate methods… The language is a plausible candidate on historical grounds, being widely used for writing in the Middle Ages. In fact, a number of cipher techniques, including anagramming, can be traced to the Jewish Cabala.”

Kabbalah, or Jewish mysticism, developed in circa 12th- to 13th-century southern Europe, including Spain and Italy. It makes one wonder, could the manuscript have been some sort of cryptography for the growing population of cryto-Jews leading up to the Spanish Expulsion of 1492?

The evidence doesn’t exactly bear out.

Promising research, but not for Voynich

Shlomo Engelson Argamon, a computer science professor and director of the Master of Data Science Program at the Illinois Institute of Technology, explained the Canadians’ study in layman’s terms to The Times of Israel.

Among his fields of interest, Argamon researches computational methods for style-based analysis of natural language using machine learning and explores applications in intelligence analysis and forensic linguistics. He said that overall, the methodology and evaluation of the Canadian scientists’ work is “interesting.”

Illinois Institute of Technology Computer Science Prof. Shlomo Argamon (Illinois Institute of Technology)

The new algorithm generated by the team “looks quite good for modern problems, such as code breaking, but in its application to the Voynich Manuscript, it becomes much more speculative,” said Argamon.

The application to the Voynich Manuscript is based on the premise that the person who wrote it encoded by both substituting letters for one another, and mixing up their order as in an anagram. The scientists are “assuming that was the process used, but we don’t know whether that is true or not,” said Argamon.

A further problem is that the algorithm is “only based on the letter statistics” — such as frequency of a letter or double lettering in a word — “and is not looking at the grammar and language-based issues,” said Argamon, who is a Hebrew-speaker himself and has held a Fulbright Postdoctoral Fellowship at Bar-Ilan University (1994-1996) and an Israel Science Foundation Immigrant Scientist Fellowship (1999-2002).

The scientists’ algorithm, he explained, juxtaposes the text with other languages. Of the available languages tested against the text, the score for Hebrew was highest.

“One of the languages had to be on top, but is that significant? There may be some other languages not in the mix that may score much higher,” said Argamon. “They are saying it looks more like Hebrew than other languages. In my opinion, that’s not necessarily saying all that much.”

A typical two-page astronomical illustration from Beinecke MS 408, aka the Voynich Manuscript. (Yale University’s Beinecke Rare Book and Manuscript Library)

Likewise, said Argamon, when decoding the text into Hebrew, “the fact that they could rearrange the letters any way they like gives a lot of room for making ‘good Hebrew.'” The “greedy-swap technique” allows for ad hoc rearranging of letters and word order.

“If they were to use the same switching of letters and word order with other languages, I would assume they could find quasi sentences in other languages as well,” he said.

Finally, the use of Google Translate gives Argamon pause. “It will give you an answer for almost anything: If you type in the letter ‘A’ 17 times, Google Translate will give you something that looks like a sentence if you squint hard enough,” he said.

If you type in the letter ‘A’ 17 times, Google Translate will give you something that looks like a sentence if you squint hard enough

Scholar Fagin Davis added to Argamon’s concerns a secondary issue that Google Translate would be searching for Modern Hebrew patterns, not Medieval.

“If the manuscript is encoding Hebrew, it’s encoding 15th-century Hebrew, not the Modern Hebrew that Google’s translation algorithm works with,” she said.

To test Google Translate’s ability, Fagin Davis sampled a portion of a Roman-alphabet transcription of the manuscript, which she explained is customarily done using a substitution system called the European Voynich Alphabet. She asked Google Translate to identify the language and translate it into English.

“The algorithm identified the language as transcribed Hindi and gave me a translation that was somewhat intelligible. I’m certainly not suggesting that the manuscript is encoded Hindi (although someone somewhere certainly has), but I think depending on Google Translate is certainly problematic and weakens their argument,” said Fagin Davis.

Too quick to publish?

Why the Canadians didn’t tap a Hebrew linguist to shore up their claims is confounding to many in academia.

Prof. Matthew Morgenstern, head of the Department of Hebrew Language and Semitic Linguistics at Tel Aviv University. (Hodayah Sundick-Morgenstern)

“We would usually expect that when somebody has deciphered a language, the text that the deciphering produces would be coherent. In this case, the ‘deciphering’ produces an ungrammatical and incoherent text that appears to be composed of a series of unconnected words,” said Prof. Matthew Morgenstern, head of the Department of Hebrew Language and Semitic Linguistics at Tel Aviv University.

“By their own admission, the authors of the study don’t know any Hebrew. Knowing the original language from which a text was encoded would appear an essential prerequisite for any process of deciphering,” said Morgenstern, whose expertise includes reading inscriptions and manuscripts, including in decoding Arabic texts written in the Mandaic (Aramaic) script.

In an email exchange this week, Fagin Davis wrote, “Like others before them, I think the authors have gone public too early. You can’t declare victory when your proposal, one, isn’t reproducible and, two, doesn’t result in a decryption that makes sense.

“An acceptable decryption has to follow logical rules so that others can reproduce your results and it has to result in a defensible, legible, intelligible text. As far as I can tell, this proposal doesn’t meet that standard,” said Fagin Davis.

In response, University of Alberta Prof. Kondrak clarified for The Times of Israel this week, “We do not claim to have deciphered the manuscript.” Likewise, he said, “The results reported in our TACL article are fully reproducible; they do not ‘depend on Google Translate,’ which is only mentioned in passing to illustrate how an incoherent sequence of Hebrew letters can be ‘translated’ into a grammatical sentence in English.”

Elsewhere in the Canadian press, Kondrak has complained that their paper’s reception by Voynich scholars was less than “friendly.”

For his part, Kondrak says he is merely motivated by deciphering ancient scripts — and there are plenty more out there.

“Undeciphered ancient scripts like Linear A are even more interesting [than the Voynich Manuscript]. The problem is that there’s not enough data for a computer program. For example, the longest inscription in the Indus script known from archaeology contains only 14 symbols,” Kondrak told The Times of Israel this week.

As far as the lingering question of why the team didn’t initially enlist the help of a Hebrew expert, Kondrak had this to say: “We could not find a Medieval Hebrew expert and cryptologist at our university. It’s not simply a text in Hebrew that you can give to any Hebrew speaker and ask what it means — it’s a very noisy decipherment, which is difficult to make sense of. Now that the research is receiving such wide attention, many people are reaching out to help.”

“Somebody with very good knowledge of Hebrew and who’s a historian at the same time could take this evidence and follow this kind of clue,” he said. “Can we look at these texts closely and do some kind of detective work and decipher what can be the message?”