Potentials and challenges in studying ancient texts
Assyriologists spend their entire careers translating cuneiform texts, yet today, hundreds of thousands of inscriptions written in the languages of this ancient script—the earliest in the world—remain untranslated. With the rise of artificial intelligence (AI), that may not be a problem for much longer. In the journal PNAS Nexus, researchers at Tel Aviv University and Ariel University present a new program capable of automatically translating cuneiform texts into English.
While applications like Google Translate make translating modern languages seamless, translating ancient languages is far more complex. Since most translation programs are designed around the Latin script, researchers working to translate ancient languages need to start from scratch and create new tools based around their target language.
As such, the authors of the study had to build a new program capable of translating Akkadian, one of the most prevalent languages to use cuneiform, into English. The program, which takes either Unicode or transliterated versions of the cuneiform text, compares the inscription against a training set of over 8,000 texts already translated by experts. “This will transform the way we produce editions and sources,” Shai Gordin, senior lecturer in Assyriology and Digital Humanities at Ariel University and co-lead on the project, told Bible History Daily.
Assyriologists spend a great deal of time reading and translating the cuneiform signs used to write Akkadian, Sumerian, and other ancient Near Eastern languages. While proper translation requires an intimate knowledge of the original language, it also requires a great deal of cultural knowledge regarding the meaning of phrases, idioms, and metaphors. Although auto-translating cuneiform will not replace expert Assyriologists anytime soon, it does have the potential to drastically speed up their work by providing preliminary textual readings. This is incredibly important given the hundreds of thousands of cuneiform documents that have been excavated but not translated.
“It will make more things accessible for students who want to read translations of texts, and that, just for itself, democratizes access to really opaque and obscure materials,” said Gordin. “Maybe it will create an interest in even a few more people to dig into these sources for their own, or researchers interested in comparative studies.”
“We hope that at some point AI can assist Assyriologists as well as non-Assyriologists in understanding cuneiform texts,” added Luis Saenz, a Ph.D. student at the University of Heidelberg and a co-author of the article. “AI offers the possibility for non-Assyriologists to understand to some extent the content of the tablet.”
Akkadian, the lingua franca of the ancient Near East for roughly 2,000 years during the second and first millennia BCE, is one of the best-attested languages in antiquity and was in use until the first century CE. Recording the history, culture, and religion of large empires like the Assyrians and Babylonians, Akkadian texts provide one of the largest sources of knowledge of the ancient world available to scholars today. The importance of the language is even evident in the Amarna Letters, which record the cuneiform correspondence between the Egyptian pharaoh and Canaanite kings, none of whom spoke Akkadian as their primary language.
Although Akkadian was one of the primary languages to use cuneiform, other ancient Near Eastern Languages used the script as well, including Sumerian, Elamite, Hittite, Luwian, Hurrian, Amorite, and Ugaritic. Likely developed for the Sumerian language, cuneiform was in use from at least the mid-third millennium into the common era. While the program developed by the researchers is primarily intended to translate Akkadian texts, it is hoped that new tools will be developed that will allow the program to translate other cuneiform languages as well. According to Saenz, another addition the team hopes to add is a web-based platform for a more user-friendly translator. Gordin points to the potential of using this as a tool for the creation of new textual editions of texts in collaboration with other teams.
While the program is still in its early stages, Gordin stresses, “What we are trying to do is to create an infrastructure and tools for others to more easily get into it and produce new materials and research that builds on our work.”
Currently, the pioneering program is only capable of working with Unicode (a digitized script used to represent non-Latin-based scripts and signs) or transliterations (Latin script conversions of the cuneiform). However, other programs are currently in existence or development that can take hand copies of texts or even 3D models and convert them into Unicode, which can then be used for auto-translating.
Many of these programs have been developed by individual research teams, but more and more collaboration is occurring in digital Assyriology. This is allowing previously disparate projects to work together. One such collaborative effort is the Digital Ancient Near Eastern Studies Network (DANES), which brings together Assyriologists and computer scientists from Israel, Europe, and the United States. Groups like DANES allow researchers to pool resources to solve issues like translating cuneiform.
Translating cuneiform comes with some unique challenges, including the fragmentary nature of many texts. One of the biggest difficulties researchers face, however, is the logo-phonetic nature of the cuneiform script. Within the logo-phonetic system, signs can serve multiple functions as either a phoneme (a distinct unit of sound), determinative (a marker of type), or logogram (a symbol intended to represent a whole word). As such, individual signs can be read in many ways. The cuneiform sign 𒌓 “UD,” for example, can be read in over 20 different ways. Furthermore, of the nearly 1,000 cuneiform signs, many share phonetic readings, with roughly a dozen signs having, for example, the phonetic value “bu” alone. Because of this, many cuneiform signs can only be understood in relation to the signs that come before or after them.
Another issue in translating cuneiform is that all of the languages that used t are considered “low resource” languages, meaning that only a limited amount of data is available for training the AI system. While hundreds of thousands of cuneiform documents are known, many of them have never been translated and many of those that have been translated have never been digitized, making them largely unavailable to researchers.
Sign up to receive our email newsletter and never miss an update.