Editor's words 2020-4

Artificial intelligence and innovation in the research of ancient Chinese scripts

As the key technology leading the development of the Industry 4.0 Era, big data + artificial intelligence (AI) has set off revolutionary waves in many fields of research. The long-standing field of study on ancient Chinese characters is thus confronted with the new problem of applying this cutting-edge new technology.

The long history of ancient Chinese character studies supposedly dates back to the Western Han Dynasty, when Gongwang of Lu 鲁恭王 discovered many ancient documents in Confucius’ house, thus initiating a new round of academic enthusiasm in ancient Chinese characters. In later centuries, more forms of ancient inscriptions were found, including those on bronzeware, stone tablets, and oracle bones, as well as Dunhuang manuscripts, Qin and Han dynasty bamboo slips and silk scripts, and Chu bamboo manuscripts in the Warring States Period. Boosted by these discoveries and excavations, research on ancient Chinese characters has lasted for over 2000 years. For a very long time, the research methodology was predominantly evidential and textual, until Wang Guowei 王国维 proposed his “dual attestation” method, combining textual analysis with newly obtained archaeological artefacts. Wang’s new approach marked a new stage in the study of ancient Chinese characters. In modern times, however, while documents and artefacts continued to increase, the number of people proficient in ancient Chinese scripts decreased drastically. While professionals worked studiously on the interpretations of ancient characters, the discipline itself was gradually estranged from the rest of academia, although there were signs of change in the new century.

The rapid growth and popularity of computer technology since the end of the last century gave rise to a new discipline—computational linguistics—whose cross-disciplinarity kindled new interest in ancient Chinese character studies. Academics from Hong Kong and Taiwan took the lead in applying digital technologies to the study of ancient inscriptions. In mainland China, a significant breakthrough was made by scholars at the Center for the Study and Application of Chinese Character (CSACC), East China Normal University, which is listed by the Ministry of Education as a key research base in humanities and social sciences. With 20 years of devoted work, researchers at the CSACC actualized a paradigmatic shift in paleography studies, by combining grammatology and computer science, building a series of ancient scripts databases, and exploring ways to make their research results relevant to contemporary life. With the leadership of the CSACC, the World Association of Chinese Characters Studies (WACCS) and Ideographical Big Data R&D Center were established, gathering sinologists together across the world. However, due to difficulties in encoding ancient characters, their display and retrieval in virtual space remained a problem, seriously restricting the accuracy and applicability of the digitalization projects.

AI technology brought new hope to the digitation of ancient scripts. Image recognition, in particular, has developing very quickly in recent years, and the use of two-dimensional barcodes (QR codes) has become a daily practice. Chinese characters are basically two-dimensional images; as Zang Kehe 臧克和 argues, they naturally correspond to and therefore are isomorphic with the objects or concepts they signify. However, the complex systems of ancient scripts cannot be easily reduced to “two-dimensional” structures. Twenty years of hard work at the CSACC yielded no significant result, until the successful use of artificial neural network technology in image and object recognition brought new hope. The CSACC recently developed the first AI recognition tool for ancient scripts, the “Shang and Zhou Bronze Script Smart Mirror.” With the latest neural network technology, the software can recognize ancient bronze characters of the Shang and Zhou Dynasties, at an accuracy rate of 90%, expected to reach 95% soon.

This ground-breaking achievement will have a “new” impact on the discipline of ancient characters in the following three aspects.

Firstly, the “new tool” will enable new ways of applying and learning ancient scripts and thus enhance their contemporary relevance.

With the “Smart Mirror,” ancient characters could be read and interpreted within seconds and become understandable to the general public. People interested in ancient texts can access systematic and professional etymological knowledge easily. The cultural heritage long sealed therein can be gradually uncovered, and new uses could be explored in all industries, while ancient characters could enter public life and hopefully become part of contemporary culture.

Secondly, the “Smart Mirror” can launch a “new mode” of learning and studying ancient scripts and initiate a new stage of big data-supported research.

Various digital platforms, where glyphs, texts, scripts, and interpretations could be retrieved, used to work quite independently of each other and could not be efficiently connected or coordinated. Successful retrieval of information from those platforms is determined in the first place by recognition of the character to be searched. When we face an image of a character that we cannot recognize, we would not be able to retrieve any information from the databases, let alone a cross-database search. AI technologies such as machine learning cannot work to their full capacities due to the lack of a “link” among the databases. Now, the “Smart Mirror” becomes the link. With intelligent recognition of ancient scripts, the various databases can now be connected and coordinated and the whole system can be activated in a way that may mark a new era in the discipline.

Thirdly, with “AI + paleography,” a cross-disciplinary “new areas of study” can be opened up and new approaches explored in ideographic studies.

Successful experiences in intelligence recognition and big data handling regarding Shang and Zhou bronze scripts could be extended to the study of other types of ancient scripts, including oracle bone inscriptions, bamboo slip and silk scripts, and stone inscriptions, among others, and even to the study of the ancient ideographs of ethnic minorities in China as well as those overseas.

Articles in this Special Issue demonstrate this new perspective of studying ancient scripts with the help of big data and AI. As the latest results of cross-disciplinary research, of paleography and computational science, they cover a wide range of topics, including the building and application of databases, image and character recognition, the development of the “Shang and Zhou Bronze Script Smart Mirror,” intelligent recognition of oracle bone inscriptions, deep learning and bronze inscription image retrieval, and the recognition of Japanese quasi-characters. We hope with the help of cutting-edge technology, ancient scripts, along with the cultural heritage they carry, can find new life and new relevance, both for academia and for the general public.

Executive Editors Guo Rui, Liu Zhiji

