ISBN: 978-2-87457-138-1

Encoding inscriptions as sets of lexical features. A case study on the Epigraphic collection of the Catacombs in Chiusi

 = Paper = 

by Giuseppe SAMO, Giuliano CARACCIOLO, in Res Antiquae 19, 2022.

The goal of this paper is to offer a model to classify automatically Latin inscriptions on the basis of text-internal criteria. To reach this goal, 41 Latin inscriptions belonging to the early Christianity in the archaeological site of Chiusi (Italy) are coded as vectorial representations, becoming readable in a supervised machine learning environment. The features represented lexical entries conveying the nature (fully Christian, mostly Christian or mostly Pagan) of the inscription. Three studies are carried out using a machine learning software. The first study demonstrates that a model based on the (independent) presence of specific lexical entries has a better accuracy than a model taking the inscription as a single construction. The second study shows that the model is able to perform under noisy elements such as unseen inscriptions. Finally, the third study shows that our model is able to predict, due to the independent nature of the variables which will be taken into account, a class of origin relying only on the weight of a specific lexical entry/formulaic entity. The model discussed in this paper shows reliable accuracy, providing further evidence for a mutual conversation between archaeology and computational linguistics.

Keywords: Epigraphy, Catacombs, Text Classification, Lexical features, Digital Humanities