fr | en

MeThAL: Towards a macroanalysis of theater in Alsatian

Goals and challenges

The Alsatian dialect theater tradition is based predominantly on popular and humour genres. What are the major trends in this tradition, regarding dramatic technique and character types? What are its major geographic locations? To what an extent do Alsatian dialect plays document the sociolinguistic situation of the period when they were written?

In order to answer these questions and carry out quantitative analyses, a large corpus, representative of the tradition, is required, as well as corpus annotations for the relevant variables: geographical origin of plays and authors, places where the plays take place, their period and genre. Regarding the characters, attributes such as their profession, social status, origin, gender or age must be made available. It is also necessary to formalize the plays' structure, identifying act and scene divisions, characters' speech and stage directions.

Our project’s first goal is creating such a corpus, encoded in the TEI format (Text Encoding Initiative), whose Performance module covers the types of annotations we’re interested in. We’re working on a representative collection of plays, which were recently digitized by the Bibliothèque Nationale et Universitaire (Bnu) in Strasbourg. We’re currently performing OCR on the plays and their TEI encoding.

The corpus thus created will allow a distant reading or macroanalysis approach to Alsatian theater. Such approaches have been applied successfully to the major European dramatic traditions, as shown in a 2017 special issue of the Revue d’Historiographie du Théâtre. However, such analyses are still impossible for Alsatian, given lack of an appropriate digital corpus. The MeThAL projects seeks to make up for this lack of resources.

To that end, we will apply natural language processing and document representation techniques, besides web technologies which will contribute to corpus navigability.


The huge orthographic variety of Alsatian presents specific challenges for Natural Language Processing (NLP), as is the case for any low-resource language. These challenges highlight needs which are only partially addressed by existing text analysis tools, mainly geared towards majority languages. The project will exploit and contribute to the resources created by the RESTAURE project, on NLP for France’s regional languages.


Publications and contributions to conferences

  • Pablo Ruiz, Carole Werner, Delphine Bernhard, Pascale Erhart, Dominique Huck. (2021). MeThAL : Ressources numériques pour une relecture du théâtre en alsacien. Poster presented at 10 ans avec CAHIER : Des corpus d’auteurs pour les humanités numériques à leur exploitation numérique, June 2021, Bordeaux, France. ⟨10.5281/zenodo.4908212⟩. ⟨hal-03255403⟩

  • Pablo Ruiz, Carole Werner. (2021). Exploration du théâtre alsacien à travers ses listes de personnages pendant la période 1870-1940. Humanistica 2021 :27-29, Rennes, France. ⟨10.5281/zenodo.4762732⟩ ⟨hal-03226579⟩ [slides]

  • Pablo Ruiz, Delphine Bernhard, Carole Werner. (2020). Création d’un corpus FAIR de théâtre en alsacien et normalisation de variétés non-contemporaines. 2èmes journées scientifiques du Groupement de Recherche Linguistique Informatique Formelle et de Terrain (LIFT) : 32-43. Montrouge, France. ⟨10.5281/zenodo.4323301⟩ ⟨hal-03047152⟩ [slides]

  • Pablo Ruiz, Delphine Bernhard, Pascale Erhart, Dominique Huck, Carole Werner. (2020). MeThAL : Vers une macroanalyse du théâtre en alsacien. Humanistica 2020, Bordeaux, France. ⟨10.5281/zenodo.3788019⟩. ⟨hal-02564694⟩


  • LiLPa Lab seminar, December 2019: [pdf]


Read the plays

  • See section [read] to read already encoded plays (25 plays at this point)

Web presence

  • The Bnu’s research blog talks about:

  • The DraCor platform (Drama Corpora) has accepted to host the encoded plays, making some first analyses possible:

    • Digital edition browsing, character networks and character-relation networks

    • Character interaction metrics. For instance, the interaction matrix below, for characters in Der Pfingstmontag (Arnold, 1816).



Project participants are members of the LiLPa lab: Pablo Ruiz (lead), Delphine Bernhard, Pascale Erhart, Dominique Huck and Carole Werner.

We are also in contact with the Bnu’s Datalab and the Bnu’s special interest group on corpora (SIG Corpus).

Special thanks to the many interns that we’ve been fortunate to work with in the project, from several fields and programs (Language Technologies, Linguistics at Master’s level; Modern Languages, Computer Science at Bachelor’s level): Nathanaël Beiner, Lena Camillone, Hoda Chouaib, Audrey Deck, Valentine Jung, Salomé Klein, Audrey Li-Thiao-Te, Kévin Michoud and Vedisha Toory among University of Strasbourg students. From other schools, Andrew Briand (University of Washington via IFE Strasbourg) and Barbara Hoff (University of Edinburgh).

Get in touch

Interested in OCR and TEI encoding, language technology application to Alsatian, digital editing, Alsatian linguistics or literature? Interested in an internship about these topics?

You have questions about the project?

Do contact us!

D’r Candidat
Cover page for play D’r Candidat. Source: Internet Archive