Skip to content

FreEMnorm

FreEMnorm is a parallel corpus which covers a range of different genres of text throughout different decades of the 17th century. It is the new version of PARALLEL17, that is now deprecated.

Publications

Data (GitHub and Zenodo):

Conference paper:

  • Simon Gabay, Marine Riguet, Loïc Barrault, "A Workflow For On The Fly Normalisation Of 17th c. French", DH2019, ADHO, Jul 2019, Utrecht, Netherlands. ⟨hal-02276150⟩.
  • Simon Gabay, Loïc Barrault, "Traduction automatique pour la normalisation du français du XVIIe siècle", TALN 2020, ATALA, Jun 2020, Nancy, France. ⟨hal-02596669⟩.
  • Rachel Bawden, Jonathan Poinhos, Eleni Kogkitsidou, Philippe Gambette, Benoît Sagot, Simon Gabay, "Automatic Normalisation of Early Modern French", Proceedings of the 13th Language Resources and Evaluation Conference, European Language Resources Association, Jun 2022, Marseille, France. ⟨hal-03540226⟩.

Content

Normalisation description
Fig.2 - Example of linguistic normalisation.

FreEM norm is a parallel corpus which covers a range of different genres of text throughout different decades of the 17th century, written in prose or verse, which have been semi-automatically normalised (Gabay 2019) and manually corrected. Most of these texts are French texts that belong to the belles-lettres (i.e. literature in its broadest sense), which is the type of source we want to normalise, but additional texts from different traditions (science, law...) are marginally present in the corpus.

If some of the transcriptions have been produced specifically for this corpus, some others have been borrowed to other projects: transcription rules are therefore not strictly equivalent from one text to another regarding, for instance old characters (e.g. the long s, ſ) or abbreviations (e.g. õon). "Normalisation" is understood here as a partial alignement with contemporary French: in some specific cases, specific spelling are maintained to keep intact the meter of the verse (e.g. the adverbial -s: jusques+vowel → jusques and not jusqu' to have three syllables).

The detailed content of the repo is available on GitHub.