Skip to content

Normalisation models

Use the models

Machine Translation Models

Coming soon...

ABA

This alignment-based approach for 17th-century text normalisation is available on GitHub and a demo (whose word transformation list was obtained on the train subcorpus of FreEMnorm) is provided here.

Publication

Rachel Bawden, Jonathan Poinhos, Eleni Kogkitsidou, Philippe Gambette, Benoît Sagot, Simon Gabay. Automatic Normalisation of Early Modern French. LREC 2022.

Results

Model Precision % Precision OOV %
Identity function 72,73 43,00
ABA 95,14 69,50
SMT 97,10±0,02 75,64±0,18
LSTM 96,14±0,08 76,69±0,70
Transformer 95,89±0,07 75,73±0,38
Fonction d’identité + Lefff 86,12 64,84
ABA + Lefff 95,44 73,54
SMT + Lefff 97,24±0,02 78,37±0,20
LSTM + Lefff 96,25±0,10 78,35±0,79
Transformer + Lefff 96,01±0,09 77,51±1,00

Qualitative comparison of results

Using MEDITE, we aligned two automatically normalised versions of the dev subcorpus of FreEMnorm, the first one with the best statistical model, SMT + Lefff and the second one with the best non-statistical approach, ABA + Lefff: result of the comparison.