-
Notifications
You must be signed in to change notification settings - Fork 34
Description
Short description
If you write a rule in a mapping stored as CSV with a trailing space at the end of the line, that trailing space is going to be part of the last field of the rule and be semantically meaningful. But if you run our pre-commit filters, then that space would get deleted, making our pre-commit filters not semantically safe.
To reproduce
Baseline: run g2p convert dame fra fra-ipa and get the (correct) output dam.
Edit g2p/mappings/langs/fra/fra_to_ipa.csv line 93 to add a trailing space on it, so it reads e,,\S,\b with a space after that \b.
g2p update; g2p convert dame fra fra-ipa yields the incorrect output damʌ.
pre-commit run --all; g2p update; g2p convert dame fra fra-ipa changes the output back to the correct dam.
Why it's an issue
- you don't see that space, so you don't realize it's there, but it changes the semantics of your rule
- pre-commit filters are supposed to be doing only safe changes
Suggested solution
Actually meaning to have a context_after that finishes on a literal space seems like a non-existent use case, especially since we tokenize on spaces by default, so I recommend making a trailing space on a line in a .csv map file an actual error when you run g2p update.
The error message could be something like:
Trailing space found on line {n} of file {file}: this space will change the semantics of your rule; please delete it or, if you really meant to have a space at the end of your "context_after" field, add a comma at the end of the line to create a comment field.