Skip to content

A trailing space at the end of a csv map line is semantically meaningful, but would be deleted by pre-commit filters #457

@joanise

Description

@joanise

Short description

If you write a rule in a mapping stored as CSV with a trailing space at the end of the line, that trailing space is going to be part of the last field of the rule and be semantically meaningful. But if you run our pre-commit filters, then that space would get deleted, making our pre-commit filters not semantically safe.

To reproduce

Baseline: run g2p convert dame fra fra-ipa and get the (correct) output dam.

Edit g2p/mappings/langs/fra/fra_to_ipa.csv line 93 to add a trailing space on it, so it reads e,,\S,\b with a space after that \b.

g2p update; g2p convert dame fra fra-ipa yields the incorrect output damʌ.

pre-commit run --all; g2p update; g2p convert dame fra fra-ipa changes the output back to the correct dam.

Why it's an issue

  • you don't see that space, so you don't realize it's there, but it changes the semantics of your rule
  • pre-commit filters are supposed to be doing only safe changes

Suggested solution

Actually meaning to have a context_after that finishes on a literal space seems like a non-existent use case, especially since we tokenize on spaces by default, so I recommend making a trailing space on a line in a .csv map file an actual error when you run g2p update.

The error message could be something like:

Trailing space found on line {n} of file {file}: this space will change the semantics of your rule; please delete it or, if you really meant to have a space at the end of your "context_after" field, add a comma at the end of the line to create a comment field.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions