Skip to content

feat: language-orthography plugin — enforce diacritical marks for non-English languages#32894

Open
alissonlinneker wants to merge 1 commit intoanthropics:mainfrom
alissonlinneker:fix/language-diacritics-enforcement
Open

feat: language-orthography plugin — enforce diacritical marks for non-English languages#32894
alissonlinneker wants to merge 1 commit intoanthropics:mainfrom
alissonlinneker:fix/language-diacritics-enforcement

Conversation

@alissonlinneker
Copy link
Copy Markdown

Summary

Adds a new language-orthography plugin that enforces full orthographic correctness (accents, cedillas, umlauts, etc.) when the language setting targets a non-ASCII language.

  • SessionStart hook reads the user's language from settings and injects an orthographic enforcement instruction
  • No-op for English or when no language is configured
  • Follows the same pattern as explanatory-output-style and learning-output-style plugins

Closes #32886

Context

The built-in language instruction template says "Always respond in pt-BR" but never mentions diacritical marks. The model interprets this loosely and frequently produces accent-less text — e.g., informacao instead of informação, voce instead of você. This affects every language with diacritics: Portuguese, French, Vietnamese, Czech, Turkish, Spanish, German, etc.

The problem gets worse after context compaction because:

  1. The compaction/summarization step doesn't receive language rules, so the summary can lose proper diacritics
  2. CLAUDE.md instructions that reinforce accents are wrapped in a "may or may not be relevant" disclaimer, which the model uses to deprioritize them

This plugin works around the issue at the prompt level. The long-term fix would involve strengthening the core r1z() language template and passing language rules to the compaction step — details in #32886.

What the plugin does

The SessionStart hook:

  1. Reads language from ~/.claude/settings.json or settings.local.json
  2. Skips if no language is set or if the language is English
  3. Injects an instruction that frames diacritic omission as an orthographic error (equivalent to a typo in English), not a style preference

Test plan

  • Tested with language: "pt-BR" — outputs correct enforcement JSON with accented examples
  • Tested with language: "fr" — outputs enforcement for French
  • Tested with language: "en-US" — silent no-op (exit 0, no output)
  • Tested with no settings file — silent no-op
  • JSON output validated — proper structure, accented characters preserved

The built-in language instruction ("Always respond in X") doesn't mention
diacritical marks, so the model frequently drops accents, cedillas, and
other characters required by non-ASCII languages like Portuguese, French,
Vietnamese, Czech, etc.

This plugin adds a SessionStart hook that reads the user's language setting
and injects an explicit orthographic enforcement instruction, framing
diacritic omission as an error rather than a style choice.

No-op for English or when no language is configured.

Closes anthropics#32886
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] Language setting does not enforce diacritical marks — accents/cedillas dropped in non-English output

1 participant