Skip to content

enhancement: improve english text normalization in english.py#76

Open
SH20RAJ wants to merge 3 commits intomofa-org:mainfrom
SH20RAJ:improve-text-normalization
Open

enhancement: improve english text normalization in english.py#76
SH20RAJ wants to merge 3 commits intomofa-org:mainfrom
SH20RAJ:improve-text-normalization

Conversation

@SH20RAJ
Copy link
Copy Markdown

@SH20RAJ SH20RAJ commented Mar 29, 2026

Improve the text_normalize function in english.py with better punctuation handling, error resilience, and TTS stability.

Changes:

  • Add support for smart quotes, dashes, ellipsis, and CJK punctuation
  • Include digits in allowed characters for better compatibility
  • Add word boundary checks for abbreviation expansion (i.e., e.g.)
  • Add error handling for normalize_numbers and replace_consecutive_punctuation
  • Normalize whitespace and ensure terminal punctuation for TTS stability
  • Handle empty input gracefully

Benefits:

  • More robust text processing for TTS
  • Better handling of various input formats
  • Improved stability with error recovery

Addresses issue #53.

SH20RAJ added 3 commits March 28, 2026 20:25
Fix typo 'setings' -> 'settings' in the benefits section of TabId enum documentation.
Replace panic-based error handling with Result-based error propagation
and recovery mechanisms to prevent runtime crashes.

Changes:
- node-hub/dora-funasr-nano-mlx/src/main.rs: Handle engine None case gracefully
- apps/mofa-asr/src/screen/mod.rs: Recover from poisoned mutexes in ChatController
- mofa-dora-bridge/src/parser.rs: Use expect with descriptive message in test
- node-hub/dora-gpt-sovits-mlx/src/ssml.rs: Use expect in test

This addresses issue mofa-org#41.
- Add support for smart quotes, dashes, ellipsis, and CJK punctuation
- Include digits in allowed characters
- Add word boundary checks for abbreviation expansion
- Add error handling for normalize_numbers and replace_consecutive_punctuation
- Normalize whitespace and ensure terminal punctuation for TTS stability
- Handle empty input gracefully

Addresses issue mofa-org#53.
@SH20RAJ
Copy link
Copy Markdown
Author

SH20RAJ commented Mar 29, 2026

Checked status: this PR is pending maintainer review/merge from upstream maintainers. I’m available to address feedback quickly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant