Skip to content

Text length warning does not consider language, flags valid English sentences #24

@JackyHe398

Description

@JackyHe398

Currently, in infer.py#class LangDetector#def _preprocess_text(), a "text too long" warning is raised whenever the text exceeds 100 characters.

In English or Korean, however, 100 characters often corresponds to only a short or medium-length sentence. This makes the warning misleading, since many valid sentences trigger it.

Suggestion:
Perform language detection first, then apply length thresholds appropriate to each language. This would ensure that the "text too long" warning is triggered only when the text is genuinely too long for the detected language.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions