Conversation
|
NB this added 1-3 dependencies for iirc ~7MB. Wanted to just query the IANA reference data file and do the checks manually but this was slightly too much work. I will do penance in removing other dependencies later to offset this. |
jmartin-tech
left a comment
There was a problem hiding this comment.
This looks good to me.
One item I think needs a little documentation is that detectors changed bcp47 -> lang_spec while probes moved to lang. I believe I understand the reasoning for this however it would be helpful to have some documentation to reference for reasoning.
| and self.lang != "*" | ||
| and lang != "*" | ||
| and self.bcp47 != lang | ||
| and self.lang != lang |
There was a problem hiding this comment.
Note in current usage lang passed here is now the detector lang_spec. I don't think we need to change anything for this PR however in a future iteration the callers will need to ensure to only pass a single lang from the spec list.
There was a problem hiding this comment.
yeah, this is the phenomenon described in bullet 3
|
reasoning present in:
expected behavior expressed in:
reasoning added to:
|
bcp47is a type; it describes format not functionrenamed to
langorlang_specfor more descriptive, precise, intuitive codelearnings & positions taken here:
bcp47->lang.bcp47->lang_specAttemptlanguage semantics are unclear.Attempts should be in one language, especially after unanimous decisions made during implementation of multilingual, but attempt bcp47 is occasionally populated from detector or probe bcp47. This PR takes the position that attempts are monolingual. This will be unravelled precisely when Turn+Conversation lands Migrate string output/input toTurnobjects #1089 , but it's something to watch for. there are a few assignments left, e.g. indetectors.base.Detector.detect, that violate this.langcodesprovides some normalisation functions that might be useful in the language code format mappingResolves #1139