Integrate Copilot for Automated Language Translation#55917
Integrate Copilot for Automated Language Translation#55917jason810496 wants to merge 2 commits intoapache:mainfrom
Conversation
shahar1
left a comment
There was a problem hiding this comment.
Amazing job! Definitely my #PROTM :)
Got some comments - not something too drastic though.
If you could please add high-level instructions for using this ability in the tools section of the i18n policy, it would be great.
| @@ -21,6 +21,8 @@ | |||
| # dependencies = [ | |||
There was a problem hiding this comment.
I think that we should start thinking about a new name for the script, as it does more than just "checking completeness" at this point (also, it's quite a long one) :)
Not urgent for now though - if it is acceptable, I'd prefer to do something about it after the upcoming Airflow Summit as I refer to this script in my talk.
There was a problem hiding this comment.
I was thinking about renaming as just "tool.py" ( or something more simple and universal ), and renaming in further PR will make the change more easy to review.
There was a problem hiding this comment.
Not a fan of tool.py. It's as broad as utils.py. We should avoid it whenever possible.
complete_translations might be a bit better (?)
| from jinja2 import Template | ||
|
|
||
|
|
||
| COPILOT_CLIENT_ID = "Iv1.b507a08c87ecfe98" |
There was a problem hiding this comment.
Where is this client ID taken from? I've managed to find references in Google, but not official ones.
It's worth documenting it here.
| "max_tokens": 2000, | ||
| "temperature": 0.1, |
There was a problem hiding this comment.
We might want to make these parameters configurable
|
No problem! Thanks @shahar1 for review 🙌 |
|
Thanks for the feedback Jarek! If this is the case, I will replace the "CopilotTranslator" class with just a subprocess of "Copilot CLI" and agree, the previous implementation is actually kind of workaround implementation before Copilot CLI released. IMO, the prompts structure should still be useful even if we want to switch to Copilot CLI, because we could standardize the prompt and customize prompts for each language in a structured way. |
Yeah. I think that's the most important part - it's simply way simpler to get the same result.
I am not so sure we need anything else than "translate the TODO: following translations already present". I think we do not have to provide a lot of "manual/per language" context on how to translate each langugage - it's not needed IMHO, simply because AI is pretty good in finding the rules based on the context. Pretty much all the translations we are doing are incremental -based on hundreds of already made translations in the .json files. There are basically two stages:
This is quite special case where the "solution space" is very limited and task is very simple. To be honest, if we need to add any more context and prompt in this case than "follow translations already done", this means that AI is not doing it's job well - it should figure out all the rules that were already applied on it's own and apply it well. This is precisely what AI models are supposed to do. They excel in it. But of course maybe it's a good idea for the initial translation to add some prompts like "do it in the way that uses less space e for the UI" etc. , so maybe it makes sense for the first time run (but i am not sure it should be different per language, and that people who will add languages will know what to add their "per language" prompt. But this can also be done interactivel in the first translation - simply because once we translate several few hundreds of those translations, AI should learn from the context and pick up the style and approach from those already translated messages without the need of the additional contest. Or so I think at least :)
One potential issue I see with it, is that while it's fine for bulk translation, it's not really good for incremental translations - especially when we skip the "TODO:" phase. What you compare then when you compare two files en + target language are two files with completed translations, but you don't see what has changed really when you look at those two different files. Simply you have to mentally do two comparisions:
I don't think that the current IDEs or manual can help with easily doing both comparisions at the same time. And the copilot interactive "accept" view does this exactly - it shows you what changed, what was the english phrase (with TODO:) and allows you to single click - approve/reject (or even correct it) - and move to the next change. note that often each incremental change will contain several changes Just to simmarize it - I am not against doing it but I doubt it will make things easier :). But maybe it's just me - we can always add this option to auto-translate and ask translators if this is good. |
|
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions. |
|
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions. |
|
Now that I have some more "hands-on" experience with AI agents, I would like to re-iterate on this one - |
Yes, I very much agree with that after the recent agentic evolution! I also feel that having "skills" is more appropriate approach compared to having additional full script. |
|
Feel free to try :) |



closes: #51975
related: #55604
Why
As noted in #55604 (review), integrating Copilot can help us automatically translate from the source language (English).
What
This change integrates Copilot with the
dev/i18n/copilot_translations.pyscript.--translate-with-copilotflag: This will translate only theTODOentries. Please ensure you run--add-missingbeforehand.--with-copilotflag, which can be used together with the existing--add-missingflag: This will addTODOentries and translate them in the same CLI run.How
I used B00TK1D/copilot-api as a reference and refactored it to be class-based and more robust, including retry logic for API calls and improved error handling.
The authentication flow for the Copilot Translator is as follows:
.copilot_token..copilot_token.The prompts are structured as follows:
The translation flow for a given language path is:
context: strvariable as an empty string.context=f"{context}.{key}").TODO: translate:, call Copilot to translate the value:language_name,value,key, andcontext.Demo
Screen.Recording.2025-09-20.at.2.44.12.PM.mov
Future Work
--language allflag to create PRs for all languages at once.