Add LLM, AI and tool usage section to contribution guide.#1730
Add LLM, AI and tool usage section to contribution guide.#1730
Conversation
CONTRIBUTING.md
Outdated
| large amounts code that will be embedded in the Janet runtime, which include | ||
| all C source files as well as boot.janet. All submitted code should be both | ||
| inspected and understood by a human author, including test cases. Large and | ||
| obviously AI driven changes will be rejected. Be mindful and transparent on the | ||
| copyright implications of any submitted code. We will use discretion when |
There was a problem hiding this comment.
I think it should be more specific than large here, preferably non-trivial. Since copyright related to LLM et al varies between jurisdiction, I'd appreciate if Janet plays on the safe side. (TBH I feel Janet can even outright ban LLM and it would not affect the contribution throughput, but it's not my place to decide so 🤷)
BTW @sogaiu, the 15 LoC thing is taken from the Legally Significant section from GNU's information for maintainers, but it's an arbitrary number and intended for C formatted this way:
if (foo)
{
bar();
}
else
{
baz();
}There was a problem hiding this comment.
I agree - the specific case I had in mind is LLM-based autocomplete in editors like VSCode or cursor. I'm not sure where that really falls ethically and I doubt one could recognize that reliably anyway.
There was a problem hiding this comment.
I've suggested about 15 lines of C and 5 lines of Janet as our "Large" chunk of code here, somewhat arbitrarily.
CONTRIBUTING.md
Outdated
|
|
||
| All usage of Large Language Models (LLMs), Neural Networks, "AI" tools, and | ||
| other tools such as software fuzzers or static analyzers must be disclosed. | ||
| This applies both pull requests, email patches, bug reports, and any meaningful |
There was a problem hiding this comment.
Recommend "This applies to" instead of "This applies both."
|
First off, I'd prefer a blanket ban and my heart sank.
One may use any tooling to assist but should write everything oneself to ensure comprehension. If e.g. some AI detects a bug or suggests adding a test case, one can take heed but the AI should never commit or directly work on the codebase. If one feels it useful for AI to e.g. generate tests, that implies an issue with how tests are done; we ought instead to strive and improve the API/approach to decrease the (time, cognitive) load required. AI usage would only seek to ossify bad practices and lock us in.
On the other hand, I have directly not contributed yet and power and authority rest in the hands of those that wield it. If @bakpakin et al. feel they can do things better (or with less time investment) with so and so tools, I can only continue to bathe in their largess. I believe a directional ban may work - where AI shall not contribute, humans shall not contribute AI code, but humans may work and learn in dialogue with LLMs etc. like with static analysis tools (also capable of errors, false positives etc.) and apply that to their contributions. But this may be a distinction without difference. |
|
Part of the reason for the vague language is that there is a proliferation of code editors with AI-based autocomplete baked in - does that count as "AI" generated? What about copilot finding/generating a one-line bug-fix? I do think that is a little different than saying "ChatGPT, generate a new module for me that does xyz". I personally have really no interest right now in using LLMs to generate functions on the main Janet repository at all, since any interesting work that I would like to delegate to an LLM is usually one-off scripts for personal use, and things of that nature.
That I can agree with completely. No bot submissions, no openclaw, etc. I will add that. Also up for debate, but I'm pretty confident that LLM generated code works and can be of good quality even right now, not to mention in the future. The Claude C Compiler is actually able to compile and pass all of Janet's tests (although shell.c has some issues with thread local variables so doesn't quite work). The issue is more about preserving the spirit of the project and respecting the explicit and implicit promises that were made to current users. |
- No bot PRs - Define "Large" code contribution Also try to disuade users from using AI for one-line or simple changes, instead preferring to treat that as "feedback" and rewrite instead.
|
I'd just like to add that it seems to me like the approach of being conservative initially doesn't lose us much. Loosening later (if warranted) might be practical? |
Still leave open the possibity for AI / tool usage for static analysis and bug repots. However, the 5-15 lines of code limitation is fuzzy and arbitrary. We can just say no.
|
Fair enough, I think it is pretty valuable to not have LLM-generated code in the runtime. I have been doing more thinking on this and I think it is clearer to simply remove any AI generated code from distributed runtime code. For now though I will leave open the possibility of AI generated test cases and static analysis. |
No description provided.