Skip to content

Add LLM, AI and tool usage section to contribution guide.#1730

Merged
bakpakin merged 5 commits intomasterfrom
contributing_llms_and_tools
Mar 15, 2026
Merged

Add LLM, AI and tool usage section to contribution guide.#1730
bakpakin merged 5 commits intomasterfrom
contributing_llms_and_tools

Conversation

@bakpakin
Copy link
Copy Markdown
Member

No description provided.

@bakpakin bakpakin self-assigned this Mar 13, 2026
CONTRIBUTING.md Outdated
Comment on lines +100 to +104
large amounts code that will be embedded in the Janet runtime, which include
all C source files as well as boot.janet. All submitted code should be both
inspected and understood by a human author, including test cases. Large and
obviously AI driven changes will be rejected. Be mindful and transparent on the
copyright implications of any submitted code. We will use discretion when
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be more specific than large here, preferably non-trivial. Since copyright related to LLM et al varies between jurisdiction, I'd appreciate if Janet plays on the safe side. (TBH I feel Janet can even outright ban LLM and it would not affect the contribution throughput, but it's not my place to decide so 🤷)

BTW @sogaiu, the 15 LoC thing is taken from the Legally Significant section from GNU's information for maintainers, but it's an arbitrary number and intended for C formatted this way:

if (foo)
  {
    bar();
  }
else
  {
    baz();
  }

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree - the specific case I had in mind is LLM-based autocomplete in editors like VSCode or cursor. I'm not sure where that really falls ethically and I doubt one could recognize that reliably anyway.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've suggested about 15 lines of C and 5 lines of Janet as our "Large" chunk of code here, somewhat arbitrarily.

CONTRIBUTING.md Outdated

All usage of Large Language Models (LLMs), Neural Networks, "AI" tools, and
other tools such as software fuzzers or static analyzers must be disclosed.
This applies both pull requests, email patches, bug reports, and any meaningful
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recommend "This applies to" instead of "This applies both."

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 9880475

@veqqq
Copy link
Copy Markdown

veqqq commented Mar 13, 2026

First off, I'd prefer a blanket ban and my heart sank.

All submitted code should be both inspected and understood by a human author, including test cases.

One may use any tooling to assist but should write everything oneself to ensure comprehension. If e.g. some AI detects a bug or suggests adding a test case, one can take heed but the AI should never commit or directly work on the codebase. If one feels it useful for AI to e.g. generate tests, that implies an issue with how tests are done; we ought instead to strive and improve the API/approach to decrease the (time, cognitive) load required. AI usage would only seek to ossify bad practices and lock us in.

If every user of an application needs to write the same code, your API is wrong. If you abstract away the boilerplate and reuse that abstraction, you’ve made the world better. If you create a tool that generates the boilerplate, you’ve made the world more fragile. - David Chisnall

On the other hand, I have directly not contributed yet and power and authority rest in the hands of those that wield it. If @bakpakin et al. feel they can do things better (or with less time investment) with so and so tools, I can only continue to bathe in their largess.

I believe a directional ban may work - where AI shall not contribute, humans shall not contribute AI code, but humans may work and learn in dialogue with LLMs etc. like with static analysis tools (also capable of errors, false positives etc.) and apply that to their contributions. But this may be a distinction without difference.

@bakpakin
Copy link
Copy Markdown
Member Author

bakpakin commented Mar 13, 2026

Part of the reason for the vague language is that there is a proliferation of code editors with AI-based autocomplete baked in - does that count as "AI" generated? What about copilot finding/generating a one-line bug-fix? I do think that is a little different than saying "ChatGPT, generate a new module for me that does xyz".

I personally have really no interest right now in using LLMs to generate functions on the main Janet repository at all, since any interesting work that I would like to delegate to an LLM is usually one-off scripts for personal use, and things of that nature.

one can take heed but the AI should never commit or directly work on the codebase

That I can agree with completely. No bot submissions, no openclaw, etc. I will add that.

Also up for debate, but I'm pretty confident that LLM generated code works and can be of good quality even right now, not to mention in the future. The Claude C Compiler is actually able to compile and pass all of Janet's tests (although shell.c has some issues with thread local variables so doesn't quite work). The issue is more about preserving the spirit of the project and respecting the explicit and implicit promises that were made to current users.

- No bot PRs
- Define "Large" code contribution

Also try to disuade users from using AI for one-line or simple changes, instead
preferring to treat that as "feedback" and rewrite instead.
@sogaiu
Copy link
Copy Markdown
Contributor

sogaiu commented Mar 14, 2026

I'd just like to add that it seems to me like the approach of being conservative initially doesn't lose us much.

Loosening later (if warranted) might be practical?

Still leave open the possibity for AI / tool usage for static analysis
and bug repots. However, the 5-15 lines of code limitation is fuzzy and
arbitrary. We can just say no.
@bakpakin
Copy link
Copy Markdown
Member Author

Fair enough, I think it is pretty valuable to not have LLM-generated code in the runtime. I have been doing more thinking on this and I think it is clearer to simply remove any AI generated code from distributed runtime code. For now though I will leave open the possibility of AI generated test cases and static analysis.

@bakpakin bakpakin merged commit 6129715 into master Mar 15, 2026
31 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants