Use a short SPDX license header for LLM-centered files#1489
Use a short SPDX license header for LLM-centered files#1489Dev-iL wants to merge 3 commits intoapache:mainfrom
Conversation
| - name: Check for missing Apache 2 license headers | ||
| run: python3 scripts/check_license_headers.py | ||
| working-directory: ${{ github.workspace }} |
There was a problem hiding this comment.
This answers all similar comments:
After the proposed change, license headers are being enforced by pre-commit hooks. This approach has several benefits:
- Contributors can tell there's an issue before getting to ci
- Coverage isn't lost since hooks should run on ci anyway as part of static checks
- No need to maintain license enforcement scripts
- Hooks were more thorough and detected missing licenses that the ci missed
Whereas the main downside is it's somewhat harder to customize if a specific file requires special treatment.
There was a problem hiding this comment.
issue is people submit without running pre-commit hooks :(
There was a problem hiding this comment.
The assumption was that the hooks run on CI too. It's just that instead of a custom script we use a hook - which is more standard.
There was a problem hiding this comment.
Can you add that to CI then?
There was a problem hiding this comment.
It already runs as part of the "Unit Tests" job's "check linting with prek" stage. I'll move this to a separate "Static checks" job for better visibility.
8f6cdd6 to
c1f2647
Compare
NOTICE
Outdated
| This product includes test code vendored from pygls | ||
| (https://github.com/openlawlibrary/pygls): | ||
| Copyright (c) Open Law Library. All rights reserved. | ||
| Licensed under the Apache License, Version 2.0. | ||
|
|
||
| Original work in conftest.py: | ||
| Copyright 2017 Palantir Technologies, Inc. | ||
| Licensed under the MIT License. |
There was a problem hiding this comment.
should this be here? or under the LSP package notice? I think the LSP package, since this is related to the source distribution of that, no?
There was a problem hiding this comment.
According to the ASF guidelines,
LICENSE and NOTICE files belong at the top level of the source tree.
I'm not sure how this applies to monorepos though. Will investigate...
- Mark .github as export-ignore in .gitattributes - Add short and long license templates for pre-commit hooks - Add pre-commit hooks to enforce license headers (replaces CI scripts) - Delete scripts/add_license_headers.py and scripts/check_license_headers.py - Remove CI license check step from hamilton-lsp workflow - Fix inconsistent license header indentation in several files - Add missing license headers to PR templates - Add vendored code attributions (Open Law Library, Palantir) to NOTICE file - Exclude contrib/docs/ from markdown license hook (Docusaurus frontmatter)
| # distributed under the License is distributed on an "AS IS" BASIS, # | ||
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # | ||
| # See the License for the specific language governing permissions and # | ||
| # limitations under the License. # |
There was a problem hiding this comment.
You can't remove license headers from 3rd party origin files.
There was a problem hiding this comment.
This should be reverted and we should still update https://github.com/SummitSG-LLC/hamilton/blob/8fa698904c772a6738f7531c7fa56f62753fa325/dev_tools/language_server/LICENSE to mention this conftest.py file
There was a problem hiding this comment.
It's unclear to me how this file should end up - do we keep only the original copyright notice or have both one under the other?
There was a problem hiding this comment.
don't remove any 3rd party code header - it's basically a rule of the ASF - you need to come up with a strong justification for it - there is an open issue for this
And that issue can't continue to be ignored. We need to at least update our LICENSE file to mention any 3rd party code.
There was a problem hiding this comment.
For now I kept the old header and excluded this file from the hook.
There was a problem hiding this comment.
ok - if we agree the code does not contain anything that requires the old Palantir copyright then it is ok to remove it
| --- | ||
| <!-- SPDX-License-Identifier: Apache-2.0 --> |
There was a problem hiding this comment.
@potiuk I configured the hooks to accept the license being below the frontmatter using:
# .pre-commit-config.yaml
- id: insert-license
...
- --detect-license-in-X-top-lines
- '8'In case this comes in handy in Airflow too...

Following the approach from apache/airflow#62073 and apache/airflow#62145, files intended for LLM/agent consumption (not distributed in releases) now use a minimal SPDX license identifier instead of the full Apache 2.0 header - for LLM token efficiency.
See also:
https://lists.apache.org/thread/j1tn63r2lf13v3d1tnnqff8fkcl4nx53
Changes
.githubfolder asexport-ignore.How I tested this
Notes
Checklist