Skip to content

Ignore copyright symbols inside URLs during copyright detection#4752

Open
dikshaa2909 wants to merge 5 commits intoaboutcode-org:developfrom
dikshaa2909:restore-copyright
Open

Ignore copyright symbols inside URLs during copyright detection#4752
dikshaa2909 wants to merge 5 commits intoaboutcode-org:developfrom
dikshaa2909:restore-copyright

Conversation

@dikshaa2909
Copy link

Fixes #4724

Summary

This PR fixes a false-positive issue where ScanCode detects copyright
statements when a copyright symbol "(c)" appears inside a URL.

For example:

http://example.com/(c)/path

was incorrectly treated as a copyright candidate, even though it is part of a URL
and not an actual copyright statement.

Problem

The copyright candidate detection logic treated "(c)" inside URLs as a valid
copyright marker. This resulted in incorrect detections when scanning text files
containing URLs with "(c)" in their path.

Solution

  • Updated the copyright candidate detection logic to ignore copyright markers
    that appear inside URLs.
  • Added a regression test to prevent future regressions.

Tests

  • Added test_copyright_symbol_inside_url_is_ignored
  • All existing tests pass successfully.
  • Verified locally before submission.

Checklist

  • Reviewed contribution guidelines
  • PR is descriptively titled and links the original issue
  • Tests pass locally and in CI
  • No merge conflicts
  • Documentation update not required
  • CHANGELOG update not required

Signed-off-by: dikshadeware@gmail.com

Signed-off-by: dikshaa2909 <dikshadeware@gmail.com>
Refactor copyright symbol detection to ignore (c) only in URL paths.

Signed-off-by: dikshaa2909 <dikshadeware@gmail.com>
Add test for copyright detection with URL

Signed-off-by: dikshaa2909 <dikshadeware@gmail.com>
Signed-off-by: dikshaa2909 <dikshadeware@gmail.com>
Signed-off-by: dikshaa2909 <dikshadeware@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Copyright detection sees URLs containing copyright symbols as copyright statements

1 participant