Add SingNER dataset-driven pytest coverage and refresh existing tests by NHLOCAL · Pull Request #27 · NHLOCAL/Singles-Sorter

NHLOCAL · 2026-02-15T03:35:15Z

Motivation

Improve confidence in the singer-name matching logic by validating it against a large, real-world labeled dataset.
Make dataset-driven tests reproducible and runnable offline by saving a deterministic sample locally.
Migrate existing unittest-style checks to a native pytest layout for easier parametrization and integration.

Description

Converted the legacy unittest test in src/tests/test_check_name_similarity.py to a parameterized pytest test using @pytest.mark.parametrize for clearer, pytest-native cases.
Added a deterministic 5,000-row sample of NHLOCAL/SingNER at src/tests/data/singner_sample_5000.jsonl and a helper script src/tests/data/download_singner_sample.py to (re)download the same sample from the Hugging Face dataset.
Introduced a new dataset-driven test module src/tests/test_singner_dataset_matching.py that loads the sample, extracts SINGER entities and runs comprehensive validations (direct matches, vav-prefix variants, prefix/suffix negatives, no-answer checks), assembling 120+ edge cases derived from the sample.
Small hygiene edits to remove unused imports and random seed remnants so tests are deterministic and self-contained.

Testing

Confirmed the sample file contains exactly 5,000 rows with wc -l src/tests/data/singner_sample_5000.jsonl.
Verified the Hugging Face dataset can be loaded and a sample written via load_dataset('NHLOCAL/SingNER') and the included download_singner_sample.py script.
Ran the full test suite with pytest -q and observed all tests passing (17 passed).
Installed datasets as part of validation and exercised the dataset loading flow used to generate the committed sample.

Codex Task

gemini-code-assist · 2026-02-15T03:35:34Z

Summary of Changes

Hello @NHLOCAL, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the robustness and maintainability of the singer-name matching logic by modernizing its testing infrastructure. It transitions from a legacy unittest framework to pytest and introduces a comprehensive, dataset-driven validation approach using a real-world labeled dataset. These changes aim to improve confidence in the matching algorithm's accuracy and provide a more scalable and reproducible testing environment.

Highlights

Test Framework Migration: Migrated existing unittest-style checks in test_check_name_similarity.py to a native pytest layout using @pytest.mark.parametrize for enhanced parametrization and integration.
Dataset Integration: Incorporated a deterministic 5,000-row sample of the NHLOCAL/SingNER dataset, along with a helper script to download and refresh this sample, ensuring reproducible offline testing.
New Dataset-Driven Tests: Introduced a new test module, test_singner_dataset_matching.py, which leverages the SingNER sample to validate the singer-name matching logic against over 120 comprehensive edge cases, including direct matches, prefix variants, and negative scenarios.
Code Hygiene: Performed minor hygiene edits to remove unused imports and random seed remnants, contributing to more deterministic and self-contained tests.

Changelog

src/tests/data/download_singner_sample.py
- Added a new Python script to download a 5,000-row sample from the NHLOCAL/SingNER Hugging Face dataset.
- Implemented logic to save the sample as singner_sample_5000.jsonl for offline use.
src/tests/test_check_name_similarity.py
- Removed unittest imports and class structure.
- Converted all individual test methods into a single parameterized pytest function using @pytest.mark.parametrize.
- Updated assertions from self.assertTrue/self.assertFalse to assert ... is expected.
src/tests/test_singner_dataset_matching.py
- Added a new pytest test module.
- Included fixtures (singner_rows, singer_mentions) to load and process the SingNER dataset sample.
- Implemented tests to verify dataset row count, singer entity matching ratio, and comprehensive edge cases derived from the dataset.
- Added a test to ensure no matches occur when the artist is not present in the text.

Activity

No specific activity (comments, reviews, or progress updates) has been recorded for this pull request.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 76b222577d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-02-15T03:37:45Z

src/tests/data/download_singner_sample.py

+
+
+def main() -> None:
+    dataset = load_dataset("NHLOCAL/SingNER", split="train")


Pin SingNER dataset revision before sampling

The sampling script currently calls load_dataset("NHLOCAL/SingNER", split="train") without a pinned revision, so rerunning it after any upstream dataset update can produce a different 5,000-row file even though the script/docstring describes the sample as deterministic. This undermines reproducibility for anyone regenerating singner_sample_5000.jsonl in a later environment; pinning a specific dataset revision (or commit hash) avoids silent drift.

Useful? React with 👍 / 👎.

gemini-code-assist

Code Review

This is a great pull request that significantly improves the testing strategy for the singer-name matching logic. Migrating the existing tests to use pytest.mark.parametrize is a solid improvement for readability, and the introduction of dataset-driven tests using a real-world sample from SingNER will greatly increase confidence in the algorithm's correctness. The new download script is also a welcome addition for reproducibility.

I've added a few comments with suggestions for improvement, mainly around enhancing test correctness and maintainability by removing magic numbers and improving code conciseness. Overall, these are excellent changes.

gemini-code-assist · 2026-02-15T03:38:05Z

src/tests/test_singner_dataset_matching.py

+@pytest.fixture(scope="module")
+def singner_rows() -> list[dict]:
+    rows = _load_dataset_sample()
+    assert len(rows) >= 5000, "Expected at least 5000 rows in sample"


The assertion len(rows) >= 5000 is too lenient. The dataset sample is expected to have exactly 5000 rows, as specified in the download script (SAMPLE_SIZE = 5000) and confirmed in test_dataset_row_count_is_significant. Using == ensures the fixture provides the exact data the tests expect, preventing potential issues if the sample file changes unexpectedly.

Suggested change

assert len(rows) >= 5000, "Expected at least 5000 rows in sample"

assert len(rows) == 5000, "Expected exactly 5000 rows in sample"

gemini-code-assist · 2026-02-15T03:38:06Z

src/tests/data/download_singner_sample.py

+        for row in sample:
+            output_file.write(
+                json.dumps(
+                    {"text": row["text"], "entities": row["entities"]},
+                    ensure_ascii=False,
+                )
+                + "\n"
+            )


This loop can be written more concisely and potentially more performantly using a generator expression inside str.join(). This avoids repeated write() calls within the loop and is generally more Pythonic.

Suggested change

for row in sample:

output_file.write(

json.dumps(

{"text": row["text"], "entities": row["entities"]},

ensure_ascii=False,

)

+ "\n"

)

output_file.write(

"\n".join(

json.dumps(

{"text": row["text"], "entities": row["entities"]},

ensure_ascii=False,

)

for row in sample

)

+ "\n"

)

gemini-code-assist · 2026-02-15T03:38:06Z

src/tests/test_singner_dataset_matching.py

+def test_singer_entities_match_in_original_text(singer_mentions: list[tuple[str, str]]) -> None:
+    sampled_mentions = singer_mentions[:1500]
+    matched = sum(1 for text, singer in sampled_mentions if check_exact_name(text, singer))
+    ratio = matched / len(sampled_mentions)
+
+    assert ratio >= 0.98, f"Unexpectedly low match ratio on dataset sample: {ratio:.3f}"
+
+
+def test_comprehensive_edge_cases_from_dataset(singer_mentions: list[tuple[str, str]]) -> None:
+    unique_singers = _pick_unique_singers(singer_mentions, limit=300)
+    assert len(unique_singers) >= 120
+
+    edge_cases: list[tuple[str, str, bool]] = []
+
+    # 1) Positive direct cases from dataset (40)
+    for text, singer in singer_mentions[:40]:
+        edge_cases.append((text, singer, True))
+
+    # 2) Positive with optional Hebrew vav-prefix (20)
+    vav_candidates = [s for s in unique_singers if not s.startswith("ו")]
+    for singer in vav_candidates[:20]:
+        edge_cases.append((f"שיר חי עם ו{singer}", singer, True))
+
+    # 3) Negative: prefix expansion should not match (20)
+    for singer in unique_singers[20:40]:
+        edge_cases.append((f"ל{singer} בהופעה", singer, False))
+
+    # 4) Negative: suffix expansion should not match (20)
+    for singer in unique_singers[40:60]:
+        edge_cases.append((f"{singer}ל בהופעה", singer, False))
+
+    # 5) Negative: text with no realistic artist mention (20)
+    for singer in unique_singers[60:80]:
+        edge_cases.append(("zzzzq xxyy 12345 ללא זמר מזוהה", singer, False))
+
+    assert len(edge_cases) >= 120
+
+    failed_cases = [
+        (filename, artist, expected)
+        for filename, artist, expected in edge_cases
+        if check_exact_name(filename, artist) is not expected
+    ]
+
+    assert not failed_cases, f"Found {len(failed_cases)} failing edge cases"
+
+
+def test_no_answer_when_artist_not_present(singer_mentions: list[tuple[str, str]]) -> None:
+    unique_singers = _pick_unique_singers(singer_mentions, limit=100)
+    text_without_singer = "גרסת בדיקה ללא התאמה כלל 987654321"
+
+    assert all(not check_exact_name(text_without_singer, singer) for singer in unique_singers)


This test file uses several "magic numbers" which make the tests harder to read and maintain. Please extract these numbers into named constants at the module level. This will make the intent of the code clearer and future modifications easier.

Examples of magic numbers to replace:

In test_singer_entities_match_in_original_text: 1500 and 0.98.

In test_comprehensive_edge_cases_from_dataset: 300, 120, 40, 20, and the slicing indices.

In test_no_answer_when_artist_not_present: 100.

For example:

# At module level SAMPLED_MENTIONS_COUNT = 1500 MIN_MATCH_RATIO = 0.98 UNIQUE_SINGER_LIMIT = 300 # ... etc. # In test_singer_entities_match_in_original_text def test_singer_entities_match_in_original_text(singer_mentions: list[tuple[str, str]]) -> None: sampled_mentions = singer_mentions[:SAMPLED_MENTIONS_COUNT] # ... assert ratio >= MIN_MATCH_RATIO, f"..."

Add SingNER-based pytest coverage with 5k sample

76b2225

NHLOCAL added the codex label Feb 15, 2026 — with ChatGPT Codex Connector

chatgpt-codex-connector bot reviewed Feb 15, 2026

View reviewed changes

gemini-code-assist bot reviewed Feb 15, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add SingNER dataset-driven pytest coverage and refresh existing tests#27

Add SingNER dataset-driven pytest coverage and refresh existing tests#27
NHLOCAL wants to merge 1 commit intodevfrom
codex/create-comprehensive-tests-for-singner-dataset

NHLOCAL commented Feb 15, 2026

Uh oh!

gemini-code-assist bot commented Feb 15, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Feb 15, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Feb 15, 2026

Uh oh!

gemini-code-assist bot Feb 15, 2026

Uh oh!

gemini-code-assist bot Feb 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant



		def main() -> None:
		dataset = load_dataset("NHLOCAL/SingNER", split="train")

	assert len(rows) >= 5000, "Expected at least 5000 rows in sample"
	assert len(rows) == 5000, "Expected exactly 5000 rows in sample"

Uh oh!

Conversation

NHLOCAL commented Feb 15, 2026

Motivation

Description

Testing

Uh oh!

gemini-code-assist bot commented Feb 15, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Feb 15, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Feb 15, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 15, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 15, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant