Add SingNER dataset-driven pytest coverage and refresh existing tests#27
Add SingNER dataset-driven pytest coverage and refresh existing tests#27
Conversation
Summary of ChangesHello @NHLOCAL, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the robustness and maintainability of the singer-name matching logic by modernizing its testing infrastructure. It transitions from a legacy Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 76b222577d
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
|
||
|
|
||
| def main() -> None: | ||
| dataset = load_dataset("NHLOCAL/SingNER", split="train") |
There was a problem hiding this comment.
Pin SingNER dataset revision before sampling
The sampling script currently calls load_dataset("NHLOCAL/SingNER", split="train") without a pinned revision, so rerunning it after any upstream dataset update can produce a different 5,000-row file even though the script/docstring describes the sample as deterministic. This undermines reproducibility for anyone regenerating singner_sample_5000.jsonl in a later environment; pinning a specific dataset revision (or commit hash) avoids silent drift.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Code Review
This is a great pull request that significantly improves the testing strategy for the singer-name matching logic. Migrating the existing tests to use pytest.mark.parametrize is a solid improvement for readability, and the introduction of dataset-driven tests using a real-world sample from SingNER will greatly increase confidence in the algorithm's correctness. The new download script is also a welcome addition for reproducibility.
I've added a few comments with suggestions for improvement, mainly around enhancing test correctness and maintainability by removing magic numbers and improving code conciseness. Overall, these are excellent changes.
| @pytest.fixture(scope="module") | ||
| def singner_rows() -> list[dict]: | ||
| rows = _load_dataset_sample() | ||
| assert len(rows) >= 5000, "Expected at least 5000 rows in sample" |
There was a problem hiding this comment.
The assertion len(rows) >= 5000 is too lenient. The dataset sample is expected to have exactly 5000 rows, as specified in the download script (SAMPLE_SIZE = 5000) and confirmed in test_dataset_row_count_is_significant. Using == ensures the fixture provides the exact data the tests expect, preventing potential issues if the sample file changes unexpectedly.
| assert len(rows) >= 5000, "Expected at least 5000 rows in sample" | |
| assert len(rows) == 5000, "Expected exactly 5000 rows in sample" |
| for row in sample: | ||
| output_file.write( | ||
| json.dumps( | ||
| {"text": row["text"], "entities": row["entities"]}, | ||
| ensure_ascii=False, | ||
| ) | ||
| + "\n" | ||
| ) |
There was a problem hiding this comment.
This loop can be written more concisely and potentially more performantly using a generator expression inside str.join(). This avoids repeated write() calls within the loop and is generally more Pythonic.
| for row in sample: | |
| output_file.write( | |
| json.dumps( | |
| {"text": row["text"], "entities": row["entities"]}, | |
| ensure_ascii=False, | |
| ) | |
| + "\n" | |
| ) | |
| output_file.write( | |
| "\n".join( | |
| json.dumps( | |
| {"text": row["text"], "entities": row["entities"]}, | |
| ensure_ascii=False, | |
| ) | |
| for row in sample | |
| ) | |
| + "\n" | |
| ) |
| def test_singer_entities_match_in_original_text(singer_mentions: list[tuple[str, str]]) -> None: | ||
| sampled_mentions = singer_mentions[:1500] | ||
| matched = sum(1 for text, singer in sampled_mentions if check_exact_name(text, singer)) | ||
| ratio = matched / len(sampled_mentions) | ||
|
|
||
| assert ratio >= 0.98, f"Unexpectedly low match ratio on dataset sample: {ratio:.3f}" | ||
|
|
||
|
|
||
| def test_comprehensive_edge_cases_from_dataset(singer_mentions: list[tuple[str, str]]) -> None: | ||
| unique_singers = _pick_unique_singers(singer_mentions, limit=300) | ||
| assert len(unique_singers) >= 120 | ||
|
|
||
| edge_cases: list[tuple[str, str, bool]] = [] | ||
|
|
||
| # 1) Positive direct cases from dataset (40) | ||
| for text, singer in singer_mentions[:40]: | ||
| edge_cases.append((text, singer, True)) | ||
|
|
||
| # 2) Positive with optional Hebrew vav-prefix (20) | ||
| vav_candidates = [s for s in unique_singers if not s.startswith("ו")] | ||
| for singer in vav_candidates[:20]: | ||
| edge_cases.append((f"שיר חי עם ו{singer}", singer, True)) | ||
|
|
||
| # 3) Negative: prefix expansion should not match (20) | ||
| for singer in unique_singers[20:40]: | ||
| edge_cases.append((f"ל{singer} בהופעה", singer, False)) | ||
|
|
||
| # 4) Negative: suffix expansion should not match (20) | ||
| for singer in unique_singers[40:60]: | ||
| edge_cases.append((f"{singer}ל בהופעה", singer, False)) | ||
|
|
||
| # 5) Negative: text with no realistic artist mention (20) | ||
| for singer in unique_singers[60:80]: | ||
| edge_cases.append(("zzzzq xxyy 12345 ללא זמר מזוהה", singer, False)) | ||
|
|
||
| assert len(edge_cases) >= 120 | ||
|
|
||
| failed_cases = [ | ||
| (filename, artist, expected) | ||
| for filename, artist, expected in edge_cases | ||
| if check_exact_name(filename, artist) is not expected | ||
| ] | ||
|
|
||
| assert not failed_cases, f"Found {len(failed_cases)} failing edge cases" | ||
|
|
||
|
|
||
| def test_no_answer_when_artist_not_present(singer_mentions: list[tuple[str, str]]) -> None: | ||
| unique_singers = _pick_unique_singers(singer_mentions, limit=100) | ||
| text_without_singer = "גרסת בדיקה ללא התאמה כלל 987654321" | ||
|
|
||
| assert all(not check_exact_name(text_without_singer, singer) for singer in unique_singers) |
There was a problem hiding this comment.
This test file uses several "magic numbers" which make the tests harder to read and maintain. Please extract these numbers into named constants at the module level. This will make the intent of the code clearer and future modifications easier.
Examples of magic numbers to replace:
- In
test_singer_entities_match_in_original_text:1500and0.98. - In
test_comprehensive_edge_cases_from_dataset:300,120,40,20, and the slicing indices. - In
test_no_answer_when_artist_not_present:100.
For example:
# At module level
SAMPLED_MENTIONS_COUNT = 1500
MIN_MATCH_RATIO = 0.98
UNIQUE_SINGER_LIMIT = 300
# ... etc.
# In test_singer_entities_match_in_original_text
def test_singer_entities_match_in_original_text(singer_mentions: list[tuple[str, str]]) -> None:
sampled_mentions = singer_mentions[:SAMPLED_MENTIONS_COUNT]
# ...
assert ratio >= MIN_MATCH_RATIO, f"..."
Motivation
unittest-style checks to a nativepytestlayout for easier parametrization and integration.Description
unittesttest insrc/tests/test_check_name_similarity.pyto a parameterizedpytesttest using@pytest.mark.parametrizefor clearer, pytest-native cases.NHLOCAL/SingNERatsrc/tests/data/singner_sample_5000.jsonland a helper scriptsrc/tests/data/download_singner_sample.pyto (re)download the same sample from the Hugging Face dataset.src/tests/test_singner_dataset_matching.pythat loads the sample, extractsSINGERentities and runs comprehensive validations (direct matches, vav-prefix variants, prefix/suffix negatives, no-answer checks), assembling 120+ edge cases derived from the sample.Testing
wc -l src/tests/data/singner_sample_5000.jsonl.load_dataset('NHLOCAL/SingNER')and the includeddownload_singner_sample.pyscript.pytest -qand observed all tests passing (17 passed).datasetsas part of validation and exercised the dataset loading flow used to generate the committed sample.Codex Task