Conversation
Signed-off-by: Adrian Edwards <adredwar@redhat.com>
Signed-off-by: Isaac Milarsky <imilarsky@gmail.com>
Add two new tables and ORM models for Topic Modeling versioning system:
1. topic_model_meta table (Migration 35):
- Stores metadata for each trained topic model
- 21 fields including model_id (UUID PK), repo_id (FK), training parameters,
quality metrics (coherence_score, topic_diversity), and visualization data
- Enables model versioning, comparison, and intelligent retraining
2. topic_model_event table (Migration 36):
- Audit log for topic modeling events
- Tracks training lifecycle: started, completed, retrain triggered, etc.
- Provides observability for automated and manual training operations
3. TopicModelMeta ORM model:
- SQLAlchemy model definition for topic_model_meta table
- Relationships and field mappings for application layer
These schema changes support the Topic Modeling feature that enables:
- Automated NMF-based topic extraction from repository messages
- Model version management and comparison
- Intelligent retraining based on data/quality changes
- Storage optimization via REPLACE strategy for automatic runs
Related: #3207
Signed-off-by: Xiaoha <blairjade183@gmail.com>
Signed-off-by: Shlok Gilda <gildashlok@hotmail.com>
- All JSON/JSONB fields in Augur have NO indexes - Verified: repo_badging.data (JSONB), chaoss_metric_status.cm_info (JSON), etc. - payload is used for display, not filtering - Query performance relies on ix_tme_repo_ts and ix_tme_event indexes Signed-off-by: Xiaoha <blairjade183@gmail.com>
Signed-off-by: Shlok Gilda <gildashlok@hotmail.com>
…ockers log stream Signed-off-by: Adrian Edwards <adredwar@redhat.com>
- set training_start_time/end_time/data_collection_date to TIMESTAMPTZ - update TopicModelMeta ORM to use timezone-aware columns - align topic_model_event ts column with TIMESTAMPTZ requirement - satisfies maintainer request for timezone data storage Signed-off-by: Xiaoha <blairjade183@gmail.com>
- switch Alembic migrations to use sa.TIMESTAMP(timezone=True) - keeps timezone support while avoiding Postgres-specific type import Signed-off-by: Xiaoha <blairjade183@gmail.com>
…typos Fix typos in test directory names (fixes #3398)
feat: Add Topic Modeling database schema tables
- add csv_utils.py with intelligent header detection - refactor add-repos and add-repo-groups commands to use new CSV parser - support both header and headerless CSV formats - add automatic column detection for headerless CSVs - add 10MB file size limit with clear error message - update sample CSV files to include headers Fixes #3310 Signed-off-by: Shlok Gilda <gildashlok@hotmail.com>
Signed-off-by: Shlok Gilda <gildashlok@hotmail.com>
Signed-off-by: Shlok Gilda <gildashlok@hotmail.com>
…in cli Signed-off-by: Shlok Gilda <gildashlok@hotmail.com>
…error handling Signed-off-by: Shlok Gilda <gildashlok@hotmail.com>
Signed-off-by: Shlok Gilda <gildashlok@hotmail.com>
Migration 36 created the topic_model_event table in the database, but the corresponding SQLAlchemy model was not added to augur_data.py. This caused ORM-level access to the event table to fail. This commit adds the TopicModelEvent class with: - All table columns (event_id, ts, repo_id, model_id, event, level, payload) - Index definitions for ix_tme_repo_ts and ix_tme_event - Foreign key constraints to repo and topic_model_meta tables - Relationship mappings to Repo and TopicModelMeta models This enables the application to query and manipulate topic modeling events through the ORM layer. Related: augur/application/schema/alembic/versions/36_add_topic_model_event.py Signed-off-by: Xiaoha <blairjade183@gmail.com>
Ensure repo_id column type matches migration definition (sa.Integer) for complete schema consistency between ORM and database. Signed-off-by: Xiaoha <blairjade183@gmail.com>
Add TopicModelEvent ORM model to augur_data.py
…clients Fix database connection leak when deleting repos via web UI: Fixes #3392 Discussed in maintainers meeting. Solution validated and ready for testing.
Ensure Docker logging facilities can see gunicorn errors. small change with well understood effects, so merging.
Signed-off-by: Adrian Edwards <adredwar@redhat.com>
Signed-off-by: PredictiveManish <manish.tiwari.09@zohomail.in>
… clarity Signed-off-by: Adeeba Nizam <adeebanizam63@gmail.com>
Create a migration to synchronize the topic model tables. This fix was discussed in Maintainers meetings and is being merged.
… config file Signed-off-by: Adrian Edwards <adredwar@redhat.com>
Signed-off-by: Adrian Edwards <adredwar@redhat.com>
Signed-off-by: Adrian Edwards <adredwar@redhat.com>
Signed-off-by: Adrian Edwards <17362949+MoralCode@users.noreply.github.com>
…ed-processing [tasks/github] Batch processing for PR review comments collection
Fix config hierarchy
Signed-off-by: Adrian Edwards <adredwar@redhat.com>
Signed-off-by: Adrian Edwards <adredwar@redhat.com>
using pytest avoids two layers of python environment "the gap between task runners like tox and test runners like pytest is narrower now" - Gemini Signed-off-by: Adrian Edwards <adredwar@redhat.com>
Signed-off-by: pushpit kamboj <pushpitkamboj@gmail.com>
Signed-off-by: iGufrankhan <gufrankhankab123@gmail.com>
(fix): remove no else raise and no else return rules from .pylintrc
Signed-off-by: Adrian Edwards <17362949+MoralCode@users.noreply.github.com>
Signed-off-by: Adrian Edwards <17362949+MoralCode@users.noreply.github.com>
Signed-off-by: Adrian Edwards <17362949+MoralCode@users.noreply.github.com>
…tview Remove stale explorer_libyear_detail refresh from matview script
Signed-off-by: Noaman-Akhtar <akhtarnoaman@gmail.com>
Signed-off-by: Adrian Edwards <17362949+MoralCode@users.noreply.github.com>
add CI job for running the unit tests
…sql.schema Deleted the augur-retired-sql.schema file
Signed-off-by: Adrian Edwards <adredwar@redhat.com>
…iases table Signed-off-by: Adrian Edwards <adredwar@redhat.com>
Signed-off-by: Adrian Edwards <adredwar@redhat.com>
Signed-off-by: Adrian Edwards <adredwar@redhat.com>
Fixes for repo url update on move detection
Revert database url retrieval so manual install works
Signed-off-by: Sean P. Goggins <s@goggins.com>
| from sqlalchemy import and_, update | ||
| import json | ||
| import copy | ||
| from typing import List, Any, Optional |
There was a problem hiding this comment.
[pylint] reported by reviewdog 🐶
W0611: Unused List imported from typing (unused-import)
| max_rev = max(max_rev, int(m.group(1))) | ||
| return str(max_rev + 1) | ||
|
|
||
| def process_revision_directives(context, revision, directives): |
There was a problem hiding this comment.
[pylint] reported by reviewdog 🐶
W0621: Redefining name 'context' from outer scope (line 4) (redefined-outer-name)
| @@ -1,7 +1,7 @@ | |||
| import pytest | |||
There was a problem hiding this comment.
[pylint] reported by reviewdog 🐶
W0611: Unused import pytest (unused-import)
| @@ -5,7 +5,7 @@ | |||
|
|
|||
| from augur.application.db.session import DatabaseSession | |||
| from augur.tasks.github.util.github_task_session import GithubTaskSession | |||
There was a problem hiding this comment.
[pylint] reported by reviewdog 🐶
W0611: Unused GithubTaskSession imported from augur.tasks.github.util.github_task_session (unused-import)
| @@ -1,7 +1,7 @@ | |||
| import pytest | |||
There was a problem hiding this comment.
[pylint] reported by reviewdog 🐶
W0611: Unused import pytest (unused-import)
| dummyPersistent.enrich_data_primary_keys(sample_source_data_enriched, tableDict['contributors_table'], gh_merge_fields, augur_merge_fields) | ||
|
|
||
| #now test each record to make sure that they have an avatar_url | ||
| avatar_url_sql = s.sql.text(""" |
There was a problem hiding this comment.
[pylint] reported by reviewdog 🐶
E0602: Undefined variable 's' (undefined-variable)
| assert dummyPersistent.enrich_data_primary_keys(None, "contributors_table", gh_merge_fields, augur_merge_fields) == None | ||
|
|
||
|
|
||
| def test_enrich_data_primary_keys_redundant_enrich(database_connection,sample_source_data_enriched, sample_source_data_unenriched): |
There was a problem hiding this comment.
[pylint] reported by reviewdog 🐶
W0621: Redefining name 'database_connection' from outer scope (line 3) (redefined-outer-name)
| assert dummyPersistent.enrich_data_primary_keys(None, "contributors_table", gh_merge_fields, augur_merge_fields) == None | ||
|
|
||
|
|
||
| def test_enrich_data_primary_keys_redundant_enrich(database_connection,sample_source_data_enriched, sample_source_data_unenriched): |
There was a problem hiding this comment.
[pylint] reported by reviewdog 🐶
W0621: Redefining name 'sample_source_data_enriched' from outer scope (line 3) (redefined-outer-name)
| assert dummyPersistent.enrich_data_primary_keys(None, "contributors_table", gh_merge_fields, augur_merge_fields) == None | ||
|
|
||
|
|
||
| def test_enrich_data_primary_keys_redundant_enrich(database_connection,sample_source_data_enriched, sample_source_data_unenriched): |
There was a problem hiding this comment.
[pylint] reported by reviewdog 🐶
W0621: Redefining name 'sample_source_data_unenriched' from outer scope (line 3) (redefined-outer-name)
| dummyPersistent.enrich_data_primary_keys(sample_source_data_enriched, tableDict['contributors_table'], gh_merge_fields, augur_merge_fields) | ||
|
|
||
| #now test each record to make sure that they have an avatar_url | ||
| avatar_url_sql = s.sql.text(""" |
There was a problem hiding this comment.
[pylint] reported by reviewdog 🐶
E0602: Undefined variable 's' (undefined-variable)
Description
/ Goggins