Skip to content

Anomaly tools changes#133

Draft
kiihne-noaa wants to merge 21 commits intomainfrom
anomaly_tools_changes
Draft

Anomaly tools changes#133
kiihne-noaa wants to merge 21 commits intomainfrom
anomaly_tools_changes

Conversation

@kiihne-noaa
Copy link
Copy Markdown
Collaborator

Describe your changes

Work-in-progress changes to epmt_query designed to enable anomaly detection. Focus on allowing saving of models separate from creation, active editing of already created models, and management of models in DB.

Issue ticket number, link (if applicable)

Checklist

A loose guide to provide structure for contributions

  • the code runs
  • the code is readable
  • the code is commented
  • there are no additional failures in GUARDED pipeline tasks
  • a new test was written (if applicable)
  • new instructions/doc was written (if applicable)
  • I ran pylint and attempted to implement some of it's feedback

@kiihne-noaa kiihne-noaa linked an issue Mar 10, 2026 that may be closed by this pull request
@ilaflott ilaflott force-pushed the anomaly_tools_changes branch from 67e059c to 8442a35 Compare March 12, 2026 15:45
@ilaflott ilaflott requested a review from Copilot March 12, 2026 15:56
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Work-in-progress updates to epmt_query to support anomaly detection workflows by enabling saving reference models separately from creation and improving DB model management.

Changes:

  • Added a new save_refmodel() helper for persisting reference models.
  • Updated create_refmodel() to call save_refmodel() instead of directly creating ORM objects.
  • Added a new test_save_refmodel.py script intended to exercise refmodel creation/saving.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 6 comments.

File Description
test_save_refmodel.py Adds a script-like “test” that creates/deletes a refmodel and calls save_refmodel().
src/epmt/epmt_query.py Alters refmodel creation flow and introduces save_refmodel() for DB persistence.

Comment on lines +1353 to +1364
@@ -1357,9 +1361,18 @@ def create_refmodel(jobs, name=None, tag={}, op_tags=[],
return r.id
r_dict = orm_to_dict(r, with_collections=True)
return pd.Series(r_dict) if fmt == 'pandas' else r_dict

'''
Copy link

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

create_refmodel() no longer returns the created reference model (the prior return logic is now inside a triple-quoted string). This is a behavior-breaking change: callers will now receive None instead of the created model/id/dict. Make create_refmodel() return the result of save_refmodel(...) (and remove the triple-quoted block rather than leaving it as a stray string literal statement).

Copilot uses AI. Check for mistakes.
Comment on lines +1367 to +1375
def save_refmodel(ReferenceModel, jobs, computed, info_dict, enabled, name=None, tags={}, op_tags=[], fmt='dict'):
r = orm_create(ReferenceModel, jobs=jobs, computed=computed, info_dict = info_dict, enabled=enabled, name=name, tags=tags, op_tags=op_tags, fmt=fmt)
orm_commit()
if fmt == 'orm':
return r
elif fmt == 'terse':
return r.id
r_dict = orm_to_dict(r, with_collections=True)
return pd.Series(r_dict) if fmt == 'pandas' else r_dict
Copy link

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few API/implementation issues in save_refmodel():\n- It uses mutable default arguments (tags={}, op_tags=[]), which can leak state between calls. Use None defaults and assign inside the function.\n- The first parameter is named ReferenceModel, which reads like the ORM class rather than an argument; it’s clearer as model_cls (or omit it and reference ReferenceModel directly if it’s always the same).\n- Consider decorating with @db_session (like create_refmodel) so save_refmodel() is safe to call directly outside an existing session.\n- orm_create(..., fmt=fmt) is potentially problematic if orm_create doesn’t accept fmt (the prior code handled formatting after orm_create). If orm_create doesn’t support it, remove fmt=fmt and keep formatting in save_refmodel().

Copilot uses AI. Check for mistakes.
Comment on lines +12 to +23

feature_list = [ 'duration', 'rchar', 'syscr', 'syscw', 'wchar', 'cstime', 'cutime', 'majflt', 'cpu_time', 'minflt', 'rssmax', 'cmajflt','cminflt', 'inblock', 'outblock', 'usertime', 'num_procs', 'starttime', 'vol_ctxsw', 'read_bytes', 'systemtime', 'time_oncpu', 'timeslices', 'invol_ctxsw', 'write_bytes', 'time_waiting', 'cancelled_write_bytes']
random_jobs = eq.get_jobs(limit = 100, fmt = 'dict', trigger_post_process=False)

try:
r = eq.get_refmodels("bronx_test_model")
eq.delete_refmodels(r[0]['id'])
finally:
print("ready for new bronx model")
r = eq.create_refmodel(random_jobs, methods=es.mvod_classifiers(), features = feature_list, name = "bronx_test_model")

eq.save_refmodel(ReferenceModel, r['jobs'], r['computed'], r['info_dict'], r['enabled'], name='test_name', tag={}, op_tags=[])
Copy link

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is named like a unit test but behaves like a destructive integration script: it performs DB reads/writes, deletes existing models, and has no assertions. This will be flaky and potentially harmful in CI/dev environments.\n\nSuggested direction (pick one):\n- Convert it to a real automated test (e.g., pytest) with assertions and isolated DB/fixtures (and avoid deleting arbitrary existing models), or\n- Move it to a scripts/ or examples/ location and keep it out of the test discovery path.\n\nAlso, the call uses tag={} but save_refmodel() expects tags=... (per its signature).

Suggested change
feature_list = [ 'duration', 'rchar', 'syscr', 'syscw', 'wchar', 'cstime', 'cutime', 'majflt', 'cpu_time', 'minflt', 'rssmax', 'cmajflt','cminflt', 'inblock', 'outblock', 'usertime', 'num_procs', 'starttime', 'vol_ctxsw', 'read_bytes', 'systemtime', 'time_oncpu', 'timeslices', 'invol_ctxsw', 'write_bytes', 'time_waiting', 'cancelled_write_bytes']
random_jobs = eq.get_jobs(limit = 100, fmt = 'dict', trigger_post_process=False)
try:
r = eq.get_refmodels("bronx_test_model")
eq.delete_refmodels(r[0]['id'])
finally:
print("ready for new bronx model")
r = eq.create_refmodel(random_jobs, methods=es.mvod_classifiers(), features = feature_list, name = "bronx_test_model")
eq.save_refmodel(ReferenceModel, r['jobs'], r['computed'], r['info_dict'], r['enabled'], name='test_name', tag={}, op_tags=[])
import uuid
feature_list = [ 'duration', 'rchar', 'syscr', 'syscw', 'wchar', 'cstime', 'cutime', 'majflt', 'cpu_time', 'minflt', 'rssmax', 'cmajflt','cminflt', 'inblock', 'outblock', 'usertime', 'num_procs', 'starttime', 'vol_ctxsw', 'read_bytes', 'systemtime', 'time_oncpu', 'timeslices', 'invol_ctxsw', 'write_bytes', 'time_waiting', 'cancelled_write_bytes']
def test_save_refmodel_persists_refmodel():
"""
Integration-style test that creates a reference model, saves it,
verifies it can be retrieved, and then cleans it up.
"""
model_name = "bronx_test_model_" + uuid.uuid4().hex
random_jobs = eq.get_jobs(limit=100, fmt='dict', trigger_post_process=False)
r = eq.create_refmodel(
random_jobs,
methods=es.mvod_classifiers(),
features=feature_list,
name=model_name,
)
try:
eq.save_refmodel(
ReferenceModel,
r['jobs'],
r['computed'],
r['info_dict'],
r['enabled'],
name=model_name,
tags={},
op_tags=[],
)
saved_models = eq.get_refmodels(model_name)
assert saved_models, "Expected at least one saved reference model"
finally:
# Clean up only the models created for this test
try:
models_to_delete = eq.get_refmodels(model_name)
except Exception:
models_to_delete = []
for m in models_to_delete:
if 'id' in m:
eq.delete_refmodels(m['id'])

Copilot uses AI. Check for mistakes.
# epmt_stat contains statistical functions
from epmt import epmt_stat as es
from epmt import epmt_query as eq
from epmt.orm import *
@@ -0,0 +1,23 @@
import numpy as np
@@ -0,0 +1,23 @@
import numpy as np
import epmt
@@ -0,0 +1,23 @@
import numpy as np
import epmt
from epmt import epmt_query
import epmt
from epmt import epmt_query
# epmt_outliers contains the EPMT Outlier Detection API
from epmt import epmt_outliers as eod
from epmt import epmt_stat as es
from epmt import epmt_query as eq
from epmt.orm import *
import pandas
from epmt import epmt_query as eq
from epmt.orm import *
import pandas
from pandas import DataFrame
@@ -0,0 +1,23 @@
import numpy as np
import epmt
@kiihne-noaa kiihne-noaa force-pushed the anomaly_tools_changes branch from 035b6ea to 824b69b Compare April 2, 2026 13:00
@codecov
Copy link
Copy Markdown

codecov bot commented Apr 2, 2026

Codecov Report

❌ Patch coverage is 16.66667% with 15 lines in your changes missing coverage. Please review.
✅ Project coverage is 76.51%. Comparing base (9c80daf) to head (d6e0eae).

Files with missing lines Patch % Lines
src/epmt/epmt_query.py 16.66% 15 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #133      +/-   ##
==========================================
- Coverage   76.67%   76.51%   -0.17%     
==========================================
  Files          34       34              
  Lines        6636     6654      +18     
==========================================
+ Hits         5088     5091       +3     
- Misses       1548     1563      +15     
Flag Coverage Δ
unittests 76.51% <16.66%> (-0.17%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants