Bugfix/backend fixes and test suite stabilization#107
Conversation
… done so far and provide feedback for Jules to continue.
This commit introduces the 'scientific' branch, a significantly streamlined version of the application tailored for simpler deployments and focused on core functionalities.
Key changes include:
1. **Branch Creation:**
* Created the `scientific` branch from the main development line.
2. **Codebase Slimming & Refactoring:**
* **Removed Enterprise & Docker Configurations:** Deleted the `deployment/` directory and root Docker Compose files.
* **Simplified Backend Data Storage:**
* Refactored `server/python_backend/database.py` to use JSON files (`emails.json`, `categories.json`, `users.json`) instead of PostgreSQL.
* Removed PostgreSQL dependencies (`psycopg2-binary`, `asyncpg`) from Python requirements.
* Removed Node.js PostgreSQL dependencies (`pg`, `drizzle-orm`, `connect-pg-simple`, `drizzle-kit`) and related files (`server/db.ts`, `shared/schema.ts`).
* Simplified `smart_filters.db` (SQLite) schema by removing the unused `google_scripts` table.
* **Simplified Frontend (UI):**
* Removed `StatsCards`, `RecentActivity`, and `CategoryOverview` components from the dashboard.
* Simplified the AI Control Panel and header elements on the dashboard.
* Integrated `AIAnalysisPanel` to display when an email is selected.
* Removed `recharts` (charting library) from client dependencies.
* **Streamlined Python Backend & NLTK Pipeline:**
* Removed `dashboard_routes.py`, `gradio_app.py`, performance monitoring (`performance_monitor.py`, `metrics.py`), action item extraction features (`action_routes.py`, `action_item_extractor.py`), and AI training (`ai_training.py`).
* Removed unused NLP utilities (`data_strategy.py`, `retrieval_monitor.py`).
* Updated `NLPEngine` and `AdvancedAIEngine` to remove dependencies on deleted modules.
* Removed associated test files for many of these components.
3. **Styling Updates:**
* Adjusted global CSS (`client/src/index.css`) for a more compact appearance (reduced corner radius, smaller base font size) inspired by functional UIs.
4. **Environment & Setup Simplification:**
* Removed `gradio`, `pyngrok` from Python requirements.
* Significantly simplified `launch.py` by removing Gradio UI, ngrok/share, PyTorch/CUDA specifics, and extension/model management features.
* Created a new `README.md` tailored for the `scientific` branch, detailing the simplified setup process.
This branch is intended for you if you need the core email analysis and smart filtering capabilities with a minimal setup footprint, suitable for local development, research, or scientific use cases.
…ientific_2' into scientific
…ix-tests Fix/refactor email routes and fix tests
This commit refactors the application to be a pure Python server, removing the Node.js/TypeScript backend and all associated dependencies. Changes: - All Python source code from `server/python_backend` and `server/python_nlp` has been consolidated into a new, single `backend` directory. - The `extensions` directory and database files have also been moved into the `backend` directory. - All Python import statements and hardcoded file paths have been updated to reflect the new directory structure. - The FastAPI server has been modified to serve the frontend assets. - A new `run.py` script has been created at the project root to provide a simple entrypoint for the application. Known Issues and Next Steps: - Due to persistent environment errors (`TMP RAM FS is not large enough`), I was unable to build the frontend assets or remove the leftover Node.js files. The application is configured to serve the raw frontend files from the `client` directory as a temporary measure. The next step is to build the frontend and update the server to serve the built assets from the `dist` directory. - The static file paths in `backend/python_backend/main.py` are likely incorrect and need to be adjusted to be relative to the `main.py` file. - The `run.py` entrypoint could be improved by moving it into the `backend` directory and adjusting the run command accordingly. - The application requires downloading large machine learning models, which may cause timeouts in some environments. Running the `download_hf_models.py` script before starting the server is recommended.
Introduces configuration files for linting (.flake8, .pylintrc), ignore rules (.gitignore), and project templates (.continue/ and codebuff.json). Adds project knowledge documentation (knowledge.md) and initial rule, model, and prompt YAMLs for the EmailIntelligence project.
commit 94375f0 Author: MasumRab <8943353+MasumRab@users.noreply.github.com> Date: Mon Jun 16 17:07:39 2025 +1000 Create diagnosis_message.txt
- Add dependabot-auto-merge.yml workflow that automatically merges Dependabot PRs when tests pass - Add ci.yml workflow for comprehensive testing on all PRs and pushes - Include safety checks: test execution, linting, formatting, and merge readiness verification - Add pytest-cov dependency for coverage reporting - Add documentation for workflow setup and customization Co-authored-by: openhands <openhands@all-hands.dev>
CRITICAL FIXES: - Replace fragile bash JSON parsing with GitHub's native PR status checks - Consolidate auto-merge steps into single action with comprehensive error handling - Remove unnecessary matrix strategy from single-version CI - Add proper error handling for GitHub CLI operations with graceful degradation - Eliminate workflow duplication by trusting CI results instead of re-running tests IMPROVEMENTS: - Use GitHub context variables (mergeable_state, draft) instead of API calls - Implement wait-for-check action to properly depend on CI completion - Add set -e for proper error propagation in bash scripts - Fix mypy configuration to show meaningful errors - Update documentation to reflect architectural improvements This addresses all fundamental reliability and complexity issues identified in code review. Co-authored-by: openhands <openhands@all-hands.dev>
- Updated all dependencies to latest versions (64 packages upgraded) * FastAPI 0.115.12 → 0.117.1 * Pydantic 2.11.5 → 2.11.9 (with v2 migration) * PyTorch 2.7.1 → 2.8.0 * Transformers 4.52.4 → 4.56.2 * And many more core dependencies - Fixed Pydantic v2 compatibility issues: * Migrated @validator to @field_validator * Updated Config to ConfigDict * Fixed min_items → min_length * Resolved syntax errors in models - Modernized launcher system: * Replaced deprecated pkg_resources with importlib.metadata * Extended Python support to 3.11-3.12 range * Fixed module import paths (server → backend) * Improved async database initialization - Code quality improvements: * Removed unused imports using unimport * Fixed async/await patterns * Enhanced error handling - Added comprehensive repository documentation: * Created .openhands/microagents/repo.md * Documented project structure and setup * Included development guidelines - Verified functionality: * All tests passing (category routes: 4/4) * API server running correctly * Launcher system working properly * Dependencies properly updated and locked Co-authored-by: openhands <openhands@all-hands.dev>
Combines latest repository updates with the improved GitHub Actions workflows: - Maintains all critical workflow fixes (native GitHub API usage, error handling) - Preserves pytest-cov dependency for coverage reporting - Integrates new backend improvements and test updates Co-authored-by: openhands <openhands@all-hands.dev>
The `get_emails` endpoint did not previously support searching within a specific category. This change adds the ability to filter emails by both a search term and a category ID simultaneously. A new `search_emails_by_category` method has been added to the `DatabaseManager` to handle the combined query. The `get_emails` route in `email_routes.py` has been updated to use this new method when both `search` and `category_id` are provided. A new test case has been added to verify the new functionality, and existing tests have been refactored for clarity and maintainability.
This commit addresses several bugs in the Python backend and improves the reliability of the test suite. - **ai_engine.py:** - Fixed a bug where a database call was made unnecessarily when the AI analysis returned no categories. - Added a check to ensure `db.get_all_categories()` is only called when there are categories to match. - **filter_routes.py:** - Added missing `await` keywords to `async` function calls in the `generate_intelligent_filters` and `prune_filters` routes. - Fixed a bug in the `create_filter` route where it was not correctly serializing the `actions` object. - Corrected the `description` attribute access in the `create_filter` route. - **gmail_routes.py:** - Improved error handling for `GoogleApiHttpError` to prevent crashes when the error response has an unexpected format. - **smart_retrieval.py:** - Fixed a command-line argument parsing error by adding `--strategies` as an alias for `--strategy-names`. - **Test Suite:** - Stabilized the test suite by fixing test isolation issues, correcting mock setups, and updating test payloads to match Pydantic models. - All 28 tests in the backend test suite now pass.
WalkthroughThis PR transitions the app from a mixed Node/Express + Python stack to a Python-first FastAPI setup. It removes the Node server and TS toolchain, adds CI workflows, introduces SPA static serving, refactors Python imports and data handling, removes action_items and training from NLP paths, and adds new configs/data. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant C as Client
participant API as FastAPI email_routes
participant DB as DatabaseManager
C->>API: GET /api/emails?search=&category_id=
alt search and category_id provided
API->>DB: search_emails_by_category(search, category_id)
else search only
API->>DB: search_emails(search)
else category_id only (not None)
API->>DB: get_emails_by_category(category_id)
else none
API->>DB: get_all_emails()
end
DB-->>API: emails[]
API-->>C: 200 emails[]
sequenceDiagram
autonumber
participant GH as GitHub
participant WF as Dependabot Auto-Merge Workflow
participant CI as CI Workflow
GH-->>WF: PR event (opened/sync) by dependabot[bot]
WF->>CI: Wait for check "test" to complete
alt CI success
WF->>GH: gh pr review --approve
WF->>GH: gh pr merge --auto --merge
alt Auto-merge already enabled
WF-->>GH: log "already enabled"
else Enabled now
WF-->>GH: log "auto-merge enabled"
end
else CI failed/timeout
WF-->>GH: Exit with error
end
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes Possibly related PRs
Suggested labels
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Sorry @MasumRab, your pull request is larger than the review limit of 150000 diff characters
There was a problem hiding this comment.
Actionable comments posted: 22
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (14)
backend/python_nlp/gmail_service.py (4)
120-129: Success path returns 'error' field for non‑JSON outputReturning {"success": True, "error": "..."} on success is misleading. Use a warning field.
- return { - "success": True, - "output": stdout_decoded, - "error": f"Non-JSON output: {str(e)}", - } # Script success, but output not JSON + return { + "success": True, + "output": stdout_decoded, + "warning": f"Non-JSON output: {str(e)}", + }
338-456: Robustness: guard optional fields in metadataencryptions_info/attachments/label structures may be None; .get() would raise.
- analysis_metadata_payload.update( + enc_info = gmail_metadata.encryption_info or {} + thread_info = gmail_metadata.thread_info or {} + attachments = gmail_metadata.attachments or [] + analysis_metadata_payload.update( { - "importance_markers": gmail_metadata.importance_markers, - "thread_info": gmail_metadata.thread_info, - "custom_headers": gmail_metadata.custom_headers, + "importance_markers": gmail_metadata.importance_markers, + "thread_info": thread_info, + "custom_headers": gmail_metadata.custom_headers, "attachments_summary": [ - {"filename": att.get("filename"), "size": att.get("size")} - for att in gmail_metadata.attachments + {"filename": (att or {}).get("filename"), "size": (att or {}).get("size")} + for att in attachments ], } ) @@ - "isEncrypted": gmail_metadata.encryption_info.get("tls_encrypted", False) - or gmail_metadata.encryption_info.get("end_to_end_encrypted", False), - "isSigned": gmail_metadata.encryption_info.get("signed", False), + "isEncrypted": enc_info.get("tls_encrypted", False) or enc_info.get("end_to_end_encrypted", False), + "isSigned": enc_info.get("signed", False),
566-634: Potential None dereferences when reading subject/labels.subject and .label_ids may be None; .lower() and iteration would fail.
- if metadata.category == "primary": + subject = metadata.subject or "" + subject_lower = subject.lower() + label_ids = metadata.label_ids or [] + if metadata.category == "primary": @@ - if any(label in ["CATEGORY_PERSONAL"] for label in metadata.label_ids): + if any(label in ["CATEGORY_PERSONAL"] for label in label_ids): @@ - elif metadata.mailing_list or any( - word in metadata.subject.lower() for word in ["newsletter", "promotion", "offer"] + elif metadata.mailing_list or any( + word in subject_lower for word in ["newsletter", "promotion", "offer"] ): return "promotions" @@ - subject_lower = metadata.subject.lower() + subject_lower = (metadata.subject or "").lower()
460-565: Broken method: undefined attributes (self.model_trainer, self.prompt_engineer)Calling train_models_from_gmail_data will raise AttributeError. Either wire required dependencies or deprecate/remove this method.
- async def train_models_from_gmail_data( - self, training_query: str = "newer_than:30d", max_training_emails: int = 5000 - ) -> Dict[str, Any]: - self.logger.info( - f"Starting model training from Gmail data. Query: {training_query}, Max emails: {max_training_emails}" - ) - try: - ... - return { - "success": True, - "training_samples_count": len(training_samples), - "models_trained": training_results, - "training_timestamp": datetime.now().isoformat(), - } - except Exception as e: - self.logger.error(f"Model training failed: {e}", exc_info=True) - return {"success": False, "error": str(e), "training_samples_count": 0} + async def train_models_from_gmail_data(self, *args, **kwargs) -> Dict[str, Any]: + # Temporarily disabled until model_trainer/prompt_engineer are reintroduced + self.logger.warning("train_models_from_gmail_data is not available in this refactor.") + raise NotImplementedError("Training pipeline is not part of the current service.")backend/python_nlp/tests/analysis_components/test_sentiment_model.py (1)
41-44: Fix the patched module path after the import relocation.Line 8 now imports
SentimentModelfrombackend..., but the threepatch()blocks (Line 41 onward) still targetserver....unittest.mock.patchwill raiseModuleNotFoundError, so these tests will crash before their assertions run. Point the patches at the relocated module path.with patch( - "server.python_nlp.analysis_components.sentiment_model.TextBlob", + "backend.python_nlp.analysis_components.sentiment_model.TextBlob", return_value=mock_textblob_instance, ) as mock_textblob_class: ... with patch( - "server.python_nlp.analysis_components.sentiment_model.TextBlob", + "backend.python_nlp.analysis_components.sentiment_model.TextBlob", side_effect=Exception("TextBlob error"), ): ... with patch( - "server.python_nlp.analysis_components.sentiment_model.TextBlob", + "backend.python_nlp.analysis_components.sentiment_model.TextBlob", return_value=mock_textblob_instance, ) as mock_textblob_class:Also applies to: 57-59, 73-75
backend/python_backend/run_server.py (1)
44-52: Multi‑process + JSON files = corruption riskWith a file‑backed JSON “DB”, multiple workers can interleave writes and corrupt data. Run a single worker until a real DB or file locking is introduced.
- config = { + is_dev = os.getenv("NODE_ENV") == "development" + use_json_db = os.getenv("USE_JSON_DB", "1") == "1" # default: file-backed storage + config = { "host": host, "port": port, "log_level": "info", "access_log": True, - "reload": os.getenv("NODE_ENV") == "development", - "workers": 1 if os.getenv("NODE_ENV") == "development" else 4, + "reload": is_dev, + # Keep single worker when using JSON files to avoid race conditions/corruption + "workers": 1 if (use_json_db or is_dev) else 4, }backend/python_backend/gmail_routes.py (2)
100-108: Redact Gmail error payloads to avoid logging PII and oversized blobs.full_gmail_error can include message bodies and addresses. Log only minimal fields.
- log_data = { + log_data = { "message": "Gmail API operation failed during sync", "endpoint": str(req.url), "error_type": type(gmail_err).__name__, - "error_detail": error_detail_message, + "error_detail": error_detail_message[:512], # cap length "gmail_status_code": getattr(gmail_err.resp, "status", None), - "full_gmail_error": error_details_dict, + "gmail_error_summary": { + "code": (error_content.get("code") if isinstance(error_content, dict) else None), + "message_present": bool(error_detail_message), + }, }- log_data = { + log_data = { "message": "Gmail API operation failed during smart retrieval", "endpoint": str(req.url), "error_type": type(gmail_err).__name__, - "error_detail": error_detail_message, + "error_detail": error_detail_message[:512], "gmail_status_code": getattr(gmail_err.resp, "status", None), - "full_gmail_error": error_details_dict, + "gmail_error_summary": { + "code": (error_content.get("code") if isinstance(error_content, dict) else None), + "message_present": bool(error_detail_message), + }, }Also applies to: 162-170
21-27: Remove module‐level instantiation; inject GmailAIService via Depends
Module‐levelDatabaseManager()bypasses theasync initialize()inget_db(database.py lines 623–629), leading to uninitialized state and flaky tests. Inbackend/python_backend/gmail_routes.py(21–27, 30–36, 137–146), replace:-db_manager_for_gmail_service = DatabaseManager() -ai_engine_for_gmail_service = AdvancedAIEngine() -gmail_service = GmailAIService( - db_manager=db_manager_for_gmail_service, - advanced_ai_engine=ai_engine_for_gmail_service, -)with per‐request wiring, e.g.:
from fastapi import Depends from .database import get_db async def get_gmail_service( db = Depends(get_db), ): return GmailAIService(db_manager=db, advanced_ai_engine=AdvancedAIEngine()) @router.post("/sync") async def sync_gmail( req: Request, request_model: GmailSyncRequest, gmail_service: GmailAIService = Depends(get_gmail_service), background_tasks: BackgroundTasks, ): ...This ensures
initialize()is awaited and each request gets an isolated, fully initialized service.backend/python_backend/email_routes.py (1)
5-12: Guard psycopg2 import to keep tests/envs without Postgres from crashing.Importing psycopg2 at module import-time will fail in lightweight test runners. Guard it and catch a local alias.
-import psycopg2 +try: + import psycopg2 + PsycopgError = psycopg2.Error +except Exception: # psycopg2 unavailable + class PsycopgError(Exception): + pass- except psycopg2.Error as db_err: + except PsycopgError as db_err:- except psycopg2.Error as db_err: + except PsycopgError as db_err:- except psycopg2.Error as db_err: + except PsycopgError as db_err:- except psycopg2.Error as db_err: + except PsycopgError as db_err:Also applies to: 50-60, 88-98, 154-164, 200-208
launch.py (2)
334-336: Venv Python check contradicts widened support (forces 3.11.x).You allow 3.11–3.12 globally but here you treat any venv not exactly 3.11 as “incompatible” and prompt to recreate with 3.11.x. This will incorrectly flag perfectly fine 3.12 venvs and lead to avoidable churn.
Apply this diff to accept any interpreter within [PYTHON_MIN_VERSION, PYTHON_MAX_VERSION] and align prompts:
- target_major, target_minor = PYTHON_MIN_VERSION - if not (venv_major == target_major and venv_minor == target_minor): + min_major, min_minor = PYTHON_MIN_VERSION + max_major, max_minor = PYTHON_MAX_VERSION + if (venv_major, venv_minor) < (min_major, min_minor) or (venv_major, venv_minor) > (max_major, max_minor): logger.warning( - f"WARNING: The existing virtual environment at './{VENV_DIR}' was created with Python {venv_major}.{venv_minor}. " - f"This project requires Python {target_major}.{target_minor}." + f"WARNING: The existing virtual environment at './{VENV_DIR}' was created with Python {venv_major}.{venv_minor}. " + f"This project supports Python {min_major}.{min_minor}–{max_major}.{max_minor}." ) ... - "Do you want to delete and recreate the virtual environment with " - f"Python {target_major}.{target_minor}? (yes/no): " + "Do you want to delete and recreate the virtual environment with a supported Python version " + f"({min_major}.{min_minor}–{max_major}.{max_minor})? (yes/no): " )And fix the earlier corrupted‑venv prompt:
- f"It might be corrupted. Do you want to delete and recreate it with Python 3.11.x? (yes/no): " + f"It might be corrupted. Do you want to delete and recreate it with a supported Python ({PYTHON_MIN_VERSION[0]}.{PYTHON_MIN_VERSION[1]}–{PYTHON_MAX_VERSION[0]}.{PYTHON_MAX_VERSION[1]})? (yes/no): "Also applies to: 393-449
745-748: Respect --api-url when starting the frontend.VITE_API_URL always points to host:port even if --api-url is provided.
Apply this diff:
- env = os.environ.copy() - env["VITE_API_URL"] = f"http://{args.host}:{args.port}" # Backend URL for Vite + env = os.environ.copy() + env["VITE_API_URL"] = args.api_url or f"http://{args.host}:{args.port}" # Backend URL for Vitebackend/python_backend/ai_engine.py (2)
215-220: Map status to "healthy" (not "ok") to satisfy ServiceHealth model.ServiceHealth.status only allows healthy|degraded|unhealthy; returning ok will fail validation downstream.
Apply this diff:
- status = "ok" + status = "healthy" if not all_models_loaded: status = "degraded" if not nltk_available or not sklearn_available: status = "degraded" # Or "unhealthy" depending on severity
91-101: Harden category matching against non-string entries.Guard against None or non-str items from NLPEngine to avoid AttributeError on lower().
Apply this diff:
- for ai_cat_str in ai_categories: - for db_cat in all_db_categories: - name_lower = db_cat["name"].lower() - ai_cat_lower = ai_cat_str.lower() + for ai_cat_str in ai_categories: + if not isinstance(ai_cat_str, str) or not ai_cat_str: + continue + ai_cat_lower = ai_cat_str.lower() + for db_cat in all_db_categories: + name = db_cat.get("name") + if not isinstance(name, str): + continue + name_lower = name.lower() if name_lower in ai_cat_lower or ai_cat_lower in name_lower: log_msg = ( f"Matched AI category '{ai_cat_str}' to DB " - f"category '{db_cat['name']}' (ID: {db_cat['id']})" + f"category '{name}' (ID: {db_cat.get('id')})" ) logger.info(log_msg) - return db_cat["id"] + return db_cat.get("id")backend/python_backend/models.py (1)
81-98: Fix EmailResponse parsing from DB (snake_case → camelCase).Current model cannot parse DB records (message_id, category_id, etc.), causing ValidationError in email_routes.create_email. Map validation aliases to snake_case keys.
Apply this diff:
class EmailResponse(EmailBase): id: int - messageId: Optional[str] - threadId: Optional[str] + messageId: Optional[str] = Field(validation_alias="message_id") + threadId: Optional[str] = Field(validation_alias="thread_id") preview: str category: Optional[str] - categoryId: Optional[int] + categoryId: Optional[int] = Field(validation_alias="category_id") labels: List[str] confidence: int = Field(ge=0, le=100) - isImportant: bool - isStarred: bool - isUnread: bool - hasAttachments: bool - attachmentCount: int - sizeEstimate: int - aiAnalysis: Dict[str, Any] = Field(default_factory=dict) + isImportant: bool = Field(validation_alias="is_important") + isStarred: bool = Field(validation_alias="is_starred") + isUnread: bool = Field(validation_alias="is_unread") + hasAttachments: bool = Field(validation_alias="has_attachments") + attachmentCount: int = Field(validation_alias="attachment_count") + sizeEstimate: int = Field(validation_alias="size_estimate") + aiAnalysis: Dict[str, Any] = Field(default_factory=dict, validation_alias="analysis_metadata") filterResults: Dict[str, Any] = Field(default_factory=dict)
🧹 Nitpick comments (42)
client/src/index.css (2)
25-25: Token change: check component rounding consistencyChanging --radius to 0.375rem subtly alters all components using this token. Verify buttons, inputs, and menus still match the design system and Tailwind rounded-* utilities if mapped to this var.
64-68: Global 14px body font can hurt readability/accessibilityA 14px base is small; prefer 16px (1rem) or a responsive clamp. Example:
- font-size: 14px; + font-size: 1rem; /* or: clamp(0.9375rem, 0.9vw + 0.6rem, 1rem) */pyproject.toml (3)
10-10: psycopg2-binary in prod: confirm suitabilitypsycopg2-binary is convenient but often discouraged for long-lived production. Consider plain psycopg2 or document why binary is acceptable for your deployment.
14-15: Align and harden uvicorn dependency
- Avoid duplicating uvicorn in both runtime and dev groups; keep one source of truth.
- Consider uvicorn[standard] for production and align to >=0.35.0 if compatible with Python 3.11.
- "uvicorn>=0.34.3", + "uvicorn[standard]>=0.35.0",Based on learnings
Also applies to: 41-42
6-15: Prefer pinning/constraints for reproducible buildsWide >= ranges can cause unexpected CI drift. Add a constraints/lock (e.g., requirements.lock/uv pip compile) or pin critical infra deps (fastapi, uvicorn, httpx).
backend/python_nlp/gmail_service.py (3)
53-61: DB manager constructed but never initializedDatabaseManager often needs initialize() before use. Consider an explicit async initializer for the service.
class GmailAIService: @@ - self.db_manager = db_manager + self.db_manager = db_manager @@ - self.db_manager = DatabaseManager() + self.db_manager = DatabaseManager() + + async def initialize(self) -> None: + # Call this after constructing the service + try: + if hasattr(self.advanced_ai_engine, "initialize"): + self.advanced_ai_engine.initialize() # sync per engine summary + if hasattr(self.db_manager, "initialize"): + await self.db_manager.initialize() + except Exception: + self.logger.exception("Service initialization failed")
85-95: Consider subprocess timeout to avoid hangsWrap communicate() with asyncio.wait_for and surface timeout errors.
- stdout, stderr = await process.communicate() + try: + stdout, stderr = await asyncio.wait_for(process.communicate(), timeout=300) + except asyncio.TimeoutError: + process.kill() + await process.communicate() + return {"success": False, "error": "Command timed out", "return_code": None}
401-404: Optional: extract clean sender emailsenderEmail currently mirrors the raw From header; consider parsing the address.
- "senderEmail": gmail_metadata.from_address, + "senderEmail": (gmail_metadata.from_address.split("<")[-1].rstrip(">") if "<" in gmail_metadata.from_address else gmail_metadata.from_address),.pylintrc (1)
1-20: Reasonable baseline; keep an eye on disabled checksGood starting point. Consider re-enabling R0913 (too-many-arguments) later to curb API bloat as the FastAPI surface grows.
.continue/models/new-model.yaml (1)
5-11: Tooling config: keep it out of runtime packagingEnsure this Continue config isn’t included in production builds/containers and secrets are injected only via CI. Confirm anthropic client is not an app dependency.
backend/python_nlp/ai_training.py (1)
6-20: Preferdefault_factoryover manual post-init dictWe can drop the custom
__post_init__and let the dataclass build a fresh dict for each instance withfield(default_factory=dict), which is the idiomatic pattern and removes the branch entirely.-from dataclasses import dataclass +from dataclasses import dataclass, field @@ - parameters: Dict[str, Any] = None + parameters: Dict[str, Any] = field(default_factory=dict) @@ - - def __post_init__(self): - if self.parameters is None: - self.parameters = {}run.py (2)
7-7: Avoid sys.path mutation or at least prepend safelyAppending can cause shadowing/duplication. Prefer insert(0) with a guard, or remove entirely by relying on proper packaging.
-# Add the current directory to the path to ensure modules can be found -sys.path.append(str(Path(__file__).parent)) +# Add this file's directory to sys.path (prepend) only if missing +p = str(Path(__file__).resolve().parent) +if p not in sys.path: + sys.path.insert(0, p)
11-12: Avoid drift with run_server.pyrun.py omits startup initialization used in backend/python_backend/run_server.py (database init, logging). Consider deleting run.py or delegating to run_server.py for consistency.
backend/data/categories.json (1)
1-37: Static seed looks good; consider adding stable slugsNames work, but downstream matching currently relies on substring comparisons. Adding a stable slug per category can prevent ambiguous matches and ease i18n.
backend/python_backend/tests/test_ai_engine.py (1)
21-40: Tighten patching and remove redundant rebindingUse autospec to keep the method signature honest and avoid reassigning the instance attribute; the class patch already covers the instance.
- with patch.object(NLPEngine, "analyze_email") as mock_nlp_analyze: + with patch.object(NLPEngine, "analyze_email", autospec=True) as mock_nlp_analyze: # Configure the mock for NLPEngine().analyze_email mock_nlp_analyze.return_value = { @@ } engine = AdvancedAIEngine() - # Store the mock for assertions if needed directly on nlp_engine's mock - engine.nlp_engine.analyze_email = mock_nlp_analyze yield enginebackend/python_backend/run_server.py (3)
41-43: Allow HOST overrideMinor: make host configurable via HOST env for container friendliness.
- host = "0.0.0.0" + host = os.getenv("HOST", "0.0.0.0")
23-37: Confirm db instance wiringStartup creates a DatabaseManager but doesn’t store it on app.state. If routes rely on a different/global instance, this is fine; otherwise wire it: app.state.db = db.
59-59: Production extras for UvicornWhen deploying, prefer installing uvicorn[standard] for better performance (uvloop, httptools). Based on learnings.
backend/python_backend/tests/test_category_routes.py (2)
29-35: Use an async override for get_db to match the dependency’s async signature.Prevents subtle sync/async mismatches and mirrors the real dependency behavior.
- app.dependency_overrides[get_db] = lambda: mock_db_manager_cat + async def _override_db(): + return mock_db_manager_cat + app.dependency_overrides[get_db] = _override_db
17-19: Remove unused mock_performance_monitor_cat_instance.It’s never referenced.
-# Mock PerformanceMonitor -mock_performance_monitor_cat_instance = MagicMock()backend/python_backend/email_routes.py (1)
127-136: Make analysisMetadata extraction resilient to different AI result shapes.Avoid attribute errors if analyze_email returns a dict/Pydantic model.
- email_data.update( - { - "confidence": int(ai_analysis.confidence * 100), - "categoryId": ai_analysis.category_id, - "labels": ai_analysis.suggested_labels, - "analysisMetadata": ai_analysis.to_dict(), # Assuming AIAnalysisResult has to_dict, or use model_dump if Pydantic - } - ) + analysis_metadata = ( + ai_analysis.to_dict() if hasattr(ai_analysis, "to_dict") + else (ai_analysis.model_dump() if hasattr(ai_analysis, "model_dump") + else (dict(ai_analysis) if isinstance(ai_analysis, (list, tuple)) is False and hasattr(ai_analysis, "__iter__") and not isinstance(ai_analysis, str) + else ai_analysis)) + ) + email_data.update( + { + "confidence": int(getattr(ai_analysis, "confidence", 0.5) * 100) if hasattr(ai_analysis, "confidence") else int((ai_analysis.get("confidence", 0.5)) * 100) if isinstance(ai_analysis, dict) else 50, + "categoryId": getattr(ai_analysis, "category_id", None) if not isinstance(ai_analysis, dict) else ai_analysis.get("category_id"), + "labels": getattr(ai_analysis, "suggested_labels", []) if not isinstance(ai_analysis, dict) else ai_analysis.get("suggested_labels", []), + "analysisMetadata": analysis_metadata, + } + )backend/python_backend/main.py (3)
48-57: Fix CORS for wildcard subdomains.allow_origins doesn’t support patterns. Use allow_origin_regex for *.replit.dev.
app.add_middleware( CORSMiddleware, - allow_origins=[ + allow_origins=[ "http://localhost:5000", "http://localhost:5173", - "https://*.replit.dev", ], + allow_origin_regex=r"^https://.*\.replit\.dev$", allow_credentials=True, allow_methods=["*"], allow_headers=["*"], )
89-94: Silence linter: unused full_path.Rename param since it’s not used.
-@app.get("/{full_path:path}") -async def catch_all(full_path: str): +@app.get("/{full_path:path}") +async def catch_all(_: str):
135-136: Prefer uvicorn.run(app, …) or correct dotted path."main:app" may fail when run outside module root. Running the object avoids import path issues.
- uvicorn.run("main:app", host="0.0.0.0", port=port, reload=True, log_level="info") + uvicorn.run(app, host="0.0.0.0", port=port, reload=True, log_level="info")backend/python_nlp/nlp_engine.py (2)
21-21: Use a relative import for package resilience.Prevents failures when the project isn’t installed as “backend” package.
-from backend.python_nlp.text_utils import clean_text +from .text_utils import clean_text
719-721: Mark unused parameter to satisfy linters.Keep signature but underscore the arg.
- def _analyze_action_items(self, text: str) -> List[Dict[str, Any]]: + def _analyze_action_items(self, _: str) -> List[Dict[str, Any]]:launch.py (5)
1182-1187: Remove stale --gradio-ui argument (feature removed; help text is misleading).The flag is kept but does nothing and its help duplicates --api-only. Remove to avoid confusion.
Apply this diff:
- parser.add_argument( - "--gradio-ui", - action="store_true", - help="Run only the API server without the frontend", # Description kept, but --gradio-ui removed - ) - # Gradio UI argument removed + # --gradio-ui removed
694-706: Bail out early if npm is missing to avoid noisy failures.You log the absence of npm but proceed to run npm commands that will fail later.
Apply this diff:
if npm_executable_path is None: logger.error( f"The 'npm' command was not found in your system's PATH. " f"Please ensure Node.js and npm are correctly installed and that the npm installation directory is added to your PATH environment variable. " f"Attempted to find 'npm' for the client in: {client_dir}" ) - # Potentially return None here if npm is essential and not found, - # or let it proceed to fail at the npm install line, which will now be more informed. - # For now, let's log and let it try, as the original code attempts to continue. - # If we want to stop it here, uncomment the next line: - # return None + return None else: logger.info(f"Found 'npm' executable at: {npm_executable_path}") @@ try: - logger.info(f"Running frontend command: {' '.join(cmd)} in {str(ROOT_DIR / 'client')}") + logger.info(f"Running frontend command: {' '.join(cmd)} in {str(ROOT_DIR / 'client')}") process = subprocess.Popen(cmd, cwd=str(ROOT_DIR / "client"), env=env)Also applies to: 709-729, 755-759
19-20: Update usage doc to match supported stages.Docstring advertises {dev,test,staging,prod} but argparse only allows ["dev","test"].
Apply this diff:
- --stage {dev,test,staging,prod} Specify the application stage to run + --stage {dev,test} Specify the application stage to run
1269-1279: Align interpreter‑discovery comments/logs with supported range.Comment still says “Ensure 3.11.x”; log now reflects 3.11–3.12. Make the intent consistent.
Apply this diff:
- # Goal: Ensure launch.py runs with Python 3.11.x + # Goal: Ensure launch.py runs with a supported Python in [PYTHON_MIN_VERSION, PYTHON_MAX_VERSION]
690-733: Optional: Skip npm install when package.json is present but lockfile unchanged.Consider a fast path: run
npm ciwhen lockfile exists; or skip install if node_modules cache is valid. This speeds up local/dev runs.backend/python_backend/performance_monitor.py (2)
17-20: Make metrics thread‑safe and accumulate values; remove unused start_times.Current dict overwrites on repeated measurements and is not concurrency‑safe under ASGI. Use a lock and store lists of samples; drop unused start_times.
Apply this diff:
-from typing import Dict, Any +from typing import Dict, Any, List +from threading import RLock +from copy import deepcopy @@ class PerformanceMonitor: """Monitor and log performance metrics for the application.""" def __init__(self): - self.metrics: Dict[str, Any] = {} - self.start_times: Dict[str, float] = {} + self.metrics: Dict[str, List[Any]] = {} + self._lock = RLock() @@ def record_metric(self, name: str, value: Any): """Record a performance metric.""" - self.metrics[name] = value - logger.debug(f"Performance metric recorded: {name} = {value}") + with self._lock: + self.metrics.setdefault(name, []).append(value) + logger.debug(f"Performance metric recorded: {name} += {value}") @@ def get_metrics(self) -> Dict[str, Any]: """Get all recorded metrics.""" - return self.metrics.copy() + with self._lock: + return deepcopy(self.metrics) @@ def clear_metrics(self): """Clear all recorded metrics.""" - self.metrics.clear() - self.start_times.clear() + with self._lock: + self.metrics.clear()Also applies to: 21-35, 36-43
21-30: Optional: Expose summary helpers (count/avg/p95).If these metrics feed endpoints, consider computed summaries to avoid large arrays in responses.
backend/python_backend/tests/test_gmail_routes.py (1)
51-80: Assert parameter mapping for sync_gmail to catch regressions.Validate that camelCase request fields map to service kwargs.
Apply this diff:
response = client_gmail.post("/api/gmail/sync", json=request_payload) @@ mock_gmail_service_instance.sync_gmail_emails.assert_called_once() + # Verify arg mapping + args, kwargs = mock_gmail_service_instance.sync_gmail_emails.call_args + assert kwargs.get("query_filter") == "test-query" + assert kwargs.get("max_emails") == 100backend/python_backend/ai_engine.py (3)
123-125: Normalize AI categories before matching.Pre-filter to non-empty strings to reduce noise and exceptions.
Apply this diff:
- ai_categories = analysis_data.get("categories") - if db and ai_categories: + ai_categories = [ + c for c in analysis_data.get("categories", []) + if isinstance(c, str) and c.strip() + ] + if db and ai_categories:
107-107: Use logger.exception for caught exceptions.Improves traceback visibility; aligns with TRY400.
Based on static analysis hints
Apply this diff:
- logger.error(f"Error during category matching: {e}", exc_info=True) + logger.exception(f"Error during category matching: {e}")- logger.error(f"An unexpected error occurred during AI analysis: {e}", exc_info=True) + logger.exception(f"An unexpected error occurred during AI analysis: {e}")- logger.error(f"AI health check failed during direct inspection: {e}", exc_info=True) + logger.exception(f"AI health check failed during direct inspection: {e}")- except OSError as e: - err_msg = f"Error removing temp file {temp_file} " f"during cleanup: {e}" - logger.error(err_msg) + except OSError as e: + logger.exception(f"Error removing temp file during cleanup: {temp_file}")- except Exception as e: - logger.error(f"AI Engine cleanup failed: {e}") + except Exception as e: + logger.exception("AI Engine cleanup failed")- logger.error(f"Error generating fallback analysis itself: {e}", exc_info=True) + logger.exception(f"Error generating fallback analysis itself: {e}")Also applies to: 140-140, 229-229, 254-254, 259-259, 305-305
278-303: Private API usage for fallback.Calling NLPEngine._get_simple_fallback_analysis uses a private method; low risk but brittle to internal changes. Consider exposing a public simple_fallback(...) API in NLPEngine.
backend/python_backend/models.py (1)
10-10: Type-hint EmailCreate preview validator for clarity.Minor hygiene; also import FieldValidationInfo.
Apply this diff:
-from pydantic import BaseModel, Field, field_validator, ConfigDict +from pydantic import BaseModel, Field, field_validator, ConfigDict, FieldValidationInfo- def set_preview(cls, v, info): - if not v and info.data and "content" in info.data: + def set_preview(cls, v: Optional[str], info: FieldValidationInfo) -> Optional[str]: + if not v and info.data and "content" in info.data: content = info.data["content"] return ( content[:200] + "..." if len(content) > 200 else content ) return vAlso applies to: 57-67
backend/python_backend/database.py (4)
84-85: Specify UTF‑8 encoding for JSON I/O.Prevents locale-dependent behavior.
Apply this diff:
- with open(file_path, 'r') as f: + with open(file_path, 'r', encoding='utf-8') as f: data = await asyncio.to_thread(json.load, f)- with open(file_path, 'w') as f: + with open(file_path, 'w', encoding='utf-8') as f: await asyncio.to_thread(json.dump, data_to_save, f, indent=4)Also applies to: 116-117
193-195: Replace unnecessary dict comprehension.Minor cleanup flagged by Pylint R1721.
Based on static analysis hints
Apply this diff:
- update_payload = {k: v for k, v in email_data.items()} + update_payload = dict(email_data)
331-347: Sorting on ISO strings is okay; add defensive parse if mixed formats appear.If you see heterogeneous time formats, consider parsing to datetime for consistent ordering; keep current fallback for performance.
Also applies to: 453-467, 500-514
623-629: Singleton init: ensure idempotence under concurrent startup.Low risk, but two coroutines could hit _db_manager_instance is None before assignment. Consider an asyncio.Lock if startup races are observed.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
⛔ Files ignored due to path filters (9)
backend/email_cache.dbis excluded by!**/*.dbbackend/python_nlp/intent_model.pklis excluded by!**/*.pklbackend/python_nlp/sentiment_model.pklis excluded by!**/*.pklbackend/python_nlp/sync_checkpoints.dbis excluded by!**/*.dbbackend/python_nlp/topic_model.pklis excluded by!**/*.pklbackend/python_nlp/urgency_model.pklis excluded by!**/*.pklbackend/smart_filters.dbis excluded by!**/*.dbpackage-lock.jsonis excluded by!**/package-lock.jsonuv.lockis excluded by!**/*.lock
📒 Files selected for processing (77)
.continue/models/new-model.yaml(1 hunks).continue/prompts/new-prompt.yaml(1 hunks).continue/rules/new-rule.yaml(1 hunks).flake8(1 hunks).github/workflows/README.md(1 hunks).github/workflows/ci.yml(1 hunks).github/workflows/dependabot-auto-merge.yml(1 hunks).gitignore(1 hunks).openhands/microagents/repo.md(1 hunks).pylintrc(1 hunks)README.md(1 hunks)backend/__init__.py(1 hunks)backend/data/categories.json(1 hunks)backend/data/emails.json(1 hunks)backend/data/users.json(1 hunks)backend/extensions/example/example.py(3 hunks)backend/python_backend/__init__.py(1 hunks)backend/python_backend/ai_engine.py(7 hunks)backend/python_backend/database.py(13 hunks)backend/python_backend/email_routes.py(2 hunks)backend/python_backend/filter_routes.py(4 hunks)backend/python_backend/gmail_routes.py(3 hunks)backend/python_backend/main.py(2 hunks)backend/python_backend/models.py(16 hunks)backend/python_backend/performance_monitor.py(1 hunks)backend/python_backend/run_server.py(1 hunks)backend/python_backend/tests/test_ai_engine.py(2 hunks)backend/python_backend/tests/test_category_routes.py(2 hunks)backend/python_backend/tests/test_email_routes.py(9 hunks)backend/python_backend/tests/test_filter_routes.py(4 hunks)backend/python_backend/tests/test_gmail_routes.py(1 hunks)backend/python_nlp/ai_training.py(1 hunks)backend/python_nlp/gmail_service.py(2 hunks)backend/python_nlp/nlp_engine.py(7 hunks)backend/python_nlp/smart_retrieval.py(2 hunks)backend/python_nlp/tests/analysis_components/test_intent_model.py(1 hunks)backend/python_nlp/tests/analysis_components/test_sentiment_model.py(1 hunks)backend/python_nlp/tests/analysis_components/test_topic_model.py(1 hunks)backend/python_nlp/tests/analysis_components/test_urgency_model.py(1 hunks)client/package.json(0 hunks)client/src/index.css(2 hunks)codebuff.json(1 hunks)diagnosis_message.txt(1 hunks)drizzle.config.ts(0 hunks)knowledge.md(1 hunks)launch.py(17 hunks)package.json(0 hunks)postcss.config.js(0 hunks)pyproject.toml(2 hunks)run.py(1 hunks)server/README.md(0 hunks)server/activityRoutes.test.ts(0 hunks)server/activityRoutes.ts(0 hunks)server/ai-engine.ts(0 hunks)server/aiRoutes.test.ts(0 hunks)server/aiRoutes.ts(0 hunks)server/categoryRoutes.test.ts(0 hunks)server/categoryRoutes.ts(0 hunks)server/dashboardRoutes.test.ts(0 hunks)server/dashboardRoutes.ts(0 hunks)server/emailRoutes.test.ts(0 hunks)server/emailRoutes.ts(0 hunks)server/gmail-ai-service.ts(0 hunks)server/gmailRoutes.test.ts(0 hunks)server/gmailRoutes.ts(0 hunks)server/index.ts(0 hunks)server/init-db.ts(0 hunks)server/performanceRoutes.ts(0 hunks)server/python-bridge.ts(0 hunks)server/python_backend/tests/test_gmail_routes.py(0 hunks)server/routes.ts(0 hunks)server/storage.ts(0 hunks)server/vite.ts(0 hunks)setup.js(0 hunks)tailwind.config.ts(0 hunks)tsconfig.json(0 hunks)vite.config.ts(0 hunks)
💤 Files with no reviewable changes (31)
- postcss.config.js
- server/categoryRoutes.ts
- server/dashboardRoutes.ts
- package.json
- server/performanceRoutes.ts
- server/gmail-ai-service.ts
- server/gmailRoutes.ts
- server/README.md
- server/vite.ts
- server/init-db.ts
- tsconfig.json
- server/aiRoutes.ts
- server/dashboardRoutes.test.ts
- server/storage.ts
- server/activityRoutes.test.ts
- server/routes.ts
- client/package.json
- tailwind.config.ts
- server/emailRoutes.ts
- server/activityRoutes.ts
- server/python-bridge.ts
- server/categoryRoutes.test.ts
- server/ai-engine.ts
- server/python_backend/tests/test_gmail_routes.py
- server/aiRoutes.test.ts
- setup.js
- server/gmailRoutes.test.ts
- server/index.ts
- vite.config.ts
- drizzle.config.ts
- server/emailRoutes.test.ts
🧰 Additional context used
🧬 Code graph analysis (20)
backend/python_nlp/tests/analysis_components/test_topic_model.py (1)
backend/python_nlp/analysis_components/topic_model.py (1)
TopicModel(7-132)
backend/python_backend/gmail_routes.py (1)
backend/python_nlp/gmail_service.py (1)
GmailAIService(30-770)
backend/python_nlp/gmail_service.py (2)
backend/python_backend/ai_engine.py (1)
AdvancedAIEngine(55-321)backend/python_backend/database.py (1)
DatabaseManager(50-618)
backend/python_backend/run_server.py (1)
backend/python_backend/database.py (1)
DatabaseManager(50-618)
backend/python_nlp/nlp_engine.py (1)
backend/python_nlp/text_utils.py (1)
clean_text(4-16)
backend/python_backend/tests/test_gmail_routes.py (2)
backend/python_nlp/gmail_service.py (4)
sync_gmail_emails(151-207)execute_smart_retrieval(649-706)get_retrieval_strategies(708-735)get_performance_metrics(737-770)backend/python_backend/gmail_routes.py (1)
get_retrieval_strategies(201-214)
backend/extensions/example/example.py (1)
backend/python_nlp/nlp_engine.py (1)
NLPEngine(59-883)
backend/python_backend/tests/test_category_routes.py (1)
backend/python_backend/database.py (1)
get_db(623-629)
backend/python_backend/__init__.py (2)
backend/python_nlp/gmail_service.py (1)
GmailAIService(30-770)backend/python_nlp/smart_filters.py (2)
EmailFilter(17-31)SmartFilterManager(50-1530)
backend/python_backend/tests/test_ai_engine.py (3)
backend/python_backend/ai_engine.py (2)
AdvancedAIEngine(55-321)AIAnalysisResult(21-52)backend/python_nlp/nlp_engine.py (1)
NLPEngine(59-883)backend/python_backend/database.py (1)
get_all_categories(273-276)
backend/python_nlp/tests/analysis_components/test_sentiment_model.py (1)
backend/python_nlp/analysis_components/sentiment_model.py (1)
SentimentModel(18-156)
backend/python_backend/tests/test_filter_routes.py (2)
backend/python_nlp/smart_filters.py (6)
main(1533-1570)EmailFilter(17-31)get_active_filters_sorted(1405-1427)add_custom_filter(707-735)create_intelligent_filters(385-403)prune_ineffective_filters(737-853)backend/python_backend/database.py (2)
get_recent_emails(524-526)get_db(623-629)
backend/python_backend/database.py (2)
backend/python_backend/email_routes.py (1)
create_email(111-172)backend/python_backend/category_routes.py (1)
create_category(53-88)
backend/python_nlp/tests/analysis_components/test_urgency_model.py (1)
backend/python_nlp/analysis_components/urgency_model.py (1)
UrgencyModel(8-76)
backend/python_nlp/tests/analysis_components/test_intent_model.py (1)
backend/python_nlp/analysis_components/intent_model.py (1)
IntentModel(8-83)
backend/python_backend/filter_routes.py (1)
backend/python_nlp/smart_filters.py (3)
add_custom_filter(707-735)create_intelligent_filters(385-403)prune_ineffective_filters(737-853)
backend/python_backend/email_routes.py (1)
backend/python_backend/database.py (2)
search_emails_by_category(477-521)search_emails(436-474)
backend/python_backend/main.py (2)
backend/python_nlp/gmail_service.py (1)
GmailAIService(30-770)backend/python_nlp/smart_filters.py (1)
SmartFilterManager(50-1530)
backend/python_backend/tests/test_email_routes.py (3)
backend/python_backend/database.py (4)
search_emails_by_category(477-521)search_emails(436-474)create_email(185-264)get_email_by_id(266-271)backend/python_backend/ai_engine.py (2)
to_dict(38-52)analyze_email(110-141)backend/python_backend/email_routes.py (1)
create_email(111-172)
backend/python_backend/ai_engine.py (2)
backend/python_backend/database.py (1)
DatabaseManager(50-618)backend/python_nlp/nlp_engine.py (1)
NLPEngine(59-883)
🪛 Pylint (3.3.8)
backend/python_nlp/gmail_service.py
[error] 19-19: Attempted relative import beyond top-level package
(E0402)
[error] 20-20: Attempted relative import beyond top-level package
(E0402)
backend/python_nlp/ai_training.py
[refactor] 1-1: Useless option value for '--disable', 'C0330' was removed from pylint, see pylint-dev/pylint#3571.
(R0022)
backend/python_backend/tests/test_gmail_routes.py
[refactor] 1-1: Useless option value for '--disable', 'C0330' was removed from pylint, see pylint-dev/pylint#3571.
(R0022)
backend/python_backend/tests/test_category_routes.py
[refactor] 1-1: Useless option value for '--disable', 'C0330' was removed from pylint, see pylint-dev/pylint#3571.
(R0022)
backend/python_backend/performance_monitor.py
[refactor] 1-1: Useless option value for '--disable', 'C0330' was removed from pylint, see pylint-dev/pylint#3571.
(R0022)
run.py
[refactor] 1-1: Useless option value for '--disable', 'C0330' was removed from pylint, see pylint-dev/pylint#3571.
(R0022)
backend/python_backend/database.py
[refactor] 193-193: Unnecessary use of a comprehension, use dict(email_data.items()) instead.
(R1721)
backend/__init__.py
[refactor] 1-1: Useless option value for '--disable', 'C0330' was removed from pylint, see pylint-dev/pylint#3571.
(R0022)
🪛 markdownlint-cli2 (0.18.1)
.openhands/microagents/repo.md
40-40: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🪛 Ruff (0.13.1)
backend/python_nlp/nlp_engine.py
715-715: Unused method argument: text
(ARG002)
run.py
12-12: Possible binding to all interfaces
(S104)
backend/python_backend/database.py
93-93: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
backend/python_backend/main.py
90-90: Unused function argument: full_path
(ARG001)
🪛 actionlint (1.7.7)
.github/workflows/ci.yml
18-18: the runner of "actions/setup-python@v4" action is too old to run on GitHub Actions. update the action's version to fix this issue
(action)
🔇 Additional comments (24)
pyproject.toml (1)
8-10: All specified package versions are available on PyPIfastapi ≥ 0.115.12 (latest 0.117.1), httpx ≥ 0.28.1 (latest 0.28.1), and psycopg2-binary ≥ 2.9.10 (latest 2.9.10) are published and installable.
backend/python_nlp/gmail_service.py (1)
660-668: Subcommands supported
Verified thatsmart_retrieval.pydefines theexecute-strategiesandlist-strategiespositional subcommands viaadd_subparsersand handles them inmain_cli()—no changes needed.backend/python_nlp/tests/analysis_components/test_intent_model.py (1)
5-5: Import path migration — LGTMThe updated import aligns with the new package layout.
backend/python_backend/tests/test_ai_engine.py (1)
15-18: Good test isolation on AsyncMock resetResetting side effects between tests prevents cross‑test leakage.
backend/python_backend/tests/test_category_routes.py (1)
6-6: Import path update LGTM.The app import now correctly points to backend.python_backend.main.
backend/python_backend/gmail_routes.py (1)
93-99: Error detail extraction improvement LGTM.Handling dict/str shapes from Gmail errors is robust and avoids noisy logs.
Also applies to: 155-161, 166-166
backend/python_backend/email_routes.py (2)
35-41: Search + category fan-in logic LGTM.Explicit None checks avoid false negatives for category_id=0.
118-123: Confirm AIAnalysisResult attribute mapping
ai_engine.analyze_emailreturns anAIAnalysisResultinstance (not adict), so attribute access won’t fail on the return type—but verify thatAIAnalysisResultactually sets or proxies all of the fields you read onai_analysis(e.g.sentiment,categories, etc.) inemail_routes.py..github/workflows/ci.yml (2)
3-7: LGTM! Well-configured CI triggers.The workflow correctly triggers on pushes and pull requests to both
mainandscientificbranches, which aligns with the project's branching strategy.
31-43: Comprehensive testing and quality checks.The CI workflow includes all essential quality gates: testing with coverage, linting (flake8), formatting checks (black, isort), and type checking (mypy). The configuration is appropriate for the Python-first transition mentioned in the PR objectives.
.github/workflows/README.md (2)
1-15: Excellent documentation structure and coverage.The README provides comprehensive documentation for the CI workflows, including triggers, purposes, and features. The structure is clear and informative.
17-27: Documentation reference verified. Thedependabot-auto-merge.ymlfile exists at.github/workflows/dependabot-auto-merge.yml, so no changes needed.backend/python_backend/tests/test_email_routes.py (4)
10-38: Excellent refactoring with helper function.The
create_mock_emailhelper function centralizes mock email creation and ensures consistent structure across tests. This reduces code duplication and makes tests more maintainable.
136-145: Good test coverage for combined search and category filtering.The new test
test_search_emails_in_categoryproperly validates the combined search and category functionality, ensuring the correct database method is called with the right parameters.
215-237: Comprehensive error handling test with fallback.The test properly handles the case where psycopg2 might not be available in the test environment by creating a mock error class. The side effect reset ensures test isolation.
182-186: Test mocks correctly match API model.create_mock_emailintentionally returns camelCase keys to satisfyEmailResponse; ignore the database’s snake_case output here.Likely an incorrect or invalid review comment.
backend/python_backend/tests/test_filter_routes.py (3)
11-15: Update mock method name to match implementation.The test correctly updates the mock to use
get_active_filters_sortedinstead of the oldget_all_filtersmethod name, aligning with the actual smart filters implementation shown in the relevant code snippets.
62-98: Comprehensive test payload with proper validation.The test now includes a complete filter payload with all required fields (description, criteria, actions) that matches the
EmailFilterstructure from the smart filters module.
33-40: Proper test isolation with mock resets.The fixture correctly resets all mocks before each test to ensure proper test isolation and prevent state leakage between tests.
backend/python_backend/tests/test_gmail_routes.py (4)
97-116: LGTM: smart retrieval route contract and arg mapping covered.
118-141: LGTM: strategies endpoint happy-path covered; logs on error in route are appropriate.
143-157: LGTM: performance endpoint happy-path covered.
83-90: Verified google-api-python-client presence: The package is listed in both requirements.txt and requirements_versions.txt, so importing HttpError is supported.backend/python_backend/models.py (1)
351-356: Align health status vocabulary with AI engine.ServiceHealth restricts status; after fixing ai_engine to use healthy/degraded/unhealthy, this model is fine. If you intend to accept "ok", extend the regex.
| uses: actions/setup-python@v4 | ||
| with: | ||
| python-version: '3.11' |
There was a problem hiding this comment.
Update GitHub Actions to use the latest version.
The static analysis tool correctly identified that actions/setup-python@v4 is deprecated. The latest version is v5, which provides better performance and security updates.
Apply this diff to update the action:
- - name: Set up Python
- uses: actions/setup-python@v4
- with:
+ - name: Set up Python
+ uses: actions/setup-python@v5
+ with:📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| uses: actions/setup-python@v4 | |
| with: | |
| python-version: '3.11' | |
| - name: Set up Python | |
| uses: actions/setup-python@v5 | |
| with: | |
| python-version: '3.11' |
🧰 Tools
🪛 actionlint (1.7.7)
18-18: the runner of "actions/setup-python@v4" action is too old to run on GitHub Actions. update the action's version to fix this issue
(action)
🤖 Prompt for AI Agents
.github/workflows/ci.yml around lines 18 to 20: update the GitHub Action
reference from actions/setup-python@v4 to actions/setup-python@v5 to use the
latest supported release; modify the uses line accordingly so the workflow uses
actions/setup-python@v5.
| if: github.actor == 'dependabot[bot]' && github.event.pull_request.mergeable_state == 'clean' && !github.event.pull_request.draft | ||
|
|
There was a problem hiding this comment.
Workflow guard skips all Dependabot merges
github.event.pull_request.mergeable_state is usually unknown/blocked on opened and synchronize events (especially while CI is pending), so this job never runs and auto-merge never activates. Let the job execute for Dependabot PRs and rely on the explicit wait + gh checks instead.
- if: github.actor == 'dependabot[bot]' && github.event.pull_request.mergeable_state == 'clean' && !github.event.pull_request.draft
+ if: github.actor == 'dependabot[bot]' && !github.event.pull_request.draft🤖 Prompt for AI Agents
.github/workflows/dependabot-auto-merge.yml lines 15-16: the current workflow
guard includes github.event.pull_request.mergeable_state == 'clean', which
prevents the job from running for Dependabot PRs because mergeable_state is
often 'unknown'/'blocked' on opened/synchronize events; remove the
mergeable_state check so the if condition only checks the actor is
dependabot[bot] and that the PR is not a draft, allowing the job to run and rely
on the existing explicit wait and gh checks to determine mergeability.
| - Edit the conditions in the workflow `if` statement (line 15) | ||
| - Adjust the CI check name in the `wait-for-check` action (line 23) | ||
| - Modify the merge strategy (currently uses `--merge`, could use `--squash` or `--rebase`) | ||
| - Change timeout values for CI wait (currently 600 seconds) |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
🧩 Analysis chain
Update line references to match actual CI workflow.
The customization section references line numbers that may not align with the actual CI workflow structure. The wait-for-check action and specific line references should be verified against the actual workflow files.
🏁 Script executed:
#!/bin/bash
# Description: Check the actual structure of CI workflow files
find .github/workflows -name "*.yml" -exec echo "=== {} ===" \; -exec cat {} \;Length of output: 3038
Update README.md workflow customization section to use correct references
In .github/workflows/README.md, replace the hard-coded line numbers for the job-level if clause, the checkName in the wait-for-check step, merge strategy flag, and timeout settings so they align with the actual .github/workflows/dependabot-auto-merge.yml.
🤖 Prompt for AI Agents
.github/workflows/README.md around lines 50 to 53: the README references
hard-coded line numbers, a checkName, merge flag, and timeout that don't match
.github/workflows/dependabot-auto-merge.yml; update the README to reference the
actual job-level if clause location (or remove line-number references entirely),
use the exact checkName string from dependabot-auto-merge.yml for the
wait-for-check step, change the merge strategy flag to match the workflow's
configured strategy (--merge/--squash/--rebase) and set the timeout value to the
same seconds used in the workflow (replace 600 with the workflow's timeout),
ensuring all README examples mirror the live file rather than hard-coded line
numbers.
| pnpm-lock.yaml | ||
| package-lock.json | ||
| uv.lock |
There was a problem hiding this comment.
Do not ignore package manager lockfiles.
Filtering out pnpm-lock.yaml, package-lock.json, and uv.lock prevents us from committing the dependency locks that guarantee reproducible installs for both the remaining JS toolchain and the new Python uv workflow. Dropping these files from version control will cause CI/local environments to drift from each other, undermining the PR’s stated goal of stabilizing the backend and the test suite. Please keep these lockfiles tracked (remove the ignore patterns) so installs stay deterministic.
Apply this diff to fix the issue:
- pnpm-lock.yaml
- package-lock.json
- uv.lock📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| pnpm-lock.yaml | |
| package-lock.json | |
| uv.lock |
🤖 Prompt for AI Agents
In .gitignore around lines 17 to 19, the lockfiles pnpm-lock.yaml,
package-lock.json, and uv.lock are being ignored which prevents committing
dependency lockfiles; remove those three ignore entries (delete the lines or
comment them out) from .gitignore and then add/commit the corresponding
lockfiles to the repo so installs remain deterministic for CI and local
environments.
| ## CI/CD Information | ||
|
|
||
| **Note**: No GitHub Actions workflows are currently configured in this repository. The project relies on local development tools and manual testing processes. | ||
|
|
There was a problem hiding this comment.
CI note contradicts the new workflows
We now ship .github/workflows/ci.yml and Dependabot auto-merge; documenting “no workflows” is misleading for anyone auditing automation. Please align the note with reality.
-**Note**: No GitHub Actions workflows are currently configured in this repository. The project relies on local development tools and manual testing processes.
+**Note**: GitHub Actions workflows (e.g., `.github/workflows/ci.yml`, `.github/workflows/dependabot-auto-merge.yml`) handle automated testing and Dependabot merges alongside the local tooling described above.📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| ## CI/CD Information | |
| **Note**: No GitHub Actions workflows are currently configured in this repository. The project relies on local development tools and manual testing processes. | |
| ## CI/CD Information | |
| **Note**: GitHub Actions workflows (e.g., `.github/workflows/ci.yml`, `.github/workflows/dependabot-auto-merge.yml`) handle automated testing and Dependabot merges alongside the local tooling described above. |
🤖 Prompt for AI Agents
.openhands/microagents/repo.md around lines 110 to 113: the CI note incorrectly
claims there are no GitHub Actions workflows while the repo now ships
.github/workflows/ci.yml and Dependabot auto-merge; update the CI/CD section to
reflect current automation by removing or replacing the “No GitHub Actions
workflows” sentence with a brief accurate statement that the repository uses a
CI workflow at .github/workflows/ci.yml and Dependabot auto-merge for dependency
updates, optionally adding a one-line pointer to where to find workflow and
Dependabot configuration files.
| ## Architecture | ||
| - **Frontend**: React (client/) with TypeScript, TailwindCSS, and Radix UI components | ||
| - **Backend**: Express.js (server/) with Python AI/ML integration | ||
| - **AI Engine**: Python-based NLP models for sentiment, intent, topic, and urgency analysis | ||
| - **Database**: SQLite for local storage and caching | ||
|
|
||
| ## Development Setup | ||
| - Run `npm run dev` to start the Node.js server in development mode | ||
| - Run `npm run test:py` for Python tests | ||
| - Run `npm run test:ts` for TypeScript tests | ||
| - Client development server runs on separate port via Vite | ||
|
|
||
| ## Project Structure | ||
| - `/client/` - React frontend application | ||
| - `/server/` - Express.js backend and API routes | ||
| - `/server/python_nlp/` - Python NLP models and analysis components | ||
| - `/server/python_backend/` - Python backend services | ||
| - `/extensions/` - Extensible plugin system | ||
| - `/shared/` - Shared TypeScript schemas | ||
|
|
||
| ## Key Features | ||
| - Gmail integration with OAuth | ||
| - AI-powered email analysis (sentiment, intent, topic, urgency) | ||
| - Smart filtering and categorization | ||
| - Performance metrics and analytics | ||
| - Dashboard with email insights | ||
|
|
||
| ## Technology Stack | ||
| - **Frontend**: React 18, TypeScript, TailwindCSS, Radix UI, Wouter (routing), React Query | ||
| - **Backend**: Express.js, TypeScript, Python (FastAPI/Flask) | ||
| - **AI/ML**: Python NLP models, scikit-learn | ||
| - **Database**: SQLite, Drizzle ORM | ||
| - **Build Tools**: Vite, esbuild |
There was a problem hiding this comment.
Update knowledge doc to match the refactored backend
This file still describes an Express/TypeScript backend under /server, but the repository now serves APIs from the Python FastAPI modules in backend/python_backend/ (see this PR’s changes). New contributors will follow the wrong stack/setup instructions unless this section is corrected.
🤖 Prompt for AI Agents
knowledge.md lines 6-38: Update the "Architecture", "Development Setup",
"Project Structure", and "Backend" references to reflect the refactor: replace
mentions of Express.js/TypeScript backend under /server with the FastAPI Python
backend located at backend/python_backend/, update development commands to show
how to run the Python FastAPI server (e.g., python -m uvicorn ... or the repo's
specific start script) and any changed test commands, adjust project structure
paths to remove or de-emphasize /server and point to backend/python_backend and
backend/python_nlp as appropriate, and ensure the Technology Stack lists
FastAPI/Python instead of Express.js/TypeScript where applicable so new
contributors have accurate setup and run instructions.
| "httpx>=0.28.1", | ||
| "psycopg2-binary>=2.9.10", | ||
| "pydantic>=2.11.5", | ||
| "pytest-asyncio>=1.2.0", |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
Move test-only deps out of runtime
pytest-asyncio should be in the dev group, not production deps.
- "pytest-asyncio>=1.2.0",And add under [dependency-groups].dev:
+ "pytest-asyncio>=1.2.0",🤖 Prompt for AI Agents
In pyproject.toml around lines 12 to 12, pytest-asyncio is listed as a
runtime/production dependency; move it to the dev dependency group by removing
"pytest-asyncio>=1.2.0" from the main dependencies table and add it under
[dependency-groups].dev (e.g., dependency-groups.dev.dependencies =
["pytest-asyncio>=1.2.0"] or equivalent format used in this file) so test-only
deps are not installed in production.
| 1. **Clone the Repository and Switch to the `scientific` Branch:** | ||
| ```bash | ||
| ./launch.sh --gradio-ui | ||
| ``` | ||
| - On Windows: | ||
| ```bash | ||
| launch.bat --gradio-ui | ||
| # Replace <repo_url> and <repo_name> with actual values | ||
| git clone <repo_url> | ||
| cd <repo_name> | ||
| git checkout scientific | ||
| ``` |
There was a problem hiding this comment.
Checkout instructions reference the wrong branch
Telling readers to switch to a scientific branch prevents them from running the code on main, which is where this README ships. Please point to the default branch (or drop the extra checkout step).
- git checkout scientific
+ git checkout main📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| 1. **Clone the Repository and Switch to the `scientific` Branch:** | |
| ```bash | |
| ./launch.sh --gradio-ui | |
| ``` | |
| - On Windows: | |
| ```bash | |
| launch.bat --gradio-ui | |
| # Replace <repo_url> and <repo_name> with actual values | |
| git clone <repo_url> | |
| cd <repo_name> | |
| git checkout scientific | |
| ``` | |
| 1. **Clone the Repository and Switch to the `scientific` Branch:** |
🤖 Prompt for AI Agents
In README.md around lines 13 to 19, the checkout instructions wrongly tell users
to switch to a non-existent "scientific" branch; update this to reference the
repository's default branch or remove the checkout step entirely. Replace the
"git checkout scientific" step with either "git checkout main" (or the actual
default branch name) or delete the checkout command and add a brief note saying
the README assumes the default branch is already checked out; ensure the clone
example uses placeholder values and that the instructions remain accurate for
users cloning the repo.
| * **Main Application Data:** Email data, categories, and user information are stored as JSON files in the `server/python_backend/data/` directory. | ||
| * **Smart Filter Rules:** Configuration for smart filters is stored in an SQLite database file named `smart_filters.db` located in the project root. | ||
| * **Email Cache:** A local cache for fetched email content is stored in `email_cache.db` in the project root. | ||
|
|
||
| ## Database | ||
| These files will be created automatically when the application runs if they don't already exist. | ||
|
|
||
| The application uses a PostgreSQL database. | ||
| - Configure `DATABASE_URL` environment variable (see [Configuration](#configuration) and [Database Setup for Development](#database-setup-for-development)). | ||
| - Schema migrations are handled by Drizzle ORM: | ||
| - `npm run db:push`: Applies schema changes to the database. | ||
| - `npm run db:generate`: Generates new migration files if you change Drizzle schema definitions (typically in `shared/schema.ts` or similar). | ||
| (Or via `python deployment/deploy.py <env> migrate` for Dockerized environments as part of a deployment workflow). | ||
| ## Stopping the Application | ||
|
|
||
| ## Extension System | ||
| To stop both the backend and frontend servers, press `Ctrl+C` in the terminal window where `launch.sh` or `python launch.py` is running. The launcher script is designed to shut down all started processes gracefully. | ||
|
|
||
| EmailIntelligence features an extension system for adding custom functionality. | ||
| - Manage extensions using `launch.py` (e.g., `--list-extensions`, `--install-extension`). | ||
| - For developing extensions and more details, see the [Extensions Guide](docs/extensions_guide.md) and the [Environment Management Guide](docs/env_management.md#extension-system). | ||
| ## Development Notes | ||
|
|
||
| ## Debugging Hangs | ||
|
|
||
| ### Debugging Pytest Hangs | ||
| * Use `pytest -vvv` or `pytest --capture=no`. | ||
| * Isolate tests: `pytest path/to/test_file.py::test_name`. | ||
| * Use `breakpoint()` or `import pdb; pdb.set_trace()`. | ||
| * Check for timeouts logged by `deployment/run_tests.py`. | ||
|
|
||
| ### Debugging NPM/Build Hangs | ||
| * Examine verbose output (e.g., Vite's `--debug`, esbuild's `--log-level=verbose`). | ||
| * Use `node --inspect-brk your_script.js`. | ||
| * Check resource limits (memory, CPU). | ||
| * Try cleaning cache/modules: `npm cache clean --force`, remove `node_modules` & `package-lock.json`, then `npm install`. | ||
|
|
||
| ### General Debugging on Linux | ||
| * Monitor resources: `top`, `htop`, `vmstat`. | ||
| * Trace system calls: `strace -p <PID>`. | ||
| * Check kernel messages: `dmesg -T`. | ||
| * Ensure adequate disk space. | ||
|
|
||
| For more detailed guides and specific component documentation, please refer to the [Documentation](#documentation) section. | ||
|
|
||
| ## Known Vulnerabilities | ||
|
|
||
| - Four moderate severity vulnerabilities related to `esbuild` persist as of the last audit. | ||
| - These vulnerabilities are due to `drizzle-kit` (and its transitive dependencies like `@esbuild-kit/core-utils`) requiring older, vulnerable versions of `esbuild`. Specifically, `drizzle-kit`'s dependency tree pulls in `esbuild@0.18.20` and `esbuild@0.19.12`, both of which are vulnerable (<=0.24.2). | ||
| - Attempts to override these nested `esbuild` versions to a non-vulnerable version (e.g., `^0.25.5`, which is used by other parts of this project like Vite) using npm's `overrides` feature in `package.json` were made. However, these overrides were not fully effective, with `npm list` indicating version incompatibilities for the overridden packages. `npm audit` continued to report the vulnerabilities. | ||
| - These `esbuild` vulnerabilities cannot be fully remediated without an update to `drizzle-kit` itself that addresses its `esbuild` dependency requirements, particularly for the deprecated `@esbuild-kit/*` packages. | ||
| - On a related note, `vite` and `@vitejs/plugin-react` were successfully updated to their latest compatible versions (`vite@6.3.5` and `@vitejs/plugin-react@4.5.2` respectively) during the audit process to address other potential issues and ensure compatibility. | ||
|
|
||
| ## Testing | ||
|
|
||
| This project includes unit tests for the Python backend components, primarily focusing on the NLP functionalities. | ||
|
|
||
| ### Python Test Setup | ||
|
|
||
| 1. **Install Python Development Dependencies:** | ||
| Ensure you have Python installed (as per `pyproject.toml`, e.g., Python 3.11+). The development dependencies, including `pytest` and libraries like `textblob` and `nltk`, are listed in `pyproject.toml` under the `[project.group.dev.dependencies]` section. Install them using pip: | ||
| ```bash | ||
| pip install .[dev] | ||
| ``` | ||
| (If you encounter issues with this, ensure your pip is up to date (`pip install --upgrade pip`) as support for `project.group` is relatively new. Alternatively, you might need to manually install the packages listed in the `dev` group.) | ||
|
|
||
| 2. **NLTK Data (for NLP tests):** | ||
| The NLP tests require certain NLTK data packages. Download the 'punkt' tokenizer data: | ||
| ```bash | ||
| python -m nltk.downloader punkt | ||
| ``` | ||
|
|
||
| ### Running Python Tests | ||
|
|
||
| To run all available Python unit tests, use the following npm script: | ||
|
|
||
| ```bash | ||
| npm test | ||
| * The Python backend is located in `server/python_backend/`. | ||
| * The NLP processing logic is in `server/python_nlp/`. | ||
| * The frontend client (React/Vite) is in `client/`. |
There was a problem hiding this comment.
Paths still refer to the removed server/ hierarchy
The backend now lives under backend/.... These outdated paths steer users into dead directories and break setup. Update the docs to match the current layout.
-* **Main Application Data:** Email data, categories, and user information are stored as JSON files in the `server/python_backend/data/` directory.
+* **Main Application Data:** Email data, categories, and user information are stored as JSON files in the `backend/data/` directory.
-* The Python backend is located in `server/python_backend/`.
-* The NLP processing logic is in `server/python_nlp/`.
+* The Python backend is located in `backend/python_backend/`.
+* The NLP processing logic is in `backend/python_nlp/`.📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| * **Main Application Data:** Email data, categories, and user information are stored as JSON files in the `server/python_backend/data/` directory. | |
| * **Smart Filter Rules:** Configuration for smart filters is stored in an SQLite database file named `smart_filters.db` located in the project root. | |
| * **Email Cache:** A local cache for fetched email content is stored in `email_cache.db` in the project root. | |
| ## Database | |
| These files will be created automatically when the application runs if they don't already exist. | |
| The application uses a PostgreSQL database. | |
| - Configure `DATABASE_URL` environment variable (see [Configuration](#configuration) and [Database Setup for Development](#database-setup-for-development)). | |
| - Schema migrations are handled by Drizzle ORM: | |
| - `npm run db:push`: Applies schema changes to the database. | |
| - `npm run db:generate`: Generates new migration files if you change Drizzle schema definitions (typically in `shared/schema.ts` or similar). | |
| (Or via `python deployment/deploy.py <env> migrate` for Dockerized environments as part of a deployment workflow). | |
| ## Stopping the Application | |
| ## Extension System | |
| To stop both the backend and frontend servers, press `Ctrl+C` in the terminal window where `launch.sh` or `python launch.py` is running. The launcher script is designed to shut down all started processes gracefully. | |
| EmailIntelligence features an extension system for adding custom functionality. | |
| - Manage extensions using `launch.py` (e.g., `--list-extensions`, `--install-extension`). | |
| - For developing extensions and more details, see the [Extensions Guide](docs/extensions_guide.md) and the [Environment Management Guide](docs/env_management.md#extension-system). | |
| ## Development Notes | |
| ## Debugging Hangs | |
| ### Debugging Pytest Hangs | |
| * Use `pytest -vvv` or `pytest --capture=no`. | |
| * Isolate tests: `pytest path/to/test_file.py::test_name`. | |
| * Use `breakpoint()` or `import pdb; pdb.set_trace()`. | |
| * Check for timeouts logged by `deployment/run_tests.py`. | |
| ### Debugging NPM/Build Hangs | |
| * Examine verbose output (e.g., Vite's `--debug`, esbuild's `--log-level=verbose`). | |
| * Use `node --inspect-brk your_script.js`. | |
| * Check resource limits (memory, CPU). | |
| * Try cleaning cache/modules: `npm cache clean --force`, remove `node_modules` & `package-lock.json`, then `npm install`. | |
| ### General Debugging on Linux | |
| * Monitor resources: `top`, `htop`, `vmstat`. | |
| * Trace system calls: `strace -p <PID>`. | |
| * Check kernel messages: `dmesg -T`. | |
| * Ensure adequate disk space. | |
| For more detailed guides and specific component documentation, please refer to the [Documentation](#documentation) section. | |
| ## Known Vulnerabilities | |
| - Four moderate severity vulnerabilities related to `esbuild` persist as of the last audit. | |
| - These vulnerabilities are due to `drizzle-kit` (and its transitive dependencies like `@esbuild-kit/core-utils`) requiring older, vulnerable versions of `esbuild`. Specifically, `drizzle-kit`'s dependency tree pulls in `esbuild@0.18.20` and `esbuild@0.19.12`, both of which are vulnerable (<=0.24.2). | |
| - Attempts to override these nested `esbuild` versions to a non-vulnerable version (e.g., `^0.25.5`, which is used by other parts of this project like Vite) using npm's `overrides` feature in `package.json` were made. However, these overrides were not fully effective, with `npm list` indicating version incompatibilities for the overridden packages. `npm audit` continued to report the vulnerabilities. | |
| - These `esbuild` vulnerabilities cannot be fully remediated without an update to `drizzle-kit` itself that addresses its `esbuild` dependency requirements, particularly for the deprecated `@esbuild-kit/*` packages. | |
| - On a related note, `vite` and `@vitejs/plugin-react` were successfully updated to their latest compatible versions (`vite@6.3.5` and `@vitejs/plugin-react@4.5.2` respectively) during the audit process to address other potential issues and ensure compatibility. | |
| ## Testing | |
| This project includes unit tests for the Python backend components, primarily focusing on the NLP functionalities. | |
| ### Python Test Setup | |
| 1. **Install Python Development Dependencies:** | |
| Ensure you have Python installed (as per `pyproject.toml`, e.g., Python 3.11+). The development dependencies, including `pytest` and libraries like `textblob` and `nltk`, are listed in `pyproject.toml` under the `[project.group.dev.dependencies]` section. Install them using pip: | |
| ```bash | |
| pip install .[dev] | |
| ``` | |
| (If you encounter issues with this, ensure your pip is up to date (`pip install --upgrade pip`) as support for `project.group` is relatively new. Alternatively, you might need to manually install the packages listed in the `dev` group.) | |
| 2. **NLTK Data (for NLP tests):** | |
| The NLP tests require certain NLTK data packages. Download the 'punkt' tokenizer data: | |
| ```bash | |
| python -m nltk.downloader punkt | |
| ``` | |
| ### Running Python Tests | |
| To run all available Python unit tests, use the following npm script: | |
| ```bash | |
| npm test | |
| * The Python backend is located in `server/python_backend/`. | |
| * The NLP processing logic is in `server/python_nlp/`. | |
| * The frontend client (React/Vite) is in `client/`. | |
| * **Main Application Data:** Email data, categories, and user information are stored as JSON files in the `backend/data/` directory. | |
| * **Smart Filter Rules:** Configuration for smart filters is stored in an SQLite database file named `smart_filters.db` located in the project root. | |
| * **Email Cache:** A local cache for fetched email content is stored in `email_cache.db` in the project root. | |
| These files will be created automatically when the application runs if they don't already exist. | |
| ## Stopping the Application | |
| To stop both the backend and frontend servers, press `Ctrl+C` in the terminal window where `launch.sh` or `python launch.py` is running. The launcher script is designed to shut down all started processes gracefully. | |
| ## Development Notes | |
| * The Python backend is located in `backend/python_backend/`. | |
| * The NLP processing logic is in `backend/python_nlp/`. | |
| * The frontend client (React/Vite) is in `client/`. |
🤖 Prompt for AI Agents
In README.md around lines 49 to 63, the documentation still references the
removed server/ hierarchy (e.g., server/python_backend/, server/python_nlp/)
which is now backend/; update all paths and examples in this section to the
current repo layout (replace server/python_backend/ → backend/python_backend/ or
the exact new path if different, server/python_nlp/ → backend/python_nlp/ or new
path, and adjust any file locations mentioned such as data/ or
smart_filters.db/email_cache.db to their current locations), ensure the
statements about automatic file creation and stopping the application still
reflect the launcher script location (launch.sh / python launch.py) in the new
structure, and verify the frontend path (client/) is correct — make the minimal
path edits so links and instructions point to existing directories.
| port = int(os.getenv("PORT", 8000)) | ||
| # We use a string to specify the app location to allow for reloading. | ||
| uvicorn.run("backend.python_backend.main:app", host="0.0.0.0", port=port, reload=True) No newline at end of file |
There was a problem hiding this comment.
Don’t bind to 0.0.0.0 with reload enabled by default
Binding to all interfaces (S104) plus unconditional reload=True is risky; gate both by env (dev vs prod) and allow HOST override.
if __name__ == "__main__":
- port = int(os.getenv("PORT", 8000))
- # We use a string to specify the app location to allow for reloading.
- uvicorn.run("backend.python_backend.main:app", host="0.0.0.0", port=port, reload=True)
+ port = int(os.getenv("PORT", 8000))
+ env = os.getenv("NODE_ENV", "development")
+ host = os.getenv("HOST", "127.0.0.1" if env == "development" else "0.0.0.0")
+ reload = env == "development"
+ # Use string app path to support reload
+ uvicorn.run("backend.python_backend.main:app", host=host, port=port, reload=reload)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| port = int(os.getenv("PORT", 8000)) | |
| # We use a string to specify the app location to allow for reloading. | |
| uvicorn.run("backend.python_backend.main:app", host="0.0.0.0", port=port, reload=True) | |
| if __name__ == "__main__": | |
| port = int(os.getenv("PORT", 8000)) | |
| env = os.getenv("NODE_ENV", "development") | |
| host = os.getenv( | |
| "HOST", | |
| "127.0.0.1" if env == "development" else "0.0.0.0" | |
| ) | |
| reload = env == "development" | |
| # Use string app path to support reload | |
| uvicorn.run( | |
| "backend.python_backend.main:app", | |
| host=host, | |
| port=port, | |
| reload=reload | |
| ) |
🧰 Tools
🪛 Ruff (0.13.1)
12-12: Possible binding to all interfaces
(S104)
🤖 Prompt for AI Agents
In run.py around lines 10-12, currently the server always binds to 0.0.0.0 and
sets reload=True; change this to read HOST and a dev/reload flag from
environment so you don't unconditionally bind to all interfaces or enable
auto-reload in production. Use os.getenv("HOST", "127.0.0.1") to allow override
of the bind address, and gate reload via something like os.getenv("ENV",
"production") == "development" or a dedicated RELOAD env var; pass the evaluated
host and reload values into uvicorn.run instead of hardcoding "0.0.0.0" and
True. Ensure port still falls back to 8000.
Summary by CodeRabbit
New Features
Bug Fixes
Documentation
Chores
Style