Skip to content

feat: Refactor NLP pipeline and add Action Item Extraction#29

Merged
MasumRab merged 1 commit intomainfrom
feature/email-ai-enhancements-p1
Jun 14, 2025
Merged

feat: Refactor NLP pipeline and add Action Item Extraction#29
MasumRab merged 1 commit intomainfrom
feature/email-ai-enhancements-p1

Conversation

@MasumRab
Copy link
Copy Markdown
Owner

@MasumRab MasumRab commented Jun 14, 2025

This commit introduces significant enhancements to the Gmail AI Email Management system.

Refactoring:

  • Centralized NLP analysis logic within AdvancedAIEngine and NLPEngine, removing redundancy from GmailAIService and DataCollectionStrategy.
  • Streamlined fallback mechanisms in the NLP pipeline for more robust error handling.
  • Standardized basic text preprocessing with a new text_utils.py module.
  • Simplified complex functions, particularly GmailAIService.sync_gmail_emails, by breaking them into smaller, more focused methods.
  • Improved error handling and logging across the NLP components for better traceability.

New Feature: Action Item Extraction

  • Added a new ActionItemExtractor module (server/python_nlp/action_item_extractor.py) using rule-based methods and optional NLTK POS tagging to identify and extract action items from email text.
  • Integrated action item extraction into the NLPEngine and AdvancedAIEngine.
  • Action items are now included in the analysisMetadata stored for emails.
  • Exposed this functionality via a new API endpoint: POST /api/actions/extract-from-text.

Unit Tests:

  • Added comprehensive unit tests for the ActionItemExtractor.
  • Included integration tests for NLPEngine, AdvancedAIEngine, and GmailAIService to ensure correct handling of action items.
  • Added API tests for the new /api/actions/extract-from-text endpoint.

Additional Feature Ideas Proposed:

  1. Automated Email Summarization: To provide concise summaries of email content.
  2. Smart Reply Suggestions: To offer context-aware reply options based on NLP analysis.

Summary by Sourcery

Refactor the NLP pipeline by centralizing analysis and fallback logic, introduce a rule-based and POS-augmented action item extraction feature across engines and the API, standardize text preprocessing, enhance logging and error handling, and add comprehensive tests.

New Features:

  • Introduce ActionItemExtractor module to identify and extract action items from email text.
  • Add POST /api/actions/extract-from-text API endpoint for on-demand action item extraction.

Enhancements:

  • Refactor GmailAIService to delegate AI analysis to AdvancedAIEngine, splitting syncing logic into helper methods.
  • Centralize NLP analysis and fallback logic in NLPEngine and AdvancedAIEngine with improved logging and error handling.
  • Standardize text preprocessing via the new text_utils.clean_text utility and update training pipeline to use it.
  • Integrate extracted action items into analysisMetadata and annotate emails using external analysis results in DataCollectionStrategy.

Tests:

  • Add unit tests for ActionItemExtractor and integration tests for NLP engines and Gmail service handling action items.
  • Add API tests for the new action extraction endpoint to verify request validation and error handling.

Summary by CodeRabbit

  • New Features

    • Introduced extraction of action items from email text, accessible via a new API endpoint for structured action item retrieval.
    • Integrated action item extraction into AI and NLP analysis flows, with results now included in analysis responses and Gmail service metadata.
    • Added utility for text cleaning and normalization.
  • Bug Fixes

    • Improved error handling and logging throughout AI analysis and fallback processes to ensure robust responses.
  • Refactor

    • Modularized AI analysis pipeline for clearer separation of concerns and easier integration of external engines.
  • Tests

    • Added comprehensive unit and integration tests for action item extraction, API endpoint behavior, Gmail service integration, and NLP engine outputs.

This commit introduces significant enhancements to the Gmail AI Email Management system.

Refactoring:
- Centralized NLP analysis logic within AdvancedAIEngine and NLPEngine, removing redundancy from GmailAIService and DataCollectionStrategy.
- Streamlined fallback mechanisms in the NLP pipeline for more robust error handling.
- Standardized basic text preprocessing with a new `text_utils.py` module.
- Simplified complex functions, particularly `GmailAIService.sync_gmail_emails`, by breaking them into smaller, more focused methods.
- Improved error handling and logging across the NLP components for better traceability.

New Feature: Action Item Extraction
- Added a new `ActionItemExtractor` module (`server/python_nlp/action_item_extractor.py`) using rule-based methods and optional NLTK POS tagging to identify and extract action items from email text.
- Integrated action item extraction into the `NLPEngine` and `AdvancedAIEngine`.
- Action items are now included in the `analysisMetadata` stored for emails.
- Exposed this functionality via a new API endpoint: `POST /api/actions/extract-from-text`.

Unit Tests:
- Added comprehensive unit tests for the `ActionItemExtractor`.
- Included integration tests for `NLPEngine`, `AdvancedAIEngine`, and `GmailAIService` to ensure correct handling of action items.
- Added API tests for the new `/api/actions/extract-from-text` endpoint.

Additional Feature Ideas Proposed:
1.  Automated Email Summarization: To provide concise summaries of email content.
2.  Smart Reply Suggestions: To offer context-aware reply options based on NLP analysis.
@sourcery-ai
Copy link
Copy Markdown
Contributor

sourcery-ai bot commented Jun 14, 2025

Reviewer's Guide

This PR refactors the core NLP pipeline by centralizing analysis logic in AdvancedAIEngine and NLPEngine, breaking down GmailAIService.sync_gmail_emails into smaller helpers, standardizing text preprocessing via text_utils, and improving error handling and logging; it also introduces a rule-based ActionItemExtractor integrated end-to-end (NLPEngine, AdvancedAIEngine, GmailAIService, and DB payloads) and exposed through a new FastAPI endpoint, with comprehensive unit and integration tests ensuring correct action-item extraction and fallback behaviors.

Sequence Diagram for Refactored Email Sync and AI Analysis

sequenceDiagram
    participant C as Caller
    participant GAS as GmailAIService
    participant GDC as GmailDataCollector
    participant GME as GmailMetadataExtractor
    participant AAE as AdvancedAIEngine

    C->>GAS: sync_gmail_emails(query, max, include_ai)
    activate GAS
    GAS->>GAS: _fetch_emails_from_gmail(query, max)
    GAS->>GDC: collect_emails_incremental(query, max)
    activate GDC
    GDC-->>GAS: email_batch
    deactivate GDC
    GAS->>GAS: _process_and_analyze_batch(email_batch, include_ai)
    loop for each email in batch
        GAS->>GME: extract_complete_metadata(gmail_msg)
        activate GME
        GME-->>GAS: gmail_metadata
        deactivate GME
        alt include_ai_analysis is true
            GAS->>GAS: _perform_ai_analysis(email_data)
            GAS->>AAE: analyze_email(subject, content)
            activate AAE
            AAE-->>GAS: ai_analysis_result (with action_items)
            deactivate AAE
        end
        GAS->>GAS: _convert_to_db_format(gmail_metadata, ai_analysis_result)
        GAS-->>GAS: db_email (metadata includes action_items)
    end
    GAS-->>C: sync_result (with processed_emails)
    deactivate GAS
Loading

Sequence Diagram for Action Item Extraction API (/api/actions/extract-from-text)

sequenceDiagram
    actor Client as APIClient
    participant FastAPI as FastAPIApp
    participant AAE as AdvancedAIEngine
    participant NLP_Script as NLPEngine (script)

    Client->>FastAPI: POST /api/actions/extract-from-text (subject, content)
    activate FastAPI
    FastAPI->>AAE: analyze_email(subject, content)
    activate AAE
    AAE->>NLP_Script: execute_async_command([... --analyze-email ...])
    activate NLP_Script
    NLP_Script-->>AAE: analysis_json_str (includes action_items)
    deactivate NLP_Script
    AAE->>AAE: Parse JSON, create AIAnalysisResult
    AAE-->>FastAPI: ai_analysis_result (with action_items)
    deactivate AAE
    FastAPI-->>Client: List[ActionItem]
    deactivate FastAPI
Loading

Sequence Diagram for NLPEngine Action Item Analysis

sequenceDiagram
    participant NLPE as NLPEngine
    participant AIE as ActionItemExtractor

    NLPE->>NLPE: analyze_email(subject, content)
    activate NLPE
    NLPE->>NLPE: _preprocess_text(full_text) // using text_utils.clean_text
    NLPE->>NLPE: ... other analysis steps ...
    NLPE->>NLPE: _analyze_action_items(full_text)
    NLPE->>AIE: extract_actions(full_text)
    activate AIE
    AIE-->>NLPE: extracted_actions_list
    deactivate AIE
    NLPE->>NLPE: _build_final_analysis_response(..., action_items=extracted_actions_list)
    NLPE-->>NLPE: final_analysis_result (with action_items)
    deactivate NLPE
Loading

Entity Relationship Diagram for Action Items and Analysis Data

erDiagram
    AIAnalysisResult {
        string topic
        string sentiment
        string intent
        string urgency
        float confidence
        string_array categories
        string_array keywords
        string reasoning
        string_array suggested_labels
        string_array risk_flags
        string category_id
        ActionItem_array action_items "New: list of action items"
    }

    ActionItem {
        string action_phrase "New"
        string verb "New, Optional"
        string object "New, Optional"
        string raw_due_date_text "New, Optional"
        string context "New"
    }

    EmailDBRecord {
        string messageId
        string analysisMetadata "JSON string, now includes action_items"
        string other_fields
    }

    AIAnalysisResult ||--o{ ActionItem : "has"
    EmailDBRecord ||--|| AIAnalysisResult : "embeds (via analysisMetadata JSON)"
Loading

Class Diagram: GmailAIService Changes

classDiagram
    class GmailAIService {
        +advanced_ai_engine: AdvancedAIEngine
        -model_trainer: ModelTrainer // Removed / Not directly used
        -prompt_engineer: PromptEngineer // Removed / Not directly used
        +__init__(rate_config: Optional[RateLimitConfig], advanced_ai_engine: Optional[Any])
        +sync_gmail_emails(query_filter: str, max_emails: int, include_ai_analysis: bool) List[Dict]
        #_fetch_emails_from_gmail(query_filter: str, max_emails: int) Optional[EmailBatch]
        #_process_and_analyze_batch(email_batch: EmailBatch, include_ai_analysis: bool) List[Dict]
        #_perform_ai_analysis(email_data: Dict) Optional[Dict]
        #_convert_to_db_format(gmail_metadata: GmailMessage, ai_analysis_result: Optional[Dict]) Dict
        #_get_basic_fallback_analysis_structure(error_message: str) Dict
    }
    GmailAIService ..> AdvancedAIEngine : uses
Loading

Class Diagram: NLPEngine Changes

classDiagram
    class NLPEngine {
        +action_item_extractor: ActionItemExtractor
        +__init__()
        #_preprocess_text(text: str) str
        #_get_fallback_analysis(error_msg: str) Dict
        #_get_simple_fallback_analysis(subject: str, content: str) Dict
        #_analyze_action_items(text: str) List[Dict]
        +analyze_email(subject: str, content: str) Dict
        #_build_final_analysis_response(sentiment_analysis, topic_analysis, intent_analysis, urgency_analysis, categories, keywords, risk_analysis_flags, action_items: List[Dict]) Dict
    }
    NLPEngine ..> ActionItemExtractor : uses
    NLPEngine ..> text_utils : uses clean_text()
Loading

Class Diagram: AdvancedAIEngine and AIAnalysisResult Changes

classDiagram
    class AdvancedAIEngine {
        +fallback_nlp_engine: NLPEngine
        +async initialize()
        +async analyze_email(subject: str, content: str) AIAnalysisResult
        #_get_fallback_analysis(subject: str, content: str, error_context: Optional[str]) AIAnalysisResult
    }
    class AIAnalysisResult {
        +action_items: List[Dict]
        +topic: str
        +sentiment: str
        +intent: str
        +urgency: str
        +confidence: float
        +categories: List[str]
        +keywords: List[str]
        +reasoning: str
        +suggested_labels: List[str]
        +risk_flags: List[str]
        +category_id: str
        +to_dict() Dict
    }
    AdvancedAIEngine ..> NLPEngine : uses (for fallback_nlp_engine)
    AdvancedAIEngine ..> AIAnalysisResult : creates
Loading

Class Diagram: New ActionItemExtractor and Pydantic Models

classDiagram
    class ActionItemExtractor {
        +action_keywords_regex: Pattern
        +due_date_regex: Pattern
        +sentence_splitter_regex: Pattern
        +__init__()
        #_extract_verb_object_with_nltk(text: str) Tuple[Optional[str], Optional[str]]
        +extract_actions(text: str) List[Dict[str, Any]]
    }
    class ActionExtractionRequest {
        <<PydanticModel>>
        +subject: Optional[str]
        +content: str
    }
    class ActionItem {
        <<PydanticModel>>
        +action_phrase: str
        +verb: Optional[str]
        +object: Optional[str]
        +raw_due_date_text: Optional[str]
        +context: str
    }
    %% FastAPI endpoint uses ActionExtractionRequest and ActionItem
    %% NLPEngine uses ActionItemExtractor
Loading

Class Diagram: DataCollectionStrategy Changes

classDiagram
    class DataCollectionStrategy {
        +annotate_email(email: EmailSample, annotator_id: str, external_analysis_results: Optional[Dict]) AnnotationSchema
        #_predict_topic(text: str, is_fallback: bool) str
        #_predict_sentiment(text: str, is_fallback: bool) str
        #_predict_intent(text: str, is_fallback: bool) str
        #_predict_urgency(text: str, is_fallback: bool) str
    }
Loading

Class Diagram: New text_utils Module

classDiagram
  class text_utils {
    <<Module/Utility>>
    +static clean_text(text: str) str
  }
Loading

Class Diagram: ModelTrainer Changes

classDiagram
    class ModelTrainer {
        +feature_extractor: FeatureExtractor
        +prepare_training_data(samples: List[Dict], target_field: str) Tuple[List, List]
    }
    ModelTrainer ..> text_utils : uses clean_text()
Loading

File-Level Changes

Change Details Files
Refactor and modularize NLP pipeline
  • Centralized AI analysis in AdvancedAIEngine/NLPEngine and removed duplicate logic from GmailAIService/DataCollectionStrategy
  • Decomposed sync_gmail_emails into _fetch_emails_from_gmail, _process_and_analyze_batch and refined error paths
  • Replaced inline text cleaning with clean_text in text_utils and updated ai_training preprocessing
  • Enhanced logging calls and standardized fallback structures across async helpers
server/python_nlp/gmail_service.py
server/python_nlp/data_strategy.py
server/python_nlp/nlp_engine.py
server/python_nlp/ai_training.py
server/python_nlp/text_utils.py
Implement Action Item Extraction end-to-end
  • Added ActionItemExtractor module with regex and optional NLTK tagging
  • Integrated extraction into NLPEngine.analyze_email, AdvancedAIEngine.analyze_email and sync flow
  • Extended AIAnalysisResult and to_dict/DB formatter to carry action_items in analysisMetadata
  • Exposed extraction via POST /api/actions/extract-from-text with Pydantic request/response models
server/python_nlp/action_item_extractor.py
server/python_nlp/nlp_engine.py
server/python_backend/ai_engine.py
server/python_backend/main.py
server/python_nlp/gmail_service.py
Expand test coverage for new and refactored features
  • Added unit tests for ActionItemExtractor edge cases and its NLTK fallback
  • Created integration tests for GmailAIService and NLPEngine/AdvancedAIEngine to verify action_items in analysis
  • Introduced API tests for the new extraction endpoint handling success, validation, and error scenarios
tests/test_action_item_extractor.py
tests/test_gmail_service_integration.py
tests/test_nlp_engine_integration.py
tests/test_api_actions.py

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Jun 14, 2025

Warning

Rate limit exceeded

@MasumRab has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 2 minutes and 46 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between d26f74f and 30a6d15.

📒 Files selected for processing (12)
  • server/python_backend/ai_engine.py (5 hunks)
  • server/python_backend/main.py (2 hunks)
  • server/python_nlp/action_item_extractor.py (1 hunks)
  • server/python_nlp/ai_training.py (2 hunks)
  • server/python_nlp/data_strategy.py (1 hunks)
  • server/python_nlp/gmail_service.py (5 hunks)
  • server/python_nlp/nlp_engine.py (7 hunks)
  • server/python_nlp/text_utils.py (1 hunks)
  • tests/test_action_item_extractor.py (1 hunks)
  • tests/test_api_actions.py (1 hunks)
  • tests/test_gmail_service_integration.py (1 hunks)
  • tests/test_nlp_engine_integration.py (1 hunks)

Walkthrough

This update introduces action item extraction to the email AI analysis pipeline. It adds a new action item extractor module, integrates action item extraction into both the main NLP and advanced AI engines, and exposes a FastAPI endpoint for extracting action items from text. The Gmail AI service is refactored to use an external AI engine, and fallback mechanisms are improved. Comprehensive unit and integration tests are included for action extraction and its API.

Changes

File(s) Change Summary
server/python_backend/ai_engine.py Extended AIAnalysisResult with action_items; improved error handling, logging, and fallback logic in AdvancedAIEngine.
server/python_backend/main.py Added Pydantic models and a new POST endpoint /api/actions/extract-from-text for extracting action items from email text.
server/python_nlp/action_item_extractor.py New module implementing ActionItemExtractor for extracting action items using regex and optional NLTK POS tagging.
server/python_nlp/ai_training.py Updated ModelTrainer.prepare_training_data to clean text samples using a shared utility before feature extraction.
server/python_nlp/data_strategy.py Enhanced annotate_email to support external analysis results and improved fallback logic and defaults.
server/python_nlp/gmail_service.py Refactored GmailAIService to use an external AdvancedAIEngine, modularized email processing, and improved AI analysis integration and error handling.
server/python_nlp/nlp_engine.py Integrated action item extraction into NLPEngine, updated analysis and fallback methods, and improved logging.
server/python_nlp/text_utils.py Added clean_text utility function for consistent text normalization and cleaning.
tests/test_action_item_extractor.py New unit tests for ActionItemExtractor, covering extraction logic, edge cases, and NLTK/no-NLTK scenarios.
tests/test_api_actions.py New test suite for the /api/actions/extract-from-text API endpoint, covering success, validation, and error cases.
tests/test_gmail_service_integration.py New integration tests for GmailAIService, focusing on AI analysis and metadata serialization with action items.
tests/test_nlp_engine_integration.py New integration tests for NLPEngine and AdvancedAIEngine, verifying action item extraction and fallback behaviors.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant FastAPI
    participant AIEngine
    participant NLPExtractor

    Client->>FastAPI: POST /api/actions/extract-from-text (subject, content)
    FastAPI->>AIEngine: analyze_email(subject, content)
    AIEngine->>NLPExtractor: extract_actions(content)
    NLPExtractor-->>AIEngine: List of action items
    AIEngine-->>FastAPI: AIAnalysisResult (includes action_items)
    FastAPI-->>Client: List of ActionItem (as JSON)
Loading
sequenceDiagram
    participant GmailService
    participant AdvancedAIEngine
    participant NLPExtractor

    GmailService->>AdvancedAIEngine: analyze_email(subject, content)
    AdvancedAIEngine->>NLPExtractor: extract_actions(content)
    NLPExtractor-->>AdvancedAIEngine: action_items
    AdvancedAIEngine-->>GmailService: AIAnalysisResult (with action_items)
    GmailService->>DB: Store email metadata + AI analysis (including action_items)
Loading

Poem

In the warren of code, a new feature appears,
Action items hop forth—no need for long ears!
With regex and NLTK, they’re found in a flash,
Through endpoints and engines, they leap and they dash.
Now rabbits and humans can clearly detect,
What needs to be done—no tasks to neglect!
🐇✨

✨ Finishing Touches
  • 📝 Generate Docstrings

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@MasumRab MasumRab self-assigned this Jun 14, 2025
@MasumRab MasumRab added the enhancement New feature or request label Jun 14, 2025
@MasumRab
Copy link
Copy Markdown
Owner Author

@coderabbitai summary

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Jun 14, 2025

✅ Actions performed

Summary regeneration triggered.

@MasumRab
Copy link
Copy Markdown
Owner Author

@coderabbitai full review

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Jun 14, 2025

✅ Actions performed

Full review triggered.

Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @MasumRab - I've reviewed your changes and they look great!

Prompt for AI Agents
Please address the comments from this code review:
## Individual Comments

### Comment 1
<location> `server/python_nlp/gmail_service.py:22` </location>
<code_context>
+# if all AI analysis is delegated to AdvancedAIEngine.
+# from .ai_training import ModelTrainer, PromptEngineer, ModelConfig
+
+# Import AdvancedAIEngine (assuming it's now the primary way to get AI analysis)
+# Adjust the import path as necessary based on your project structure.
+# This assumes AdvancedAIEngine is in a module that can be imported.
</code_context>

<issue_to_address>
Cleanup placeholder commented imports

Remove unused commented imports and placeholder comments to keep the code clean. Finalize or delete the AdvancedAIEngine import as appropriate.
</issue_to_address>

<suggested_fix>
<<<<<<< SEARCH
# AI Training and PromptEngineer might not be directly used by GmailAIService after refactoring
# if all AI analysis is delegated to AdvancedAIEngine.
# from .ai_training import ModelTrainer, PromptEngineer, ModelConfig

# Import AdvancedAIEngine (assuming it's now the primary way to get AI analysis)
# Adjust the import path as necessary based on your project structure.
# This assumes AdvancedAIEngine is in a module that can be imported.
# If it's in python_backend, the path needs to be correct.
# For now, let's assume a placeholder for where AdvancedAIEngine would be imported from.
# from server.python_backend.ai_engine import AdvancedAIEngine
# Placeholder: If AdvancedAIEngine is not easily importable due to circular deps or structure,
# this part of the refactoring (direct usage of AdvancedAIEngine object) might need adjustment.
# For this step, we'll mock its presence or assume it's passed in.

class GmailAIService:
=======
class GmailAIService:
>>>>>>> REPLACE

</suggested_fix>

### Comment 2
<location> `server/python_nlp/gmail_service.py:35` </location>
<code_context>
     """Complete Gmail integration with AI processing and metadata extraction"""

-    def __init__(self, rate_config: Optional[RateLimitConfig] = None):
+    def __init__(self, rate_config: Optional[RateLimitConfig] = None, advanced_ai_engine: Optional[Any] = None): # Allow passing AdvancedAIEngine
         self.collector = GmailDataCollector(rate_config)
         self.metadata_extractor = GmailMetadataExtractor()
</code_context>

<issue_to_address>
Use explicit type for advanced_ai_engine

Use the specific AdvancedAIEngine type or a protocol instead of Optional[Any] for better type safety and IDE support. Consider requiring this parameter or providing a default value.

Suggested implementation:

```python
from typing import Optional
from server.python_backend.ai_engine import AdvancedAIEngine  # Use the actual import path

class GmailAIService:
    """Complete Gmail integration with AI processing and metadata extraction"""

    def __init__(
        self,
        rate_config: Optional[RateLimitConfig] = None,
        advanced_ai_engine: Optional[AdvancedAIEngine] = None  # Use explicit type for better type safety
    ):
        self.collector = GmailDataCollector(rate_config)
        self.metadata_extractor = GmailMetadataExtractor()
        self.data_strategy = DataCollectionStrategy()
        # self.model_trainer = ModelTrainer() # Potentially remove if not used directly
        # self.prompt_engineer = PromptEngineer() # Potentially remove
        self.logger = logging.getLogger(f"{__name__}.{self.__class__.__name__}")

        # This is key for the refactoring: GmailAIService will use AdvancedAIEngine

```

- If `AdvancedAIEngine` cannot be imported directly due to circular dependencies, consider defining a Protocol in a shared location and using that as the type instead.
- If you want to require `advanced_ai_engine`, remove the `Optional` and the default value (`= None`).
- Make sure to update any code that instantiates `GmailAIService` to pass an `AdvancedAIEngine` instance as needed.
</issue_to_address>

### Comment 3
<location> `server/python_nlp/nlp_engine.py:952` </location>
<code_context>

             # Basic preprocessing (primarily for non-model based methods or as initial step)
             # Model pipelines should ideally handle their own specific preprocessing.
+            logger.info("Preprocessing email text...")
             cleaned_text = self._preprocess_text(full_text)
+            logger.info("Email text preprocessed successfully.")
</code_context>

<issue_to_address>
Verbose info-level logs with potential PII exposure

Consider changing these logs to `debug` level and avoid logging sensitive message content to reduce PII exposure.
</issue_to_address>

<suggested_fix>
<<<<<<< SEARCH
            logger.info("Preprocessing email text...")
            cleaned_text = self._preprocess_text(full_text)
            logger.info("Email text preprocessed successfully.")
=======
            logger.debug("Preprocessing email text...")
            cleaned_text = self._preprocess_text(full_text)
            logger.debug("Email text preprocessed successfully.")
>>>>>>> REPLACE

</suggested_fix>

### Comment 4
<location> `server/python_backend/ai_engine.py:83` </location>
<code_context>
             ]

-            result = await _execute_async_command(cmd, cwd=self.python_nlp_path)
+            logger.debug(f"Executing NLPEngine script with command: {' '.join(cmd)}")
+            result_json_str = await _execute_async_command(cmd, cwd=self.python_nlp_path)

</code_context>

<issue_to_address>
Sensitive content in debug logs

Redact or truncate user-provided parameters in logs to prevent exposure of sensitive data.
</issue_to_address>

<suggested_fix>
<<<<<<< SEARCH
            logger.debug(f"Executing NLPEngine script with command: {' '.join(cmd)}")
=======
            # Redact or truncate user-provided parameters in logs to prevent exposure of sensitive data
            redacted_cmd = [
                arg if i == 0 else ("<REDACTED>" if arg not in ('--output-format', 'json') else arg)
                for i, arg in enumerate(cmd)
            ]
            logger.debug(f"Executing NLPEngine script with command: {' '.join(redacted_cmd)}")
>>>>>>> REPLACE

</suggested_fix>

### Comment 5
<location> `server/python_nlp/text_utils.py:13` </location>
<code_context>
+    text = text.lower()
+    text = text.strip()
+    # Normalize punctuation by removing or replacing unwanted characters
+    text = re.sub(r"[^a-zA-Z0-9\s.,?!']", "", text) # Keep basic punctuation
+    text = re.sub(r"\s+", " ", text) # Normalize whitespace
+    return text
</code_context>

<issue_to_address>
Cleaning regex may strip useful punctuation

Hyphens and colons may be important for dates or context. Consider allowing these characters or making the regex customizable.
</issue_to_address>

### Comment 6
<location> `tests/test_action_item_extractor.py:5` </location>
<code_context>
+from unittest.mock import patch
+from server.python_nlp.action_item_extractor import ActionItemExtractor, HAS_NLTK
+
+class TestActionItemExtractor(unittest.TestCase):
+
+    def setUp(self):
</code_context>

<issue_to_address>
Expand keyword coverage for action item extraction tests.

Please add tests for keywords such as "must", "will you", and "required to" to fully cover the regex logic in `action_keywords_regex`.
</issue_to_address>

### Comment 7
<location> `tests/test_action_item_extractor.py:76` </location>
<code_context>
+            self.assertEqual(actions[1]['verb'], "send")
+
+
+    def test_extract_actions_simple_due_date_tomorrow(self):
+        text = "Submit the expenses by tomorrow."
+        actions = self.extractor.extract_actions(text)
+        self.assertEqual(len(actions), 1) # Assuming "Submit" isn't a keyword, but "by tomorrow" is part of sentence
+                                          # The current extractor triggers on keywords like "please", "need to" etc.
+                                          # Let's adjust the text to include a keyword.
+        text_with_keyword = "You should Submit the expenses by tomorrow."
+        actions = self.extractor.extract_actions(text_with_keyword)
+        self.assertEqual(len(actions), 1)
+        self.assertEqual(actions[0]['raw_due_date_text'], "by tomorrow")
+
+    def test_extract_actions_due_date_on_monday(self):
+        text = "We need to finish this on Monday."
+        actions = self.extractor.extract_actions(text)
</code_context>

<issue_to_address>
Increase test coverage for diverse due date patterns.

Please add tests for additional due date patterns, such as "by next Monday", "end of day"/"eod", "on 1st Jan 2023", "in 2 days"/"in 3 weeks", and "next month"/"next year" to further validate the regex.
</issue_to_address>

<suggested_fix>
<<<<<<< SEARCH
    def test_extract_actions_due_date_on_monday(self):
        text = "We need to finish this on Monday."
        actions = self.extractor.extract_actions(text)
=======
    def test_extract_actions_due_date_on_monday(self):
        text = "We need to finish this on Monday."
        actions = self.extractor.extract_actions(text)

    def test_extract_actions_due_date_by_next_monday(self):
        text = "Please submit the report by next Monday."
        actions = self.extractor.extract_actions(text)
        self.assertEqual(len(actions), 1)
        self.assertEqual(actions[0]['raw_due_date_text'], "by next Monday")

    def test_extract_actions_due_date_eod(self):
        text = "Can you send the file by end of day?"
        actions = self.extractor.extract_actions(text)
        self.assertEqual(len(actions), 1)
        self.assertIn(actions[0]['raw_due_date_text'].lower(), ["by end of day", "by eod"])

        text_eod = "Please finish this by EOD."
        actions_eod = self.extractor.extract_actions(text_eod)
        self.assertEqual(len(actions_eod), 1)
        self.assertEqual(actions_eod[0]['raw_due_date_text'], "by EOD")

    def test_extract_actions_due_date_on_specific_date(self):
        text = "You need to complete the task on 1st Jan 2023."
        actions = self.extractor.extract_actions(text)
        self.assertEqual(len(actions), 1)
        self.assertEqual(actions[0]['raw_due_date_text'], "on 1st Jan 2023")

    def test_extract_actions_due_date_in_days_weeks(self):
        text_days = "Please send the update in 2 days."
        actions_days = self.extractor.extract_actions(text_days)
        self.assertEqual(len(actions_days), 1)
        self.assertEqual(actions_days[0]['raw_due_date_text'], "in 2 days")

        text_weeks = "You should finish the project in 3 weeks."
        actions_weeks = self.extractor.extract_actions(text_weeks)
        self.assertEqual(len(actions_weeks), 1)
        self.assertEqual(actions_weeks[0]['raw_due_date_text'], "in 3 weeks")

    def test_extract_actions_due_date_next_month_year(self):
        text_month = "Let's finalize the plan next month."
        actions_month = self.extractor.extract_actions(text_month)
        self.assertEqual(len(actions_month), 1)
        self.assertEqual(actions_month[0]['raw_due_date_text'], "next month")

        text_year = "We need to launch the product next year."
        actions_year = self.extractor.extract_actions(text_year)
        self.assertEqual(len(actions_year), 1)
        self.assertEqual(actions_year[0]['raw_due_date_text'], "next year")
>>>>>>> REPLACE

</suggested_fix>

### Comment 8
<location> `tests/test_action_item_extractor.py:22` </location>
<code_context>
+            self.assertIsNotNone(action['verb']) # NLTK should find 'review'
+            # self.assertEqual(action['object'], "document") # Object extraction can be tricky
+
+    def test_extract_actions_keyword_task(self):
+        text = "Task: John to complete the slides by tomorrow."
+        actions = self.extractor.extract_actions(text)
</code_context>

<issue_to_address>
Add explicit test for keyword case-insensitivity.

Consider adding a test with an uppercase or mixed-case keyword (e.g., "TASK:") to explicitly verify case-insensitivity.

Suggested implementation:

```python
    def test_extract_actions_keyword_task(self):

```

```python
    def test_extract_actions_keyword_task(self):
        text = "Task: John to complete the slides by tomorrow."
        actions = self.extractor.extract_actions(text)
        self.assertEqual(len(actions), 1)
        action = actions[0]
        self.assertIn("complete the slides", action['action_phrase'])
        self.assertIn("tomorrow", action['raw_due_date_text'])

    def test_extract_actions_keyword_task_case_insensitive(self):
        # Explicitly test case-insensitivity of the keyword
        text = "TASK: John to update the budget by next week."
        actions = self.extractor.extract_actions(text)
        self.assertEqual(len(actions), 1)
        action = actions[0]
        self.assertIn("update the budget", action['action_phrase'])
        self.assertIn("next week", action['raw_due_date_text'])

```
</issue_to_address>

### Comment 9
<location> `tests/test_api_actions.py:17` </location>
<code_context>
+        self.client = TestClient(app)
+
+    @patch('server.python_backend.main.ai_engine.analyze_email', new_callable=AsyncMock)
+    def test_extract_actions_from_text_success(self, mock_analyze_email):
+        # Mock the return value of ai_engine.analyze_email
+        mock_action_item_data = [
</code_context>

<issue_to_address>
Consider testing API with optional 'subject' field omitted or null.

Add a test case where 'subject' is omitted or set to null to verify the endpoint handles these scenarios as expected.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment on lines +18 to 32
# AI Training and PromptEngineer might not be directly used by GmailAIService after refactoring
# if all AI analysis is delegated to AdvancedAIEngine.
# from .ai_training import ModelTrainer, PromptEngineer, ModelConfig

# Import AdvancedAIEngine (assuming it's now the primary way to get AI analysis)
# Adjust the import path as necessary based on your project structure.
# This assumes AdvancedAIEngine is in a module that can be imported.
# If it's in python_backend, the path needs to be correct.
# For now, let's assume a placeholder for where AdvancedAIEngine would be imported from.
# from server.python_backend.ai_engine import AdvancedAIEngine
# Placeholder: If AdvancedAIEngine is not easily importable due to circular deps or structure,
# this part of the refactoring (direct usage of AdvancedAIEngine object) might need adjustment.
# For this step, we'll mock its presence or assume it's passed in.

class GmailAIService:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: Cleanup placeholder commented imports

Remove unused commented imports and placeholder comments to keep the code clean. Finalize or delete the AdvancedAIEngine import as appropriate.

Suggested change
# AI Training and PromptEngineer might not be directly used by GmailAIService after refactoring
# if all AI analysis is delegated to AdvancedAIEngine.
# from .ai_training import ModelTrainer, PromptEngineer, ModelConfig
# Import AdvancedAIEngine (assuming it's now the primary way to get AI analysis)
# Adjust the import path as necessary based on your project structure.
# This assumes AdvancedAIEngine is in a module that can be imported.
# If it's in python_backend, the path needs to be correct.
# For now, let's assume a placeholder for where AdvancedAIEngine would be imported from.
# from server.python_backend.ai_engine import AdvancedAIEngine
# Placeholder: If AdvancedAIEngine is not easily importable due to circular deps or structure,
# this part of the refactoring (direct usage of AdvancedAIEngine object) might need adjustment.
# For this step, we'll mock its presence or assume it's passed in.
class GmailAIService:
class GmailAIService:

"""Complete Gmail integration with AI processing and metadata extraction"""

def __init__(self, rate_config: Optional[RateLimitConfig] = None):
def __init__(self, rate_config: Optional[RateLimitConfig] = None, advanced_ai_engine: Optional[Any] = None): # Allow passing AdvancedAIEngine
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: Use explicit type for advanced_ai_engine

Use the specific AdvancedAIEngine type or a protocol instead of Optional[Any] for better type safety and IDE support. Consider requiring this parameter or providing a default value.

Suggested implementation:

from typing import Optional
from server.python_backend.ai_engine import AdvancedAIEngine  # Use the actual import path

class GmailAIService:
    """Complete Gmail integration with AI processing and metadata extraction"""

    def __init__(
        self,
        rate_config: Optional[RateLimitConfig] = None,
        advanced_ai_engine: Optional[AdvancedAIEngine] = None  # Use explicit type for better type safety
    ):
        self.collector = GmailDataCollector(rate_config)
        self.metadata_extractor = GmailMetadataExtractor()
        self.data_strategy = DataCollectionStrategy()
        # self.model_trainer = ModelTrainer() # Potentially remove if not used directly
        # self.prompt_engineer = PromptEngineer() # Potentially remove
        self.logger = logging.getLogger(f"{__name__}.{self.__class__.__name__}")

        # This is key for the refactoring: GmailAIService will use AdvancedAIEngine
  • If AdvancedAIEngine cannot be imported directly due to circular dependencies, consider defining a Protocol in a shared location and using that as the type instead.
  • If you want to require advanced_ai_engine, remove the Optional and the default value (= None).
  • Make sure to update any code that instantiates GmailAIService to pass an AdvancedAIEngine instance as needed.

Comment on lines +952 to +954
logger.info("Preprocessing email text...")
cleaned_text = self._preprocess_text(full_text)
logger.info("Email text preprocessed successfully.")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚨 suggestion (security): Verbose info-level logs with potential PII exposure

Consider changing these logs to debug level and avoid logging sensitive message content to reduce PII exposure.

Suggested change
logger.info("Preprocessing email text...")
cleaned_text = self._preprocess_text(full_text)
logger.info("Email text preprocessed successfully.")
logger.debug("Preprocessing email text...")
cleaned_text = self._preprocess_text(full_text)
logger.debug("Email text preprocessed successfully.")

]

result = await _execute_async_command(cmd, cwd=self.python_nlp_path)
logger.debug(f"Executing NLPEngine script with command: {' '.join(cmd)}")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚨 suggestion (security): Sensitive content in debug logs

Redact or truncate user-provided parameters in logs to prevent exposure of sensitive data.

Suggested change
logger.debug(f"Executing NLPEngine script with command: {' '.join(cmd)}")
# Redact or truncate user-provided parameters in logs to prevent exposure of sensitive data
redacted_cmd = [
arg if i == 0 else ("<REDACTED>" if arg not in ('--output-format', 'json') else arg)
for i, arg in enumerate(cmd)
]
logger.debug(f"Executing NLPEngine script with command: {' '.join(redacted_cmd)}")

text = text.lower()
text = text.strip()
# Normalize punctuation by removing or replacing unwanted characters
text = re.sub(r"[^a-zA-Z0-9\s.,?!']", "", text) # Keep basic punctuation
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: Cleaning regex may strip useful punctuation

Hyphens and colons may be important for dates or context. Consider allowing these characters or making the regex customizable.

Comment on lines +102 to +103
match = self.action_keywords_regex.search(sentence)
if match:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (code-quality): Use named expression to simplify assignment and conditional (use-named-expression)

Suggested change
match = self.action_keywords_regex.search(sentence)
if match:
if match := self.action_keywords_regex.search(sentence):

Comment on lines +291 to +293
topic = external_analysis_results.get('topic', self._predict_topic(email.content + " " + email.subject, is_fallback=True))
sentiment = external_analysis_results.get('sentiment', self._predict_sentiment(email.content + " " + email.subject, is_fallback=True))
intent = external_analysis_results.get('intent', self._predict_intent(email.content + " " + email.subject, is_fallback=True))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (code-quality): Use f-string instead of string concatenation [×5] (use-fstring-for-concatenation)

}

def _convert_to_db_format(self, gmail_metadata: GmailMessage, ai_analysis_result: Optional[Dict[str, Any]]) -> Dict[str, Any]:
"""Convert Gmail metadata and AI analysis (if available) to database format."""
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (code-quality): We've found these issues:

Comment on lines +77 to +84
text = "Submit the expenses by tomorrow."
actions = self.extractor.extract_actions(text)
self.assertEqual(len(actions), 1) # Assuming "Submit" isn't a keyword, but "by tomorrow" is part of sentence
# The current extractor triggers on keywords like "please", "need to" etc.
# Let's adjust the text to include a keyword.
text_with_keyword = "You should Submit the expenses by tomorrow."
actions = self.extractor.extract_actions(text_with_keyword)
self.assertEqual(len(actions), 1)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (code-quality): Extract duplicate code into method (extract-duplicate-method)

Comment on lines +29 to +39
action_item = analysis['action_items'][0]
self.assertIn('action_phrase', action_item)
self.assertIn('verb', action_item)
self.assertIn('object', action_item)
self.assertIn('raw_due_date_text', action_item)
self.assertIn('context', action_item)

# Check if one of the expected actions is present
phrases = [item['action_phrase'] for item in analysis['action_items']]
self.assertTrue(any("Please complete the task by Monday." in phrase for phrase in phrases))
self.assertTrue(any("need to also review the report." in phrase for phrase in phrases))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (code-quality): Extract code out into method (extract-method)

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

♻️ Duplicate comments (1)
tests/test_action_item_extractor.py (1)

34-41: Test expectation will fail until regex supports “Action required:”

The test correctly asserts one action item is found, but the extractor misses this phrase with the current action_keywords_regex. Once the regex is broadened (see extractor comment), re-run the suite to confirm it passes.

🧹 Nitpick comments (7)
server/python_nlp/action_item_extractor.py (1)

52-60: Guard return path but avoid silent failure

The blanket except Exception in POS-tagging swallows all errors, returning (None, None) and only logging. That hides genuine bugs (e.g., malformed tokens). Catch LookupError and ValueError instead, or re-raise after logging.

🧰 Tools
🪛 Ruff (0.11.9)

52-52: Undefined name Tuple

(F821)

🪛 Pylint (3.3.7)

[error] 52-52: Undefined variable 'Tuple'

(E0602)

tests/test_api_actions.py (1)

16-17: Patch target could break if ai_engine is re-imported elsewhere

@patch('server.python_backend.main.ai_engine.analyze_email') assumes ai_engine is a module attribute, not re-assigned after startup. If main.py performs from ... import ai_engine later, the patch will miss. Consider patching the object actually used inside the route handler (e.g., server.python_backend.main.AdvancedAIEngine.analyze_email) to make the test more stable.

server/python_backend/main.py (1)

689-730: Improve exception chaining for better debugging context.

The endpoint implementation is solid with good error handling and logging. However, the exception re-raising could be improved for better debugging.

Apply this diff to improve exception chaining:

-        raise HTTPException(status_code=500, detail=f"Failed to extract action items: {str(e)}")
+        raise HTTPException(status_code=500, detail=f"Failed to extract action items: {str(e)}") from e
🧰 Tools
🪛 Ruff (0.11.9)

729-729: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

tests/test_nlp_engine_integration.py (2)

3-3: Remove unused import.

MagicMock is imported but not used in this test file.

-from unittest.mock import patch, MagicMock # Import MagicMock for AdvancedAIEngine test
+from unittest.mock import patch
🧰 Tools
🪛 Ruff (0.11.9)

3-3: unittest.mock.MagicMock imported but unused

Remove unused import: unittest.mock.MagicMock

(F401)


53-97: Consider using pytest-asyncio for proper async test support.

The async test method is well-written, but unittest doesn't natively support async tests. The commented code shows awareness of this limitation.

Would you like me to help convert this to use pytest with pytest-asyncio for proper async test support? This would make the tests more maintainable and eliminate the need for the workaround code.

Also, remove the unused asyncio import at line 112:

-    import asyncio

Also applies to: 108-130

server/python_nlp/nlp_engine.py (1)

952-973: Too-chatty INFO logs – switch to DEBUG

Emitting INFO for every sub-analysis (preprocess, sentiment, topic, etc.) will explode log volume in production. Change these to logger.debug() (keep the start / end high-level INFO if needed).

server/python_nlp/gmail_service.py (1)

232-239: Maintain key symmetry in fallback analysis

_get_basic_fallback_analysis_structure omits action_items, whereas downstream code expects it (accessed via .get('action_items', [])). Add an empty list for consistency and to avoid accidental None propagation.

 'risk_flags': ['analysis_failed']
+ , 'action_items': []
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d26f74f and 30a6d15.

📒 Files selected for processing (12)
  • server/python_backend/ai_engine.py (5 hunks)
  • server/python_backend/main.py (2 hunks)
  • server/python_nlp/action_item_extractor.py (1 hunks)
  • server/python_nlp/ai_training.py (2 hunks)
  • server/python_nlp/data_strategy.py (1 hunks)
  • server/python_nlp/gmail_service.py (5 hunks)
  • server/python_nlp/nlp_engine.py (7 hunks)
  • server/python_nlp/text_utils.py (1 hunks)
  • tests/test_action_item_extractor.py (1 hunks)
  • tests/test_api_actions.py (1 hunks)
  • tests/test_gmail_service_integration.py (1 hunks)
  • tests/test_nlp_engine_integration.py (1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (4)
server/python_nlp/ai_training.py (1)
server/python_nlp/text_utils.py (1)
  • clean_text (3-15)
server/python_backend/ai_engine.py (2)
server/python_nlp/gmail_service.py (1)
  • _execute_async_command (62-109)
server/python_nlp/nlp_engine.py (2)
  • _get_fallback_analysis (840-860)
  • _get_simple_fallback_analysis (862-922)
server/python_nlp/data_strategy.py (1)
server/python_nlp/nlp_engine.py (1)
  • _extract_keywords (567-615)
server/python_nlp/gmail_service.py (4)
server/python_nlp/gmail_integration.py (3)
  • GmailDataCollector (227-724)
  • EmailBatch (59-65)
  • collect_emails_incremental (360-446)
server/python_nlp/gmail_metadata.py (3)
  • GmailMetadataExtractor (86-660)
  • extract_complete_metadata (125-245)
  • GmailMessage (16-84)
server/python_nlp/data_strategy.py (2)
  • DataCollectionStrategy (40-483)
  • to_dict (25-26)
server/python_backend/ai_engine.py (2)
  • analyze_email (70-113)
  • to_dict (35-49)
🪛 Ruff (0.11.9)
tests/test_nlp_engine_integration.py

3-3: unittest.mock.MagicMock imported but unused

Remove unused import: unittest.mock.MagicMock

(F401)


112-112: asyncio imported but unused

Remove unused import: asyncio

(F401)

server/python_nlp/action_item_extractor.py

52-52: Undefined name Tuple

(F821)

tests/test_gmail_service_integration.py

122-122: Undefined name json

(F821)


150-150: Undefined name json

(F821)

server/python_backend/main.py

729-729: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

🪛 Pylint (3.3.7)
server/python_nlp/action_item_extractor.py

[error] 52-52: Undefined variable 'Tuple'

(E0602)


[refactor] 24-24: Too few public methods (1/2)

(R0903)

tests/test_gmail_service_integration.py

[error] 122-122: Undefined variable 'json'

(E0602)


[error] 150-150: Undefined variable 'json'

(E0602)

server/python_backend/main.py

[refactor] 49-49: Too few public methods (0/2)

(R0903)


[refactor] 53-53: Too few public methods (0/2)

(R0903)

server/python_nlp/nlp_engine.py

[refactor] 1006-1006: Too many arguments (9/5)

(R0913)


[refactor] 1006-1006: Too many positional arguments (9/5)

(R0917)


[refactor] 1006-1006: Too many local variables (20/15)

(R0914)

server/python_nlp/gmail_service.py

[refactor] 32-32: Too many instance attributes (8/7)

(R0902)

🔇 Additional comments (8)
server/python_nlp/ai_training.py (1)

232-234: Re-use of clean_text inherits punctuation-loss issue

Because clean_text currently strips :, any training samples containing “task:” / “action:” will be altered, potentially degrading model performance or mismatching production flow. Ensure the earlier fix to clean_text is applied or bypass cleaning for labels that depend on such punctuation.

server/python_backend/main.py (1)

48-59: LGTM! Well-structured Pydantic models for action item extraction.

The models are properly defined with appropriate field types and optional parameters. The static analysis warnings about too few public methods can be safely ignored as these are Pydantic data models that don't require additional methods.

🧰 Tools
🪛 Pylint (3.3.7)

[refactor] 49-49: Too few public methods (0/2)

(R0903)


[refactor] 53-53: Too few public methods (0/2)

(R0903)

server/python_backend/ai_engine.py (3)

33-33: LGTM! Proper integration of action_items field.

The action_items field is correctly added to both the initialization and serialization methods with appropriate default values.

Also applies to: 47-48


71-114: Excellent error handling and logging improvements.

The enhanced analyze_email method now provides:

  • Detailed logging for debugging without being verbose
  • Specific handling for different failure scenarios (empty output, JSON errors, file not found, timeout)
  • Proper error context propagation to the fallback method
  • Consistent return of AIAnalysisResult in all code paths

202-255: Robust fallback implementation with proper error context handling.

The fallback method provides excellent resilience:

  • Uses the fallback NLP engine for basic analysis when possible
  • Includes a secondary fallback for critical failures
  • Properly incorporates error context into the reasoning
  • Ensures action_items is always present in the response for API consistency
tests/test_gmail_service_integration.py (1)

11-63: Well-structured integration tests with comprehensive coverage.

The tests effectively validate:

  • AI analysis integration with action items
  • Database format conversion with and without AI analysis results
  • Proper serialization of analysis metadata

Note: Like the previous test file, the async test method will need a proper async test runner.

Also applies to: 64-128, 129-156

server/python_nlp/data_strategy.py (2)

280-320: Excellent implementation of dual-path annotation with external analysis support.

The updated annotate_email method elegantly handles both external AI analysis results and internal fallback predictions:

  • Clear separation of logic paths with appropriate logging
  • Proper confidence scoring based on the analysis source
  • Meaningful annotator ID suffixes for tracking
  • Graceful fallback for missing fields in external results

324-338: Good enhancement of prediction methods with fallback tracking.

The addition of the is_fallback parameter to all prediction methods provides better observability without changing the core logic. The debug logging with conditional prefixes will help with troubleshooting and understanding the annotation flow.

Also applies to: 339-359, 361-374, 376-389

Comment on lines +13 to +15
text = re.sub(r"[^a-zA-Z0-9\s.,?!']", "", text) # Keep basic punctuation
text = re.sub(r"\s+", " ", text) # Normalize whitespace
return text
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

⚠️ Potential issue

Colon & other critical punctuation removed – breaks Action-Item detection

clean_text strips : and ;.
ActionItemExtractor relies on task: / action: / action required: keywords that contain a colon, so the current normalisation will silently prevent those patterns from ever matching when text is pre-processed (e.g. in NLPEngine).

-    text = re.sub(r"[^a-zA-Z0-9\s.,?!']", "", text) # Keep basic punctuation
+    # Keep punctuation that is semantically relevant for downstream NLP
+    # (e.g. ':' needed for “task:” / “action:” keywords).
+    text = _PUNCT_CLEAN_RE.sub("", text)

And hoist the compiled regexes to module scope to avoid recompilation on every call:

+_PUNCT_CLEAN_RE = re.compile(r"[^a-zA-Z0-9\s.,?!':]")
+_WHITESPACE_RE  = re.compile(r"\s+")
+
 def clean_text(text: str) -> str:
@@
-    text = re.sub(r"[^a-zA-Z0-9\s.,?!']", "", text) # Keep basic punctuation
-    text = re.sub(r"\s+", " ", text) # Normalize whitespace
+    text = _PUNCT_CLEAN_RE.sub("", text)
+    text = _WHITESPACE_RE.sub(" ", text)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
text = re.sub(r"[^a-zA-Z0-9\s.,?!']", "", text) # Keep basic punctuation
text = re.sub(r"\s+", " ", text) # Normalize whitespace
return text
_PUNCT_CLEAN_RE = re.compile(r"[^a-zA-Z0-9\s.,?!':]")
_WHITESPACE_RE = re.compile(r"\s+")
def clean_text(text: str) -> str:
# Keep punctuation that is semantically relevant for downstream NLP
# (e.g. ':' needed for “task:” / “action:” keywords).
text = _PUNCT_CLEAN_RE.sub("", text)
text = _WHITESPACE_RE.sub(" ", text)
return text
🤖 Prompt for AI Agents
In server/python_nlp/text_utils.py around lines 13 to 15, the regex used in
clean_text removes colons and semicolons, which breaks detection of action-item
keywords like "task:" or "action:". Update the regex to keep colons and
semicolons by including them in the allowed characters. Additionally, move the
compiled regex patterns to module-level variables outside the function to avoid
recompiling them on every call, improving performance.

Comment on lines +8 to +18
# Ensure necessary NLTK data is available, if not, download it.
# This is more for a local setup; in a container, it should be pre-installed.
try:
nltk.data.find('taggers/averaged_perceptron_tagger')
except nltk.downloader.DownloadError:
nltk.download('averaged_perceptron_tagger', quiet=True)
try:
nltk.data.find('tokenizers/punkt')
except nltk.downloader.DownloadError:
nltk.download('punkt', quiet=True)
HAS_NLTK = True
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Auto-downloading NLTK resources at import time is risky

Triggering network downloads inside a library import can hang or fail in production containers without internet access and slows cold-starts.

Move the download logic behind an explicit helper or setup script instead of executing automatically.

🤖 Prompt for AI Agents
In server/python_nlp/action_item_extractor.py around lines 8 to 18, the current
code automatically downloads NLTK resources during import, which can cause hangs
or failures in production without internet access. Refactor by removing the
automatic download logic from the import time and instead create a separate
explicit helper function or setup script that checks for and downloads the
required NLTK resources. This helper should be called manually during setup or
initialization, not during module import.

Comment on lines +1 to +4
import re
import logging
from typing import List, Dict, Any, Optional

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Missing Tuple import => runtime NameError

_extract_verb_object_with_nltk declares Tuple[...] in the return annotation but Tuple is not imported. The file will raise at import time.

-from typing import List, Dict, Any, Optional
+from typing import List, Dict, Any, Optional, Tuple
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
import re
import logging
from typing import List, Dict, Any, Optional
import re
import logging
-from typing import List, Dict, Any, Optional
+from typing import List, Dict, Any, Optional, Tuple
🤖 Prompt for AI Agents
In server/python_nlp/action_item_extractor.py at the top import section (lines 1
to 4), the Tuple type is used in the return annotation of the function
_extract_verb_object_with_nltk but Tuple is not imported from typing. Add Tuple
to the import statement from typing alongside List, Dict, Any, and Optional to
prevent a runtime NameError.

Comment on lines +32 to +44
self.action_keywords_regex = re.compile(
r'\b(please|task:|action:|need to|required to|must|should|can you|could you|will you)\b',
re.IGNORECASE
)
# Regex for simple due date patterns
# This is a basic version and can be expanded significantly
self.due_date_regex = re.compile(
r'\b(by (next )?(monday|tuesday|wednesday|thursday|friday|saturday|sunday|tomorrow|end of day|eod)|'
r'on \d{1,2}(st|nd|rd|th)? (jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)(\w*)?(\s\d{4})?|'
r'in \d+ (days?|weeks?|months?)|'
r'next (week|month|year))\b',
re.IGNORECASE
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Keyword pattern too narrow – test suite fails on “Action required:”

action_keywords_regex lacks action required: and the plain task form; unit-tests (tests/test_action_item_extractor.py) expect those to be recognised.

-            r'\b(please|task:|action:|need to|required to|must|should|can you|could you|will you)\b',
+            r'\b(please|task:?|action:?|action required:|need to|required to|must|should|can you|could you|will you)\b',

(Added optional colon for task / action, plus explicit action required:).
Update tests accordingly if you choose a different wording.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
self.action_keywords_regex = re.compile(
r'\b(please|task:|action:|need to|required to|must|should|can you|could you|will you)\b',
re.IGNORECASE
)
# Regex for simple due date patterns
# This is a basic version and can be expanded significantly
self.due_date_regex = re.compile(
r'\b(by (next )?(monday|tuesday|wednesday|thursday|friday|saturday|sunday|tomorrow|end of day|eod)|'
r'on \d{1,2}(st|nd|rd|th)? (jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)(\w*)?(\s\d{4})?|'
r'in \d+ (days?|weeks?|months?)|'
r'next (week|month|year))\b',
re.IGNORECASE
)
self.action_keywords_regex = re.compile(
r'\b(please|task:?|action:?|action required:|need to|required to|must|should|can you|could you|will you)\b',
re.IGNORECASE
)
# Regex for simple due date patterns
# This is a basic version and can be expanded significantly
self.due_date_regex = re.compile(
r'\b(by (next )?(monday|tuesday|wednesday|thursday|friday|saturday|sunday|tomorrow|end of day|eod)|'
r'on \d{1,2}(st|nd|rd|th)? (jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)(\w*)?(\s\d{4})?|'
r'in \d+ (days?|weeks?|months?)|'
r'next (week|month|year))\b',
re.IGNORECASE
)
🤖 Prompt for AI Agents
In server/python_nlp/action_item_extractor.py around lines 32 to 44, the
action_keywords_regex pattern is too narrow and misses matching "action
required:" and the plain "task" keyword without a colon, causing test failures.
Update the regex to include optional colons after "task" and "action" and
explicitly add "action required:" as a keyword. After updating the regex, ensure
the unit tests in tests/test_action_item_extractor.py are aligned with these
changes.

Comment on lines +1 to +8
import unittest
from unittest.mock import MagicMock, AsyncMock # AsyncMock for async methods

from server.python_nlp.gmail_service import GmailAIService
# Assuming AdvancedAIEngine and AIAnalysisResult are importable for type hinting or mocking structure
# from server.python_backend.ai_engine import AdvancedAIEngine, AIAnalysisResult
from server.python_nlp.gmail_metadata import GmailMessage # For structuring metadata input

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix missing json import.

The json module is used at lines 122 and 150 but not imported.

Add the missing import at the top of the file:

 import unittest
 from unittest.mock import MagicMock, AsyncMock # AsyncMock for async methods
+import json
 
 from server.python_nlp.gmail_service import GmailAIService

Also applies to: 122-122, 150-150

🤖 Prompt for AI Agents
In tests/test_gmail_service_integration.py at the top of the file (lines 1 to
8), the json module is used later in the file around lines 122 and 150 but is
not imported. Add an import statement for the json module alongside the existing
imports to fix the missing import error.

Comment on lines 1006 to 1009
def _build_final_analysis_response(self, sentiment_analysis, topic_analysis, intent_analysis, urgency_analysis,
categories, keywords, risk_analysis_flags) -> Dict[str, Any]:
categories, keywords, risk_analysis_flags, action_items) -> Dict[str, Any]:
"""Helper function to consolidate analysis results and build the final response dictionary."""

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Method signature is getting unwieldy

_build_final_analysis_response now takes 9 positional parameters (plus self) and exceeds the 5-arg guideline flagged by pylint (R0913/R0917). Pass a single dataclass or dict instead to improve readability and future maintainability.

🧰 Tools
🪛 Pylint (3.3.7)

[refactor] 1006-1006: Too many arguments (9/5)

(R0913)


[refactor] 1006-1006: Too many positional arguments (9/5)

(R0917)


[refactor] 1006-1006: Too many local variables (20/15)

(R0914)

🤖 Prompt for AI Agents
In server/python_nlp/nlp_engine.py around lines 1006 to 1009, the method
_build_final_analysis_response has too many positional parameters, making it
hard to read and maintain. Refactor this method to accept a single dataclass or
dictionary that encapsulates all these parameters instead of passing them
individually. Define a suitable dataclass or use a dict to group
sentiment_analysis, topic_analysis, intent_analysis, urgency_analysis,
categories, keywords, risk_analysis_flags, and action_items, then update the
method signature and its internal usage accordingly.

Comment on lines +35 to 41
def __init__(self, rate_config: Optional[RateLimitConfig] = None, advanced_ai_engine: Optional[Any] = None): # Allow passing AdvancedAIEngine
self.collector = GmailDataCollector(rate_config)
self.metadata_extractor = GmailMetadataExtractor()
self.data_strategy = DataCollectionStrategy()
self.model_trainer = ModelTrainer()
self.prompt_engineer = PromptEngineer()
# Changed for more specific logger name
# self.model_trainer = ModelTrainer() # Potentially remove if not used directly
# self.prompt_engineer = PromptEngineer() # Potentially remove
self.logger = logging.getLogger(f"{__name__}.{self.__class__.__name__}")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

model_trainer / prompt_engineer now undefined – will crash

__init__ comments out creation of ModelTrainer and PromptEngineer, yet train_models_from_gmail_data() still calls self.model_trainer and self.prompt_engineer. Any invocation of training will raise AttributeError.

Add the attributes back or gate the training code behind a feature flag.

Also applies to: 397-423

🤖 Prompt for AI Agents
In server/python_nlp/gmail_service.py around lines 35 to 41 and also lines 397
to 423, the attributes model_trainer and prompt_engineer are commented out in
the __init__ method but are still used in train_models_from_gmail_data(),
causing AttributeError. To fix this, either uncomment and initialize
self.model_trainer and self.prompt_engineer in __init__ or add a feature flag to
conditionally create and use these attributes, ensuring any training code checks
the flag before accessing them.

@MasumRab MasumRab merged commit 53baeec into main Jun 14, 2025
4 checks passed
@MasumRab MasumRab deleted the feature/email-ai-enhancements-p1 branch June 14, 2025 08:41
MasumRab added a commit that referenced this pull request Oct 29, 2025
feat: Refactor NLP pipeline and add Action Item Extraction
MasumRab added a commit that referenced this pull request Nov 6, 2025
feat: Refactor NLP pipeline and add Action Item Extraction
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant