feat: Refactor NLP pipeline and add Action Item Extraction by MasumRab · Pull Request #29 · MasumRab/EmailIntelligence

MasumRab · 2025-06-14T07:58:49Z

This commit introduces significant enhancements to the Gmail AI Email Management system.

Refactoring:

Centralized NLP analysis logic within AdvancedAIEngine and NLPEngine, removing redundancy from GmailAIService and DataCollectionStrategy.
Streamlined fallback mechanisms in the NLP pipeline for more robust error handling.
Standardized basic text preprocessing with a new text_utils.py module.
Simplified complex functions, particularly GmailAIService.sync_gmail_emails, by breaking them into smaller, more focused methods.
Improved error handling and logging across the NLP components for better traceability.

New Feature: Action Item Extraction

Added a new ActionItemExtractor module (server/python_nlp/action_item_extractor.py) using rule-based methods and optional NLTK POS tagging to identify and extract action items from email text.
Integrated action item extraction into the NLPEngine and AdvancedAIEngine.
Action items are now included in the analysisMetadata stored for emails.
Exposed this functionality via a new API endpoint: POST /api/actions/extract-from-text.

Unit Tests:

Added comprehensive unit tests for the ActionItemExtractor.
Included integration tests for NLPEngine, AdvancedAIEngine, and GmailAIService to ensure correct handling of action items.
Added API tests for the new /api/actions/extract-from-text endpoint.

Additional Feature Ideas Proposed:

Automated Email Summarization: To provide concise summaries of email content.
Smart Reply Suggestions: To offer context-aware reply options based on NLP analysis.

Summary by Sourcery

Refactor the NLP pipeline by centralizing analysis and fallback logic, introduce a rule-based and POS-augmented action item extraction feature across engines and the API, standardize text preprocessing, enhance logging and error handling, and add comprehensive tests.

New Features:

Introduce ActionItemExtractor module to identify and extract action items from email text.
Add POST /api/actions/extract-from-text API endpoint for on-demand action item extraction.

Enhancements:

Refactor GmailAIService to delegate AI analysis to AdvancedAIEngine, splitting syncing logic into helper methods.
Centralize NLP analysis and fallback logic in NLPEngine and AdvancedAIEngine with improved logging and error handling.
Standardize text preprocessing via the new text_utils.clean_text utility and update training pipeline to use it.
Integrate extracted action items into analysisMetadata and annotate emails using external analysis results in DataCollectionStrategy.

Tests:

Add unit tests for ActionItemExtractor and integration tests for NLP engines and Gmail service handling action items.
Add API tests for the new action extraction endpoint to verify request validation and error handling.

Summary by CodeRabbit

New Features
- Introduced extraction of action items from email text, accessible via a new API endpoint for structured action item retrieval.
- Integrated action item extraction into AI and NLP analysis flows, with results now included in analysis responses and Gmail service metadata.
- Added utility for text cleaning and normalization.
Bug Fixes
- Improved error handling and logging throughout AI analysis and fallback processes to ensure robust responses.
Refactor
- Modularized AI analysis pipeline for clearer separation of concerns and easier integration of external engines.
Tests
- Added comprehensive unit and integration tests for action item extraction, API endpoint behavior, Gmail service integration, and NLP engine outputs.

This commit introduces significant enhancements to the Gmail AI Email Management system. Refactoring: - Centralized NLP analysis logic within AdvancedAIEngine and NLPEngine, removing redundancy from GmailAIService and DataCollectionStrategy. - Streamlined fallback mechanisms in the NLP pipeline for more robust error handling. - Standardized basic text preprocessing with a new `text_utils.py` module. - Simplified complex functions, particularly `GmailAIService.sync_gmail_emails`, by breaking them into smaller, more focused methods. - Improved error handling and logging across the NLP components for better traceability. New Feature: Action Item Extraction - Added a new `ActionItemExtractor` module (`server/python_nlp/action_item_extractor.py`) using rule-based methods and optional NLTK POS tagging to identify and extract action items from email text. - Integrated action item extraction into the `NLPEngine` and `AdvancedAIEngine`. - Action items are now included in the `analysisMetadata` stored for emails. - Exposed this functionality via a new API endpoint: `POST /api/actions/extract-from-text`. Unit Tests: - Added comprehensive unit tests for the `ActionItemExtractor`. - Included integration tests for `NLPEngine`, `AdvancedAIEngine`, and `GmailAIService` to ensure correct handling of action items. - Added API tests for the new `/api/actions/extract-from-text` endpoint. Additional Feature Ideas Proposed: 1. Automated Email Summarization: To provide concise summaries of email content. 2. Smart Reply Suggestions: To offer context-aware reply options based on NLP analysis.

sourcery-ai · 2025-06-14T07:58:53Z

Reviewer's Guide

This PR refactors the core NLP pipeline by centralizing analysis logic in AdvancedAIEngine and NLPEngine, breaking down GmailAIService.sync_gmail_emails into smaller helpers, standardizing text preprocessing via text_utils, and improving error handling and logging; it also introduces a rule-based ActionItemExtractor integrated end-to-end (NLPEngine, AdvancedAIEngine, GmailAIService, and DB payloads) and exposed through a new FastAPI endpoint, with comprehensive unit and integration tests ensuring correct action-item extraction and fallback behaviors.

Sequence Diagram for Refactored Email Sync and AI Analysis

sequenceDiagram
    participant C as Caller
    participant GAS as GmailAIService
    participant GDC as GmailDataCollector
    participant GME as GmailMetadataExtractor
    participant AAE as AdvancedAIEngine

    C->>GAS: sync_gmail_emails(query, max, include_ai)
    activate GAS
    GAS->>GAS: _fetch_emails_from_gmail(query, max)
    GAS->>GDC: collect_emails_incremental(query, max)
    activate GDC
    GDC-->>GAS: email_batch
    deactivate GDC
    GAS->>GAS: _process_and_analyze_batch(email_batch, include_ai)
    loop for each email in batch
        GAS->>GME: extract_complete_metadata(gmail_msg)
        activate GME
        GME-->>GAS: gmail_metadata
        deactivate GME
        alt include_ai_analysis is true
            GAS->>GAS: _perform_ai_analysis(email_data)
            GAS->>AAE: analyze_email(subject, content)
            activate AAE
            AAE-->>GAS: ai_analysis_result (with action_items)
            deactivate AAE
        end
        GAS->>GAS: _convert_to_db_format(gmail_metadata, ai_analysis_result)
        GAS-->>GAS: db_email (metadata includes action_items)
    end
    GAS-->>C: sync_result (with processed_emails)
    deactivate GAS

Sequence Diagram for Action Item Extraction API (/api/actions/extract-from-text)

sequenceDiagram
    actor Client as APIClient
    participant FastAPI as FastAPIApp
    participant AAE as AdvancedAIEngine
    participant NLP_Script as NLPEngine (script)

    Client->>FastAPI: POST /api/actions/extract-from-text (subject, content)
    activate FastAPI
    FastAPI->>AAE: analyze_email(subject, content)
    activate AAE
    AAE->>NLP_Script: execute_async_command([... --analyze-email ...])
    activate NLP_Script
    NLP_Script-->>AAE: analysis_json_str (includes action_items)
    deactivate NLP_Script
    AAE->>AAE: Parse JSON, create AIAnalysisResult
    AAE-->>FastAPI: ai_analysis_result (with action_items)
    deactivate AAE
    FastAPI-->>Client: List[ActionItem]
    deactivate FastAPI

Sequence Diagram for NLPEngine Action Item Analysis

sequenceDiagram
    participant NLPE as NLPEngine
    participant AIE as ActionItemExtractor

    NLPE->>NLPE: analyze_email(subject, content)
    activate NLPE
    NLPE->>NLPE: _preprocess_text(full_text) // using text_utils.clean_text
    NLPE->>NLPE: ... other analysis steps ...
    NLPE->>NLPE: _analyze_action_items(full_text)
    NLPE->>AIE: extract_actions(full_text)
    activate AIE
    AIE-->>NLPE: extracted_actions_list
    deactivate AIE
    NLPE->>NLPE: _build_final_analysis_response(..., action_items=extracted_actions_list)
    NLPE-->>NLPE: final_analysis_result (with action_items)
    deactivate NLPE

Entity Relationship Diagram for Action Items and Analysis Data

erDiagram
    AIAnalysisResult {
        string topic
        string sentiment
        string intent
        string urgency
        float confidence
        string_array categories
        string_array keywords
        string reasoning
        string_array suggested_labels
        string_array risk_flags
        string category_id
        ActionItem_array action_items "New: list of action items"
    }

    ActionItem {
        string action_phrase "New"
        string verb "New, Optional"
        string object "New, Optional"
        string raw_due_date_text "New, Optional"
        string context "New"
    }

    EmailDBRecord {
        string messageId
        string analysisMetadata "JSON string, now includes action_items"
        string other_fields
    }

    AIAnalysisResult ||--o{ ActionItem : "has"
    EmailDBRecord ||--|| AIAnalysisResult : "embeds (via analysisMetadata JSON)"

Class Diagram: GmailAIService Changes

classDiagram
    class GmailAIService {
        +advanced_ai_engine: AdvancedAIEngine
        -model_trainer: ModelTrainer // Removed / Not directly used
        -prompt_engineer: PromptEngineer // Removed / Not directly used
        +__init__(rate_config: Optional[RateLimitConfig], advanced_ai_engine: Optional[Any])
        +sync_gmail_emails(query_filter: str, max_emails: int, include_ai_analysis: bool) List[Dict]
        #_fetch_emails_from_gmail(query_filter: str, max_emails: int) Optional[EmailBatch]
        #_process_and_analyze_batch(email_batch: EmailBatch, include_ai_analysis: bool) List[Dict]
        #_perform_ai_analysis(email_data: Dict) Optional[Dict]
        #_convert_to_db_format(gmail_metadata: GmailMessage, ai_analysis_result: Optional[Dict]) Dict
        #_get_basic_fallback_analysis_structure(error_message: str) Dict
    }
    GmailAIService ..> AdvancedAIEngine : uses

Class Diagram: NLPEngine Changes

classDiagram
    class NLPEngine {
        +action_item_extractor: ActionItemExtractor
        +__init__()
        #_preprocess_text(text: str) str
        #_get_fallback_analysis(error_msg: str) Dict
        #_get_simple_fallback_analysis(subject: str, content: str) Dict
        #_analyze_action_items(text: str) List[Dict]
        +analyze_email(subject: str, content: str) Dict
        #_build_final_analysis_response(sentiment_analysis, topic_analysis, intent_analysis, urgency_analysis, categories, keywords, risk_analysis_flags, action_items: List[Dict]) Dict
    }
    NLPEngine ..> ActionItemExtractor : uses
    NLPEngine ..> text_utils : uses clean_text()

Class Diagram: AdvancedAIEngine and AIAnalysisResult Changes

classDiagram
    class AdvancedAIEngine {
        +fallback_nlp_engine: NLPEngine
        +async initialize()
        +async analyze_email(subject: str, content: str) AIAnalysisResult
        #_get_fallback_analysis(subject: str, content: str, error_context: Optional[str]) AIAnalysisResult
    }
    class AIAnalysisResult {
        +action_items: List[Dict]
        +topic: str
        +sentiment: str
        +intent: str
        +urgency: str
        +confidence: float
        +categories: List[str]
        +keywords: List[str]
        +reasoning: str
        +suggested_labels: List[str]
        +risk_flags: List[str]
        +category_id: str
        +to_dict() Dict
    }
    AdvancedAIEngine ..> NLPEngine : uses (for fallback_nlp_engine)
    AdvancedAIEngine ..> AIAnalysisResult : creates

Class Diagram: New ActionItemExtractor and Pydantic Models

classDiagram
    class ActionItemExtractor {
        +action_keywords_regex: Pattern
        +due_date_regex: Pattern
        +sentence_splitter_regex: Pattern
        +__init__()
        #_extract_verb_object_with_nltk(text: str) Tuple[Optional[str], Optional[str]]
        +extract_actions(text: str) List[Dict[str, Any]]
    }
    class ActionExtractionRequest {
        <<PydanticModel>>
        +subject: Optional[str]
        +content: str
    }
    class ActionItem {
        <<PydanticModel>>
        +action_phrase: str
        +verb: Optional[str]
        +object: Optional[str]
        +raw_due_date_text: Optional[str]
        +context: str
    }
    %% FastAPI endpoint uses ActionExtractionRequest and ActionItem
    %% NLPEngine uses ActionItemExtractor

Class Diagram: DataCollectionStrategy Changes

classDiagram
    class DataCollectionStrategy {
        +annotate_email(email: EmailSample, annotator_id: str, external_analysis_results: Optional[Dict]) AnnotationSchema
        #_predict_topic(text: str, is_fallback: bool) str
        #_predict_sentiment(text: str, is_fallback: bool) str
        #_predict_intent(text: str, is_fallback: bool) str
        #_predict_urgency(text: str, is_fallback: bool) str
    }

Class Diagram: New text_utils Module

classDiagram
  class text_utils {
    <<Module/Utility>>
    +static clean_text(text: str) str
  }

Class Diagram: ModelTrainer Changes

classDiagram
    class ModelTrainer {
        +feature_extractor: FeatureExtractor
        +prepare_training_data(samples: List[Dict], target_field: str) Tuple[List, List]
    }
    ModelTrainer ..> text_utils : uses clean_text()

File-Level Changes

Change	Details	Files
Refactor and modularize NLP pipeline	Centralized AI analysis in AdvancedAIEngine/NLPEngine and removed duplicate logic from GmailAIService/DataCollectionStrategy Decomposed sync_gmail_emails into _fetch_emails_from_gmail, _process_and_analyze_batch and refined error paths Replaced inline text cleaning with clean_text in text_utils and updated ai_training preprocessing Enhanced logging calls and standardized fallback structures across async helpers	`server/python_nlp/gmail_service.py` `server/python_nlp/data_strategy.py` `server/python_nlp/nlp_engine.py` `server/python_nlp/ai_training.py` `server/python_nlp/text_utils.py`
Implement Action Item Extraction end-to-end	Added ActionItemExtractor module with regex and optional NLTK tagging Integrated extraction into NLPEngine.analyze_email, AdvancedAIEngine.analyze_email and sync flow Extended AIAnalysisResult and to_dict/DB formatter to carry action_items in analysisMetadata Exposed extraction via POST /api/actions/extract-from-text with Pydantic request/response models	`server/python_nlp/action_item_extractor.py` `server/python_nlp/nlp_engine.py` `server/python_backend/ai_engine.py` `server/python_backend/main.py` `server/python_nlp/gmail_service.py`
Expand test coverage for new and refactored features	Added unit tests for ActionItemExtractor edge cases and its NLTK fallback Created integration tests for GmailAIService and NLPEngine/AdvancedAIEngine to verify action_items in analysis Introduced API tests for the new extraction endpoint handling success, validation, and error scenarios	`tests/test_action_item_extractor.py` `tests/test_gmail_service_integration.py` `tests/test_nlp_engine_integration.py` `tests/test_api_actions.py`

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

coderabbitai · 2025-06-14T07:58:54Z

Warning

Rate limit exceeded

@MasumRab has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 2 minutes and 46 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between d26f74f and 30a6d15.

📒 Files selected for processing (12)

server/python_backend/ai_engine.py (5 hunks)
server/python_backend/main.py (2 hunks)
server/python_nlp/action_item_extractor.py (1 hunks)
server/python_nlp/ai_training.py (2 hunks)
server/python_nlp/data_strategy.py (1 hunks)
server/python_nlp/gmail_service.py (5 hunks)
server/python_nlp/nlp_engine.py (7 hunks)
server/python_nlp/text_utils.py (1 hunks)
tests/test_action_item_extractor.py (1 hunks)
tests/test_api_actions.py (1 hunks)
tests/test_gmail_service_integration.py (1 hunks)
tests/test_nlp_engine_integration.py (1 hunks)

Walkthrough

This update introduces action item extraction to the email AI analysis pipeline. It adds a new action item extractor module, integrates action item extraction into both the main NLP and advanced AI engines, and exposes a FastAPI endpoint for extracting action items from text. The Gmail AI service is refactored to use an external AI engine, and fallback mechanisms are improved. Comprehensive unit and integration tests are included for action extraction and its API.

Changes

File(s)	Change Summary
server/python_backend/ai_engine.py	Extended `AIAnalysisResult` with `action_items`; improved error handling, logging, and fallback logic in `AdvancedAIEngine`.
server/python_backend/main.py	Added Pydantic models and a new POST endpoint `/api/actions/extract-from-text` for extracting action items from email text.
server/python_nlp/action_item_extractor.py	New module implementing `ActionItemExtractor` for extracting action items using regex and optional NLTK POS tagging.
server/python_nlp/ai_training.py	Updated `ModelTrainer.prepare_training_data` to clean text samples using a shared utility before feature extraction.
server/python_nlp/data_strategy.py	Enhanced `annotate_email` to support external analysis results and improved fallback logic and defaults.
server/python_nlp/gmail_service.py	Refactored `GmailAIService` to use an external `AdvancedAIEngine`, modularized email processing, and improved AI analysis integration and error handling.
server/python_nlp/nlp_engine.py	Integrated action item extraction into `NLPEngine`, updated analysis and fallback methods, and improved logging.
server/python_nlp/text_utils.py	Added `clean_text` utility function for consistent text normalization and cleaning.
tests/test_action_item_extractor.py	New unit tests for `ActionItemExtractor`, covering extraction logic, edge cases, and NLTK/no-NLTK scenarios.
tests/test_api_actions.py	New test suite for the `/api/actions/extract-from-text` API endpoint, covering success, validation, and error cases.
tests/test_gmail_service_integration.py	New integration tests for `GmailAIService`, focusing on AI analysis and metadata serialization with action items.
tests/test_nlp_engine_integration.py	New integration tests for `NLPEngine` and `AdvancedAIEngine`, verifying action item extraction and fallback behaviors.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant FastAPI
    participant AIEngine
    participant NLPExtractor

    Client->>FastAPI: POST /api/actions/extract-from-text (subject, content)
    FastAPI->>AIEngine: analyze_email(subject, content)
    AIEngine->>NLPExtractor: extract_actions(content)
    NLPExtractor-->>AIEngine: List of action items
    AIEngine-->>FastAPI: AIAnalysisResult (includes action_items)
    FastAPI-->>Client: List of ActionItem (as JSON)

sequenceDiagram
    participant GmailService
    participant AdvancedAIEngine
    participant NLPExtractor

    GmailService->>AdvancedAIEngine: analyze_email(subject, content)
    AdvancedAIEngine->>NLPExtractor: extract_actions(content)
    NLPExtractor-->>AdvancedAIEngine: action_items
    AdvancedAIEngine-->>GmailService: AIAnalysisResult (with action_items)
    GmailService->>DB: Store email metadata + AI analysis (including action_items)

Poem

In the warren of code, a new feature appears,
Action items hop forth—no need for long ears!
With regex and NLTK, they’re found in a flash,
Through endpoints and engines, they leap and they dash.
Now rabbits and humans can clearly detect,
What needs to be done—no tasks to neglect!
🐇✨

✨ Finishing Touches

📝 Generate Docstrings

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Explain this complex logic.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai explain this code block.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and explain its main purpose.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR.
@coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

MasumRab · 2025-06-14T08:00:45Z

@coderabbitai summary

coderabbitai · 2025-06-14T08:00:50Z

✅ Actions performed

Summary regeneration triggered.

MasumRab · 2025-06-14T08:01:29Z

@coderabbitai full review

coderabbitai · 2025-06-14T08:01:35Z

✅ Actions performed

Full review triggered.

sourcery-ai

Hey @MasumRab - I've reviewed your changes and they look great!

Prompt for AI Agents

Please address the comments from this code review:
## Individual Comments

### Comment 1
<location> `server/python_nlp/gmail_service.py:22` </location>
<code_context>
+# if all AI analysis is delegated to AdvancedAIEngine.
+# from .ai_training import ModelTrainer, PromptEngineer, ModelConfig
+
+# Import AdvancedAIEngine (assuming it's now the primary way to get AI analysis)
+# Adjust the import path as necessary based on your project structure.
+# This assumes AdvancedAIEngine is in a module that can be imported.
</code_context>

<issue_to_address>
Cleanup placeholder commented imports

Remove unused commented imports and placeholder comments to keep the code clean. Finalize or delete the AdvancedAIEngine import as appropriate.
</issue_to_address>

<suggested_fix>
<<<<<<< SEARCH
# AI Training and PromptEngineer might not be directly used by GmailAIService after refactoring
# if all AI analysis is delegated to AdvancedAIEngine.
# from .ai_training import ModelTrainer, PromptEngineer, ModelConfig

# Import AdvancedAIEngine (assuming it's now the primary way to get AI analysis)
# Adjust the import path as necessary based on your project structure.
# This assumes AdvancedAIEngine is in a module that can be imported.
# If it's in python_backend, the path needs to be correct.
# For now, let's assume a placeholder for where AdvancedAIEngine would be imported from.
# from server.python_backend.ai_engine import AdvancedAIEngine
# Placeholder: If AdvancedAIEngine is not easily importable due to circular deps or structure,
# this part of the refactoring (direct usage of AdvancedAIEngine object) might need adjustment.
# For this step, we'll mock its presence or assume it's passed in.

class GmailAIService:
=======
class GmailAIService:
>>>>>>> REPLACE

</suggested_fix>

### Comment 2
<location> `server/python_nlp/gmail_service.py:35` </location>
<code_context>
     """Complete Gmail integration with AI processing and metadata extraction"""

-    def __init__(self, rate_config: Optional[RateLimitConfig] = None):
+    def __init__(self, rate_config: Optional[RateLimitConfig] = None, advanced_ai_engine: Optional[Any] = None): # Allow passing AdvancedAIEngine
         self.collector = GmailDataCollector(rate_config)
         self.metadata_extractor = GmailMetadataExtractor()
</code_context>

<issue_to_address>
Use explicit type for advanced_ai_engine

Use the specific AdvancedAIEngine type or a protocol instead of Optional[Any] for better type safety and IDE support. Consider requiring this parameter or providing a default value.

Suggested implementation:

```python
from typing import Optional
from server.python_backend.ai_engine import AdvancedAIEngine  # Use the actual import path

class GmailAIService:
    """Complete Gmail integration with AI processing and metadata extraction"""

    def __init__(
        self,
        rate_config: Optional[RateLimitConfig] = None,
        advanced_ai_engine: Optional[AdvancedAIEngine] = None  # Use explicit type for better type safety
    ):
        self.collector = GmailDataCollector(rate_config)
        self.metadata_extractor = GmailMetadataExtractor()
        self.data_strategy = DataCollectionStrategy()
        # self.model_trainer = ModelTrainer() # Potentially remove if not used directly
        # self.prompt_engineer = PromptEngineer() # Potentially remove
        self.logger = logging.getLogger(f"{__name__}.{self.__class__.__name__}")

        # This is key for the refactoring: GmailAIService will use AdvancedAIEngine

```

- If `AdvancedAIEngine` cannot be imported directly due to circular dependencies, consider defining a Protocol in a shared location and using that as the type instead.
- If you want to require `advanced_ai_engine`, remove the `Optional` and the default value (`= None`).
- Make sure to update any code that instantiates `GmailAIService` to pass an `AdvancedAIEngine` instance as needed.
</issue_to_address>

### Comment 3
<location> `server/python_nlp/nlp_engine.py:952` </location>
<code_context>

             # Basic preprocessing (primarily for non-model based methods or as initial step)
             # Model pipelines should ideally handle their own specific preprocessing.
+            logger.info("Preprocessing email text...")
             cleaned_text = self._preprocess_text(full_text)
+            logger.info("Email text preprocessed successfully.")
</code_context>

<issue_to_address>
Verbose info-level logs with potential PII exposure

Consider changing these logs to `debug` level and avoid logging sensitive message content to reduce PII exposure.
</issue_to_address>

<suggested_fix>
<<<<<<< SEARCH
            logger.info("Preprocessing email text...")
            cleaned_text = self._preprocess_text(full_text)
            logger.info("Email text preprocessed successfully.")
=======
            logger.debug("Preprocessing email text...")
            cleaned_text = self._preprocess_text(full_text)
            logger.debug("Email text preprocessed successfully.")
>>>>>>> REPLACE

</suggested_fix>

### Comment 4
<location> `server/python_backend/ai_engine.py:83` </location>
<code_context>
             ]

-            result = await _execute_async_command(cmd, cwd=self.python_nlp_path)
+            logger.debug(f"Executing NLPEngine script with command: {' '.join(cmd)}")
+            result_json_str = await _execute_async_command(cmd, cwd=self.python_nlp_path)

</code_context>

<issue_to_address>
Sensitive content in debug logs

Redact or truncate user-provided parameters in logs to prevent exposure of sensitive data.
</issue_to_address>

<suggested_fix>
<<<<<<< SEARCH
            logger.debug(f"Executing NLPEngine script with command: {' '.join(cmd)}")
=======
            # Redact or truncate user-provided parameters in logs to prevent exposure of sensitive data
            redacted_cmd = [
                arg if i == 0 else ("<REDACTED>" if arg not in ('--output-format', 'json') else arg)
                for i, arg in enumerate(cmd)
            ]
            logger.debug(f"Executing NLPEngine script with command: {' '.join(redacted_cmd)}")
>>>>>>> REPLACE

</suggested_fix>

### Comment 5
<location> `server/python_nlp/text_utils.py:13` </location>
<code_context>
+    text = text.lower()
+    text = text.strip()
+    # Normalize punctuation by removing or replacing unwanted characters
+    text = re.sub(r"[^a-zA-Z0-9\s.,?!']", "", text) # Keep basic punctuation
+    text = re.sub(r"\s+", " ", text) # Normalize whitespace
+    return text
</code_context>

<issue_to_address>
Cleaning regex may strip useful punctuation

Hyphens and colons may be important for dates or context. Consider allowing these characters or making the regex customizable.
</issue_to_address>

### Comment 6
<location> `tests/test_action_item_extractor.py:5` </location>
<code_context>
+from unittest.mock import patch
+from server.python_nlp.action_item_extractor import ActionItemExtractor, HAS_NLTK
+
+class TestActionItemExtractor(unittest.TestCase):
+
+    def setUp(self):
</code_context>

<issue_to_address>
Expand keyword coverage for action item extraction tests.

Please add tests for keywords such as "must", "will you", and "required to" to fully cover the regex logic in `action_keywords_regex`.
</issue_to_address>

### Comment 7
<location> `tests/test_action_item_extractor.py:76` </location>
<code_context>
+            self.assertEqual(actions[1]['verb'], "send")
+
+
+    def test_extract_actions_simple_due_date_tomorrow(self):
+        text = "Submit the expenses by tomorrow."
+        actions = self.extractor.extract_actions(text)
+        self.assertEqual(len(actions), 1) # Assuming "Submit" isn't a keyword, but "by tomorrow" is part of sentence
+                                          # The current extractor triggers on keywords like "please", "need to" etc.
+                                          # Let's adjust the text to include a keyword.
+        text_with_keyword = "You should Submit the expenses by tomorrow."
+        actions = self.extractor.extract_actions(text_with_keyword)
+        self.assertEqual(len(actions), 1)
+        self.assertEqual(actions[0]['raw_due_date_text'], "by tomorrow")
+
+    def test_extract_actions_due_date_on_monday(self):
+        text = "We need to finish this on Monday."
+        actions = self.extractor.extract_actions(text)
</code_context>

<issue_to_address>
Increase test coverage for diverse due date patterns.

Please add tests for additional due date patterns, such as "by next Monday", "end of day"/"eod", "on 1st Jan 2023", "in 2 days"/"in 3 weeks", and "next month"/"next year" to further validate the regex.
</issue_to_address>

<suggested_fix>
<<<<<<< SEARCH
    def test_extract_actions_due_date_on_monday(self):
        text = "We need to finish this on Monday."
        actions = self.extractor.extract_actions(text)
=======
    def test_extract_actions_due_date_on_monday(self):
        text = "We need to finish this on Monday."
        actions = self.extractor.extract_actions(text)

    def test_extract_actions_due_date_by_next_monday(self):
        text = "Please submit the report by next Monday."
        actions = self.extractor.extract_actions(text)
        self.assertEqual(len(actions), 1)
        self.assertEqual(actions[0]['raw_due_date_text'], "by next Monday")

    def test_extract_actions_due_date_eod(self):
        text = "Can you send the file by end of day?"
        actions = self.extractor.extract_actions(text)
        self.assertEqual(len(actions), 1)
        self.assertIn(actions[0]['raw_due_date_text'].lower(), ["by end of day", "by eod"])

        text_eod = "Please finish this by EOD."
        actions_eod = self.extractor.extract_actions(text_eod)
        self.assertEqual(len(actions_eod), 1)
        self.assertEqual(actions_eod[0]['raw_due_date_text'], "by EOD")

    def test_extract_actions_due_date_on_specific_date(self):
        text = "You need to complete the task on 1st Jan 2023."
        actions = self.extractor.extract_actions(text)
        self.assertEqual(len(actions), 1)
        self.assertEqual(actions[0]['raw_due_date_text'], "on 1st Jan 2023")

    def test_extract_actions_due_date_in_days_weeks(self):
        text_days = "Please send the update in 2 days."
        actions_days = self.extractor.extract_actions(text_days)
        self.assertEqual(len(actions_days), 1)
        self.assertEqual(actions_days[0]['raw_due_date_text'], "in 2 days")

        text_weeks = "You should finish the project in 3 weeks."
        actions_weeks = self.extractor.extract_actions(text_weeks)
        self.assertEqual(len(actions_weeks), 1)
        self.assertEqual(actions_weeks[0]['raw_due_date_text'], "in 3 weeks")

    def test_extract_actions_due_date_next_month_year(self):
        text_month = "Let's finalize the plan next month."
        actions_month = self.extractor.extract_actions(text_month)
        self.assertEqual(len(actions_month), 1)
        self.assertEqual(actions_month[0]['raw_due_date_text'], "next month")

        text_year = "We need to launch the product next year."
        actions_year = self.extractor.extract_actions(text_year)
        self.assertEqual(len(actions_year), 1)
        self.assertEqual(actions_year[0]['raw_due_date_text'], "next year")
>>>>>>> REPLACE

</suggested_fix>

### Comment 8
<location> `tests/test_action_item_extractor.py:22` </location>
<code_context>
+            self.assertIsNotNone(action['verb']) # NLTK should find 'review'
+            # self.assertEqual(action['object'], "document") # Object extraction can be tricky
+
+    def test_extract_actions_keyword_task(self):
+        text = "Task: John to complete the slides by tomorrow."
+        actions = self.extractor.extract_actions(text)
</code_context>

<issue_to_address>
Add explicit test for keyword case-insensitivity.

Consider adding a test with an uppercase or mixed-case keyword (e.g., "TASK:") to explicitly verify case-insensitivity.

Suggested implementation:

```python
    def test_extract_actions_keyword_task(self):

```

```python
    def test_extract_actions_keyword_task(self):
        text = "Task: John to complete the slides by tomorrow."
        actions = self.extractor.extract_actions(text)
        self.assertEqual(len(actions), 1)
        action = actions[0]
        self.assertIn("complete the slides", action['action_phrase'])
        self.assertIn("tomorrow", action['raw_due_date_text'])

    def test_extract_actions_keyword_task_case_insensitive(self):
        # Explicitly test case-insensitivity of the keyword
        text = "TASK: John to update the budget by next week."
        actions = self.extractor.extract_actions(text)
        self.assertEqual(len(actions), 1)
        action = actions[0]
        self.assertIn("update the budget", action['action_phrase'])
        self.assertIn("next week", action['raw_due_date_text'])

```
</issue_to_address>

### Comment 9
<location> `tests/test_api_actions.py:17` </location>
<code_context>
+        self.client = TestClient(app)
+
+    @patch('server.python_backend.main.ai_engine.analyze_email', new_callable=AsyncMock)
+    def test_extract_actions_from_text_success(self, mock_analyze_email):
+        # Mock the return value of ai_engine.analyze_email
+        mock_action_item_data = [
</code_context>

<issue_to_address>
Consider testing API with optional 'subject' field omitted or null.

Add a test case where 'subject' is omitted or set to null to verify the endpoint handles these scenarios as expected.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

sourcery-ai · 2025-06-14T08:01:41Z

server/python_nlp/gmail_service.py

+# AI Training and PromptEngineer might not be directly used by GmailAIService after refactoring
+# if all AI analysis is delegated to AdvancedAIEngine.
+# from .ai_training import ModelTrainer, PromptEngineer, ModelConfig
+
+# Import AdvancedAIEngine (assuming it's now the primary way to get AI analysis)
+# Adjust the import path as necessary based on your project structure.
+# This assumes AdvancedAIEngine is in a module that can be imported.
+# If it's in python_backend, the path needs to be correct.
+# For now, let's assume a placeholder for where AdvancedAIEngine would be imported from.
+# from server.python_backend.ai_engine import AdvancedAIEngine
+# Placeholder: If AdvancedAIEngine is not easily importable due to circular deps or structure,
+# this part of the refactoring (direct usage of AdvancedAIEngine object) might need adjustment.
+# For this step, we'll mock its presence or assume it's passed in.

 class GmailAIService:


suggestion: Cleanup placeholder commented imports

Remove unused commented imports and placeholder comments to keep the code clean. Finalize or delete the AdvancedAIEngine import as appropriate.

Suggested change

# AI Training and PromptEngineer might not be directly used by GmailAIService after refactoring

# if all AI analysis is delegated to AdvancedAIEngine.

# from .ai_training import ModelTrainer, PromptEngineer, ModelConfig

# Import AdvancedAIEngine (assuming it's now the primary way to get AI analysis)

# Adjust the import path as necessary based on your project structure.

# This assumes AdvancedAIEngine is in a module that can be imported.

# If it's in python_backend, the path needs to be correct.

# For now, let's assume a placeholder for where AdvancedAIEngine would be imported from.

# from server.python_backend.ai_engine import AdvancedAIEngine

# Placeholder: If AdvancedAIEngine is not easily importable due to circular deps or structure,

# this part of the refactoring (direct usage of AdvancedAIEngine object) might need adjustment.

# For this step, we'll mock its presence or assume it's passed in.

class GmailAIService:

class GmailAIService:

sourcery-ai · 2025-06-14T08:01:41Z

server/python_nlp/gmail_service.py

    """Complete Gmail integration with AI processing and metadata extraction"""

-    def __init__(self, rate_config: Optional[RateLimitConfig] = None):
+    def __init__(self, rate_config: Optional[RateLimitConfig] = None, advanced_ai_engine: Optional[Any] = None): # Allow passing AdvancedAIEngine


suggestion: Use explicit type for advanced_ai_engine

Use the specific AdvancedAIEngine type or a protocol instead of Optional[Any] for better type safety and IDE support. Consider requiring this parameter or providing a default value.

Suggested implementation:

from typing import Optional from server.python_backend.ai_engine import AdvancedAIEngine # Use the actual import path class GmailAIService: """Complete Gmail integration with AI processing and metadata extraction""" def __init__( self, rate_config: Optional[RateLimitConfig] = None, advanced_ai_engine: Optional[AdvancedAIEngine] = None # Use explicit type for better type safety ): self.collector = GmailDataCollector(rate_config) self.metadata_extractor = GmailMetadataExtractor() self.data_strategy = DataCollectionStrategy() # self.model_trainer = ModelTrainer() # Potentially remove if not used directly # self.prompt_engineer = PromptEngineer() # Potentially remove self.logger = logging.getLogger(f"{__name__}.{self.__class__.__name__}") # This is key for the refactoring: GmailAIService will use AdvancedAIEngine

If AdvancedAIEngine cannot be imported directly due to circular dependencies, consider defining a Protocol in a shared location and using that as the type instead.

If you want to require advanced_ai_engine, remove the Optional and the default value (= None).

Make sure to update any code that instantiates GmailAIService to pass an AdvancedAIEngine instance as needed.

sourcery-ai · 2025-06-14T08:01:41Z

server/python_nlp/nlp_engine.py

+            logger.info("Preprocessing email text...")
            cleaned_text = self._preprocess_text(full_text)
+            logger.info("Email text preprocessed successfully.")


🚨 suggestion (security): Verbose info-level logs with potential PII exposure

Consider changing these logs to debug level and avoid logging sensitive message content to reduce PII exposure.

Suggested change

logger.info("Preprocessing email text...")

cleaned_text = self._preprocess_text(full_text)

logger.info("Email text preprocessed successfully.")

logger.debug("Preprocessing email text...")

cleaned_text = self._preprocess_text(full_text)

logger.debug("Email text preprocessed successfully.")

sourcery-ai · 2025-06-14T08:01:41Z

server/python_backend/ai_engine.py

            ]

-            result = await _execute_async_command(cmd, cwd=self.python_nlp_path)
+            logger.debug(f"Executing NLPEngine script with command: {' '.join(cmd)}")


🚨 suggestion (security): Sensitive content in debug logs

Redact or truncate user-provided parameters in logs to prevent exposure of sensitive data.

Suggested change

logger.debug(f"Executing NLPEngine script with command: {' '.join(cmd)}")

# Redact or truncate user-provided parameters in logs to prevent exposure of sensitive data

redacted_cmd = [

arg if i == 0 else ("<REDACTED>" if arg not in ('--output-format', 'json') else arg)

for i, arg in enumerate(cmd)

]

logger.debug(f"Executing NLPEngine script with command: {' '.join(redacted_cmd)}")

sourcery-ai · 2025-06-14T08:01:41Z

server/python_nlp/text_utils.py

+    text = text.lower()
+    text = text.strip()
+    # Normalize punctuation by removing or replacing unwanted characters
+    text = re.sub(r"[^a-zA-Z0-9\s.,?!']", "", text) # Keep basic punctuation


suggestion: Cleaning regex may strip useful punctuation

Hyphens and colons may be important for dates or context. Consider allowing these characters or making the regex customizable.

sourcery-ai · 2025-06-14T08:01:42Z

server/python_nlp/action_item_extractor.py

+            match = self.action_keywords_regex.search(sentence)
+            if match:


suggestion (code-quality): Use named expression to simplify assignment and conditional (use-named-expression)

Suggested change

match = self.action_keywords_regex.search(sentence)

if match:

if match := self.action_keywords_regex.search(sentence):

sourcery-ai · 2025-06-14T08:01:42Z

server/python_nlp/data_strategy.py

+            topic = external_analysis_results.get('topic', self._predict_topic(email.content + " " + email.subject, is_fallback=True))
+            sentiment = external_analysis_results.get('sentiment', self._predict_sentiment(email.content + " " + email.subject, is_fallback=True))
+            intent = external_analysis_results.get('intent', self._predict_intent(email.content + " " + email.subject, is_fallback=True))


issue (code-quality): Use f-string instead of string concatenation [×5] (use-fstring-for-concatenation)

sourcery-ai · 2025-06-14T08:01:42Z

server/python_nlp/gmail_service.py

+        }
+
+    def _convert_to_db_format(self, gmail_metadata: GmailMessage, ai_analysis_result: Optional[Dict[str, Any]]) -> Dict[str, Any]:
+        """Convert Gmail metadata and AI analysis (if available) to database format."""


issue (code-quality): We've found these issues:

Inline variable that is immediately returned (inline-immediately-returned-variable)

Use f-string instead of string concatenation (use-fstring-for-concatenation)

Merge dictionary updates via the union operator (dict-assign-update-to-union)

sourcery-ai · 2025-06-14T08:01:42Z

tests/test_action_item_extractor.py

+        text = "Submit the expenses by tomorrow."
+        actions = self.extractor.extract_actions(text)
+        self.assertEqual(len(actions), 1) # Assuming "Submit" isn't a keyword, but "by tomorrow" is part of sentence
+                                          # The current extractor triggers on keywords like "please", "need to" etc.
+                                          # Let's adjust the text to include a keyword.
+        text_with_keyword = "You should Submit the expenses by tomorrow."
+        actions = self.extractor.extract_actions(text_with_keyword)
+        self.assertEqual(len(actions), 1)


issue (code-quality): Extract duplicate code into method (extract-duplicate-method)

sourcery-ai · 2025-06-14T08:01:42Z

tests/test_nlp_engine_integration.py

+            action_item = analysis['action_items'][0]
+            self.assertIn('action_phrase', action_item)
+            self.assertIn('verb', action_item)
+            self.assertIn('object', action_item)
+            self.assertIn('raw_due_date_text', action_item)
+            self.assertIn('context', action_item)
+
+            # Check if one of the expected actions is present
+            phrases = [item['action_phrase'] for item in analysis['action_items']]
+            self.assertTrue(any("Please complete the task by Monday." in phrase for phrase in phrases))
+            self.assertTrue(any("need to also review the report." in phrase for phrase in phrases))


issue (code-quality): Extract code out into method (extract-method)

coderabbitai

Actionable comments posted: 7

♻️ Duplicate comments (1)

tests/test_action_item_extractor.py (1)

34-41: Test expectation will fail until regex supports “Action required:”

The test correctly asserts one action item is found, but the extractor misses this phrase with the current action_keywords_regex. Once the regex is broadened (see extractor comment), re-run the suite to confirm it passes.

🧹 Nitpick comments (7)

server/python_nlp/action_item_extractor.py (1)

52-60: Guard return path but avoid silent failure

The blanket except Exception in POS-tagging swallows all errors, returning (None, None) and only logging. That hides genuine bugs (e.g., malformed tokens). Catch LookupError and ValueError instead, or re-raise after logging.

🧰 Tools

🪛 Ruff (0.11.9)

52-52: Undefined name Tuple

(F821)

🪛 Pylint (3.3.7)

[error] 52-52: Undefined variable 'Tuple'

(E0602)

tests/test_api_actions.py (1)

16-17: Patch target could break if ai_engine is re-imported elsewhere

@patch('server.python_backend.main.ai_engine.analyze_email') assumes ai_engine is a module attribute, not re-assigned after startup. If main.py performs from ... import ai_engine later, the patch will miss. Consider patching the object actually used inside the route handler (e.g., server.python_backend.main.AdvancedAIEngine.analyze_email) to make the test more stable.
server/python_backend/main.py (1)
689-730: Improve exception chaining for better debugging context.

The endpoint implementation is solid with good error handling and logging. However, the exception re-raising could be improved for better debugging.

Apply this diff to improve exception chaining:
-        raise HTTPException(status_code=500, detail=f"Failed to extract action items: {str(e)}")
+        raise HTTPException(status_code=500, detail=f"Failed to extract action items: {str(e)}") from e
🧰 Tools

🪛 Ruff (0.11.9)

729-729: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)
tests/test_nlp_engine_integration.py (2)
3-3: Remove unused import.

MagicMock is imported but not used in this test file.
-from unittest.mock import patch, MagicMock # Import MagicMock for AdvancedAIEngine test
+from unittest.mock import patch
🧰 Tools

🪛 Ruff (0.11.9)

3-3: unittest.mock.MagicMock imported but unused

Remove unused import: unittest.mock.MagicMock

(F401)

53-97: Consider using pytest-asyncio for proper async test support.

The async test method is well-written, but unittest doesn't natively support async tests. The commented code shows awareness of this limitation.

Would you like me to help convert this to use pytest with pytest-asyncio for proper async test support? This would make the tests more maintainable and eliminate the need for the workaround code.

Also, remove the unused asyncio import at line 112:
-    import asyncio
Also applies to: 108-130
server/python_nlp/nlp_engine.py (1)

952-973: Too-chatty INFO logs – switch to DEBUG

Emitting INFO for every sub-analysis (preprocess, sentiment, topic, etc.) will explode log volume in production. Change these to logger.debug() (keep the start / end high-level INFO if needed).
server/python_nlp/gmail_service.py (1)
232-239: Maintain key symmetry in fallback analysis

_get_basic_fallback_analysis_structure omits action_items, whereas downstream code expects it (accessed via .get('action_items', [])). Add an empty list for consistency and to avoid accidental None propagation.
 'risk_flags': ['analysis_failed']
+ , 'action_items': []

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d26f74f and 30a6d15.

📒 Files selected for processing (12)

server/python_backend/ai_engine.py (5 hunks)
server/python_backend/main.py (2 hunks)
server/python_nlp/action_item_extractor.py (1 hunks)
server/python_nlp/ai_training.py (2 hunks)
server/python_nlp/data_strategy.py (1 hunks)
server/python_nlp/gmail_service.py (5 hunks)
server/python_nlp/nlp_engine.py (7 hunks)
server/python_nlp/text_utils.py (1 hunks)
tests/test_action_item_extractor.py (1 hunks)
tests/test_api_actions.py (1 hunks)
tests/test_gmail_service_integration.py (1 hunks)
tests/test_nlp_engine_integration.py (1 hunks)

🧰 Additional context used

🧬 Code Graph Analysis (4)

server/python_nlp/ai_training.py (1)

server/python_nlp/text_utils.py (1)

clean_text (3-15)

server/python_backend/ai_engine.py (2)

server/python_nlp/gmail_service.py (1)

_execute_async_command (62-109)

server/python_nlp/nlp_engine.py (2)

_get_fallback_analysis (840-860)

_get_simple_fallback_analysis (862-922)

server/python_nlp/data_strategy.py (1)

server/python_nlp/nlp_engine.py (1)

_extract_keywords (567-615)

server/python_nlp/gmail_service.py (4)

server/python_nlp/gmail_integration.py (3)

GmailDataCollector (227-724)

EmailBatch (59-65)

collect_emails_incremental (360-446)

server/python_nlp/gmail_metadata.py (3)

GmailMetadataExtractor (86-660)

extract_complete_metadata (125-245)

GmailMessage (16-84)

server/python_nlp/data_strategy.py (2)

DataCollectionStrategy (40-483)

to_dict (25-26)

server/python_backend/ai_engine.py (2)

analyze_email (70-113)

to_dict (35-49)

🪛 Ruff (0.11.9)

tests/test_nlp_engine_integration.py

3-3: unittest.mock.MagicMock imported but unused

Remove unused import: unittest.mock.MagicMock

(F401)

112-112: asyncio imported but unused

Remove unused import: asyncio

(F401)

server/python_nlp/action_item_extractor.py

52-52: Undefined name Tuple

(F821)

tests/test_gmail_service_integration.py

122-122: Undefined name json

(F821)

150-150: Undefined name json

(F821)

server/python_backend/main.py

729-729: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

🪛 Pylint (3.3.7)

server/python_nlp/action_item_extractor.py

[error] 52-52: Undefined variable 'Tuple'

(E0602)

[refactor] 24-24: Too few public methods (1/2)

(R0903)

tests/test_gmail_service_integration.py

[error] 122-122: Undefined variable 'json'

(E0602)

[error] 150-150: Undefined variable 'json'

(E0602)

server/python_backend/main.py

[refactor] 49-49: Too few public methods (0/2)

(R0903)

[refactor] 53-53: Too few public methods (0/2)

(R0903)

server/python_nlp/nlp_engine.py

[refactor] 1006-1006: Too many arguments (9/5)

(R0913)

[refactor] 1006-1006: Too many positional arguments (9/5)

(R0917)

[refactor] 1006-1006: Too many local variables (20/15)

(R0914)

server/python_nlp/gmail_service.py

[refactor] 32-32: Too many instance attributes (8/7)

(R0902)

🔇 Additional comments (8)

server/python_nlp/ai_training.py (1)

232-234: Re-use of clean_text inherits punctuation-loss issue

Because clean_text currently strips :, any training samples containing “task:” / “action:” will be altered, potentially degrading model performance or mismatching production flow. Ensure the earlier fix to clean_text is applied or bypass cleaning for labels that depend on such punctuation.

server/python_backend/main.py (1)

48-59: LGTM! Well-structured Pydantic models for action item extraction.

The models are properly defined with appropriate field types and optional parameters. The static analysis warnings about too few public methods can be safely ignored as these are Pydantic data models that don't require additional methods.

🧰 Tools

🪛 Pylint (3.3.7)

[refactor] 49-49: Too few public methods (0/2)

(R0903)

[refactor] 53-53: Too few public methods (0/2)

(R0903)

server/python_backend/ai_engine.py (3)

33-33: LGTM! Proper integration of action_items field.

The action_items field is correctly added to both the initialization and serialization methods with appropriate default values.

Also applies to: 47-48

71-114: Excellent error handling and logging improvements.

The enhanced analyze_email method now provides:

Detailed logging for debugging without being verbose

Specific handling for different failure scenarios (empty output, JSON errors, file not found, timeout)

Proper error context propagation to the fallback method

Consistent return of AIAnalysisResult in all code paths

202-255: Robust fallback implementation with proper error context handling.

The fallback method provides excellent resilience:

Uses the fallback NLP engine for basic analysis when possible

Includes a secondary fallback for critical failures

Properly incorporates error context into the reasoning

Ensures action_items is always present in the response for API consistency

tests/test_gmail_service_integration.py (1)

11-63: Well-structured integration tests with comprehensive coverage.

The tests effectively validate:

AI analysis integration with action items

Database format conversion with and without AI analysis results

Proper serialization of analysis metadata

Note: Like the previous test file, the async test method will need a proper async test runner.

Also applies to: 64-128, 129-156

server/python_nlp/data_strategy.py (2)

280-320: Excellent implementation of dual-path annotation with external analysis support.

The updated annotate_email method elegantly handles both external AI analysis results and internal fallback predictions:

Clear separation of logic paths with appropriate logging

Proper confidence scoring based on the analysis source

Meaningful annotator ID suffixes for tracking

Graceful fallback for missing fields in external results

324-338: Good enhancement of prediction methods with fallback tracking.

The addition of the is_fallback parameter to all prediction methods provides better observability without changing the core logic. The debug logging with conditional prefixes will help with troubleshooting and understanding the annotation flow.

Also applies to: 339-359, 361-374, 376-389

coderabbitai · 2025-06-14T08:05:40Z

server/python_nlp/text_utils.py

+    text = re.sub(r"[^a-zA-Z0-9\s.,?!']", "", text) # Keep basic punctuation
+    text = re.sub(r"\s+", " ", text) # Normalize whitespace
+    return text


🛠️ Refactor suggestion

⚠️ Potential issue

Colon & other critical punctuation removed – breaks Action-Item detection

clean_text strips : and ;.
ActionItemExtractor relies on task: / action: / action required: keywords that contain a colon, so the current normalisation will silently prevent those patterns from ever matching when text is pre-processed (e.g. in NLPEngine).

- text = re.sub(r"[^a-zA-Z0-9\s.,?!']", "", text) # Keep basic punctuation + # Keep punctuation that is semantically relevant for downstream NLP + # (e.g. ':' needed for “task:” / “action:” keywords). + text = _PUNCT_CLEAN_RE.sub("", text)

And hoist the compiled regexes to module scope to avoid recompilation on every call:

+_PUNCT_CLEAN_RE = re.compile(r"[^a-zA-Z0-9\s.,?!':]") +_WHITESPACE_RE = re.compile(r"\s+") + def clean_text(text: str) -> str: @@ - text = re.sub(r"[^a-zA-Z0-9\s.,?!']", "", text) # Keep basic punctuation - text = re.sub(r"\s+", " ", text) # Normalize whitespace + text = _PUNCT_CLEAN_RE.sub("", text) + text = _WHITESPACE_RE.sub(" ", text)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

text = re.sub(r"[^a-zA-Z0-9\s.,?!']", "", text) # Keep basic punctuation

text = re.sub(r"\s+", " ", text) # Normalize whitespace

return text

_PUNCT_CLEAN_RE = re.compile(r"[^a-zA-Z0-9\s.,?!':]")

_WHITESPACE_RE = re.compile(r"\s+")

def clean_text(text: str) -> str:

# Keep punctuation that is semantically relevant for downstream NLP

# (e.g. ':' needed for “task:” / “action:” keywords).

text = _PUNCT_CLEAN_RE.sub("", text)

text = _WHITESPACE_RE.sub(" ", text)

return text

🤖 Prompt for AI Agents

In server/python_nlp/text_utils.py around lines 13 to 15, the regex used in clean_text removes colons and semicolons, which breaks detection of action-item keywords like "task:" or "action:". Update the regex to keep colons and semicolons by including them in the allowed characters. Additionally, move the compiled regex patterns to module-level variables outside the function to avoid recompiling them on every call, improving performance.

coderabbitai · 2025-06-14T08:05:40Z

server/python_nlp/action_item_extractor.py

+    # Ensure necessary NLTK data is available, if not, download it.
+    # This is more for a local setup; in a container, it should be pre-installed.
+    try:
+        nltk.data.find('taggers/averaged_perceptron_tagger')
+    except nltk.downloader.DownloadError:
+        nltk.download('averaged_perceptron_tagger', quiet=True)
+    try:
+        nltk.data.find('tokenizers/punkt')
+    except nltk.downloader.DownloadError:
+        nltk.download('punkt', quiet=True)
+    HAS_NLTK = True


🛠️ Refactor suggestion

Auto-downloading NLTK resources at import time is risky

Triggering network downloads inside a library import can hang or fail in production containers without internet access and slows cold-starts.

Move the download logic behind an explicit helper or setup script instead of executing automatically.

🤖 Prompt for AI Agents

In server/python_nlp/action_item_extractor.py around lines 8 to 18, the current code automatically downloads NLTK resources during import, which can cause hangs or failures in production without internet access. Refactor by removing the automatic download logic from the import time and instead create a separate explicit helper function or setup script that checks for and downloads the required NLTK resources. This helper should be called manually during setup or initialization, not during module import.

coderabbitai · 2025-06-14T08:05:40Z

server/python_nlp/action_item_extractor.py

+import re
+import logging
+from typing import List, Dict, Any, Optional
+


⚠️ Potential issue

Missing Tuple import => runtime NameError

_extract_verb_object_with_nltk declares Tuple[...] in the return annotation but Tuple is not imported. The file will raise at import time.

-from typing import List, Dict, Any, Optional +from typing import List, Dict, Any, Optional, Tuple

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

import re

import logging

from typing import List, Dict, Any, Optional

import re

import logging

-from typing import List, Dict, Any, Optional

+from typing import List, Dict, Any, Optional, Tuple

🤖 Prompt for AI Agents

In server/python_nlp/action_item_extractor.py at the top import section (lines 1 to 4), the Tuple type is used in the return annotation of the function _extract_verb_object_with_nltk but Tuple is not imported from typing. Add Tuple to the import statement from typing alongside List, Dict, Any, and Optional to prevent a runtime NameError.

coderabbitai · 2025-06-14T08:05:40Z

server/python_nlp/action_item_extractor.py

+        self.action_keywords_regex = re.compile(
+            r'\b(please|task:|action:|need to|required to|must|should|can you|could you|will you)\b',
+            re.IGNORECASE
+        )
+        # Regex for simple due date patterns
+        # This is a basic version and can be expanded significantly
+        self.due_date_regex = re.compile(
+            r'\b(by (next )?(monday|tuesday|wednesday|thursday|friday|saturday|sunday|tomorrow|end of day|eod)|'
+            r'on \d{1,2}(st|nd|rd|th)? (jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)(\w*)?(\s\d{4})?|'
+            r'in \d+ (days?|weeks?|months?)|'
+            r'next (week|month|year))\b',
+            re.IGNORECASE
+        )


⚠️ Potential issue

Keyword pattern too narrow – test suite fails on “Action required:”

action_keywords_regex lacks action required: and the plain task form; unit-tests (tests/test_action_item_extractor.py) expect those to be recognised.

- r'\b(please|task:|action:|need to|required to|must|should|can you|could you|will you)\b', + r'\b(please|task:?|action:?|action required:|need to|required to|must|should|can you|could you|will you)\b',

(Added optional colon for task / action, plus explicit action required:).
Update tests accordingly if you choose a different wording.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

self.action_keywords_regex = re.compile(

r'\b(please|task:|action:|need to|required to|must|should|can you|could you|will you)\b',

re.IGNORECASE

)

# Regex for simple due date patterns

# This is a basic version and can be expanded significantly

self.due_date_regex = re.compile(

r'\b(by (next )?(monday|tuesday|wednesday|thursday|friday|saturday|sunday|tomorrow|end of day|eod)|'

r'on \d{1,2}(st|nd|rd|th)? (jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)(\w*)?(\s\d{4})?|'

r'in \d+ (days?|weeks?|months?)|'

r'next (week|month|year))\b',

re.IGNORECASE

)

self.action_keywords_regex = re.compile(

r'\b(please|task:?|action:?|action required:|need to|required to|must|should|can you|could you|will you)\b',

re.IGNORECASE

)

# Regex for simple due date patterns

# This is a basic version and can be expanded significantly

self.due_date_regex = re.compile(

r'\b(by (next )?(monday|tuesday|wednesday|thursday|friday|saturday|sunday|tomorrow|end of day|eod)|'

r'on \d{1,2}(st|nd|rd|th)? (jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)(\w*)?(\s\d{4})?|'

r'in \d+ (days?|weeks?|months?)|'

r'next (week|month|year))\b',

re.IGNORECASE

)

🤖 Prompt for AI Agents

In server/python_nlp/action_item_extractor.py around lines 32 to 44, the action_keywords_regex pattern is too narrow and misses matching "action required:" and the plain "task" keyword without a colon, causing test failures. Update the regex to include optional colons after "task" and "action" and explicitly add "action required:" as a keyword. After updating the regex, ensure the unit tests in tests/test_action_item_extractor.py are aligned with these changes.

coderabbitai · 2025-06-14T08:05:40Z

tests/test_gmail_service_integration.py

+import unittest
+from unittest.mock import MagicMock, AsyncMock # AsyncMock for async methods
+
+from server.python_nlp.gmail_service import GmailAIService
+# Assuming AdvancedAIEngine and AIAnalysisResult are importable for type hinting or mocking structure
+# from server.python_backend.ai_engine import AdvancedAIEngine, AIAnalysisResult
+from server.python_nlp.gmail_metadata import GmailMessage # For structuring metadata input
+


⚠️ Potential issue

Fix missing json import.

The json module is used at lines 122 and 150 but not imported.

Add the missing import at the top of the file:

import unittest from unittest.mock import MagicMock, AsyncMock # AsyncMock for async methods +import json from server.python_nlp.gmail_service import GmailAIService

Also applies to: 122-122, 150-150

🤖 Prompt for AI Agents

In tests/test_gmail_service_integration.py at the top of the file (lines 1 to 8), the json module is used later in the file around lines 122 and 150 but is not imported. Add an import statement for the json module alongside the existing imports to fix the missing import error.

coderabbitai · 2025-06-14T08:05:41Z

server/python_nlp/nlp_engine.py

    def _build_final_analysis_response(self, sentiment_analysis, topic_analysis, intent_analysis, urgency_analysis,
-                                     categories, keywords, risk_analysis_flags) -> Dict[str, Any]:
+                                     categories, keywords, risk_analysis_flags, action_items) -> Dict[str, Any]:
        """Helper function to consolidate analysis results and build the final response dictionary."""



🛠️ Refactor suggestion

Method signature is getting unwieldy

_build_final_analysis_response now takes 9 positional parameters (plus self) and exceeds the 5-arg guideline flagged by pylint (R0913/R0917). Pass a single dataclass or dict instead to improve readability and future maintainability.

🧰 Tools

🪛 Pylint (3.3.7)

[refactor] 1006-1006: Too many arguments (9/5)

(R0913)

[refactor] 1006-1006: Too many positional arguments (9/5)

(R0917)

[refactor] 1006-1006: Too many local variables (20/15)

(R0914)

🤖 Prompt for AI Agents

In server/python_nlp/nlp_engine.py around lines 1006 to 1009, the method _build_final_analysis_response has too many positional parameters, making it hard to read and maintain. Refactor this method to accept a single dataclass or dictionary that encapsulates all these parameters instead of passing them individually. Define a suitable dataclass or use a dict to group sentiment_analysis, topic_analysis, intent_analysis, urgency_analysis, categories, keywords, risk_analysis_flags, and action_items, then update the method signature and its internal usage accordingly.

coderabbitai · 2025-06-14T08:05:41Z

server/python_nlp/gmail_service.py

+    def __init__(self, rate_config: Optional[RateLimitConfig] = None, advanced_ai_engine: Optional[Any] = None): # Allow passing AdvancedAIEngine
        self.collector = GmailDataCollector(rate_config)
        self.metadata_extractor = GmailMetadataExtractor()
        self.data_strategy = DataCollectionStrategy()
-        self.model_trainer = ModelTrainer()
-        self.prompt_engineer = PromptEngineer()
-        # Changed for more specific logger name
+        # self.model_trainer = ModelTrainer() # Potentially remove if not used directly
+        # self.prompt_engineer = PromptEngineer() # Potentially remove
        self.logger = logging.getLogger(f"{__name__}.{self.__class__.__name__}")


⚠️ Potential issue

model_trainer / prompt_engineer now undefined – will crash

__init__ comments out creation of ModelTrainer and PromptEngineer, yet train_models_from_gmail_data() still calls self.model_trainer and self.prompt_engineer. Any invocation of training will raise AttributeError.

Add the attributes back or gate the training code behind a feature flag.

Also applies to: 397-423

🤖 Prompt for AI Agents

In server/python_nlp/gmail_service.py around lines 35 to 41 and also lines 397 to 423, the attributes model_trainer and prompt_engineer are commented out in the __init__ method but are still used in train_models_from_gmail_data(), causing AttributeError. To fix this, either uncomment and initialize self.model_trainer and self.prompt_engineer in __init__ or add a feature flag to conditionally create and use these attributes, ensuring any training code checks the flag before accessing them.

feat: Refactor NLP pipeline and add Action Item Extraction

MasumRab self-assigned this Jun 14, 2025

MasumRab added the enhancement New feature or request label Jun 14, 2025

sourcery-ai bot reviewed Jun 14, 2025

View reviewed changes

coderabbitai bot reviewed Jun 14, 2025

View reviewed changes

MasumRab merged commit 53baeec into main Jun 14, 2025
4 checks passed

MasumRab deleted the feature/email-ai-enhancements-p1 branch June 14, 2025 08:41

This was referenced Jun 16, 2025

Jules was unable to complete the task in time. Please review the work… #66

Merged

Jules was unable to complete the task in time. Please review the work… #67

Merged

Jules was unable to complete the task in time. Please review the work… #81

Merged

MasumRab added a commit that referenced this pull request Oct 29, 2025

Merge pull request #29 from MasumRab/feature/email-ai-enhancements-p1

9d32374

feat: Refactor NLP pipeline and add Action Item Extraction

coderabbitai bot mentioned this pull request Nov 2, 2025

feat: Add worktree documentation inheritance system #168

Closed

MasumRab added a commit that referenced this pull request Nov 6, 2025

Merge pull request #29 from MasumRab/feature/email-ai-enhancements-p1

352acc2

feat: Refactor NLP pipeline and add Action Item Extraction

coderabbitai bot mentioned this pull request Mar 16, 2026

🛡️ Sentinel: [CRITICAL/HIGH] Fix Missing authentication on sensitive endpoints #488

Open

		match = self.action_keywords_regex.search(sentence)
		if match:

	match = self.action_keywords_regex.search(sentence)
	if match:
	if match := self.action_keywords_regex.search(sentence):

-    text = re.sub(r"[^a-zA-Z0-9\s.,?!']", "", text) # Keep basic punctuation
-    text = re.sub(r"\s+", " ", text) # Normalize whitespace
-    return text
+_PUNCT_CLEAN_RE = re.compile(r"[^a-zA-Z0-9\s.,?!':]")
+_WHITESPACE_RE  = re.compile(r"\s+")
+def clean_text(text: str) -> str:
+    # Keep punctuation that is semantically relevant for downstream NLP
+    # (e.g. ':' needed for “task:” / “action:” keywords).
+    text = _PUNCT_CLEAN_RE.sub("", text)
+    text = _WHITESPACE_RE.sub(" ", text)
+    return text

Conversation

MasumRab commented Jun 14, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by Sourcery

Summary by CodeRabbit

Uh oh!

sourcery-ai bot commented Jun 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewer's Guide

Sequence Diagram for Refactored Email Sync and AI Analysis

Sequence Diagram for Action Item Extraction API (/api/actions/extract-from-text)

Sequence Diagram for NLPEngine Action Item Analysis

Entity Relationship Diagram for Action Items and Analysis Data

Class Diagram: GmailAIService Changes

Class Diagram: NLPEngine Changes

Class Diagram: AdvancedAIEngine and AIAnalysisResult Changes

Class Diagram: New ActionItemExtractor and Pydantic Models

Class Diagram: DataCollectionStrategy Changes

Class Diagram: New text_utils Module

Class Diagram: ModelTrainer Changes

File-Level Changes

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

coderabbitai bot commented Jun 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Sequence Diagram(s)

Poem

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

Uh oh!

MasumRab commented Jun 14, 2025

Uh oh!

coderabbitai bot commented Jun 14, 2025

Uh oh!

MasumRab commented Jun 14, 2025

Uh oh!

coderabbitai bot commented Jun 14, 2025

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Jun 14, 2025

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Jun 14, 2025

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Jun 14, 2025

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Jun 14, 2025

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Jun 14, 2025

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Jun 14, 2025

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Jun 14, 2025

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Jun 14, 2025

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Jun 14, 2025

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Jun 14, 2025

Choose a reason for hiding this comment

Uh oh!

MasumRab commented Jun 14, 2025 •

edited by coderabbitai bot

Loading

sourcery-ai bot commented Jun 14, 2025 •

edited

Loading

coderabbitai bot commented Jun 14, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)