feat: Refactor NLP pipeline and add Action Item Extraction#29
Conversation
This commit introduces significant enhancements to the Gmail AI Email Management system. Refactoring: - Centralized NLP analysis logic within AdvancedAIEngine and NLPEngine, removing redundancy from GmailAIService and DataCollectionStrategy. - Streamlined fallback mechanisms in the NLP pipeline for more robust error handling. - Standardized basic text preprocessing with a new `text_utils.py` module. - Simplified complex functions, particularly `GmailAIService.sync_gmail_emails`, by breaking them into smaller, more focused methods. - Improved error handling and logging across the NLP components for better traceability. New Feature: Action Item Extraction - Added a new `ActionItemExtractor` module (`server/python_nlp/action_item_extractor.py`) using rule-based methods and optional NLTK POS tagging to identify and extract action items from email text. - Integrated action item extraction into the `NLPEngine` and `AdvancedAIEngine`. - Action items are now included in the `analysisMetadata` stored for emails. - Exposed this functionality via a new API endpoint: `POST /api/actions/extract-from-text`. Unit Tests: - Added comprehensive unit tests for the `ActionItemExtractor`. - Included integration tests for `NLPEngine`, `AdvancedAIEngine`, and `GmailAIService` to ensure correct handling of action items. - Added API tests for the new `/api/actions/extract-from-text` endpoint. Additional Feature Ideas Proposed: 1. Automated Email Summarization: To provide concise summaries of email content. 2. Smart Reply Suggestions: To offer context-aware reply options based on NLP analysis.
Reviewer's GuideThis PR refactors the core NLP pipeline by centralizing analysis logic in AdvancedAIEngine and NLPEngine, breaking down GmailAIService.sync_gmail_emails into smaller helpers, standardizing text preprocessing via text_utils, and improving error handling and logging; it also introduces a rule-based ActionItemExtractor integrated end-to-end (NLPEngine, AdvancedAIEngine, GmailAIService, and DB payloads) and exposed through a new FastAPI endpoint, with comprehensive unit and integration tests ensuring correct action-item extraction and fallback behaviors. Sequence Diagram for Refactored Email Sync and AI AnalysissequenceDiagram
participant C as Caller
participant GAS as GmailAIService
participant GDC as GmailDataCollector
participant GME as GmailMetadataExtractor
participant AAE as AdvancedAIEngine
C->>GAS: sync_gmail_emails(query, max, include_ai)
activate GAS
GAS->>GAS: _fetch_emails_from_gmail(query, max)
GAS->>GDC: collect_emails_incremental(query, max)
activate GDC
GDC-->>GAS: email_batch
deactivate GDC
GAS->>GAS: _process_and_analyze_batch(email_batch, include_ai)
loop for each email in batch
GAS->>GME: extract_complete_metadata(gmail_msg)
activate GME
GME-->>GAS: gmail_metadata
deactivate GME
alt include_ai_analysis is true
GAS->>GAS: _perform_ai_analysis(email_data)
GAS->>AAE: analyze_email(subject, content)
activate AAE
AAE-->>GAS: ai_analysis_result (with action_items)
deactivate AAE
end
GAS->>GAS: _convert_to_db_format(gmail_metadata, ai_analysis_result)
GAS-->>GAS: db_email (metadata includes action_items)
end
GAS-->>C: sync_result (with processed_emails)
deactivate GAS
Sequence Diagram for Action Item Extraction API (/api/actions/extract-from-text)sequenceDiagram
actor Client as APIClient
participant FastAPI as FastAPIApp
participant AAE as AdvancedAIEngine
participant NLP_Script as NLPEngine (script)
Client->>FastAPI: POST /api/actions/extract-from-text (subject, content)
activate FastAPI
FastAPI->>AAE: analyze_email(subject, content)
activate AAE
AAE->>NLP_Script: execute_async_command([... --analyze-email ...])
activate NLP_Script
NLP_Script-->>AAE: analysis_json_str (includes action_items)
deactivate NLP_Script
AAE->>AAE: Parse JSON, create AIAnalysisResult
AAE-->>FastAPI: ai_analysis_result (with action_items)
deactivate AAE
FastAPI-->>Client: List[ActionItem]
deactivate FastAPI
Sequence Diagram for NLPEngine Action Item AnalysissequenceDiagram
participant NLPE as NLPEngine
participant AIE as ActionItemExtractor
NLPE->>NLPE: analyze_email(subject, content)
activate NLPE
NLPE->>NLPE: _preprocess_text(full_text) // using text_utils.clean_text
NLPE->>NLPE: ... other analysis steps ...
NLPE->>NLPE: _analyze_action_items(full_text)
NLPE->>AIE: extract_actions(full_text)
activate AIE
AIE-->>NLPE: extracted_actions_list
deactivate AIE
NLPE->>NLPE: _build_final_analysis_response(..., action_items=extracted_actions_list)
NLPE-->>NLPE: final_analysis_result (with action_items)
deactivate NLPE
Entity Relationship Diagram for Action Items and Analysis DataerDiagram
AIAnalysisResult {
string topic
string sentiment
string intent
string urgency
float confidence
string_array categories
string_array keywords
string reasoning
string_array suggested_labels
string_array risk_flags
string category_id
ActionItem_array action_items "New: list of action items"
}
ActionItem {
string action_phrase "New"
string verb "New, Optional"
string object "New, Optional"
string raw_due_date_text "New, Optional"
string context "New"
}
EmailDBRecord {
string messageId
string analysisMetadata "JSON string, now includes action_items"
string other_fields
}
AIAnalysisResult ||--o{ ActionItem : "has"
EmailDBRecord ||--|| AIAnalysisResult : "embeds (via analysisMetadata JSON)"
Class Diagram: GmailAIService ChangesclassDiagram
class GmailAIService {
+advanced_ai_engine: AdvancedAIEngine
-model_trainer: ModelTrainer // Removed / Not directly used
-prompt_engineer: PromptEngineer // Removed / Not directly used
+__init__(rate_config: Optional[RateLimitConfig], advanced_ai_engine: Optional[Any])
+sync_gmail_emails(query_filter: str, max_emails: int, include_ai_analysis: bool) List[Dict]
#_fetch_emails_from_gmail(query_filter: str, max_emails: int) Optional[EmailBatch]
#_process_and_analyze_batch(email_batch: EmailBatch, include_ai_analysis: bool) List[Dict]
#_perform_ai_analysis(email_data: Dict) Optional[Dict]
#_convert_to_db_format(gmail_metadata: GmailMessage, ai_analysis_result: Optional[Dict]) Dict
#_get_basic_fallback_analysis_structure(error_message: str) Dict
}
GmailAIService ..> AdvancedAIEngine : uses
Class Diagram: NLPEngine ChangesclassDiagram
class NLPEngine {
+action_item_extractor: ActionItemExtractor
+__init__()
#_preprocess_text(text: str) str
#_get_fallback_analysis(error_msg: str) Dict
#_get_simple_fallback_analysis(subject: str, content: str) Dict
#_analyze_action_items(text: str) List[Dict]
+analyze_email(subject: str, content: str) Dict
#_build_final_analysis_response(sentiment_analysis, topic_analysis, intent_analysis, urgency_analysis, categories, keywords, risk_analysis_flags, action_items: List[Dict]) Dict
}
NLPEngine ..> ActionItemExtractor : uses
NLPEngine ..> text_utils : uses clean_text()
Class Diagram: AdvancedAIEngine and AIAnalysisResult ChangesclassDiagram
class AdvancedAIEngine {
+fallback_nlp_engine: NLPEngine
+async initialize()
+async analyze_email(subject: str, content: str) AIAnalysisResult
#_get_fallback_analysis(subject: str, content: str, error_context: Optional[str]) AIAnalysisResult
}
class AIAnalysisResult {
+action_items: List[Dict]
+topic: str
+sentiment: str
+intent: str
+urgency: str
+confidence: float
+categories: List[str]
+keywords: List[str]
+reasoning: str
+suggested_labels: List[str]
+risk_flags: List[str]
+category_id: str
+to_dict() Dict
}
AdvancedAIEngine ..> NLPEngine : uses (for fallback_nlp_engine)
AdvancedAIEngine ..> AIAnalysisResult : creates
Class Diagram: New ActionItemExtractor and Pydantic ModelsclassDiagram
class ActionItemExtractor {
+action_keywords_regex: Pattern
+due_date_regex: Pattern
+sentence_splitter_regex: Pattern
+__init__()
#_extract_verb_object_with_nltk(text: str) Tuple[Optional[str], Optional[str]]
+extract_actions(text: str) List[Dict[str, Any]]
}
class ActionExtractionRequest {
<<PydanticModel>>
+subject: Optional[str]
+content: str
}
class ActionItem {
<<PydanticModel>>
+action_phrase: str
+verb: Optional[str]
+object: Optional[str]
+raw_due_date_text: Optional[str]
+context: str
}
%% FastAPI endpoint uses ActionExtractionRequest and ActionItem
%% NLPEngine uses ActionItemExtractor
Class Diagram: DataCollectionStrategy ChangesclassDiagram
class DataCollectionStrategy {
+annotate_email(email: EmailSample, annotator_id: str, external_analysis_results: Optional[Dict]) AnnotationSchema
#_predict_topic(text: str, is_fallback: bool) str
#_predict_sentiment(text: str, is_fallback: bool) str
#_predict_intent(text: str, is_fallback: bool) str
#_predict_urgency(text: str, is_fallback: bool) str
}
Class Diagram: New text_utils ModuleclassDiagram
class text_utils {
<<Module/Utility>>
+static clean_text(text: str) str
}
Class Diagram: ModelTrainer ChangesclassDiagram
class ModelTrainer {
+feature_extractor: FeatureExtractor
+prepare_training_data(samples: List[Dict], target_field: str) Tuple[List, List]
}
ModelTrainer ..> text_utils : uses clean_text()
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
|
Warning Rate limit exceeded@MasumRab has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 2 minutes and 46 seconds before requesting another review. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📒 Files selected for processing (12)
WalkthroughThis update introduces action item extraction to the email AI analysis pipeline. It adds a new action item extractor module, integrates action item extraction into both the main NLP and advanced AI engines, and exposes a FastAPI endpoint for extracting action items from text. The Gmail AI service is refactored to use an external AI engine, and fallback mechanisms are improved. Comprehensive unit and integration tests are included for action extraction and its API. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant FastAPI
participant AIEngine
participant NLPExtractor
Client->>FastAPI: POST /api/actions/extract-from-text (subject, content)
FastAPI->>AIEngine: analyze_email(subject, content)
AIEngine->>NLPExtractor: extract_actions(content)
NLPExtractor-->>AIEngine: List of action items
AIEngine-->>FastAPI: AIAnalysisResult (includes action_items)
FastAPI-->>Client: List of ActionItem (as JSON)
sequenceDiagram
participant GmailService
participant AdvancedAIEngine
participant NLPExtractor
GmailService->>AdvancedAIEngine: analyze_email(subject, content)
AdvancedAIEngine->>NLPExtractor: extract_actions(content)
NLPExtractor-->>AdvancedAIEngine: action_items
AdvancedAIEngine-->>GmailService: AIAnalysisResult (with action_items)
GmailService->>DB: Store email metadata + AI analysis (including action_items)
Poem
✨ Finishing Touches
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
|
@coderabbitai summary |
✅ Actions performedSummary regeneration triggered. |
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
There was a problem hiding this comment.
Hey @MasumRab - I've reviewed your changes and they look great!
Prompt for AI Agents
Please address the comments from this code review:
## Individual Comments
### Comment 1
<location> `server/python_nlp/gmail_service.py:22` </location>
<code_context>
+# if all AI analysis is delegated to AdvancedAIEngine.
+# from .ai_training import ModelTrainer, PromptEngineer, ModelConfig
+
+# Import AdvancedAIEngine (assuming it's now the primary way to get AI analysis)
+# Adjust the import path as necessary based on your project structure.
+# This assumes AdvancedAIEngine is in a module that can be imported.
</code_context>
<issue_to_address>
Cleanup placeholder commented imports
Remove unused commented imports and placeholder comments to keep the code clean. Finalize or delete the AdvancedAIEngine import as appropriate.
</issue_to_address>
<suggested_fix>
<<<<<<< SEARCH
# AI Training and PromptEngineer might not be directly used by GmailAIService after refactoring
# if all AI analysis is delegated to AdvancedAIEngine.
# from .ai_training import ModelTrainer, PromptEngineer, ModelConfig
# Import AdvancedAIEngine (assuming it's now the primary way to get AI analysis)
# Adjust the import path as necessary based on your project structure.
# This assumes AdvancedAIEngine is in a module that can be imported.
# If it's in python_backend, the path needs to be correct.
# For now, let's assume a placeholder for where AdvancedAIEngine would be imported from.
# from server.python_backend.ai_engine import AdvancedAIEngine
# Placeholder: If AdvancedAIEngine is not easily importable due to circular deps or structure,
# this part of the refactoring (direct usage of AdvancedAIEngine object) might need adjustment.
# For this step, we'll mock its presence or assume it's passed in.
class GmailAIService:
=======
class GmailAIService:
>>>>>>> REPLACE
</suggested_fix>
### Comment 2
<location> `server/python_nlp/gmail_service.py:35` </location>
<code_context>
"""Complete Gmail integration with AI processing and metadata extraction"""
- def __init__(self, rate_config: Optional[RateLimitConfig] = None):
+ def __init__(self, rate_config: Optional[RateLimitConfig] = None, advanced_ai_engine: Optional[Any] = None): # Allow passing AdvancedAIEngine
self.collector = GmailDataCollector(rate_config)
self.metadata_extractor = GmailMetadataExtractor()
</code_context>
<issue_to_address>
Use explicit type for advanced_ai_engine
Use the specific AdvancedAIEngine type or a protocol instead of Optional[Any] for better type safety and IDE support. Consider requiring this parameter or providing a default value.
Suggested implementation:
```python
from typing import Optional
from server.python_backend.ai_engine import AdvancedAIEngine # Use the actual import path
class GmailAIService:
"""Complete Gmail integration with AI processing and metadata extraction"""
def __init__(
self,
rate_config: Optional[RateLimitConfig] = None,
advanced_ai_engine: Optional[AdvancedAIEngine] = None # Use explicit type for better type safety
):
self.collector = GmailDataCollector(rate_config)
self.metadata_extractor = GmailMetadataExtractor()
self.data_strategy = DataCollectionStrategy()
# self.model_trainer = ModelTrainer() # Potentially remove if not used directly
# self.prompt_engineer = PromptEngineer() # Potentially remove
self.logger = logging.getLogger(f"{__name__}.{self.__class__.__name__}")
# This is key for the refactoring: GmailAIService will use AdvancedAIEngine
```
- If `AdvancedAIEngine` cannot be imported directly due to circular dependencies, consider defining a Protocol in a shared location and using that as the type instead.
- If you want to require `advanced_ai_engine`, remove the `Optional` and the default value (`= None`).
- Make sure to update any code that instantiates `GmailAIService` to pass an `AdvancedAIEngine` instance as needed.
</issue_to_address>
### Comment 3
<location> `server/python_nlp/nlp_engine.py:952` </location>
<code_context>
# Basic preprocessing (primarily for non-model based methods or as initial step)
# Model pipelines should ideally handle their own specific preprocessing.
+ logger.info("Preprocessing email text...")
cleaned_text = self._preprocess_text(full_text)
+ logger.info("Email text preprocessed successfully.")
</code_context>
<issue_to_address>
Verbose info-level logs with potential PII exposure
Consider changing these logs to `debug` level and avoid logging sensitive message content to reduce PII exposure.
</issue_to_address>
<suggested_fix>
<<<<<<< SEARCH
logger.info("Preprocessing email text...")
cleaned_text = self._preprocess_text(full_text)
logger.info("Email text preprocessed successfully.")
=======
logger.debug("Preprocessing email text...")
cleaned_text = self._preprocess_text(full_text)
logger.debug("Email text preprocessed successfully.")
>>>>>>> REPLACE
</suggested_fix>
### Comment 4
<location> `server/python_backend/ai_engine.py:83` </location>
<code_context>
]
- result = await _execute_async_command(cmd, cwd=self.python_nlp_path)
+ logger.debug(f"Executing NLPEngine script with command: {' '.join(cmd)}")
+ result_json_str = await _execute_async_command(cmd, cwd=self.python_nlp_path)
</code_context>
<issue_to_address>
Sensitive content in debug logs
Redact or truncate user-provided parameters in logs to prevent exposure of sensitive data.
</issue_to_address>
<suggested_fix>
<<<<<<< SEARCH
logger.debug(f"Executing NLPEngine script with command: {' '.join(cmd)}")
=======
# Redact or truncate user-provided parameters in logs to prevent exposure of sensitive data
redacted_cmd = [
arg if i == 0 else ("<REDACTED>" if arg not in ('--output-format', 'json') else arg)
for i, arg in enumerate(cmd)
]
logger.debug(f"Executing NLPEngine script with command: {' '.join(redacted_cmd)}")
>>>>>>> REPLACE
</suggested_fix>
### Comment 5
<location> `server/python_nlp/text_utils.py:13` </location>
<code_context>
+ text = text.lower()
+ text = text.strip()
+ # Normalize punctuation by removing or replacing unwanted characters
+ text = re.sub(r"[^a-zA-Z0-9\s.,?!']", "", text) # Keep basic punctuation
+ text = re.sub(r"\s+", " ", text) # Normalize whitespace
+ return text
</code_context>
<issue_to_address>
Cleaning regex may strip useful punctuation
Hyphens and colons may be important for dates or context. Consider allowing these characters or making the regex customizable.
</issue_to_address>
### Comment 6
<location> `tests/test_action_item_extractor.py:5` </location>
<code_context>
+from unittest.mock import patch
+from server.python_nlp.action_item_extractor import ActionItemExtractor, HAS_NLTK
+
+class TestActionItemExtractor(unittest.TestCase):
+
+ def setUp(self):
</code_context>
<issue_to_address>
Expand keyword coverage for action item extraction tests.
Please add tests for keywords such as "must", "will you", and "required to" to fully cover the regex logic in `action_keywords_regex`.
</issue_to_address>
### Comment 7
<location> `tests/test_action_item_extractor.py:76` </location>
<code_context>
+ self.assertEqual(actions[1]['verb'], "send")
+
+
+ def test_extract_actions_simple_due_date_tomorrow(self):
+ text = "Submit the expenses by tomorrow."
+ actions = self.extractor.extract_actions(text)
+ self.assertEqual(len(actions), 1) # Assuming "Submit" isn't a keyword, but "by tomorrow" is part of sentence
+ # The current extractor triggers on keywords like "please", "need to" etc.
+ # Let's adjust the text to include a keyword.
+ text_with_keyword = "You should Submit the expenses by tomorrow."
+ actions = self.extractor.extract_actions(text_with_keyword)
+ self.assertEqual(len(actions), 1)
+ self.assertEqual(actions[0]['raw_due_date_text'], "by tomorrow")
+
+ def test_extract_actions_due_date_on_monday(self):
+ text = "We need to finish this on Monday."
+ actions = self.extractor.extract_actions(text)
</code_context>
<issue_to_address>
Increase test coverage for diverse due date patterns.
Please add tests for additional due date patterns, such as "by next Monday", "end of day"/"eod", "on 1st Jan 2023", "in 2 days"/"in 3 weeks", and "next month"/"next year" to further validate the regex.
</issue_to_address>
<suggested_fix>
<<<<<<< SEARCH
def test_extract_actions_due_date_on_monday(self):
text = "We need to finish this on Monday."
actions = self.extractor.extract_actions(text)
=======
def test_extract_actions_due_date_on_monday(self):
text = "We need to finish this on Monday."
actions = self.extractor.extract_actions(text)
def test_extract_actions_due_date_by_next_monday(self):
text = "Please submit the report by next Monday."
actions = self.extractor.extract_actions(text)
self.assertEqual(len(actions), 1)
self.assertEqual(actions[0]['raw_due_date_text'], "by next Monday")
def test_extract_actions_due_date_eod(self):
text = "Can you send the file by end of day?"
actions = self.extractor.extract_actions(text)
self.assertEqual(len(actions), 1)
self.assertIn(actions[0]['raw_due_date_text'].lower(), ["by end of day", "by eod"])
text_eod = "Please finish this by EOD."
actions_eod = self.extractor.extract_actions(text_eod)
self.assertEqual(len(actions_eod), 1)
self.assertEqual(actions_eod[0]['raw_due_date_text'], "by EOD")
def test_extract_actions_due_date_on_specific_date(self):
text = "You need to complete the task on 1st Jan 2023."
actions = self.extractor.extract_actions(text)
self.assertEqual(len(actions), 1)
self.assertEqual(actions[0]['raw_due_date_text'], "on 1st Jan 2023")
def test_extract_actions_due_date_in_days_weeks(self):
text_days = "Please send the update in 2 days."
actions_days = self.extractor.extract_actions(text_days)
self.assertEqual(len(actions_days), 1)
self.assertEqual(actions_days[0]['raw_due_date_text'], "in 2 days")
text_weeks = "You should finish the project in 3 weeks."
actions_weeks = self.extractor.extract_actions(text_weeks)
self.assertEqual(len(actions_weeks), 1)
self.assertEqual(actions_weeks[0]['raw_due_date_text'], "in 3 weeks")
def test_extract_actions_due_date_next_month_year(self):
text_month = "Let's finalize the plan next month."
actions_month = self.extractor.extract_actions(text_month)
self.assertEqual(len(actions_month), 1)
self.assertEqual(actions_month[0]['raw_due_date_text'], "next month")
text_year = "We need to launch the product next year."
actions_year = self.extractor.extract_actions(text_year)
self.assertEqual(len(actions_year), 1)
self.assertEqual(actions_year[0]['raw_due_date_text'], "next year")
>>>>>>> REPLACE
</suggested_fix>
### Comment 8
<location> `tests/test_action_item_extractor.py:22` </location>
<code_context>
+ self.assertIsNotNone(action['verb']) # NLTK should find 'review'
+ # self.assertEqual(action['object'], "document") # Object extraction can be tricky
+
+ def test_extract_actions_keyword_task(self):
+ text = "Task: John to complete the slides by tomorrow."
+ actions = self.extractor.extract_actions(text)
</code_context>
<issue_to_address>
Add explicit test for keyword case-insensitivity.
Consider adding a test with an uppercase or mixed-case keyword (e.g., "TASK:") to explicitly verify case-insensitivity.
Suggested implementation:
```python
def test_extract_actions_keyword_task(self):
```
```python
def test_extract_actions_keyword_task(self):
text = "Task: John to complete the slides by tomorrow."
actions = self.extractor.extract_actions(text)
self.assertEqual(len(actions), 1)
action = actions[0]
self.assertIn("complete the slides", action['action_phrase'])
self.assertIn("tomorrow", action['raw_due_date_text'])
def test_extract_actions_keyword_task_case_insensitive(self):
# Explicitly test case-insensitivity of the keyword
text = "TASK: John to update the budget by next week."
actions = self.extractor.extract_actions(text)
self.assertEqual(len(actions), 1)
action = actions[0]
self.assertIn("update the budget", action['action_phrase'])
self.assertIn("next week", action['raw_due_date_text'])
```
</issue_to_address>
### Comment 9
<location> `tests/test_api_actions.py:17` </location>
<code_context>
+ self.client = TestClient(app)
+
+ @patch('server.python_backend.main.ai_engine.analyze_email', new_callable=AsyncMock)
+ def test_extract_actions_from_text_success(self, mock_analyze_email):
+ # Mock the return value of ai_engine.analyze_email
+ mock_action_item_data = [
</code_context>
<issue_to_address>
Consider testing API with optional 'subject' field omitted or null.
Add a test case where 'subject' is omitted or set to null to verify the endpoint handles these scenarios as expected.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| # AI Training and PromptEngineer might not be directly used by GmailAIService after refactoring | ||
| # if all AI analysis is delegated to AdvancedAIEngine. | ||
| # from .ai_training import ModelTrainer, PromptEngineer, ModelConfig | ||
|
|
||
| # Import AdvancedAIEngine (assuming it's now the primary way to get AI analysis) | ||
| # Adjust the import path as necessary based on your project structure. | ||
| # This assumes AdvancedAIEngine is in a module that can be imported. | ||
| # If it's in python_backend, the path needs to be correct. | ||
| # For now, let's assume a placeholder for where AdvancedAIEngine would be imported from. | ||
| # from server.python_backend.ai_engine import AdvancedAIEngine | ||
| # Placeholder: If AdvancedAIEngine is not easily importable due to circular deps or structure, | ||
| # this part of the refactoring (direct usage of AdvancedAIEngine object) might need adjustment. | ||
| # For this step, we'll mock its presence or assume it's passed in. | ||
|
|
||
| class GmailAIService: |
There was a problem hiding this comment.
suggestion: Cleanup placeholder commented imports
Remove unused commented imports and placeholder comments to keep the code clean. Finalize or delete the AdvancedAIEngine import as appropriate.
| # AI Training and PromptEngineer might not be directly used by GmailAIService after refactoring | |
| # if all AI analysis is delegated to AdvancedAIEngine. | |
| # from .ai_training import ModelTrainer, PromptEngineer, ModelConfig | |
| # Import AdvancedAIEngine (assuming it's now the primary way to get AI analysis) | |
| # Adjust the import path as necessary based on your project structure. | |
| # This assumes AdvancedAIEngine is in a module that can be imported. | |
| # If it's in python_backend, the path needs to be correct. | |
| # For now, let's assume a placeholder for where AdvancedAIEngine would be imported from. | |
| # from server.python_backend.ai_engine import AdvancedAIEngine | |
| # Placeholder: If AdvancedAIEngine is not easily importable due to circular deps or structure, | |
| # this part of the refactoring (direct usage of AdvancedAIEngine object) might need adjustment. | |
| # For this step, we'll mock its presence or assume it's passed in. | |
| class GmailAIService: | |
| class GmailAIService: |
| """Complete Gmail integration with AI processing and metadata extraction""" | ||
|
|
||
| def __init__(self, rate_config: Optional[RateLimitConfig] = None): | ||
| def __init__(self, rate_config: Optional[RateLimitConfig] = None, advanced_ai_engine: Optional[Any] = None): # Allow passing AdvancedAIEngine |
There was a problem hiding this comment.
suggestion: Use explicit type for advanced_ai_engine
Use the specific AdvancedAIEngine type or a protocol instead of Optional[Any] for better type safety and IDE support. Consider requiring this parameter or providing a default value.
Suggested implementation:
from typing import Optional
from server.python_backend.ai_engine import AdvancedAIEngine # Use the actual import path
class GmailAIService:
"""Complete Gmail integration with AI processing and metadata extraction"""
def __init__(
self,
rate_config: Optional[RateLimitConfig] = None,
advanced_ai_engine: Optional[AdvancedAIEngine] = None # Use explicit type for better type safety
):
self.collector = GmailDataCollector(rate_config)
self.metadata_extractor = GmailMetadataExtractor()
self.data_strategy = DataCollectionStrategy()
# self.model_trainer = ModelTrainer() # Potentially remove if not used directly
# self.prompt_engineer = PromptEngineer() # Potentially remove
self.logger = logging.getLogger(f"{__name__}.{self.__class__.__name__}")
# This is key for the refactoring: GmailAIService will use AdvancedAIEngine- If
AdvancedAIEnginecannot be imported directly due to circular dependencies, consider defining a Protocol in a shared location and using that as the type instead. - If you want to require
advanced_ai_engine, remove theOptionaland the default value (= None). - Make sure to update any code that instantiates
GmailAIServiceto pass anAdvancedAIEngineinstance as needed.
| logger.info("Preprocessing email text...") | ||
| cleaned_text = self._preprocess_text(full_text) | ||
| logger.info("Email text preprocessed successfully.") |
There was a problem hiding this comment.
🚨 suggestion (security): Verbose info-level logs with potential PII exposure
Consider changing these logs to debug level and avoid logging sensitive message content to reduce PII exposure.
| logger.info("Preprocessing email text...") | |
| cleaned_text = self._preprocess_text(full_text) | |
| logger.info("Email text preprocessed successfully.") | |
| logger.debug("Preprocessing email text...") | |
| cleaned_text = self._preprocess_text(full_text) | |
| logger.debug("Email text preprocessed successfully.") |
| ] | ||
|
|
||
| result = await _execute_async_command(cmd, cwd=self.python_nlp_path) | ||
| logger.debug(f"Executing NLPEngine script with command: {' '.join(cmd)}") |
There was a problem hiding this comment.
🚨 suggestion (security): Sensitive content in debug logs
Redact or truncate user-provided parameters in logs to prevent exposure of sensitive data.
| logger.debug(f"Executing NLPEngine script with command: {' '.join(cmd)}") | |
| # Redact or truncate user-provided parameters in logs to prevent exposure of sensitive data | |
| redacted_cmd = [ | |
| arg if i == 0 else ("<REDACTED>" if arg not in ('--output-format', 'json') else arg) | |
| for i, arg in enumerate(cmd) | |
| ] | |
| logger.debug(f"Executing NLPEngine script with command: {' '.join(redacted_cmd)}") |
| text = text.lower() | ||
| text = text.strip() | ||
| # Normalize punctuation by removing or replacing unwanted characters | ||
| text = re.sub(r"[^a-zA-Z0-9\s.,?!']", "", text) # Keep basic punctuation |
There was a problem hiding this comment.
suggestion: Cleaning regex may strip useful punctuation
Hyphens and colons may be important for dates or context. Consider allowing these characters or making the regex customizable.
| match = self.action_keywords_regex.search(sentence) | ||
| if match: |
There was a problem hiding this comment.
suggestion (code-quality): Use named expression to simplify assignment and conditional (use-named-expression)
| match = self.action_keywords_regex.search(sentence) | |
| if match: | |
| if match := self.action_keywords_regex.search(sentence): |
| topic = external_analysis_results.get('topic', self._predict_topic(email.content + " " + email.subject, is_fallback=True)) | ||
| sentiment = external_analysis_results.get('sentiment', self._predict_sentiment(email.content + " " + email.subject, is_fallback=True)) | ||
| intent = external_analysis_results.get('intent', self._predict_intent(email.content + " " + email.subject, is_fallback=True)) |
There was a problem hiding this comment.
issue (code-quality): Use f-string instead of string concatenation [×5] (use-fstring-for-concatenation)
| } | ||
|
|
||
| def _convert_to_db_format(self, gmail_metadata: GmailMessage, ai_analysis_result: Optional[Dict[str, Any]]) -> Dict[str, Any]: | ||
| """Convert Gmail metadata and AI analysis (if available) to database format.""" |
There was a problem hiding this comment.
issue (code-quality): We've found these issues:
- Inline variable that is immediately returned (
inline-immediately-returned-variable) - Use f-string instead of string concatenation (
use-fstring-for-concatenation) - Merge dictionary updates via the union operator (
dict-assign-update-to-union)
| text = "Submit the expenses by tomorrow." | ||
| actions = self.extractor.extract_actions(text) | ||
| self.assertEqual(len(actions), 1) # Assuming "Submit" isn't a keyword, but "by tomorrow" is part of sentence | ||
| # The current extractor triggers on keywords like "please", "need to" etc. | ||
| # Let's adjust the text to include a keyword. | ||
| text_with_keyword = "You should Submit the expenses by tomorrow." | ||
| actions = self.extractor.extract_actions(text_with_keyword) | ||
| self.assertEqual(len(actions), 1) |
There was a problem hiding this comment.
issue (code-quality): Extract duplicate code into method (extract-duplicate-method)
| action_item = analysis['action_items'][0] | ||
| self.assertIn('action_phrase', action_item) | ||
| self.assertIn('verb', action_item) | ||
| self.assertIn('object', action_item) | ||
| self.assertIn('raw_due_date_text', action_item) | ||
| self.assertIn('context', action_item) | ||
|
|
||
| # Check if one of the expected actions is present | ||
| phrases = [item['action_phrase'] for item in analysis['action_items']] | ||
| self.assertTrue(any("Please complete the task by Monday." in phrase for phrase in phrases)) | ||
| self.assertTrue(any("need to also review the report." in phrase for phrase in phrases)) |
There was a problem hiding this comment.
issue (code-quality): Extract code out into method (extract-method)
There was a problem hiding this comment.
Actionable comments posted: 7
♻️ Duplicate comments (1)
tests/test_action_item_extractor.py (1)
34-41: Test expectation will fail until regex supports “Action required:”The test correctly asserts one action item is found, but the extractor misses this phrase with the current
action_keywords_regex. Once the regex is broadened (see extractor comment), re-run the suite to confirm it passes.
🧹 Nitpick comments (7)
server/python_nlp/action_item_extractor.py (1)
52-60: Guard return path but avoid silent failureThe blanket
except Exceptionin POS-tagging swallows all errors, returning(None, None)and only logging. That hides genuine bugs (e.g., malformed tokens). CatchLookupErrorandValueErrorinstead, or re-raise after logging.🧰 Tools
🪛 Ruff (0.11.9)
52-52: Undefined name
Tuple(F821)
🪛 Pylint (3.3.7)
[error] 52-52: Undefined variable 'Tuple'
(E0602)
tests/test_api_actions.py (1)
16-17: Patch target could break ifai_engineis re-imported elsewhere
@patch('server.python_backend.main.ai_engine.analyze_email')assumesai_engineis a module attribute, not re-assigned after startup. Ifmain.pyperformsfrom ... import ai_enginelater, the patch will miss. Consider patching the object actually used inside the route handler (e.g.,server.python_backend.main.AdvancedAIEngine.analyze_email) to make the test more stable.server/python_backend/main.py (1)
689-730: Improve exception chaining for better debugging context.The endpoint implementation is solid with good error handling and logging. However, the exception re-raising could be improved for better debugging.
Apply this diff to improve exception chaining:
- raise HTTPException(status_code=500, detail=f"Failed to extract action items: {str(e)}") + raise HTTPException(status_code=500, detail=f"Failed to extract action items: {str(e)}") from e🧰 Tools
🪛 Ruff (0.11.9)
729-729: Within an
exceptclause, raise exceptions withraise ... from errorraise ... from Noneto distinguish them from errors in exception handling(B904)
tests/test_nlp_engine_integration.py (2)
3-3: Remove unused import.
MagicMockis imported but not used in this test file.-from unittest.mock import patch, MagicMock # Import MagicMock for AdvancedAIEngine test +from unittest.mock import patch🧰 Tools
🪛 Ruff (0.11.9)
3-3:
unittest.mock.MagicMockimported but unusedRemove unused import:
unittest.mock.MagicMock(F401)
53-97: Consider using pytest-asyncio for proper async test support.The async test method is well-written, but unittest doesn't natively support async tests. The commented code shows awareness of this limitation.
Would you like me to help convert this to use pytest with pytest-asyncio for proper async test support? This would make the tests more maintainable and eliminate the need for the workaround code.
Also, remove the unused
asyncioimport at line 112:- import asyncioAlso applies to: 108-130
server/python_nlp/nlp_engine.py (1)
952-973: Too-chatty INFO logs – switch to DEBUGEmitting INFO for every sub-analysis (
preprocess,sentiment,topic, etc.) will explode log volume in production. Change these tologger.debug()(keep the start / end high-level INFO if needed).server/python_nlp/gmail_service.py (1)
232-239: Maintain key symmetry in fallback analysis
_get_basic_fallback_analysis_structureomitsaction_items, whereas downstream code expects it (accessed via.get('action_items', [])). Add an empty list for consistency and to avoid accidentalNonepropagation.'risk_flags': ['analysis_failed'] + , 'action_items': []
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (12)
server/python_backend/ai_engine.py(5 hunks)server/python_backend/main.py(2 hunks)server/python_nlp/action_item_extractor.py(1 hunks)server/python_nlp/ai_training.py(2 hunks)server/python_nlp/data_strategy.py(1 hunks)server/python_nlp/gmail_service.py(5 hunks)server/python_nlp/nlp_engine.py(7 hunks)server/python_nlp/text_utils.py(1 hunks)tests/test_action_item_extractor.py(1 hunks)tests/test_api_actions.py(1 hunks)tests/test_gmail_service_integration.py(1 hunks)tests/test_nlp_engine_integration.py(1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (4)
server/python_nlp/ai_training.py (1)
server/python_nlp/text_utils.py (1)
clean_text(3-15)
server/python_backend/ai_engine.py (2)
server/python_nlp/gmail_service.py (1)
_execute_async_command(62-109)server/python_nlp/nlp_engine.py (2)
_get_fallback_analysis(840-860)_get_simple_fallback_analysis(862-922)
server/python_nlp/data_strategy.py (1)
server/python_nlp/nlp_engine.py (1)
_extract_keywords(567-615)
server/python_nlp/gmail_service.py (4)
server/python_nlp/gmail_integration.py (3)
GmailDataCollector(227-724)EmailBatch(59-65)collect_emails_incremental(360-446)server/python_nlp/gmail_metadata.py (3)
GmailMetadataExtractor(86-660)extract_complete_metadata(125-245)GmailMessage(16-84)server/python_nlp/data_strategy.py (2)
DataCollectionStrategy(40-483)to_dict(25-26)server/python_backend/ai_engine.py (2)
analyze_email(70-113)to_dict(35-49)
🪛 Ruff (0.11.9)
tests/test_nlp_engine_integration.py
3-3: unittest.mock.MagicMock imported but unused
Remove unused import: unittest.mock.MagicMock
(F401)
112-112: asyncio imported but unused
Remove unused import: asyncio
(F401)
server/python_nlp/action_item_extractor.py
52-52: Undefined name Tuple
(F821)
tests/test_gmail_service_integration.py
122-122: Undefined name json
(F821)
150-150: Undefined name json
(F821)
server/python_backend/main.py
729-729: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
🪛 Pylint (3.3.7)
server/python_nlp/action_item_extractor.py
[error] 52-52: Undefined variable 'Tuple'
(E0602)
[refactor] 24-24: Too few public methods (1/2)
(R0903)
tests/test_gmail_service_integration.py
[error] 122-122: Undefined variable 'json'
(E0602)
[error] 150-150: Undefined variable 'json'
(E0602)
server/python_backend/main.py
[refactor] 49-49: Too few public methods (0/2)
(R0903)
[refactor] 53-53: Too few public methods (0/2)
(R0903)
server/python_nlp/nlp_engine.py
[refactor] 1006-1006: Too many arguments (9/5)
(R0913)
[refactor] 1006-1006: Too many positional arguments (9/5)
(R0917)
[refactor] 1006-1006: Too many local variables (20/15)
(R0914)
server/python_nlp/gmail_service.py
[refactor] 32-32: Too many instance attributes (8/7)
(R0902)
🔇 Additional comments (8)
server/python_nlp/ai_training.py (1)
232-234: Re-use ofclean_textinherits punctuation-loss issueBecause
clean_textcurrently strips:, any training samples containing “task:” / “action:” will be altered, potentially degrading model performance or mismatching production flow. Ensure the earlier fix toclean_textis applied or bypass cleaning for labels that depend on such punctuation.server/python_backend/main.py (1)
48-59: LGTM! Well-structured Pydantic models for action item extraction.The models are properly defined with appropriate field types and optional parameters. The static analysis warnings about too few public methods can be safely ignored as these are Pydantic data models that don't require additional methods.
🧰 Tools
🪛 Pylint (3.3.7)
[refactor] 49-49: Too few public methods (0/2)
(R0903)
[refactor] 53-53: Too few public methods (0/2)
(R0903)
server/python_backend/ai_engine.py (3)
33-33: LGTM! Proper integration of action_items field.The
action_itemsfield is correctly added to both the initialization and serialization methods with appropriate default values.Also applies to: 47-48
71-114: Excellent error handling and logging improvements.The enhanced
analyze_emailmethod now provides:
- Detailed logging for debugging without being verbose
- Specific handling for different failure scenarios (empty output, JSON errors, file not found, timeout)
- Proper error context propagation to the fallback method
- Consistent return of
AIAnalysisResultin all code paths
202-255: Robust fallback implementation with proper error context handling.The fallback method provides excellent resilience:
- Uses the fallback NLP engine for basic analysis when possible
- Includes a secondary fallback for critical failures
- Properly incorporates error context into the reasoning
- Ensures
action_itemsis always present in the response for API consistencytests/test_gmail_service_integration.py (1)
11-63: Well-structured integration tests with comprehensive coverage.The tests effectively validate:
- AI analysis integration with action items
- Database format conversion with and without AI analysis results
- Proper serialization of analysis metadata
Note: Like the previous test file, the async test method will need a proper async test runner.
Also applies to: 64-128, 129-156
server/python_nlp/data_strategy.py (2)
280-320: Excellent implementation of dual-path annotation with external analysis support.The updated
annotate_emailmethod elegantly handles both external AI analysis results and internal fallback predictions:
- Clear separation of logic paths with appropriate logging
- Proper confidence scoring based on the analysis source
- Meaningful annotator ID suffixes for tracking
- Graceful fallback for missing fields in external results
324-338: Good enhancement of prediction methods with fallback tracking.The addition of the
is_fallbackparameter to all prediction methods provides better observability without changing the core logic. The debug logging with conditional prefixes will help with troubleshooting and understanding the annotation flow.Also applies to: 339-359, 361-374, 376-389
| text = re.sub(r"[^a-zA-Z0-9\s.,?!']", "", text) # Keep basic punctuation | ||
| text = re.sub(r"\s+", " ", text) # Normalize whitespace | ||
| return text |
There was a problem hiding this comment.
🛠️ Refactor suggestion
Colon & other critical punctuation removed – breaks Action-Item detection
clean_text strips : and ;.
ActionItemExtractor relies on task: / action: / action required: keywords that contain a colon, so the current normalisation will silently prevent those patterns from ever matching when text is pre-processed (e.g. in NLPEngine).
- text = re.sub(r"[^a-zA-Z0-9\s.,?!']", "", text) # Keep basic punctuation
+ # Keep punctuation that is semantically relevant for downstream NLP
+ # (e.g. ':' needed for “task:” / “action:” keywords).
+ text = _PUNCT_CLEAN_RE.sub("", text)And hoist the compiled regexes to module scope to avoid recompilation on every call:
+_PUNCT_CLEAN_RE = re.compile(r"[^a-zA-Z0-9\s.,?!':]")
+_WHITESPACE_RE = re.compile(r"\s+")
+
def clean_text(text: str) -> str:
@@
- text = re.sub(r"[^a-zA-Z0-9\s.,?!']", "", text) # Keep basic punctuation
- text = re.sub(r"\s+", " ", text) # Normalize whitespace
+ text = _PUNCT_CLEAN_RE.sub("", text)
+ text = _WHITESPACE_RE.sub(" ", text)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| text = re.sub(r"[^a-zA-Z0-9\s.,?!']", "", text) # Keep basic punctuation | |
| text = re.sub(r"\s+", " ", text) # Normalize whitespace | |
| return text | |
| _PUNCT_CLEAN_RE = re.compile(r"[^a-zA-Z0-9\s.,?!':]") | |
| _WHITESPACE_RE = re.compile(r"\s+") | |
| def clean_text(text: str) -> str: | |
| # Keep punctuation that is semantically relevant for downstream NLP | |
| # (e.g. ':' needed for “task:” / “action:” keywords). | |
| text = _PUNCT_CLEAN_RE.sub("", text) | |
| text = _WHITESPACE_RE.sub(" ", text) | |
| return text |
🤖 Prompt for AI Agents
In server/python_nlp/text_utils.py around lines 13 to 15, the regex used in
clean_text removes colons and semicolons, which breaks detection of action-item
keywords like "task:" or "action:". Update the regex to keep colons and
semicolons by including them in the allowed characters. Additionally, move the
compiled regex patterns to module-level variables outside the function to avoid
recompiling them on every call, improving performance.
| # Ensure necessary NLTK data is available, if not, download it. | ||
| # This is more for a local setup; in a container, it should be pre-installed. | ||
| try: | ||
| nltk.data.find('taggers/averaged_perceptron_tagger') | ||
| except nltk.downloader.DownloadError: | ||
| nltk.download('averaged_perceptron_tagger', quiet=True) | ||
| try: | ||
| nltk.data.find('tokenizers/punkt') | ||
| except nltk.downloader.DownloadError: | ||
| nltk.download('punkt', quiet=True) | ||
| HAS_NLTK = True |
There was a problem hiding this comment.
🛠️ Refactor suggestion
Auto-downloading NLTK resources at import time is risky
Triggering network downloads inside a library import can hang or fail in production containers without internet access and slows cold-starts.
Move the download logic behind an explicit helper or setup script instead of executing automatically.
🤖 Prompt for AI Agents
In server/python_nlp/action_item_extractor.py around lines 8 to 18, the current
code automatically downloads NLTK resources during import, which can cause hangs
or failures in production without internet access. Refactor by removing the
automatic download logic from the import time and instead create a separate
explicit helper function or setup script that checks for and downloads the
required NLTK resources. This helper should be called manually during setup or
initialization, not during module import.
| import re | ||
| import logging | ||
| from typing import List, Dict, Any, Optional | ||
|
|
There was a problem hiding this comment.
Missing Tuple import => runtime NameError
_extract_verb_object_with_nltk declares Tuple[...] in the return annotation but Tuple is not imported. The file will raise at import time.
-from typing import List, Dict, Any, Optional
+from typing import List, Dict, Any, Optional, Tuple📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| import re | |
| import logging | |
| from typing import List, Dict, Any, Optional | |
| import re | |
| import logging | |
| -from typing import List, Dict, Any, Optional | |
| +from typing import List, Dict, Any, Optional, Tuple |
🤖 Prompt for AI Agents
In server/python_nlp/action_item_extractor.py at the top import section (lines 1
to 4), the Tuple type is used in the return annotation of the function
_extract_verb_object_with_nltk but Tuple is not imported from typing. Add Tuple
to the import statement from typing alongside List, Dict, Any, and Optional to
prevent a runtime NameError.
| self.action_keywords_regex = re.compile( | ||
| r'\b(please|task:|action:|need to|required to|must|should|can you|could you|will you)\b', | ||
| re.IGNORECASE | ||
| ) | ||
| # Regex for simple due date patterns | ||
| # This is a basic version and can be expanded significantly | ||
| self.due_date_regex = re.compile( | ||
| r'\b(by (next )?(monday|tuesday|wednesday|thursday|friday|saturday|sunday|tomorrow|end of day|eod)|' | ||
| r'on \d{1,2}(st|nd|rd|th)? (jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)(\w*)?(\s\d{4})?|' | ||
| r'in \d+ (days?|weeks?|months?)|' | ||
| r'next (week|month|year))\b', | ||
| re.IGNORECASE | ||
| ) |
There was a problem hiding this comment.
Keyword pattern too narrow – test suite fails on “Action required:”
action_keywords_regex lacks action required: and the plain task form; unit-tests (tests/test_action_item_extractor.py) expect those to be recognised.
- r'\b(please|task:|action:|need to|required to|must|should|can you|could you|will you)\b',
+ r'\b(please|task:?|action:?|action required:|need to|required to|must|should|can you|could you|will you)\b',(Added optional colon for task / action, plus explicit action required:).
Update tests accordingly if you choose a different wording.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| self.action_keywords_regex = re.compile( | |
| r'\b(please|task:|action:|need to|required to|must|should|can you|could you|will you)\b', | |
| re.IGNORECASE | |
| ) | |
| # Regex for simple due date patterns | |
| # This is a basic version and can be expanded significantly | |
| self.due_date_regex = re.compile( | |
| r'\b(by (next )?(monday|tuesday|wednesday|thursday|friday|saturday|sunday|tomorrow|end of day|eod)|' | |
| r'on \d{1,2}(st|nd|rd|th)? (jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)(\w*)?(\s\d{4})?|' | |
| r'in \d+ (days?|weeks?|months?)|' | |
| r'next (week|month|year))\b', | |
| re.IGNORECASE | |
| ) | |
| self.action_keywords_regex = re.compile( | |
| r'\b(please|task:?|action:?|action required:|need to|required to|must|should|can you|could you|will you)\b', | |
| re.IGNORECASE | |
| ) | |
| # Regex for simple due date patterns | |
| # This is a basic version and can be expanded significantly | |
| self.due_date_regex = re.compile( | |
| r'\b(by (next )?(monday|tuesday|wednesday|thursday|friday|saturday|sunday|tomorrow|end of day|eod)|' | |
| r'on \d{1,2}(st|nd|rd|th)? (jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)(\w*)?(\s\d{4})?|' | |
| r'in \d+ (days?|weeks?|months?)|' | |
| r'next (week|month|year))\b', | |
| re.IGNORECASE | |
| ) |
🤖 Prompt for AI Agents
In server/python_nlp/action_item_extractor.py around lines 32 to 44, the
action_keywords_regex pattern is too narrow and misses matching "action
required:" and the plain "task" keyword without a colon, causing test failures.
Update the regex to include optional colons after "task" and "action" and
explicitly add "action required:" as a keyword. After updating the regex, ensure
the unit tests in tests/test_action_item_extractor.py are aligned with these
changes.
| import unittest | ||
| from unittest.mock import MagicMock, AsyncMock # AsyncMock for async methods | ||
|
|
||
| from server.python_nlp.gmail_service import GmailAIService | ||
| # Assuming AdvancedAIEngine and AIAnalysisResult are importable for type hinting or mocking structure | ||
| # from server.python_backend.ai_engine import AdvancedAIEngine, AIAnalysisResult | ||
| from server.python_nlp.gmail_metadata import GmailMessage # For structuring metadata input | ||
|
|
There was a problem hiding this comment.
Fix missing json import.
The json module is used at lines 122 and 150 but not imported.
Add the missing import at the top of the file:
import unittest
from unittest.mock import MagicMock, AsyncMock # AsyncMock for async methods
+import json
from server.python_nlp.gmail_service import GmailAIServiceAlso applies to: 122-122, 150-150
🤖 Prompt for AI Agents
In tests/test_gmail_service_integration.py at the top of the file (lines 1 to
8), the json module is used later in the file around lines 122 and 150 but is
not imported. Add an import statement for the json module alongside the existing
imports to fix the missing import error.
| def _build_final_analysis_response(self, sentiment_analysis, topic_analysis, intent_analysis, urgency_analysis, | ||
| categories, keywords, risk_analysis_flags) -> Dict[str, Any]: | ||
| categories, keywords, risk_analysis_flags, action_items) -> Dict[str, Any]: | ||
| """Helper function to consolidate analysis results and build the final response dictionary.""" | ||
|
|
There was a problem hiding this comment.
🛠️ Refactor suggestion
Method signature is getting unwieldy
_build_final_analysis_response now takes 9 positional parameters (plus self) and exceeds the 5-arg guideline flagged by pylint (R0913/R0917). Pass a single dataclass or dict instead to improve readability and future maintainability.
🧰 Tools
🪛 Pylint (3.3.7)
[refactor] 1006-1006: Too many arguments (9/5)
(R0913)
[refactor] 1006-1006: Too many positional arguments (9/5)
(R0917)
[refactor] 1006-1006: Too many local variables (20/15)
(R0914)
🤖 Prompt for AI Agents
In server/python_nlp/nlp_engine.py around lines 1006 to 1009, the method
_build_final_analysis_response has too many positional parameters, making it
hard to read and maintain. Refactor this method to accept a single dataclass or
dictionary that encapsulates all these parameters instead of passing them
individually. Define a suitable dataclass or use a dict to group
sentiment_analysis, topic_analysis, intent_analysis, urgency_analysis,
categories, keywords, risk_analysis_flags, and action_items, then update the
method signature and its internal usage accordingly.
| def __init__(self, rate_config: Optional[RateLimitConfig] = None, advanced_ai_engine: Optional[Any] = None): # Allow passing AdvancedAIEngine | ||
| self.collector = GmailDataCollector(rate_config) | ||
| self.metadata_extractor = GmailMetadataExtractor() | ||
| self.data_strategy = DataCollectionStrategy() | ||
| self.model_trainer = ModelTrainer() | ||
| self.prompt_engineer = PromptEngineer() | ||
| # Changed for more specific logger name | ||
| # self.model_trainer = ModelTrainer() # Potentially remove if not used directly | ||
| # self.prompt_engineer = PromptEngineer() # Potentially remove | ||
| self.logger = logging.getLogger(f"{__name__}.{self.__class__.__name__}") |
There was a problem hiding this comment.
model_trainer / prompt_engineer now undefined – will crash
__init__ comments out creation of ModelTrainer and PromptEngineer, yet train_models_from_gmail_data() still calls self.model_trainer and self.prompt_engineer. Any invocation of training will raise AttributeError.
Add the attributes back or gate the training code behind a feature flag.
Also applies to: 397-423
🤖 Prompt for AI Agents
In server/python_nlp/gmail_service.py around lines 35 to 41 and also lines 397
to 423, the attributes model_trainer and prompt_engineer are commented out in
the __init__ method but are still used in train_models_from_gmail_data(),
causing AttributeError. To fix this, either uncomment and initialize
self.model_trainer and self.prompt_engineer in __init__ or add a feature flag to
conditionally create and use these attributes, ensuring any training code checks
the flag before accessing them.
feat: Refactor NLP pipeline and add Action Item Extraction
feat: Refactor NLP pipeline and add Action Item Extraction
This commit introduces significant enhancements to the Gmail AI Email Management system.
Refactoring:
text_utils.pymodule.GmailAIService.sync_gmail_emails, by breaking them into smaller, more focused methods.New Feature: Action Item Extraction
ActionItemExtractormodule (server/python_nlp/action_item_extractor.py) using rule-based methods and optional NLTK POS tagging to identify and extract action items from email text.NLPEngineandAdvancedAIEngine.analysisMetadatastored for emails.POST /api/actions/extract-from-text.Unit Tests:
ActionItemExtractor.NLPEngine,AdvancedAIEngine, andGmailAIServiceto ensure correct handling of action items./api/actions/extract-from-textendpoint.Additional Feature Ideas Proposed:
Summary by Sourcery
Refactor the NLP pipeline by centralizing analysis and fallback logic, introduce a rule-based and POS-augmented action item extraction feature across engines and the API, standardize text preprocessing, enhance logging and error handling, and add comprehensive tests.
New Features:
ActionItemExtractormodule to identify and extract action items from email text.POST /api/actions/extract-from-textAPI endpoint for on-demand action item extraction.Enhancements:
GmailAIServiceto delegate AI analysis toAdvancedAIEngine, splitting syncing logic into helper methods.NLPEngineandAdvancedAIEnginewith improved logging and error handling.text_utils.clean_textutility and update training pipeline to use it.analysisMetadataand annotate emails using external analysis results inDataCollectionStrategy.Tests:
ActionItemExtractorand integration tests for NLP engines and Gmail service handling action items.Summary by CodeRabbit
New Features
Bug Fixes
Refactor
Tests