Claude/adapt western media sources 01 e4q ej pwxa zxqmo wh81 puok #358

FJiangArthur · 2025-11-16T23:10:15Z

No description provided.

This commit adds full support for monitoring Western media sources, especially USA political news from left/right/center perspectives, plus social media platforms and technical news sources. New Features: - Western news RSS collection (CNN, Fox News, Reuters, NYT, WaPo, etc.) - Reddit crawler with API support (political, tech, news subreddits) - Twitter/X crawler (scraper-based, no API needed but high IP ban risk) - YouTube crawler with API support (political channels, tech content) - HackerNews crawler (free API, technical news) - Google News integration (various topics) - Comprehensive rate limiting and IP protection for home use Platform Coverage: Left-leaning news: CNN, MSNBC, NYT, Washington Post, NPR Right-leaning news: Fox News, Breitbart, Daily Wire, NY Post Center/Balanced: Reuters, AP, BBC, WSJ Tech news: TechCrunch, The Verge, Wired, HackerNews Social: Reddit, Twitter/X, YouTube IP Protection Features: - Per-platform rate limiting (configurable requests/hour) - Minimum delay enforcement between requests - Request counting and quota management - User agent rotation - Conservative defaults to prevent home IP bans - Special protection for high-risk platforms (Twitter) Database Changes: - New tables for reddit_post, reddit_comment - New tables for twitter_tweet - New tables for youtube_video, youtube_comment - New tables for hackernews_post, hackernews_comment - New table for western_news_article Configuration: - Updated .env.example with Western platform API credentials - Added rate limiting configuration options - Platform-specific delay and quota settings Documentation: - Comprehensive WESTERN_MEDIA_SETUP.md guide - API setup instructions (Reddit, YouTube) - IP protection best practices - Usage examples for all platforms - Troubleshooting guide Dependencies: - praw (Reddit API) - feedparser (RSS feeds) - google-api-python-client (YouTube API) - tweepy (Twitter API, optional) - ntscraper (Twitter scraper) - ratelimit (rate limiting) - fake-useragent (user agent rotation) Testing: - test_western_crawlers.py for quick validation - Individual crawler test functions Note: TikTok US support is planned but not fully implemented yet due to complexity of TikTok's anti-scraping measures. This implementation prioritizes IP safety for home use with aggressive rate limiting to prevent bans.

Critical Fix: - ntscraper is non-functional (Nitter instances shut down by Twitter) - Replaced with twikit (free) and Apify API (paid) options New Implementation: - twitter_crawler_v2.py with dual backend support * twikit: Free scraping (requires Twitter account, maintenance) * Apify API: Paid scraping (~$0.30/1000 tweets, reliable) Files Added: - twitter_crawler_v2.py: New working Twitter crawler - TWITTER_MIGRATION_GUIDE.md: Comprehensive migration guide Updated: - requirements.txt: Replace ntscraper with twikit - Mark ntscraper as deprecated Recommended Solutions for 100 publishers daily: 1. Brand24 ($49-99/month) - Best overall, multi-platform 2. Apify API (~$10/month) - Good value, reliable 3. twikit (free) - Budget option, requires maintenance 4. Twitter API ($200/month) - Official but expensive See TWITTER_MIGRATION_GUIDE.md for detailed setup instructions.

Implements industry-standard multi-agent system based on Anthropic's best practices for coordinating specialized agents in Western media monitoring. Documentation (docs/): - PROJECT_STRUCTURE.md: Complete system architecture and directory structure - PROJECT_PLAN.md: 12-week implementation plan with phases and milestones - AGENT_PERSONAS.md: Detailed personas for 20+ specialized agents - INTER_AGENT_COMMUNICATION.md: Message protocols and communication patterns - MULTI_AGENT_SYSTEM_README.md: Comprehensive getting started guide Agent Framework (agents/): - shared/base_agent.py: Base class template for all agents * Standard lifecycle management (IDLE → READY → WORKING → COMPLETED) * Message bus communication (pub/sub pattern) * Health monitoring and metrics * Error handling and reporting * Logging and observability - platform_agents/reddit_agent.py: Example concrete implementation * Monitors political and tech subreddits * Integrates with reddit_crawler.py * Demonstrates agent personality and expertise * Shows task execution patterns * Includes health checks Agent Categories: 1. Coordinator Agents (1): Project Manager, Task Dispatcher, Status Tracker 2. Platform Agents (6): Reddit, Twitter, YouTube, HackerNews, TikTok, RSS News 3. Data Agents (4): Pipeline, Storage, Validation, Deduplication 4. Analysis Agents (3): Sentiment, Topic, Bias 5. Protection Agents (3): Rate Limiter, Health Monitor, Error Recovery 6. QA Agents (2): Test, Monitoring Key Features: - Message Bus Architecture: Redis-based pub/sub for agent communication - Standard Message Formats: JSON schemas for all message types - Priority System: 1-5 priority levels for message processing - Circuit Breaker Pattern: Prevents cascading failures - Rate Limiting: Per-platform IP protection - Health Monitoring: Real-time agent health checks - Error Recovery: Automatic retry and recovery strategies Communication Patterns: - Request-Response: Synchronous operations with timeout - Pub-Sub: Event broadcasting to multiple agents - Task Queue: Distributed work distribution - Circuit Breaker: Error isolation and recovery Project Timeline: - Phase 1 (Weeks 1-3): Foundation and core crawlers ✓ (mostly done) - Phase 2 (Weeks 4-6): Agent implementation - Phase 3 (Weeks 7-9): Integration and testing - Phase 4 (Weeks 10-12): Production deployment Benefits: - Scalability: Agents can be scaled independently - Modularity: Easy to add new platforms/features - Reliability: Isolated failures, automatic recovery - Observability: Full visibility into system state - Maintainability: Clear separation of concerns - Best Practices: Based on Anthropic's agent guidelines Next Steps: 1. Implement remaining platform agents (Twitter, YouTube, etc.) 2. Implement data processing agents 3. Set up Redis message bus 4. Create agent coordinator 5. Write integration tests See docs/MULTI_AGENT_SYSTEM_README.md for getting started guide.

sonarqubecloud · 2025-11-19T05:36:02Z

Quality Gate failed

Failed conditions
5 Security Hotspots

See analysis details on SonarQube Cloud

claude added 2 commits November 16, 2025 09:00

dosubot bot added size:XXL This PR changes 1000+ lines, ignoring generated files. documentation Improvements or additions to documentation improvement New feature or request labels Nov 16, 2025

claude added 2 commits November 19, 2025 05:33

Add comprehensive Quick Start Guide for multi-agent system

a58ea33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Claude/adapt western media sources 01 e4q ej pwxa zxqmo wh81 puok #358

Claude/adapt western media sources 01 e4q ej pwxa zxqmo wh81 puok #358

Uh oh!

FJiangArthur commented Nov 16, 2025

Uh oh!

sonarqubecloud bot commented Nov 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Claude/adapt western media sources 01 e4q ej pwxa zxqmo wh81 puok #358

Are you sure you want to change the base?

Claude/adapt western media sources 01 e4q ej pwxa zxqmo wh81 puok #358

Uh oh!

Conversation

FJiangArthur commented Nov 16, 2025

Uh oh!

sonarqubecloud bot commented Nov 19, 2025

Quality Gate failed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants