English Document | δΈζζζ‘£
Automatically track the latest AI research papers on arXiv each day, use LLMs for intelligent summarization, and generate research trend analysis reports.
-
π Intelligent Crawling: Daily automatic fetching of the newest papers from arXiv in specified fields
- Supports multiple research areas (cs.AI, cs.LG, cs.CV, etc.)
- Keyword filtering
- TFβIDF based smart selection
-
π€ MultiβModel Summarization: Use LLMs to generate concise paper summaries
- Supports 5 LLM providers: OpenAI, Gemini, Claude, DeepSeek, vLLM
- Bilingual (Chinese & English) summaries
- Concurrent processing for higher efficiency
-
π Trend Analysis: Inβdepth analysis of research hot topics and technological trends
- TFβIDF keyword extraction
- LDA topic modeling
- Wordβcloud visualization
- LLM deep analysis (research hotspots, technology trends, future directions)
-
π Web Interface: Modern responsive web UI
- Built with BootstrapΒ 5
- Realβtime data display
- Detailed paper view
- Pagination and filtering
-
β° Scheduled Execution: Various scheduling options
- APScheduler (recommended)
- Linux cron jobs
- Systemd service
-
π§ Email Notifications: Execution status via email
- Elegant HTML email templates
- Separate success/failure notices
- Detailed statistics
- PythonΒ 3.12+
- Conda (recommended) or virtualenv
- LLM API keys (OpenAI / Gemini / Claude / DeepSeek / vLLM)
git clone https://github.com/yourusername/daily-arxiv.git
cd daily-arxiv# Using Conda (recommended)
conda create -n daily-arxiv python=3.12 -y
conda activate daily-arxiv
# Or using venv
python -m venv venv
source venv/bin/activate # Linux/macOS
# venv\Scripts\activate # Windowspip install uv
uv pip install -r requirements.txt# Copy the example file
cp .env.example .env
# Edit the .env file
nano .envAdd your API keys:
# OpenAI
OPENAI_API_KEY=sk-...
# Google Gemini
GEMINI_API_KEY=...
# Anthropic Claude
ANTHROPIC_API_KEY=...
# DeepSeek
DEEPSEEK_API_KEY=...
# vLLM (local deployment)
VLLM_API_KEY=EMPTY
# Email notifications (optional)
EMAIL_PASSWORD=your-app-passwordEdit config/config.yaml:
# Research fields
arxiv:
categories:
- "cs.AI" # Artificial Intelligence
- "cs.LG" # Machine Learning
keywords:
- "large language model"
- "transformer"
max_results: 20
# LLM provider
llm:
provider: "vllm" # openai, gemini, claude, deepseek, vllm
# Scheduler settings
scheduler:
enabled: true
run_time: "09:00"
timezone: "Asia/Shanghai"# Test paper fetching
python test/test_fetcher.py
# Test LLM summarization
python test/test_summarizer.py
# Test trend analysis
python test/test_analyzer.py
# Test web service
python test/test_web.py
# Test scheduler
python test/test_scheduler.py# Manual single run
python main.py# Development mode
python src/web/app.py
# Open http://localhost:5000# Recommended: use the start script
./deploy/start.sh
# Or run directly
python scheduler.pyVisit http://localhost:5000 to view results.
daily-arxiv/
βββ config/
β βββ config.yaml # Main configuration file
βββ src/
β βββ crawler/
β β βββ arxiv_fetcher.py # arXiv paper crawler
β βββ summarizer/
β β βββ base_llm_client.py # Base LLM class
β β βββ openai_client.py # OpenAI client
β β βββ gemini_client.py # Gemini client
β β βββ claude_client.py # Claude client
β β βββ deepseek_client.py # DeepSeek client
β β βββ vllm_client.py # vLLM client
β β βββ llm_factory.py # LLM factory
β β βββ paper_summarizer.py # Paper summarizer
β βββ analyzer/
β β βββ trend_analyzer.py # Trend analysis
β βββ web/
β β βββ app.py # Flask web app
β β βββ templates/
β β βββ index.html # Web UI page
β βββ notifier/
β β βββ email_notifier.py # Email notifier
β βββ utils.py # Utility functions
βββ static/
β βββ js/
β βββ main.js # Frontβend JavaScript
βββ data/ # Data storage
β βββ papers/ # Paper JSON files
β βββ summaries/ # Summary JSON files
β ββ/ # wordβcloud images
βββ logs/ # Log files
βββ deploy/ # Deployment scripts
β βββ start.sh # Start script
β βββ daily-arxiv.service # Systemd service
β βββ crontab.example # Cron example
βββ docs/ # Documentation
β βββ arxiv_fetcher_guide.md
β βββ trend_analyzer_guide.md
β βββ web_interface_guide.md
β βββ scheduler_guide.md
βββ main.py # Main entry point
βββ scheduler.py # APScheduler dispatcher
βββ test_*.py # Test scripts
βββ requirements.txt # Python dependencies
βββ .env.example # Example env file
βββ README.md # Project overview
Common Computer Science categories:
cs.AIβ Artificial Intelligencecs.LGβ Machine Learningcs.CVβ Computer Visioncs.CLβ Computation and Language (NLP)cs.NEβ Neural and Evolutionary Computingstat.MLβ Machine Learning (Statistics)
See the full list at: https://arxiv.org/category_taxonomy
Supported providers:
- OpenAI: GPTβ4, GPTβ3.5βturbo
- Gemini: Gemini models
- Anthropic: Claude
- DeepSeek: DeepSeek models
- vLLM: Locally run openβsource models (OpenAIβcompatible API)
- Project scaffolding β
- arXiv crawling β
- LLM summarization β
- Support OpenAI, Gemini, Claude, DeepSeek, vLLM
- Trend analysis β
- Keyword extraction, topic modeling, wordβcloud generation
- LLMβdriven deep analysis (hotspots, trends, innovations)
- Web UI development
- Scheduling functionality
- Testing & optimization
- UI beautification
- Add WeChat public account integration
# Test paper crawler
python test/test_fetcher.py
# Test summarizer
python test/test_summarizer.py
# Test trend analyzer
python test/test_analyzer.py
# Run full pipeline
python main.pydata/
βββ papers/
β βββ papers_YYYY-MM-DD.json # Daily paper data
β βββ latest.json # Latest paper data
βββ summaries/
β βββ summaries_YYYY-MM-DD.json# Daily summaries
β βββ latest.json # Latest summaries
βββ analysis/
βββ wordcloud_YYYY-MM-DD.png # Wordβcloud image
βββ analysis_YYYY-MM-DD.json # Analysis results
βββ report_YYYY-MM-DD.md # Markdown report
βββ latest.json # Latest analysis data
Feel free to open Issues and submit Pull Requests!
MIT License


