Daily arXiv – AI Research Tracker 📚🤖

English Document | 中文文档

Automatically track the latest AI research papers on arXiv each day, use LLMs for intelligent summarization, and generate research trend analysis reports.

✨ Features

Core Functions

🔍 Intelligent Crawling: Daily automatic fetching of the newest papers from arXiv in specified fields
- Supports multiple research areas (cs.AI, cs.LG, cs.CV, etc.)
- Keyword filtering
- TF‑IDF based smart selection
🤖 Multi‑Model Summarization: Use LLMs to generate concise paper summaries
- Supports 5 LLM providers: OpenAI, Gemini, Claude, DeepSeek, vLLM
- Bilingual (Chinese & English) summaries
- Concurrent processing for higher efficiency
📊 Trend Analysis: In‑depth analysis of research hot topics and technological trends
- TF‑IDF keyword extraction
- LDA topic modeling
- Word‑cloud visualization
- LLM deep analysis (research hotspots, technology trends, future directions)
🌐 Web Interface: Modern responsive web UI
- Built with Bootstrap 5
- Real‑time data display
- Detailed paper view
- Pagination and filtering
⏰ Scheduled Execution: Various scheduling options
- APScheduler (recommended)
- Linux cron jobs
- Systemd service
📧 Email Notifications: Execution status via email
- Elegant HTML email templates
- Separate success/failure notices
- Detailed statistics

📸 Interface Preview

🚀 Quick Start

Prerequisites

Python 3.12+
Conda (recommended) or virtualenv
LLM API keys (OpenAI / Gemini / Claude / DeepSeek / vLLM)

1. Clone the repository

git clone https://github.com/yourusername/daily-arxiv.git
cd daily-arxiv

2. Create a virtual environment

# Using Conda (recommended)
conda create -n daily-arxiv python=3.12 -y
conda activate daily-arxiv

# Or using venv
python -m venv venv
source venv/bin/activate  # Linux/macOS
# venv\Scripts\activate   # Windows

3. Install dependencies

pip install uv
uv pip install -r requirements.txt

4. Configure environment variables

# Copy the example file
cp .env.example .env

# Edit the .env file
nano .env

Add your API keys:

# OpenAI
OPENAI_API_KEY=sk-...

# Google Gemini
GEMINI_API_KEY=...

# Anthropic Claude
ANTHROPIC_API_KEY=...

# DeepSeek
DEEPSEEK_API_KEY=...

# vLLM (local deployment)
VLLM_API_KEY=EMPTY

# Email notifications (optional)
EMAIL_PASSWORD=your-app-password

5. Configure `config.yaml`

Edit config/config.yaml:

# Research fields
arxiv:
  categories:
    - "cs.AI"  # Artificial Intelligence
    - "cs.LG"  # Machine Learning
  
  keywords:
    - "large language model"
    - "transformer"
  
  max_results: 20

# LLM provider
llm:
  provider: "vllm"  # openai, gemini, claude, deepseek, vllm

# Scheduler settings
scheduler:
  enabled: true
  run_time: "09:00"
  timezone: "Asia/Shanghai"

6. Run tests

# Test paper fetching
python test/test_fetcher.py

# Test LLM summarization
python test/test_summarizer.py

# Test trend analysis
python test/test_analyzer.py

# Test web service
python test/test_web.py

# Test scheduler
python test/test_scheduler.py

7. Execute the full workflow

# Manual single run
python main.py

8. Start the web service

# Development mode
python src/web/app.py

# Open http://localhost:5000

9. Launch scheduled execution

# Recommended: use the start script
./deploy/start.sh

# Or run directly
python scheduler.py

Visit http://localhost:5000 to view results.

📂 Project Structure

daily-arxiv/
├── config/
│   └── config.yaml              # Main configuration file
├── src/
│   ├── crawler/
│   │   └── arxiv_fetcher.py    # arXiv paper crawler
│   ├── summarizer/
│   │   ├── base_llm_client.py  # Base LLM class
│   │   ├── openai_client.py    # OpenAI client
│   │   ├── gemini_client.py    # Gemini client
│   │   ├── claude_client.py    # Claude client
│   │   ├── deepseek_client.py  # DeepSeek client
│   │   ├── vllm_client.py      # vLLM client
│   │   ├── llm_factory.py      # LLM factory
│   │   └── paper_summarizer.py # Paper summarizer
│   ├── analyzer/
│   │   └── trend_analyzer.py   # Trend analysis
│   ├── web/
│   │   ├── app.py             # Flask web app
│   │   └── templates/
│   │       └── index.html     # Web UI page
│   ├── notifier/
│   │   └── email_notifier.py  # Email notifier
│   └── utils.py               # Utility functions
├── static/
│   └── js/
│       └── main.js            # Front‑end JavaScript
├── data/                      # Data storage
│   ├── papers/               # Paper JSON files
│   ├── summaries/            # Summary JSON files
│  ──/ # word‑cloud images
├── logs/                     # Log files
├── deploy/                   # Deployment scripts
│   ├── start.sh             # Start script
│   ├── daily-arxiv.service  # Systemd service
│   └── crontab.example      # Cron example
├── docs/                     # Documentation
│   ├── arxiv_fetcher_guide.md
│   ├── trend_analyzer_guide.md
│   ├── web_interface_guide.md
│   └── scheduler_guide.md
├── main.py                   # Main entry point
├── scheduler.py              # APScheduler dispatcher
├── test_*.py                # Test scripts
├── requirements.txt         # Python dependencies
├── .env.example            # Example env file
└── README.md               # Project overview

⚙️ Configuration Details

arXiv Category Codes

Common Computer Science categories:

cs.AI – Artificial Intelligence
cs.LG – Machine Learning
cs.CV – Computer Vision
cs.CL – Computation and Language (NLP)
cs.NE – Neural and Evolutionary Computing
stat.ML – Machine Learning (Statistics)

See the full list at: https://arxiv.org/category_taxonomy

LLM Providers

Supported providers:

OpenAI: GPT‑4, GPT‑3.5‑turbo
Gemini: Gemini models
Anthropic: Claude
DeepSeek: DeepSeek models
vLLM: Locally run open‑source models (OpenAI‑compatible API)

📝 Development Roadmap

Project scaffolding ✅
arXiv crawling ✅
LLM summarization ✅
- Support OpenAI, Gemini, Claude, DeepSeek, vLLM
Trend analysis ✅
- Keyword extraction, topic modeling, word‑cloud generation
- LLM‑driven deep analysis (hotspots, trends, innovations)
Web UI development
Scheduling functionality
Testing & optimization
UI beautification
Add WeChat public account integration

🧪 Testing

# Test paper crawler
python test/test_fetcher.py

# Test summarizer
python test/test_summarizer.py

# Test trend analyzer
python test/test_analyzer.py

# Run full pipeline
python main.py

📊 Generated Files

data/
├── papers/
│   ├── papers_YYYY-MM-DD.json   # Daily paper data
│   └── latest.json              # Latest paper data
├── summaries/
│   ├── summaries_YYYY-MM-DD.json# Daily summaries
│   └── latest.json              # Latest summaries
└── analysis/
    ├── wordcloud_YYYY-MM-DD.png # Word‑cloud image
    ├── analysis_YYYY-MM-DD.json # Analysis results
    ├── report_YYYY-MM-DD.md     # Markdown report
    └── latest.json              # Latest analysis data

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Daily arXiv – AI Research Tracker 📚🤖

✨ Features

Core Functions

📸 Interface Preview

🚀 Quick Start

Prerequisites

1. Clone the repository

2. Create a virtual environment

3. Install dependencies

4. Configure environment variables

5. Configure `config.yaml`

6. Run tests

7. Execute the full workflow

8. Start the web service

9. Launch scheduled execution

📂 Project Structure

⚙️ Configuration Details

arXiv Category Codes

LLM Providers

📝 Development Roadmap

🧪 Testing

📊 Generated Files

📖 Documentation

🤝 Contributing

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
config		config
data		data
deploy		deploy
docs		docs
resources		resources
src		src
static		static
test		test
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
README_zh.md		README_zh.md
main.py		main.py
requirements.txt		requirements.txt
scheduler.py		scheduler.py

Folders and files

Latest commit

History

Repository files navigation

Daily arXiv – AI Research Tracker 📚🤖

✨ Features

Core Functions

📸 Interface Preview

🚀 Quick Start

Prerequisites

1. Clone the repository

2. Create a virtual environment

3. Install dependencies

4. Configure environment variables

5. Configure config.yaml

6. Run tests

7. Execute the full workflow

8. Start the web service

9. Launch scheduled execution

📂 Project Structure

⚙️ Configuration Details

arXiv Category Codes

LLM Providers

📝 Development Roadmap

🧪 Testing

📊 Generated Files

📖 Documentation

🤝 Contributing

📄 License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

5. Configure `config.yaml`

Packages