A Python-based tool to aggregate data from multiple cyber threat intelligence sources, store that data over time in a SQLite database, and generate executive summaries of key findings to assist human analysts in tracking and analyzing threat actor campaigns. This is a proof of concept tool meant to be and example and a starting point for analysts to solve problems with AI assistance.
- Scrapes data from various cyber threat intelligence blogs, articles, and RSS feeds
- Extracts Indicators of Compromise (IOCs) such as IP addresses, domains, hashes, and more
- Stores raw data and IOCs in a SQLite database for historical tracking
- Generates AI-powered summaries for each article using Claude AI
- Creates executive summaries highlighting critical findings for non-technical stakeholders
- Supports multiple output formats (HTML, Markdown, JSON)
- Python 3.8 or higher
- pip (Python package installer)
- Clone the repository or download the source code:
git clone https://github.com/deruke/prism
cd prism - Create a virtual environment (recommended):
With Python Virtual Environment:
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activateWith conda:
conda create -n "prism" python=3.12
conda activate prism- Install the required dependencies:
pip install -r requirements.txt- Create a configuration file:
cp config.yaml.example config.yaml- Edit the configuration file with your settings:
- Add your Anthropic API key
- Configure your preferred intelligence sources
- Adjust database and reporting settings as needed
The Cyber Threat Intelligence Aggregator can be run in several modes:
To collect new threat intelligence from the configured sources:
python prism.py --scrapeTo generate AI summaries for articles that don't have summaries yet:
python prism.py --analyzeTo create an executive summary report from recent intelligence:
python prism.py --reportTo run the entire workflow (scrape, analyze, and report) in a single command:
python prism.py --full-runTo see all available options:
python prism.py --helpThe config.yaml file contains all the settings for the CTI Aggregator:
database:
path: data/cti.db # Path to SQLite database fileYou can configure multiple sources of different types:
sources:
- name: Krebs on Security
url: https://krebsonsecurity.com
type: rss
feed_url: https://krebsonsecurity.com/feed/sources:
- name: CISA Alerts
url: https://www.cisa.gov/news-events/cybersecurity-advisories
type: web
article_selector: a.usa-link
content_selector: div.usa-prose
url_include_patterns:
- /cybersecurity-advisories/ai:
api_key: YOUR_ANTHROPIC_API_KEY_HERE
model: claude-3-opus-20240229reporting:
output_directory: reports
time_window_days: 30 # How many days of data to include in reportsThe tool uses a SQLite database with the following structure:
Stores the raw articles and their summaries:
id: Unique identifiersource: Source nametitle: Article titleurl: Article URLauthor: Author namepublished_date: Publication datecontent: Article contentsummary: AI-generated summaryscraped_date: Date when article was scrapedanalyzed_date: Date when article was analyzed
Stores Indicators of Compromise extracted from articles:
id: Unique identifierarticle_id: Reference to the articletype: IOC type (ip, domain, hash, etc.)value: IOC valuecontext: Context around the IOC
Stores additional metadata tags for articles:
id: Unique identifierarticle_id: Reference to the articletag: Tag value
The tool can generate reports in multiple formats:
A formatted HTML report suitable for viewing in a browser with structured sections for:
- Executive Summary
- Key Threat Actors
- Critical IOCs
- Strategic Recommendations
- Recent Threat Intelligence
A Markdown-formatted report that can be viewed on GitHub or converted to other formats.
A structured JSON format that can be ingested by other tools or applications.
The following Python libraries are required:
- requests
- beautifulsoup4
- feedparser
- pyyaml
- anthropic
- markdown
- jinja2
See requirements.txt for specific versions.
Contributions are welcome! Please feel free to submit a Pull Request.
This project is licensed under the MIT License - see the LICENSE file for details.