Streaming data comes in many forms. We've looked at routing data in motion using Kafka pipelines. This project is an example of the analytics for a historical review of streaming data. It's an example of a common concern: once real-time signals are captured, how do analysts detect patterns in the data?
Batch and streaming use much the same detection logic, just in a different context. Many of our prior tools will be useful in this analysis as well.
Two independent worlds are represented in separate DuckDB files (civic_world_a.duckdb and civic_world_b.duckdb).
Each world provides information about social media posting behavior.
To protect privacy, the platform provider doesn't expose the acount holder - or the exact content they posted.
Instead, they implement a "privacy-preserving" API that shares aggregate information.
The public or analytics researchers can access this proposed API to look for indications of possible coordination and/or manipulation.
Purpose: Data analysts explore the two worlds of simulated behavior to see if they can detect possible coordination.
- Analyze
civic_world_a.duckdbandcivic_world_b.duckdb. - Determine which world shows organic civic discourse and which world shows coordinated manipulation.
- Use the provided Jupyter Notebook (analysis.ipynb) for analysis.
Interactive Charts:
- The notebook uses interactive Plotly charts which do not render on GitHub.
- Recommended: See Getting Started below to get the analysis working locally on your machine.
- For an interactive preview, see MyBinder Analysis Notebook (it's free, please be patient).
/data/
/worlds/ # DuckDB files for analysis
civic_world_a.duckdb
civic_world_b.duckdb
/docs/ # Background information
/notebooks/
analysis.ipynb # Partially implemented analysis
These six prepared views help compare the two worlds:
- View 1: Compares burst and synchrony across topics
- View 2: Examines temporal posting patterns (the "when")
- View 3: Analyzes account age and automation correlation
- View 4: Measures content coordination (duplication patterns)
- View 5: Identifies highest-scoring suspicious events
- View 6: Synthesizes all signals for an overall assessment
Each view tests a hypothesis about coordinated vs organic behavior.
- Copy this template repo into your GitHub account.
- Clone your new buzzline-06-world repository down to your machine.
- Create and activate your local project virtual environment (.venv) and install key tools.
Follow the standard project setup described at pro-analytics-01 for more detailed instructions.
These commands:
- Create a local project virtual environment in a folder named
.venv. - Activate the virtual environment.
- Install and upgrade key tools in .venv.
- Install and upgrade required project dependencies.
Windows PowerShell (recommended Option A + requirements.txt)
py -m venv .venv
.\.venv\Scripts\activate
py -m pip install --upgrade pip setuptools wheel
py -m pip install --upgrade -r requirements.txtWindows PowerShell (advanced Option B + pyproject.toml)
uv venv
.\.venv\Scripts\activate
uv pip install --upgrade pip setuptools wheel
uv pip install -e ".[dev]"Mac/Linux/WSL (recommended Option A + requirements.txt)
python3 -m venv .venv
source .venv/bin/activate
python3 -m pip install --upgrade pip setuptools wheel
python3 -m pip install --upgrade -r requirements.txtMac/Linux/WSL (advanced Option B + pyproject.toml)
uv venv
source .venv/bin/activate
uv pip install --upgrade pip setuptools wheel
uv pip install -e ".[dev]"Execute notebooks.
git add .
git commit -m "custom message"
git push -u origin mainExamples of comparing the social media behavior of the two worlds (without compromising privacy).

