A Go-based CLI for mirroring Slack workspace data into local SQLite
for search, querying, and offline inspection.
Slack search is convenient until you need your own workflow, your own retention, or your own queries. slacrawl is a Go-based CLI that pulls Slack workspace metadata and message history into SQLite so you can inspect it without depending on the Slack UI.
Data stays on your machine. You can run it in API mode, desktop mode, or a hybrid workflow that combines both. That covers one-shot syncs, live tailing over Socket Mode, and local desktop recovery or "wiretap" style inspection from Slack Desktop artifacts already on your machine.
- local SQLite storage with full-text search backed by SQLite FTS5
- workspace, channel, user, and message sync
- thread reply backfill when a user token is available
- DM and MPIM sync when a user token is available
- incremental API history sync by default, with
--fullreserved for deliberate backfills sync --latest-onlyfor cheap incremental refreshes on already-seeded channels- mention extraction for structured querying
- read-only SQL access for ad hoc analysis
doctordiagnostics for config, database, token, and desktop-source checks- desktop-local ingestion of workspace metadata, channels, users, cached channel messages, drafts, read markers, recent-channel hints, and custom-status metadata
- optional Socket Mode live tailing via app token
- periodic desktop refresh with
watch - git-backed archive publishing, subscription, and read-time auto-refresh
- multi-workspace storage and filtering
- multi-workspace API sync when
[[workspaces]]is configured - multi-workspace live tailing when per-workspace app tokens are configured
- public channels
- private channels
- top-level messages
- channel threads
- local FTS search
- read-only SQL access
- macOS Slack Desktop discovery
- attachment blob downloads
- write-back actions
- public Marketplace-style distribution hardening
- desktop-local message extraction beyond the documented bootstrap surface
If one of those gaps matters to your workflow, open an issue so it can be tracked explicitly.
- Go
1.25+ nodeif you want richer desktop-local IndexedDB blob decoding- a Slack bot token for standard API sync
- an app token if you want to use
tail - an optional user token for fuller historical thread coverage
- macOS Slack Desktop only if you want desktop-local discovery
Homebrew (macOS)
brew tap vincentkoc/tap
brew install slacrawlLinux packages from GitHub Releases
Download the package that matches your platform from the latest release.
Debian/Ubuntu:
curl -LO https://github.com/vincentkoc/slacrawl/releases/latest/download/slacrawl_0.5.0_amd64.deb
sudo dpkg -i slacrawl_0.5.0_amd64.debRHEL/Fedora:
curl -LO https://github.com/vincentkoc/slacrawl/releases/latest/download/slacrawl-0.5.0-1.x86_64.rpm
sudo rpm -i slacrawl-0.5.0-1.x86_64.rpmBuild from source
git clone https://github.com/vincentkoc/slacrawl.git
cd slacrawl
go build -o bin/slacrawl ./cmd/slacrawl
./bin/slacrawl --helpRun without building a binary
git clone https://github.com/vincentkoc/slacrawl.git
cd slacrawl
go run ./cmd/slacrawl --helpexport SLACK_BOT_TOKEN="xoxb-..."
export SLACK_APP_TOKEN="xapp-..."
export SLACK_USER_TOKEN="xoxp-..."
go run ./cmd/slacrawl init
go run ./cmd/slacrawl doctor
go run ./cmd/slacrawl sync --source api
go run ./cmd/slacrawl search --workspace T01234567 "incident"
go run ./cmd/slacrawl analytics trends --weeks 4
go run ./cmd/slacrawl tail --repair-every 30m
go run ./cmd/slacrawl watch --desktop-every 5mIf you built the binary, replace go run ./cmd/slacrawl with ./bin/slacrawl.
tail is the live API side of the tool. watch is the recurring desktop-side refresh loop.
Choose the path that matches your setup:
- use
sync --source apifor normal incremental syncs - use
sync --source api --fullonly when you want a deliberate full backfill - use
sync --source api --latest-onlywhen you only want fresh deltas on channels that already have local history - use
sync --source desktopwhen you want local desktop recovery only - use
watchwhen you want desktop-local state to refresh into SQLite continuously
initcreates a starter config filedoctorchecks config, DB access, token presence, FTS, and desktop source availabilityreportsummarizes archive activity and git-share freshness without writing SQLpublishexports the local SQLite archive into a git repo as compressed JSONL shards plus a manifestsubscribeconfigures a git-backed reader that can run without Slack credentialsupdatepulls and imports the latest git snapshotsyncperforms a one-shot crawl from API, desktop, or bothimportimports a Slack export ZIP or extracted export directorytaillistens for live events through Socket Mode, including one tail per configured workspacewatchrefreshes desktop-local state on a schedulesearchruns local FTS queries, optionally filtered by workspacemessageslists stored messages with filtersmentionslists structured mention recordssqlruns read-only SQL against the local databaseuserslists synced userschannelslists synced channelsstatusprints workspace and sync statusdigestprints a per-channel activity summary for a time windowanalyticsgroups analytics subcommands (digest,quiet,trends)completionprints shell completion forbashorzsh
slacrawl import ./my-export.zip --workspace T01234567
slacrawl import ./extracted-export/ --workspace T01234567 --dry-runSet SLACK_USER_TOKEN with im:history, mpim:history, im:read, and mpim:read scopes to include DMs and MPIMs in API sync.
analytics digest [--since 7d] [--workspace X] [--channel C]analytics quiet [--since 30d] [--workspace X]analytics trends [--weeks 8] [--workspace X] [--channel C]
Planned follow-ups: health, response-times, threads-stale, activity.
The CLI supports three output modes:
--format textfor the styled default terminal view--format jsonor--jsonfor machine-readable output--format logfor line-oriented automation-friendly output
Color is disabled automatically when stdout is not a TTY. You can also force plain text with --no-color or NO_COLOR=1.
make build
make test
make run ARGS="status"
make completionCompletion files are generated into dist/completions/.
Generate completion scripts with:
go run ./cmd/slacrawl completion bash
go run ./cmd/slacrawl completion zshOr use the Makefile:
make completionTypical install locations:
# bash
go run ./cmd/slacrawl completion bash > /usr/local/etc/bash_completion.d/slacrawl
# zsh
mkdir -p "${HOME}/.zsh/completions"
go run ./cmd/slacrawl completion zsh > "${HOME}/.zsh/completions/_slacrawl"- config:
~/.slacrawl/config.toml - database:
~/.slacrawl/slacrawl.db - cache:
~/.slacrawl/cache - logs:
~/.slacrawl/logs
For one workspace, keep using the top-level [slack.bot], [slack.app], and [slack.user] token config.
For multiple API workspaces or multiple live wiretap/tail sessions, add [[workspaces]] entries with per-workspace token env vars:
workspace_id = "T01234567"
[[workspaces]]
id = "T01234567"
default = true
[[workspaces]]
id = "T08976543"
bot_token_env = "SLACK_CLIENT_BOT_TOKEN"
app_token_env = "SLACK_CLIENT_APP_TOKEN"
user_token_env = "SLACK_CLIENT_USER_TOKEN"By default, each workspace entry automatically looks for SLACK_<WORKSPACE_ID>_BOT_TOKEN, SLACK_<WORKSPACE_ID>_APP_TOKEN, and SLACK_<WORKSPACE_ID>_USER_TOKEN, so you only need the id in the common case. Top-level enabled flags still apply globally, which avoids repeating enabled = true per workspace.
Without --workspace, sync --source api and tail fan out across every configured workspace entry. Read commands such as search, messages, mentions, users, and channels accept --workspace to scope the shared local database when needed.
Use git-share mode when one machine has Slack credentials and should publish snapshots, while other machines only need a local read-only archive.
Typical split:
- publisher machine: runs
sync, thenpublish --push - subscriber machine: runs
subscribe, then reads from local SQLite with optional read-time auto-refresh
Git-backed archive sharing is configured under [share]:
[share]
remote = "git@github.com:your-org/private-slacrawl-archive.git"
repo_path = "~/.slacrawl/share"
branch = "main"
auto_update = true
stale_after = "15m"Behavior:
publishwrites gzipped JSONL shards plusmanifest.jsonintorepo_pathsubscribewrites a git-reader config, disables Slack API and desktop sources for that config, clones the repo, and imports the snapshot- pass
--dbtosubscribewhen you want the reader archive to land in a non-default SQLite path updatepulls and re-imports only when the manifest changesstatus,search,messages,mentions,sql,users,channels, andreportauto-refresh stale git snapshots before reading whenauto_update = truesync --source apiandsync --source allwarm from the git snapshot before hitting Slack when a share remote is configuredstatusanddoctorsurface the current git-share repo, last import time, and whether the local snapshot is stale
publish is the writer-side command. It exports the current SQLite archive into the git share repo and can commit/push it in one step.
go run ./cmd/slacrawl publish --remote /path/to/private/slacrawl-archive.git --push
go run ./cmd/slacrawl publish --repo ~/.slacrawl/share --branch main --message "archive: daily refresh" --pushRelevant flags:
--repochooses the local git working repo path--remotesets or overrides the git remote used for publish--branchchooses the target branch--messagesets the git commit message--no-commitexports files without creating a git commit--pushpushes the new commit toorigin
subscribe is the reader-side setup command. It clones the git archive, writes a share-reader config, disables live Slack sources for that config, and imports the snapshot into SQLite.
go run ./cmd/slacrawl subscribe --repo ~/.slacrawl/share --db ~/.slacrawl/slacrawl.db /path/to/private/slacrawl-archive.git
go run ./cmd/slacrawl subscribe --remote git@github.com:your-org/private-slacrawl-archive.git --stale-after 30m
go run ./cmd/slacrawl subscribe --repo ~/.slacrawl/share --no-import --no-auto-update /path/to/private/slacrawl-archive.gitRelevant flags:
--repochooses the local clone path--dbchooses the SQLite file used by the reader--branchchooses which branch to track--remotestores the remote in config without requiring it as a positional arg--stale-aftercontrols when read-time refresh considers the local snapshot stale--no-auto-updatedisables read-time refresh for search/status/report-style commands--no-importskips the initial snapshot import
update is the explicit reader-side refresh. Use it when you want to pull and import on demand instead of waiting for automatic stale checks.
go run ./cmd/slacrawl update
go run ./cmd/slacrawl update --repo ~/.slacrawl/share --branch mainreport is the fastest human-readable archive summary and is especially handy in git-share mode because it shows the current archive footprint plus share freshness.
go run ./cmd/slacrawl reportTypical publish / subscribe flow:
# publisher
go run ./cmd/slacrawl sync --source api --latest-only
go run ./cmd/slacrawl publish --remote /path/to/private/slacrawl-archive.git --push
# subscriber
go run ./cmd/slacrawl subscribe --repo ~/.slacrawl/share --db ~/.slacrawl/slacrawl.db /path/to/private/slacrawl-archive.git
go run ./cmd/slacrawl search incidentThe starter config lives in config.example.toml. By default it points to these environment variables:
SLACK_BOT_TOKENSLACK_APP_TOKENSLACK_USER_TOKEN
Desktop discovery is enabled by default and uses:
~/Library/Containers/com.tinyspeck.slackmacgap/Data/Library/Application Support/Slack
Desktop config notes:
- set
[slack.desktop].enabled = falseto disable desktop ingestion - leave
[slack.desktop].path = ""to auto-detect the macOS Slack path - set a custom absolute path if Slack Desktop data lives elsewhere
- set
[slack.bot],[slack.app], or[slack.user]enabled = falseto ignore that token source entirely
go run ./cmd/slacrawl init
go run ./cmd/slacrawl sync --source api
go run ./cmd/slacrawl status
go run ./cmd/slacrawl report
go run ./cmd/slacrawl digest --since 7d
go run ./cmd/slacrawl channels
go run ./cmd/slacrawl messages --channel C12345678 --limit 20
go run ./cmd/slacrawl mentions --limit 20
go run ./cmd/slacrawl sql 'select channel_id, count(*) as messages from messages group by channel_id order by messages desc limit 10;'- Full historical thread reply coverage in public and private channels depends on providing a user token.
tailrequires an app token because it uses Socket Mode.- SQLite FTS5 is the built-in full-text index that powers fast local text search without an external search server.
- Indexed text is sanitized before it reaches FTS, so malformed UTF-8, zero-width junk, and odd whitespace do not poison search.
- Desktop-local support is broader than simple discovery, but still not a full write-back or export-import path.
go test ./...
go build ./cmd/slacrawlSee CONTRIBUTING.md for contribution workflow and SPEC.md for the implementation contract.
Deep-dive docs:
Built by Vincent Koc · MIT
