Switch to magika+onxx instead of guesslang+tensorflow by robherley · Pull Request #251 · robherley/snips.sh

robherley · 2026-01-10T05:13:29Z

Closes #245

We have pretty dramatic savings by ditching tensorflow (like 400 MB), but it looks like the guesslang model is faster.

Guesslang itself is a smaller, less accurate model and only supports ~54 languages with a stated 90% accuracy. Whereas Magika supports > 200 content types and states 99% accuracy.

IMO the difference in speed isn't dramatic enough to stick with guesslang.

Note, the data collected below was investigated and summarized by Claude.

Size

Image Size Comparison Summary

Architecture	Remote (TensorFlow)	Local (Magika)	Savings
amd64	529.28 MB	128.54 MB	400.74 MB (75.7%)
arm64	104.88 MB	115.24 MB	-10.36 MB (+9.9%)

Key Findings:

amd64 (x86_64):

The old TensorFlow-based image had a massive 436 MB layer for TensorFlow libs
The new Magika image uses only 22.3 MB for ONNX Runtime libs
Total savings: ~401 MB (76% reduction)

arm64:

The remote arm64 image didn't actually include TensorFlow (0B extra libs layer) - likely due to lack of TensorFlow arm64 support at the time
The new local arm64 image is slightly larger (+10 MB) because it now includes ONNX Runtime (18.9 MB) where previously there was nothing
This is actually a feature improvement - arm64 now has full ML inference support via Magika/ONNX

Performance

Initialization Time

Time to load the ML model and prepare for inference.

Library	Time	Memory	Allocations
Magika	3.29 ms	177 KB	2,745
Guesslang	34.08 ms	N/A*	N/A*

*Guesslang uses TensorFlow which manages memory internally.

Takeaway: Magika initializes ~10x faster, making it better for CLI tools or short-lived processes.

Average time to detect the language of a single file (after initialization).

Library	Avg Time	Throughput	Memory/op	Allocs/op
Magika	2.02 ms	0.58 MB/s	~21 KB	12
Guesslang	0.27 ms	4.30 MB/s	~6.5 KB	141

Per-Language Breakdown

Language	Magika (ns/op)	Guesslang (ns/op)	Guesslang Speedup
Go	1,995,021	235,510	8.5x
Python	1,993,068	252,971	7.9x
JavaScript	2,001,053	259,589	7.7x
Rust	2,080,715	265,405	7.8x
Java	2,051,423	288,144	7.1x
TypeScript	2,034,903	265,808	7.7x
Ruby	2,039,749	259,497	7.9x
C++	2,032,912	278,796	7.3x
C	2,022,978	276,233	7.3x
PHP	2,049,994	287,743	7.1x

Takeaway: Guesslang is ~7.5x faster for per-file detection.

Copilot

Pull request overview

This pull request switches the file type detection system from guesslang (based on TensorFlow) to magika (based on ONNX Runtime), enabling better cross-platform support including ARM64 architectures.

Changes:

Replaces TensorFlow/guesslang-go dependencies with ONNX Runtime/magika-go
Adds vendor-onnxruntime script to download and install ONNX Runtime binaries
Updates build system with new env and build scripts for CGO/linker configuration
Removes architecture-specific limitations (previously disabled on ARM64)

Reviewed changes

Copilot reviewed 18 out of 20 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
script/vendor-onnxruntime	New script to download and install ONNX Runtime for the platform
script/run	New script to run the application with proper environment setup
script/env	New script to configure CGO/linker flags for ONNX Runtime
script/build	New script to build the binary with cross-compilation support
script/install-libtensorflow	Removed TensorFlow installation script
internal/renderer/guess.go	Replaced guesslang with magika scanner implementation using lazy initialization
internal/renderer/guess_disabled.go	Updated build tags to remove arm64 restriction
internal/renderer/detect.go	Updated comments to reflect AI guessing instead of Guesslang
internal/config/config_guesser.go	Changed build tag from amd64 to cgo
internal/config/config_noguesser.go	Changed build tag from arm64 to !cgo
internal/config/config.go	Updated description to reference AI model instead of Guesslang
go.mod	Replaced guesslang-go with magika-go dependency
go.sum	Updated dependency checksums
docs/self-hosting.md	Updated documentation for new ONNX-based approach
docs/contributing.md	Updated setup instructions for local development
README.md	Updated technology credits
Dockerfile	Refactored to use ONNX Runtime instead of TensorFlow
.gitignore	Added third_party/ directory
.github/workflows/test.yml	Updated to use vendor-onnxruntime instead of install-libtensorflow
.github/workflows/lint.yml	Updated to use vendor-onnxruntime instead of install-libtensorflow

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

.github/workflows/lint.yml

Dockerfile

script/vendor-onnxruntime

.github/workflows/test.yml

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

robherley added 6 commits January 1, 2026 14:04

use magika binary

642f0b0

switch from guesslang+tensorflow to magika+onnx

eacf02c

cleanup CI

78d7fc3

use smaller base img, rm old tensorflow script

7b69996

revert runtime img

5f3b5b8

use robherley/magika-go

de031c0

robherley changed the base branch from main to v1 January 10, 2026 19:20

robherley and others added 3 commits January 10, 2026 14:20

update CI

4dcabb6

Merge branch 'v1' into robherley/magika

07dac43

more docs

7fd78db

robherley marked this pull request as ready for review January 10, 2026 19:24

robherley changed the base branch from v1 to main January 10, 2026 19:25

robherley added 3 commits January 10, 2026 14:34

chore: trigger CI

37f0c1e

vendor onnx in ci

57fe80d

cmd+s

0243035

robherley requested a review from Copilot January 10, 2026 19:48

Copilot started reviewing on behalf of robherley January 10, 2026 19:48 View session

Copilot AI reviewed Jan 10, 2026

View reviewed changes

.github/workflows/lint.yml Outdated Show resolved Hide resolved

Dockerfile Show resolved Hide resolved

script/vendor-onnxruntime Show resolved Hide resolved

.github/workflows/test.yml Outdated Show resolved Hide resolved

Apply suggestions from code review

46518e4

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

robherley changed the base branch from main to v1 January 10, 2026 21:42

robherley changed the base branch from v1 to main January 10, 2026 21:42

robherley merged commit 660a5f3 into main Jan 10, 2026
5 checks passed

robherley deleted the robherley/magika branch January 10, 2026 21:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Switch to magika+onxx instead of guesslang+tensorflow#251

Switch to magika+onxx instead of guesslang+tensorflow#251
robherley merged 13 commits intomainfrom
robherley/magika

robherley commented Jan 10, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

robherley commented Jan 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Size

Performance

Initialization Time

Per-Language Breakdown

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

robherley commented Jan 10, 2026 •

edited

Loading