The fastest JSON parser and serializer, written in hand-optimized assembly (x86-64 & ARM64) with a clean C API.
json-asm is a high-performance JSON library that pushes the boundaries of parsing and serialization speed through hand-optimized assembly. It leverages every available SIMD instruction set and CPU feature for maximum throughput.
- Multi-architecture assembly - Hand-optimized for x86-64 (AVX-512/AVX2/SSE4.2) and ARM64 (NEON/SVE/SVE2)
- Maximum SIMD utilization - 512-bit vector operations where available
- Branch prediction optimization - Computed gotos, branchless algorithms, cmov usage
- Cache hierarchy optimization - Prefetching, cache-line alignment, TLB-friendly allocation
- Zero-copy parsing - Minimal memory allocations during parse operations
- Lock-free data structures - Wait-free progress on multi-threaded access
- Simple C API - Easy to integrate, no C++ runtime dependencies
| Architecture | SIMD Support | Status |
|---|---|---|
| x86-64 | AVX-512, AVX2, SSE4.2, BMI2, POPCNT | Full |
| ARM64 | SVE2, SVE, NEON | Full |
Preliminary benchmark results on available hardware:
Tested with 399-byte JSON document (44 values, nested objects/arrays):
| Operation | Throughput | Time per operation | Memory per value |
|---|---|---|---|
| Parse | ~290 MB/s | 1.38 μs | 24 bytes |
| Stringify | ~275 MB/s | 1.45 μs | 24 bytes |
Test Environment:
- CPU: AMD EPYC 7282 16-Core Processor
- SIMD: AVX2 (256-bit vectors)
- Compiler: GCC with -O3 -march=native
- Test data: Small JSON with mixed types
Note: These are preliminary results from development hardware. Performance will vary significantly based on:
- CPU architecture and generation
- Available SIMD instructions (SSE4.2 vs AVX2 vs AVX-512)
- JSON document size and structure
- Memory bandwidth and cache characteristics
The library automatically selects the best SIMD implementation for your CPU at runtime.
Comprehensive benchmarks comparing against other JSON libraries coming soon.
#include <json_asm.h>
int main(void) {
const char *json = "{\"name\": \"json-asm\", \"fast\": true}";
// Parse JSON
json_doc *doc = json_parse(json, strlen(json));
if (!doc) {
fprintf(stderr, "Parse error\n");
return 1;
}
// Access values
json_val *root = json_doc_root(doc);
json_val *name = json_obj_get(root, "name");
printf("Name: %s\n", json_get_str(name));
// Cleanup
json_doc_free(doc);
return 0;
}# Requirements:
# x86-64: NASM 2.15+, GCC 10+/Clang 12+
# ARM64: GCC 10+/Clang 12+ (uses inline assembly)
make
make install # installs to /usr/local by default
# Run tests
make test
# Run benchmarks
make benchSee docs/building.md for detailed build instructions.
- API Reference - Complete C API documentation
- Architecture - Internal design, SIMD optimizations, CPU features
- Benchmarks - Performance methodology and comparisons
- Building - Build instructions for x86-64 and ARM64
- Examples - Usage examples and patterns
Modern compilers are excellent, but JSON parsing has predictable patterns that benefit from manual optimization:
- AVX-512 masked operations - Process 64 bytes with predicated loads/stores
- AVX2 string scanning - Find structural chars across 32 bytes simultaneously
- BMI2 PEXT/PDEP - Branchless bit manipulation for escape handling
- POPCNT/LZCNT/TZCNT - O(1) bit counting for mask processing
- SVE/SVE2 variable-length vectors - Scalable from 128 to 2048 bits
- NEON 128-bit operations - Parallel character classification
- AdvSIMD dot product - Fast numeric accumulation
- Load/store pair instructions - Efficient memory access patterns
- Branchless number parsing - Eliminates misprediction penalties (~15 cycles)
- Computed goto dispatch - O(1) state machine transitions
- Explicit prefetching - Hide memory latency for sequential access
- Register pressure optimization - Keep hot data in registers
- Instruction-level parallelism - Maximize superscalar execution
See docs/architecture.md for technical details.
json-asm uses hand-optimized assembly to achieve high performance:
| Aspect | json-asm |
|---|---|
| Implementation | Native assembly (x86-64 + ARM64) |
| SIMD x86-64 | AVX-512/AVX2/SSE4.2 hand-written |
| SIMD ARM64 | NEON/SVE/SVE2 hand-written |
| String scanning | Custom SIMD routines |
| Number parsing | Branchless assembly algorithms |
| Memory layout | Compact 24 bytes/value |
By controlling every instruction in hot paths, assembly allows for:
- Explicit SIMD instruction selection
- Branchless algorithms to avoid misprediction penalties
- Cache-line alignment and prefetching control
- Register allocation optimization
json-asm automatically detects and uses the best available instructions:
x86-64 dispatch (best to worst):
AVX-512 + BMI2 → AVX2 + BMI2 → AVX2 → SSE4.2 → Scalar
ARM64 dispatch (best to worst):
SVE2 → SVE → NEON → Scalar
Runtime detection ensures optimal performance on any conforming CPU.
// Parsing
json_doc *json_parse(const char *json, size_t len);
json_doc *json_parse_file(const char *path);
// Document access
json_val *json_doc_root(json_doc *doc);
void json_doc_free(json_doc *doc);
// Value inspection
json_type json_get_type(json_val *val);
const char *json_get_str(json_val *val);
double json_get_num(json_val *val);
int64_t json_get_int(json_val *val);
bool json_get_bool(json_val *val);
// Object/Array access
json_val *json_obj_get(json_val *obj, const char *key);
json_val *json_arr_get(json_val *arr, size_t index);
// Serialization
char *json_stringify(json_val *val);See docs/api.md for the complete API reference.
- CPU: x86-64-v2 minimum (SSE4.2), x86-64-v4 recommended (AVX-512)
- OS: Linux, macOS, Windows, FreeBSD
- Build: NASM 2.15+, GCC 10+ or Clang 12+
- CPU: ARMv8-A minimum, ARMv9-A recommended (SVE2)
- OS: Linux, macOS (Apple Silicon), Android
- Build: GCC 10+ or Clang 12+
MIT License. See LICENSE for details.
Contributions welcome! Performance improvements should include benchmark results. Please read the architecture docs before working on assembly code.