Skip to content

⚡ The fastest JSON parser/serializer - hand-optimized x86-64 & ARM64 assembly with C API. AVX-512/AVX2/NEON SIMD, 24-byte nodes, zero-copy parsing

License

Notifications You must be signed in to change notification settings

sebyx07/json-asm

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

json-asm

The fastest JSON parser and serializer, written in hand-optimized assembly (x86-64 & ARM64) with a clean C API.

Overview

json-asm is a high-performance JSON library that pushes the boundaries of parsing and serialization speed through hand-optimized assembly. It leverages every available SIMD instruction set and CPU feature for maximum throughput.

Key Features

  • Multi-architecture assembly - Hand-optimized for x86-64 (AVX-512/AVX2/SSE4.2) and ARM64 (NEON/SVE/SVE2)
  • Maximum SIMD utilization - 512-bit vector operations where available
  • Branch prediction optimization - Computed gotos, branchless algorithms, cmov usage
  • Cache hierarchy optimization - Prefetching, cache-line alignment, TLB-friendly allocation
  • Zero-copy parsing - Minimal memory allocations during parse operations
  • Lock-free data structures - Wait-free progress on multi-threaded access
  • Simple C API - Easy to integrate, no C++ runtime dependencies

Supported Architectures

Architecture SIMD Support Status
x86-64 AVX-512, AVX2, SSE4.2, BMI2, POPCNT Full
ARM64 SVE2, SVE, NEON Full

Performance

Preliminary benchmark results on available hardware:

Development System (AMD EPYC 7282 @ AVX2)

Tested with 399-byte JSON document (44 values, nested objects/arrays):

Operation Throughput Time per operation Memory per value
Parse ~290 MB/s 1.38 μs 24 bytes
Stringify ~275 MB/s 1.45 μs 24 bytes

Test Environment:

  • CPU: AMD EPYC 7282 16-Core Processor
  • SIMD: AVX2 (256-bit vectors)
  • Compiler: GCC with -O3 -march=native
  • Test data: Small JSON with mixed types

Note: These are preliminary results from development hardware. Performance will vary significantly based on:

  • CPU architecture and generation
  • Available SIMD instructions (SSE4.2 vs AVX2 vs AVX-512)
  • JSON document size and structure
  • Memory bandwidth and cache characteristics

The library automatically selects the best SIMD implementation for your CPU at runtime.

Comprehensive benchmarks comparing against other JSON libraries coming soon.

Quick Start

#include <json_asm.h>

int main(void) {
    const char *json = "{\"name\": \"json-asm\", \"fast\": true}";

    // Parse JSON
    json_doc *doc = json_parse(json, strlen(json));
    if (!doc) {
        fprintf(stderr, "Parse error\n");
        return 1;
    }

    // Access values
    json_val *root = json_doc_root(doc);
    json_val *name = json_obj_get(root, "name");
    printf("Name: %s\n", json_get_str(name));

    // Cleanup
    json_doc_free(doc);
    return 0;
}

Building

# Requirements:
#   x86-64: NASM 2.15+, GCC 10+/Clang 12+
#   ARM64:  GCC 10+/Clang 12+ (uses inline assembly)

make
make install  # installs to /usr/local by default

# Run tests
make test

# Run benchmarks
make bench

See docs/building.md for detailed build instructions.

Documentation

Why Assembly?

Modern compilers are excellent, but JSON parsing has predictable patterns that benefit from manual optimization:

x86-64 Optimizations

  • AVX-512 masked operations - Process 64 bytes with predicated loads/stores
  • AVX2 string scanning - Find structural chars across 32 bytes simultaneously
  • BMI2 PEXT/PDEP - Branchless bit manipulation for escape handling
  • POPCNT/LZCNT/TZCNT - O(1) bit counting for mask processing

ARM64 Optimizations

  • SVE/SVE2 variable-length vectors - Scalable from 128 to 2048 bits
  • NEON 128-bit operations - Parallel character classification
  • AdvSIMD dot product - Fast numeric accumulation
  • Load/store pair instructions - Efficient memory access patterns

Micro-architectural Optimizations

  • Branchless number parsing - Eliminates misprediction penalties (~15 cycles)
  • Computed goto dispatch - O(1) state machine transitions
  • Explicit prefetching - Hide memory latency for sequential access
  • Register pressure optimization - Keep hot data in registers
  • Instruction-level parallelism - Maximize superscalar execution

See docs/architecture.md for technical details.

Design Approach

json-asm uses hand-optimized assembly to achieve high performance:

Aspect json-asm
Implementation Native assembly (x86-64 + ARM64)
SIMD x86-64 AVX-512/AVX2/SSE4.2 hand-written
SIMD ARM64 NEON/SVE/SVE2 hand-written
String scanning Custom SIMD routines
Number parsing Branchless assembly algorithms
Memory layout Compact 24 bytes/value

By controlling every instruction in hot paths, assembly allows for:

  • Explicit SIMD instruction selection
  • Branchless algorithms to avoid misprediction penalties
  • Cache-line alignment and prefetching control
  • Register allocation optimization

CPU Feature Detection

json-asm automatically detects and uses the best available instructions:

x86-64 dispatch (best to worst):
  AVX-512 + BMI2 → AVX2 + BMI2 → AVX2 → SSE4.2 → Scalar

ARM64 dispatch (best to worst):
  SVE2 → SVE → NEON → Scalar

Runtime detection ensures optimal performance on any conforming CPU.

API Overview

// Parsing
json_doc *json_parse(const char *json, size_t len);
json_doc *json_parse_file(const char *path);

// Document access
json_val *json_doc_root(json_doc *doc);
void json_doc_free(json_doc *doc);

// Value inspection
json_type json_get_type(json_val *val);
const char *json_get_str(json_val *val);
double json_get_num(json_val *val);
int64_t json_get_int(json_val *val);
bool json_get_bool(json_val *val);

// Object/Array access
json_val *json_obj_get(json_val *obj, const char *key);
json_val *json_arr_get(json_val *arr, size_t index);

// Serialization
char *json_stringify(json_val *val);

See docs/api.md for the complete API reference.

Requirements

x86-64

  • CPU: x86-64-v2 minimum (SSE4.2), x86-64-v4 recommended (AVX-512)
  • OS: Linux, macOS, Windows, FreeBSD
  • Build: NASM 2.15+, GCC 10+ or Clang 12+

ARM64

  • CPU: ARMv8-A minimum, ARMv9-A recommended (SVE2)
  • OS: Linux, macOS (Apple Silicon), Android
  • Build: GCC 10+ or Clang 12+

License

MIT License. See LICENSE for details.

Contributing

Contributions welcome! Performance improvements should include benchmark results. Please read the architecture docs before working on assembly code.

About

⚡ The fastest JSON parser/serializer - hand-optimized x86-64 & ARM64 assembly with C API. AVX-512/AVX2/NEON SIMD, 24-byte nodes, zero-copy parsing

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Contributors 2

  •  
  •