e1n00r

Elnur Abdullaev e1n00r

Popular repositories Loading

tinyserve tinyserve Public

30 tok/s for 20B MoE on 8 GB VRAM. Flat throughput to 32K context. Native MXFP4 + GGUF Q4_K/Q5_K/Q6_K via ggml CUDA kernels — zero dequant. Expert offloading for models that don't fit in GPU memory.

Python 9 2
vllm vllm Public

Forked from vllm-project/vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 1
llama.cpp llama.cpp Public

Forked from ggml-org/llama.cpp

LLM inference in C/C++

C++