Summary
The pkg/hfutil/modelconfig package has 36 separate .go model files (6,187 lines) that each define a struct and implement the same HuggingFaceModel interface with near-identical methods. No external code depends on specific model types — all consumers use the interface only.
Key Findings
GetModelSizeBytes(), GetQuantizationType() are character-for-character identical across 30+ files
GetParameterCount() follows the exact same 3-phase pattern (safetensors → hardcoded lookup → estimation) in every file
GetContextLength() differs meaningfully in only 4 models
HasVision() returns false in 30+ files; IsEmbedding() returns false in 34 files
- 9 different parameter estimation functions scattered across 6 files
- Zero type assertions to specific model configs in production code
Bugs
mistral.go:144: IsEmbedding() returns true — wrong for a LLM
phi.go, llama4.go: shadow ConfigPath field already in BaseModelConfig
Refactoring Plan
Phase 1: Infrastructure (no behavior change)
Step 1.1 — Consolidate estimation functions in interface.go
9 estimation functions scattered across 6 files:
| Function |
Location |
Action |
estimateModelParams |
phi3_v.go:111 |
Move to interface.go as canonical EstimateModelParams() |
estimateGenericParams |
interface.go:265 |
Replace with more accurate phi3_v version |
estimateParamsFromArchitecture |
llama.go:141 |
Replace with call to EstimateModelParams |
estimateTextParams |
mllama.go:132 |
Move to interface.go as shared helper |
estimateVisionParams |
mllama.go:138 |
Deduplicate (identical to estimateTextParams) |
estimateMoEParamCount |
llama4.go:258 |
Consolidate into shared EstimateMoEParams |
estimateMoEParams |
deepseek_vl.go:204 |
Consolidate into shared EstimateMoEParams |
estimateQwen3VLMoEParams |
qwen3_vl.go:169 |
Deduplicate with shared MoE estimator |
estimateQwen3VLVisionParams |
qwen3_vl.go:190 |
Move to interface.go as shared helper |
Step 1.2 — Fix mistral.go IsEmbedding bug
Step 1.3 — Add StandardModelConfig to interface.go
New struct embedding BaseModelConfig with common transformer fields. Provides default implementations for GetParameterCount(), GetContextLength(), GetModelSizeBytes(), GetQuantizationType() with a per-model paramLookupTable.
Phase 2: Consolidate simple text models (~20 files → 1 file)
Create models_text.go with thin wrapper structs embedding StandardModelConfig. Use generic loader factory (Go 1.25).
Models: llama, mistral, gemma, gemma2, gemma3_text, phi3, phi3small, exaone, command_r, internlm, internlm2, stablelm, xverse, minicpm, minicpm3
Create models_text_special.go for models with custom GetContextLength(): Qwen family (qwen, qwen2, qwen3), Baichuan.
Phase 3: Consolidate MoE models (~5 files → 1 file)
Create models_moe.go with MoEModelConfig embedding StandardModelConfig + MoE fields. Override GetParameterCount() with MoE-aware estimation.
Models: mixtral, phimoe, qwen3_moe, gpt_oss, kimi_k2, deepseek_v3
Phase 4: Clean up vision models (keep separate, reduce duplication)
Vision models keep individual files (genuinely unique nested configs). But:
- Remove re-implemented methods that
BaseModelConfig already provides
- Use shared estimation helpers from Phase 1
Phase 5: Clean up standalone special models
Keep as individual files due to non-standard JSON field names:
chatglm.go — num_layers, ffn_hidden_size, padded_vocab_size
dbrx.go — d_model, n_heads, n_layers
bert.go — IsEmbedding() = true, BERT-specific estimation
phi.go — doesn't embed BaseModelConfig
Target File Structure
modelconfig/
interface.go # Interface, BaseModelConfig, StandardModelConfig, utilities
safetensors.go # Safetensors parsing (unchanged)
diffusion.go # Diffusion pipelines (unchanged)
models_text.go # ~15 simple text models (consolidated)
models_text_special.go # Qwen/Baichuan with custom GetContextLength
models_moe.go # MoE models (consolidated)
chatglm.go # Standalone (non-standard fields)
dbrx.go # Standalone (non-standard fields)
bert.go # Standalone (embedding model)
phi.go # Standalone (non-standard structure)
mllama.go # Vision (standalone)
llava.go # Vision (standalone)
gemma3.go # Vision (standalone)
qwen2_vl.go # Vision (standalone)
qwen3_vl.go # Vision (standalone)
phi3_v.go # Vision (standalone)
deepseek_vl.go # Vision (standalone)
llama4.go # Vision + MoE (standalone)
Result: 36 model files → ~18 files, significant code deduplication
🤖 Generated with Claude Code
Summary
The
pkg/hfutil/modelconfigpackage has 36 separate .go model files (6,187 lines) that each define a struct and implement the sameHuggingFaceModelinterface with near-identical methods. No external code depends on specific model types — all consumers use the interface only.Key Findings
GetModelSizeBytes(),GetQuantizationType()are character-for-character identical across 30+ filesGetParameterCount()follows the exact same 3-phase pattern (safetensors → hardcoded lookup → estimation) in every fileGetContextLength()differs meaningfully in only 4 modelsHasVision()returnsfalsein 30+ files;IsEmbedding()returnsfalsein 34 filesBugs
mistral.go:144:IsEmbedding()returnstrue— wrong for a LLMphi.go,llama4.go: shadowConfigPathfield already inBaseModelConfigRefactoring Plan
Phase 1: Infrastructure (no behavior change)
Step 1.1 — Consolidate estimation functions in
interface.go9 estimation functions scattered across 6 files:
estimateModelParamsphi3_v.go:111interface.goas canonicalEstimateModelParams()estimateGenericParamsinterface.go:265estimateParamsFromArchitecturellama.go:141EstimateModelParamsestimateTextParamsmllama.go:132interface.goas shared helperestimateVisionParamsmllama.go:138estimateTextParams)estimateMoEParamCountllama4.go:258EstimateMoEParamsestimateMoEParamsdeepseek_vl.go:204EstimateMoEParamsestimateQwen3VLMoEParamsqwen3_vl.go:169estimateQwen3VLVisionParamsqwen3_vl.go:190interface.goas shared helperStep 1.2 — Fix
mistral.goIsEmbedding bugStep 1.3 — Add
StandardModelConfigtointerface.goNew struct embedding
BaseModelConfigwith common transformer fields. Provides default implementations forGetParameterCount(),GetContextLength(),GetModelSizeBytes(),GetQuantizationType()with a per-modelparamLookupTable.Phase 2: Consolidate simple text models (~20 files → 1 file)
Create
models_text.gowith thin wrapper structs embeddingStandardModelConfig. Use generic loader factory (Go 1.25).Models:
llama,mistral,gemma,gemma2,gemma3_text,phi3,phi3small,exaone,command_r,internlm,internlm2,stablelm,xverse,minicpm,minicpm3Create
models_text_special.gofor models with customGetContextLength(): Qwen family (qwen,qwen2,qwen3), Baichuan.Phase 3: Consolidate MoE models (~5 files → 1 file)
Create
models_moe.gowithMoEModelConfigembeddingStandardModelConfig+ MoE fields. OverrideGetParameterCount()with MoE-aware estimation.Models:
mixtral,phimoe,qwen3_moe,gpt_oss,kimi_k2,deepseek_v3Phase 4: Clean up vision models (keep separate, reduce duplication)
Vision models keep individual files (genuinely unique nested configs). But:
BaseModelConfigalready providesPhase 5: Clean up standalone special models
Keep as individual files due to non-standard JSON field names:
chatglm.go—num_layers,ffn_hidden_size,padded_vocab_sizedbrx.go—d_model,n_heads,n_layersbert.go—IsEmbedding()= true, BERT-specific estimationphi.go— doesn't embedBaseModelConfigTarget File Structure
Result: 36 model files → ~18 files, significant code deduplication
🤖 Generated with Claude Code