[Refactor] Reduce per-model code duplication in modelconfig package

## Summary

The `pkg/hfutil/modelconfig` package has **36 separate .go model files** (6,187 lines) that each define a struct and implement the same `HuggingFaceModel` interface with near-identical methods. No external code depends on specific model types — all consumers use the interface only.

## Key Findings

- `GetModelSizeBytes()`, `GetQuantizationType()` are character-for-character identical across 30+ files
- `GetParameterCount()` follows the exact same 3-phase pattern (safetensors → hardcoded lookup → estimation) in every file
- `GetContextLength()` differs meaningfully in only 4 models
- `HasVision()` returns `false` in 30+ files; `IsEmbedding()` returns `false` in 34 files
- 9 different parameter estimation functions scattered across 6 files
- Zero type assertions to specific model configs in production code

### Bugs

- `mistral.go:144`: `IsEmbedding()` returns `true` — wrong for a LLM
- `phi.go`, `llama4.go`: shadow `ConfigPath` field already in `BaseModelConfig`

## Refactoring Plan

### Phase 1: Infrastructure (no behavior change)

**Step 1.1 — Consolidate estimation functions in `interface.go`**

9 estimation functions scattered across 6 files:

| Function | Location | Action |
|----------|----------|--------|
| `estimateModelParams` | `phi3_v.go:111` | Move to `interface.go` as canonical `EstimateModelParams()` |
| `estimateGenericParams` | `interface.go:265` | Replace with more accurate phi3_v version |
| `estimateParamsFromArchitecture` | `llama.go:141` | Replace with call to `EstimateModelParams` |
| `estimateTextParams` | `mllama.go:132` | Move to `interface.go` as shared helper |
| `estimateVisionParams` | `mllama.go:138` | Deduplicate (identical to `estimateTextParams`) |
| `estimateMoEParamCount` | `llama4.go:258` | Consolidate into shared `EstimateMoEParams` |
| `estimateMoEParams` | `deepseek_vl.go:204` | Consolidate into shared `EstimateMoEParams` |
| `estimateQwen3VLMoEParams` | `qwen3_vl.go:169` | Deduplicate with shared MoE estimator |
| `estimateQwen3VLVisionParams` | `qwen3_vl.go:190` | Move to `interface.go` as shared helper |

**Step 1.2 — Fix `mistral.go` IsEmbedding bug**

**Step 1.3 — Add `StandardModelConfig` to `interface.go`**

New struct embedding `BaseModelConfig` with common transformer fields. Provides default implementations for `GetParameterCount()`, `GetContextLength()`, `GetModelSizeBytes()`, `GetQuantizationType()` with a per-model `paramLookupTable`.

### Phase 2: Consolidate simple text models (~20 files → 1 file)

Create `models_text.go` with thin wrapper structs embedding `StandardModelConfig`. Use generic loader factory (Go 1.25).

Models: `llama`, `mistral`, `gemma`, `gemma2`, `gemma3_text`, `phi3`, `phi3small`, `exaone`, `command_r`, `internlm`, `internlm2`, `stablelm`, `xverse`, `minicpm`, `minicpm3`

Create `models_text_special.go` for models with custom `GetContextLength()`: Qwen family (`qwen`, `qwen2`, `qwen3`), Baichuan.

### Phase 3: Consolidate MoE models (~5 files → 1 file)

Create `models_moe.go` with `MoEModelConfig` embedding `StandardModelConfig` + MoE fields. Override `GetParameterCount()` with MoE-aware estimation.

Models: `mixtral`, `phimoe`, `qwen3_moe`, `gpt_oss`, `kimi_k2`, `deepseek_v3`

### Phase 4: Clean up vision models (keep separate, reduce duplication)

Vision models keep individual files (genuinely unique nested configs). But:
- Remove re-implemented methods that `BaseModelConfig` already provides
- Use shared estimation helpers from Phase 1

### Phase 5: Clean up standalone special models

Keep as individual files due to non-standard JSON field names:
- `chatglm.go` — `num_layers`, `ffn_hidden_size`, `padded_vocab_size`
- `dbrx.go` — `d_model`, `n_heads`, `n_layers`
- `bert.go` — `IsEmbedding()` = true, BERT-specific estimation
- `phi.go` — doesn't embed `BaseModelConfig`

## Target File Structure

```
modelconfig/
  interface.go              # Interface, BaseModelConfig, StandardModelConfig, utilities
  safetensors.go            # Safetensors parsing (unchanged)
  diffusion.go              # Diffusion pipelines (unchanged)
  models_text.go            # ~15 simple text models (consolidated)
  models_text_special.go    # Qwen/Baichuan with custom GetContextLength
  models_moe.go             # MoE models (consolidated)
  chatglm.go                # Standalone (non-standard fields)
  dbrx.go                   # Standalone (non-standard fields)
  bert.go                   # Standalone (embedding model)
  phi.go                    # Standalone (non-standard structure)
  mllama.go                 # Vision (standalone)
  llava.go                  # Vision (standalone)
  gemma3.go                 # Vision (standalone)
  qwen2_vl.go               # Vision (standalone)
  qwen3_vl.go               # Vision (standalone)
  phi3_v.go                 # Vision (standalone)
  deepseek_vl.go            # Vision (standalone)
  llama4.go                 # Vision + MoE (standalone)
```

**Result: 36 model files → ~18 files, significant code deduplication**

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Function	Location	Action
`estimateModelParams`	`phi3_v.go:111`	Move to `interface.go` as canonical `EstimateModelParams()`
`estimateGenericParams`	`interface.go:265`	Replace with more accurate phi3_v version
`estimateParamsFromArchitecture`	`llama.go:141`	Replace with call to `EstimateModelParams`
`estimateTextParams`	`mllama.go:132`	Move to `interface.go` as shared helper
`estimateVisionParams`	`mllama.go:138`	Deduplicate (identical to `estimateTextParams`)
`estimateMoEParamCount`	`llama4.go:258`	Consolidate into shared `EstimateMoEParams`
`estimateMoEParams`	`deepseek_vl.go:204`	Consolidate into shared `EstimateMoEParams`
`estimateQwen3VLMoEParams`	`qwen3_vl.go:169`	Deduplicate with shared MoE estimator
`estimateQwen3VLVisionParams`	`qwen3_vl.go:190`	Move to `interface.go` as shared helper

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Refactor] Reduce per-model code duplication in modelconfig package #587

Summary

Key Findings

Bugs

Refactoring Plan

Phase 1: Infrastructure (no behavior change)

Phase 2: Consolidate simple text models (~20 files → 1 file)

Phase 3: Consolidate MoE models (~5 files → 1 file)

Phase 4: Clean up vision models (keep separate, reduce duplication)

Phase 5: Clean up standalone special models

Target File Structure

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Refactor] Reduce per-model code duplication in modelconfig package #587

Description

Summary

Key Findings

Bugs

Refactoring Plan

Phase 1: Infrastructure (no behavior change)

Phase 2: Consolidate simple text models (~20 files → 1 file)

Phase 3: Consolidate MoE models (~5 files → 1 file)

Phase 4: Clean up vision models (keep separate, reduce duplication)

Phase 5: Clean up standalone special models

Target File Structure

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions