[Misc] add Gemma 4 config parser#579
Conversation
There was a problem hiding this comment.
Code Review
This pull request adds support for the Google Gemma 4 model family, implementing configuration loading, parameter count estimation for dense and MoE variants, and unit tests. Feedback suggests sanitizing JSON input to handle non-standard values like NaN or Infinity and replacing fmt.Printf with log.Printf for library-level warnings to avoid polluting stdout.
| var config Gemma4Config | ||
| if err := json.Unmarshal(data, &config); err != nil { |
There was a problem hiding this comment.
The configuration data should be sanitized before unmarshaling to handle non-standard JSON values such as Infinity, -Infinity, or NaN. These values are common in Hugging Face configurations exported from Python but are not valid in standard JSON, causing json.Unmarshal to fail. Using SanitizeJSONBytes (defined in interface.go) ensures the parser is robust against these cases, maintaining consistency with the generic loader path.
data = SanitizeJSONBytes(data)
var config Gemma4Config
if err := json.Unmarshal(data, &config); err != nil {There was a problem hiding this comment.
Same thing applies to all model parsing logic. Will address in the later PR.
Motivation
Adds a config parser for Google Gemma 4 multimodal models to
pkg/hfutil/modelconfig.Modifications
Gemma4Configstruct implementing theHuggingFaceModelinterface, with nestedGemma4TextConfig,Gemma4VisionConfig, andGemma4AudioConfigNumExperts,TopKExperts,MoeIntermediateSize) so dense variants (nullin JSON) round-trip asniland MoE variants (26B-A4B) retain their 128-expert / top-k-8 shape.AudioConfig *Gemma4AudioConfigsonullon text+vision-only variants (26B, 31B) deserialises toniland populated encoders (E2B, E4B) round-trip correctly.dtypefield (used by transformers 5.x) toTorchDtypeduringLoadGemma4ConfigsoEstimateModelSizeBytesand downstream runtime sizing stay accurateGetContextLength()returnstext_config.max_position_embeddingsdirectly: 131072 on E2B/E4B, 262144 on 26B-A4B/31B.GetParameterCount()first reads from co-located safetensors files viaFindAndParseSafetensors, then falls back to a shape-based lookup table for the four published variants, then toestimateMoEParamCountfor MoE andestimateTextParams + estimateVisionParamsfor dense unknowns.GetQuantizationType()inspects the optionalquantization_config.quant_methodand returns"fp8"for FP8 checkpoints,"int4"for any other non-nil quant config,""otherwise so the runtime matcher can reject quantized checkpoints from unquantized runtimes.HasVision()returnstruefor every variant (all four ship the vision tower);IsEmbedding()returnsfalse."gemma4"in the model loader registry viainit()soLoadModelConfigdispatches correctly.gemma4_test.gotestdata/gemma4_{e2b,e4b,26b_a4b,31b}.jsonpkg/hfutil/modelconfig/README.mdsupported-families list andconfig/models/SUPPORTED_MODELS.mdGoogle Gemma table.Checklist
make format).