Skip to content

fix(local-llm): restore custom GGUF setup without restart#417

Merged
penso merged 13 commits intomainfrom
issue-132
Mar 14, 2026
Merged

fix(local-llm): restore custom GGUF setup without restart#417
penso merged 13 commits intomainfrom
issue-132

Conversation

@penso
Copy link
Copy Markdown
Collaborator

@penso penso commented Mar 11, 2026

Summary

  • preserve hf_repo and hf_filename for custom GGUF entries and derive filename-scoped custom model ids
  • download and register custom GGUF models asynchronously with progress events, then restore them at gateway startup
  • cover the regression with Rust tests, a browser regression test, and updated local-LLM docs

Validation

Completed

  • just format
  • cargo check -p moltis-gateway --tests
  • cargo test -p moltis-gateway register_saved_local_models_registers_custom_gguf_provider_from_saved_config
  • cargo test -p moltis-gateway custom_model_
  • cargo test -p moltis-gateway round_trip_preserves_custom_gguf_metadata
  • cargo test -p moltis-providers --features local-llm custom_model_
  • cargo test -p moltis-providers --features local-llm download_gguf_file_with_progress_downloads_to_target_path
  • cargo +nightly-2025-11-30 clippy -Z unstable-options -p moltis-gateway -p moltis-providers --all-targets -- -D warnings
  • biome check --write crates/web/src/assets/js/providers.js crates/web/ui/e2e/specs/providers.spec.js

Remaining

  • ./scripts/local-validate.sh <PR_NUMBER>
  • cd crates/web/ui && npm run e2e -- e2e/specs/providers.spec.js -g "custom local model download progress modal renders without JS errors" (playwright: command not found in this environment)

Manual QA

  • Configure a custom GGUF repo and filename from Settings -> LLMs -> Local LLM
  • Confirm the model switches into the download-progress modal immediately and appears without a restart
  • Restart the gateway and verify the custom model is still selectable, then send a chat after the download completes

Closes #132

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Mar 11, 2026

Greptile Summary

This PR fixes a regression where custom GGUF models configured via the UI were lost after a gateway restart and not shown in the download-progress modal. It achieves this through three coordinated changes: persisting hf_repo and hf_filename in LocalModelEntry so the config round-trips correctly; restoring all saved entries into the ProviderRegistry at startup via the new register_saved_local_models; and wiring an async GGUF download-with-broadcast flow so the frontend's new showModelDownloadProgress modal receives real-time progress events immediately after configuration.

Key points:

  • Path traversal risk in hf_repo: validate_hf_filename_path protects hf_filename against .. components, but hf_repo receives no equivalent check. A value of ".." (no slashes to replace) resolves cache_dir/custom/..cache_dir, writing files outside the intended custom/ sandbox.
  • Cache directory collision: hf_repo.replace('/', "__") can map two distinct repos (e.g. foo/bar__baz and foo__bar/baz) to the same cache subdirectory, causing silent overwrites.
  • URL encoding omitted: hf_repo and hf_filename are interpolated directly into the HuggingFace download URL without percent-encoding, which will break for any name containing spaces or other URL-reserved characters.
  • MLX backend has no-op progress callback: The spawned download task for MLX models uses |_| {}, so no local-llm.download events are fired and the UI progress bar stays at 0% throughout.
  • Misleading _provider naming in JS: The parameter is named _provider (convention for "unused") but is forwarded to pollLocalStatus; the underscore prefix should be removed.

Confidence Score: 2/5

  • Not safe to merge — the unvalidated hf_repo introduces a path traversal vector in the new cache-path logic, and the cache-directory collision can silently overwrite cached models.
  • Two logic issues need to be addressed before this is production-ready: (1) hf_repo path traversal bypasses the custom/ sandbox, and (2) the replace('/', "__") slug scheme has real collision cases for repos whose names include __. The URL-encoding gap is a reliability issue. The MLX no-op progress is a UX gap. The core design (persisting metadata, startup restoration, async progress broadcast) is sound and well-tested.
  • crates/providers/src/local_gguf/models.rs — both the path-traversal and collision issues live in custom_model_path; crates/gateway/src/local_llm_setup.rs — MLX no-op progress callback.

Important Files Changed

Filename Overview
crates/providers/src/local_gguf/models.rs Adds custom_model_path, is_custom_model_cached, and ensure_custom_model_with_progress; refactors ensure_model_with_progress into a shared download_gguf_file_with_progress helper. Two issues: hf_repo is not validated for path-traversal components (only hf_filename is), and the replace('/', "__") slug scheme has collision potential for repos whose names embed __.
crates/gateway/src/local_llm_setup.rs Largest change: adds hf_repo/hf_filename fields to LocalModelEntry, download_custom_gguf_model, register_saved_local_models, and the full async download-with-broadcast flow for custom GGUF. MLX backend in configure_custom_model uses a no-op progress callback, giving no UI feedback during download. The rest of the logic is sound and well-tested.
crates/gateway/src/server.rs Minimal change: calls register_saved_local_models at gateway startup behind the local-llm feature flag, restoring previously configured models into the registry. No issues found.
crates/web/src/assets/js/providers.js Adds showModelDownloadProgress export and wires it into createHfSearchResultCard for GGUF models; displays a real-time progress bar fed by local-llm.download SSE events. The _provider parameter naming is misleading since the value is forwarded to pollLocalStatus.
crates/web/ui/e2e/specs/providers.spec.js Adds a Playwright regression test that mounts a stub WebSocket, fires a 50% progress event, and asserts the progress bar and completion text render without JS errors. Well-structured, no issues.
docs/src/local-llm.md Two documentation corrections: updates the model storage path and changes "downloaded on first use" to "downloads immediately after configuration". Accurate and no issues.

Sequence Diagram

sequenceDiagram
    actor User
    participant UI as Browser (providers.js)
    participant GW as Gateway (local_llm_setup.rs)
    participant FS as Filesystem (models.rs)
    participant HF as HuggingFace

    User->>UI: Configure custom GGUF (hfRepo + hfFilename)
    UI->>GW: RPC: providers.local.configure-custom-model
    GW->>GW: Validate hf_filename (validate_hf_filename_path)
    GW->>GW: Build custom_gguf_model_id
    GW->>GW: Save LocalModelEntry {hf_repo, hf_filename} to config
    GW-->>UI: {ok: true, modelId, displayName}
    UI->>UI: showModelDownloadProgress()

    GW->>GW: tokio::spawn download task
    GW->>FS: ensure_custom_model_with_progress()
    FS->>HF: GET huggingface.co/{hf_repo}/resolve/main/{hf_filename}
    loop Progress chunks
        HF-->>FS: chunk
        FS->>GW: on_progress callback
        GW-->>UI: SSE: local-llm.download {progress}
        UI->>UI: Update progress bar
    end
    FS-->>GW: Ok(model_path)
    GW->>GW: register_local_model_entry()
    GW->>GW: status = Ready
    GW-->>UI: SSE: local-llm.download {complete: true}
    UI->>UI: fetchModels(), refreshProvidersPage()

    Note over GW,FS: On gateway restart
    GW->>GW: register_saved_local_models()
    GW->>FS: Load LocalLlmConfig
    GW->>GW: Register each saved entry into ProviderRegistry
Loading

Last reviewed commit: 02922de

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 02922de4e4

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@codspeed-hq
Copy link
Copy Markdown
Contributor

codspeed-hq bot commented Mar 11, 2026

Merging this PR will not alter performance

✅ 39 untouched benchmarks
⏩ 5 skipped benchmarks1


Comparing issue-132 (45408cc) with main (adaa74e)

Open in CodSpeed

Footnotes

  1. 5 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 11, 2026

Codecov Report

❌ Patch coverage is 73.20319% with 302 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
crates/gateway/src/local_llm_setup.rs 66.75% 243 Missing ⚠️
crates/providers/src/local_gguf/models.rs 82.10% 51 Missing ⚠️
crates/gateway/src/server.rs 92.79% 8 Missing ⚠️

📢 Thoughts on this report? Let us know!

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7eb81c6605

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: fbfae6948d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e69726b8e0

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d9f99f9d26

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1715028046

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 096bfa295f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 52e8488aae

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +1315 to +1317
if backend != "GGUF" && backend != "MLX" {
return Err(format!("invalid backend: {backend}. Must be GGUF or MLX").into());
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Reject custom MLX setup when runtime support is missing

configure_custom now starts immediate background downloads for MLX repos, but this validation only checks that backend is one of GGUF/MLX and never enforces the same MLX capability gate used in configure (Apple Silicon + mlx-lm installed). In unsupported environments, users can download large MLX models and receive a successful configuration state even though inference will fail at load time, so this path should fail fast before spawning the MLX download.

Useful? React with 👍 / 👎.

@chatgpt-codex-connector
Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.
To continue using code reviews, add credits to your account and enable them for code reviews in your settings.

@penso penso merged commit f04235c into main Mar 14, 2026
50 of 54 checks passed
@penso penso deleted the issue-132 branch March 14, 2026 17:48
penso added a commit that referenced this pull request Mar 23, 2026
* fix(local-llm): restore custom gguf downloads and startup registration

* fix(local-llm): address PR review feedback

* test(sandbox): fix off mode coverage

* fix(web): unhide local model progress modal

* fix(local-llm): address review feedback

* fix(local-llm): preserve colliding provider entries

* fix(gateway): retain custom local models after discovery

* fix(local-llm): harden custom model validation and cleanup

* fix(gateway): honor local disable flag during restore

* fix(gateway): preserve local model restore state

* test(web-ui): stabilize voice fallback websocket e2e
jmikedupont2 pushed a commit to meta-introspector/moltis that referenced this pull request Mar 23, 2026
…#417)

* fix(local-llm): restore custom gguf downloads and startup registration

* fix(local-llm): address PR review feedback

* test(sandbox): fix off mode coverage

* fix(web): unhide local model progress modal

* fix(local-llm): address review feedback

* fix(local-llm): preserve colliding provider entries

* fix(gateway): retain custom local models after discovery

* fix(local-llm): harden custom model validation and cleanup

* fix(gateway): honor local disable flag during restore

* fix(gateway): preserve local model restore state

* test(web-ui): stabilize voice fallback websocket e2e
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Model from huggingface not downloaded

1 participant