Skip to content

feat: add embedding APIs, batching, and docs#75

Merged
leehack merged 6 commits intomainfrom
feat/embedding-support
Feb 28, 2026
Merged

feat: add embedding APIs, batching, and docs#75
leehack merged 6 commits intomainfrom
feat/embedding-support

Conversation

@leehack
Copy link
Copy Markdown
Owner

@leehack leehack commented Feb 28, 2026

Summary

  • add first-class embedding support in LlamaEngine with optional backend capability interfaces and native backend implementations (embed and embedBatch)
  • introduce native worker/service embedding paths with multi-sequence batching controls (ModelParams.maxParallelSequences) plus benchmark/sweep tooling for throughput analysis
  • add embedding example CLI, refresh README/example/website docs, and update template parity mapping for new vendored llama.cpp template fixture

Validation

  • dart analyze
  • dart test -p vm -j 1 --exclude-tags local-only
  • dart test (in example/basic_app)
  • npm run build (in website)

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Feb 28, 2026

Codecov Report

❌ Patch coverage is 29.79592% with 172 lines in your changes missing coverage. Please review.
✅ Project coverage is 76.81%. Comparing base (d0734a0) to head (52d156c).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
lib/src/backends/llama_cpp/llama_cpp_service.dart 7.18% 168 Missing ⚠️
lib/src/backends/llama_cpp/llama_cpp_backend.dart 92.59% 2 Missing ⚠️
lib/src/backends/llama_cpp/worker.dart 83.33% 2 Missing ⚠️

❌ Your patch status has failed because the patch coverage (29.79%) is below the target coverage (70.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files
@@            Coverage Diff             @@
##             main      #75      +/-   ##
==========================================
+ Coverage   76.48%   76.81%   +0.33%     
==========================================
  Files          66       66              
  Lines        8338     8579     +241     
==========================================
+ Hits         6377     6590     +213     
- Misses       1961     1989      +28     
Flag Coverage Δ
unittests 76.81% <29.79%> (+0.33%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@leehack
Copy link
Copy Markdown
Owner Author

leehack commented Feb 28, 2026

Coverage update after backend/worker/model-params test expansion:

  • Fixed a platform-sensitive assertion in llama_cpp_service_test (backend info now asserts CPU presence instead of exact list).
  • Added and expanded unit coverage in backend routing, worker routing, and model params paths.
  • Validation: dart analyze, targeted tests, and full VM suite (dart test -p vm -j 1 --exclude-tags local-only) all pass.
  • Refreshed VM coverage (coverage/lcov.info). Current file coverage:
    • lib/src/backends/llama_cpp/llama_cpp_service.dart: 42.55% (680/1598)
    • lib/src/backends/llama_cpp/llama_cpp_backend.dart: 94.47% (239/253)
    • lib/src/backends/llama_cpp/worker.dart: 90.98% (111/122)
    • lib/src/core/models/inference/model_params.dart: 92.31% (12/13)
  • Overall lib coverage now 76.27% (6544/8580).

@leehack
Copy link
Copy Markdown
Owner Author

leehack commented Feb 28, 2026

Web bridge update completed:

  • Published new bridge asset release: leehack/llama-web-bridge-assets@v0.1.6 (llama.cpp b8157).
  • Updated llamadart web defaults/pins/docs to v0.1.6:
    • scripts/fetch_webgpu_bridge_assets.sh
    • example/chat_app/web/index.html
    • doc/webgpu_bridge.md
    • website/docs/platforms/webgpu-bridge.md
    • website/versioned_docs/version-0.6.4/platforms/webgpu-bridge.md
    • CHANGELOG.md (Unreleased note)
  • Validation: dart analyze passed in llamadart.

Reference release: https://github.com/leehack/llama-web-bridge-assets/releases/tag/v0.1.6

@leehack
Copy link
Copy Markdown
Owner Author

leehack commented Feb 28, 2026

Upstream tracking for the web bridge bump:

@leehack
Copy link
Copy Markdown
Owner Author

leehack commented Feb 28, 2026

Web embeddings are now wired and validated.

What changed in this branch:

  • Added web backend embedding support (LlamaEngine.embed / embedBatch) via WebGPU bridge APIs.
  • Updated web backend wrappers/capabilities so embeddings resolve correctly on web.
  • Added browser unit coverage for web embeddings and legacy bridge fallback behavior.
  • Bumped default bridge assets to leehack/llama-web-bridge-assets@v0.1.7.

Validation run:

  • dart analyze
  • dart test -p chrome test/unit/backends/webgpu/webgpu_backend_test.dart test/unit/backends/web/web_backend_test.dart
  • WEBGPU_BRIDGE_ASSETS_TAG=v0.1.7 ./scripts/fetch_webgpu_bridge_assets.sh (checksum verification passed)

Upstream references:

@leehack
Copy link
Copy Markdown
Owner Author

leehack commented Feb 28, 2026

Additional browser integration coverage added:

  • test/integration/backends/webgpu/webgpu_engine_multimodal_browser_integration_test.dart
    • new assertion path for LlamaEngine.embed(...) and embedBatch(...) over the mock WebGPU bridge.

Re-validated:

  • dart analyze
  • dart test -p chrome test/unit/backends/webgpu/webgpu_backend_test.dart test/unit/backends/web/web_backend_test.dart test/integration/backends/webgpu/webgpu_engine_multimodal_browser_integration_test.dart

@leehack leehack merged commit 9a84976 into main Feb 28, 2026
6 checks passed
@leehack leehack deleted the feat/embedding-support branch February 28, 2026 23:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants