fix(libsql): support flexible embedding dimensions#534
Conversation
The libSQL schema hardcoded F32_BLOB(1536) for the embedding column, preventing use of models with other dimensions (e.g. 768-dim nomic-embed-text). This adds incremental migration support to the libSQL backend and a V9 migration that rebuilds the memory_chunks table with a plain BLOB column accepting any dimension. - Add incremental migration infrastructure (INCREMENTAL_MIGRATIONS array + run_incremental() runner tracked via _migrations table) - V9 migration rebuilds memory_chunks with BLOB column, drops the vector index (which requires fixed-dimension F32_BLOB) - Update base schema for fresh installs (BLOB, no vector index) - Vector search gracefully falls back to FTS-only when the index is absent (matches PostgreSQL behavior after its V9 migration) - Remove now-incorrect "dimension is not 1536" warnings Existing embeddings are preserved during migration. Users only need to re-embed if they change their embedding model/dimension. [skip-regression-check] Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the libSQL database backend by introducing support for flexible embedding dimensions. It achieves this by updating the Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces support for flexible embedding dimensions in the libSQL backend, which is a great improvement for users who want to use different embedding models. The changes include a new incremental migration system, a V9 migration to alter the memory_chunks table, and updates to gracefully handle the absence of a vector index. The implementation is well-structured. I've found one potential issue regarding the atomicity of migrations which could lead to an inconsistent database state if a migration is interrupted.
src/db/libsql_migrations.rs
Outdated
| conn.execute_batch(sql).await.map_err(|e| { | ||
| DatabaseError::Migration(format!( | ||
| "libSQL migration V{version} ({name}) failed: {e}" | ||
| )) | ||
| })?; | ||
|
|
||
| // Record as applied | ||
| conn.execute( | ||
| "INSERT INTO _migrations (version, name) VALUES (?1, ?2)", | ||
| libsql::params![version, name], | ||
| ) | ||
| .await | ||
| .map_err(|e| { | ||
| DatabaseError::Migration(format!( | ||
| "Failed to record migration V{version} ({name}): {e}" | ||
| )) | ||
| })?; |
There was a problem hiding this comment.
The current implementation of run_incremental executes the migration SQL and records its completion as two separate operations. If the application were to be interrupted after the migration SQL runs but before it's recorded in the _migrations table, the migration would be partially or fully applied but not marked as complete. This could lead to failures on subsequent startups when the application tries to re-run the same migration on an altered schema.
To ensure atomicity, each migration should be wrapped in a database transaction. This guarantees that either both the migration and its tracking record are committed, or neither is, preventing an inconsistent state. The libsql crate provides a convenient transaction() API that can be used for this purpose. The transaction will automatically roll back if not explicitly committed.
let tx = conn.transaction().await.map_err(|e| {
DatabaseError::Migration(format!(
"Failed to start transaction for V{version} ({name}): {e}"
))
})?;
tx.execute_batch(sql).await.map_err(|e| {
DatabaseError::Migration(format!(
"libSQL migration V{version} ({name}) failed: {e}"
))
})?;
// Record as applied
tx.execute(
"INSERT INTO _migrations (version, name) VALUES (?1, ?2)",
libsql::params![version, name],
)
.await
.map_err(|e| {
DatabaseError::Migration(format!(
"Failed to record migration V{version} ({name}): {e}"
))
})?;
tx.commit().await.map_err(|e| {
DatabaseError::Migration(format!(
"Failed to commit transaction for V{version} ({name}): {e}"
))
})?;There was a problem hiding this comment.
Addressed in d3cf7a4. The migration execution and recording are now wrapped in a conn.transaction() with explicit tx.commit(), so they succeed or fail atomically. If the process crashes mid-migration, the transaction rolls back and the migration retries on next startup.
Address PR review feedback: if the process crashes after executing migration SQL but before recording it in _migrations, the migration would be applied but not marked complete. Wrapping both operations in a transaction ensures they succeed or fail together. [skip-regression-check] Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix(libsql): support flexible embedding dimensions (nearai#494) The libSQL schema hardcoded F32_BLOB(1536) for the embedding column, preventing use of models with other dimensions (e.g. 768-dim nomic-embed-text). This adds incremental migration support to the libSQL backend and a V9 migration that rebuilds the memory_chunks table with a plain BLOB column accepting any dimension. - Add incremental migration infrastructure (INCREMENTAL_MIGRATIONS array + run_incremental() runner tracked via _migrations table) - V9 migration rebuilds memory_chunks with BLOB column, drops the vector index (which requires fixed-dimension F32_BLOB) - Update base schema for fresh installs (BLOB, no vector index) - Vector search gracefully falls back to FTS-only when the index is absent (matches PostgreSQL behavior after its V9 migration) - Remove now-incorrect "dimension is not 1536" warnings Existing embeddings are preserved during migration. Users only need to re-embed if they change their embedding model/dimension. [skip-regression-check] Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: wrap incremental migrations in transaction for atomicity Address PR review feedback: if the process crashes after executing migration SQL but before recording it in _migrations, the migration would be applied but not marked complete. Wrapping both operations in a transaction ensures they succeed or fail together. [skip-regression-check] Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * chore: merge main and fix formatting drift Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* fix(libsql): support flexible embedding dimensions (nearai#494) The libSQL schema hardcoded F32_BLOB(1536) for the embedding column, preventing use of models with other dimensions (e.g. 768-dim nomic-embed-text). This adds incremental migration support to the libSQL backend and a V9 migration that rebuilds the memory_chunks table with a plain BLOB column accepting any dimension. - Add incremental migration infrastructure (INCREMENTAL_MIGRATIONS array + run_incremental() runner tracked via _migrations table) - V9 migration rebuilds memory_chunks with BLOB column, drops the vector index (which requires fixed-dimension F32_BLOB) - Update base schema for fresh installs (BLOB, no vector index) - Vector search gracefully falls back to FTS-only when the index is absent (matches PostgreSQL behavior after its V9 migration) - Remove now-incorrect "dimension is not 1536" warnings Existing embeddings are preserved during migration. Users only need to re-embed if they change their embedding model/dimension. [skip-regression-check] Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: wrap incremental migrations in transaction for atomicity Address PR review feedback: if the process crashes after executing migration SQL but before recording it in _migrations, the migration would be applied but not marked complete. Wrapping both operations in a transaction ensures they succeed or fail together. [skip-regression-check] Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * chore: merge main and fix formatting drift Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Summary
Fixes #494.
_migrationstable tracking +run_incremental()runner)memory_chunkstable withBLOBcolumn (wasF32_BLOB(1536)) so any embedding dimension workslibsql_vector_idxvector index (requires fixed-dimensionF32_BLOB(N)); vector search gracefully falls back to FTS-only, matching PostgreSQL behavior after its V9 migrationBLOBcolumn)app.rsandmain.rsUpgrade path: Existing users get the V9 migration automatically on next startup. All data (documents, chunks, embeddings) is preserved. Users only need to re-embed if they change their embedding model/dimension, which
backfill_embeddings()handles.Test plan
cargo test --lib-- 1853 passed, 0 failedcargo clippy --all --all-features-- zero warningscargo check --no-default-features --features libsql-- compilesGenerated with Claude Code