-
Notifications
You must be signed in to change notification settings - Fork 1
Severe performance degradation and massive allocations on indexed string queries (Potential Full Table Scan fallback) compared to LiteDB #31
Description
Hi, I am currently evaluating migrating a desktop application's data layer from LiteDB to BLite. The primary motivation is to fully support Native AOT and we have high expectations for BLite's performance, especially given the official benchmarks and its zero-reflection architecture.
However, during our local integration tests and subsequent BenchmarkDotNet profiling, we observed severe performance bottlenecks when querying data by indexed string fields.
The Issue:
Queries that filter by an indexed string field (using either .FindAsync() or .AsQueryable().Where()) are executing magnitudes slower than LiteDB and allocating an enormous amount of memory.
The discrepancy between our findings and the official BLite benchmarks makes me wonder if there is a specific API usage pattern I am missing, or if there is a defect in the Query Provider causing it to fail to use the B-Tree index and fallback to a full collection scan.
Benchmark Results:
We generated 5,000 dummy records representing photos, distributed across 20 folders. We then randomly sampled records and queried them in a loop.
Here are the BenchmarkDotNet results:
Technical Observations & Hypothesis:
Looking at the BLite: Query by FilePath row: querying a uniquely identifiable string field takes ~6.29 seconds and allocates 158.81 MB of memory for a database of only 5,000 records.
This strongly implies that:
The B-Tree Index is being completely ignored for string property queries.
The query provider might be failing to translate the closure variable (e.g., x.FilePath == localVariable) in the Expression Tree into an Index Seek operation.
As a result, it seems to fall back to scanning the entire collection and deserializing every single BSON document into memory repeatedly.
Code Snippet Context:
Here is how the Context and Indexes are configured:
protected override void OnModelCreating(ModelBuilder modelBuilder)
{
modelBuilder.Entity<BenchmarkPhotoPo>()
.ToCollection("Photos")
.HasIndex(x => x.Id, unique: true)
.HasIndex(x => x.SourceId)
.HasIndex(x => x.FilePath, unique: true);
}
And here is the exact query executing in the Benchmark loop:
// Tried both AsQueryable and FindAsync, similar slow results
var result = await _bliteDb.Photos.FindAsync(x => x.FilePath == filePath).FirstOrDefaultAsync();
Reproduction Repo:
I have created a minimal, reproducible BenchmarkDotNet project here: PerformanceBenchmark
Could you please help shed some light on this? Is there a recommended/different way to query indexed string fields to ensure a proper Index Seek, or is this a known limitation/bug in the current query optimizer for string types?
Thank you for your fantastic work on this AOT-friendly database, looking forward to your insights!