Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/features/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -114,3 +114,4 @@

- [Vector Search (k-NN)](k-nn/vector-search-k-nn.md)
- [k-NN Explain API](k-nn/explain-api.md)
- [Lucene On Faiss (Memory Optimized Search)](k-nn/lucene-on-faiss.md)
209 changes: 209 additions & 0 deletions docs/features/k-nn/lucene-on-faiss.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,209 @@
# Lucene On Faiss (Memory Optimized Search)

## Summary

Lucene-on-Faiss is a hybrid vector search approach that enables OpenSearch to perform vector searches on FAISS HNSW indexes without loading the entire index into memory. By combining Lucene's efficient HNSW search algorithm with FAISS's high-performance index format, this feature allows vector search operations in memory-constrained environments while maintaining strong recall performance.

The feature addresses a fundamental limitation of FAISS: the requirement to load entire vector indexes into memory. With Lucene-on-Faiss, users can run vector searches on large FAISS indexes even when available memory is less than the index size.

## Details

### Architecture

```mermaid
graph TB
subgraph "Query Processing"
Query[Search Query] --> KNNQueryBuilder
KNNQueryBuilder --> |Check Settings| MemoryOptCheck{memory_optimized_search?}
end

subgraph "Memory Optimized Path"
MemoryOptCheck --> |true| LuceneSearcher[Lucene HnswGraphSearcher]
LuceneSearcher --> FaissHnswGraph[FaissHnswGraph Adapter]
FaissHnswGraph --> FaissHNSW[FaissHNSW Structure]
FaissHNSW --> IndexInput[Lucene IndexInput]
IndexInput --> |On-demand read| FaissFile[(FAISS Index File)]
end

subgraph "Traditional Path"
MemoryOptCheck --> |false| FaissJNI[FAISS JNI Layer]
FaissJNI --> NativeMemory[Native Memory]
NativeMemory --> FaissFile
end

subgraph "Vector Values"
FaissHnswGraph --> VectorValues[FloatVectorValues / ByteVectorValues]
VectorValues --> IndexInput
end
```

### Data Flow

```mermaid
flowchart LR
subgraph "Index Loading"
A[Open Index] --> B[Parse FAISS Header]
B --> C[Mark Section Offsets]
C --> D[Skip to Next Section]
D --> E[Store FaissIndex Structure]
end

subgraph "Search Execution"
F[Query Vector] --> G[Create VectorSearcher]
G --> H[Build FaissHnswGraph]
H --> I[Navigate HNSW Levels]
I --> J[Fetch Neighbors On-Demand]
J --> K[Compute Distances]
K --> L[Collect Top-K Results]
end

E --> G
```

### Components

| Component | Description |
|-----------|-------------|
| `FaissIndex` | Abstract base class for FAISS index types with partial loading support |
| `FaissIdMapIndex` | Handles ID mapping between internal vector IDs and Lucene document IDs |
| `FaissHNSWIndex` | Represents FAISS HNSW index with flat vector storage |
| `FaissHNSW` | HNSW graph structure with neighbor lists and level information |
| `FaissHnswGraph` | Lucene `HnswGraph` adapter that wraps `FaissHNSW` |
| `FaissMemoryOptimizedSearcher` | `VectorSearcher` implementation for FAISS indexes |
| `FaissMemoryOptimizedSearcherFactory` | Factory for creating memory-optimized searchers |
| `FaissIndexFloatFlat` | Float vector storage (L2 and Inner Product) |
| `FaissIndexScalarQuantizedFlat` | Scalar quantized vector storage (8-bit, FP16) |
| `FaissSection` | Represents a section in FAISS index file with offset and size |
| `MemoryOptimizedSearchSupportSpec` | Determines if a field configuration supports memory-optimized search |
| `VectorSearcher` | Interface for vector search compatible with Lucene's search API |
| `VectorSearcherFactory` | Factory interface for creating `VectorSearcher` instances |

### Configuration

| Setting | Description | Default | Scope |
|---------|-------------|---------|-------|
| `index.knn.memory_optimized_search` | Enable memory-optimized search for FAISS indexes | `false` | Index |

### Supported Configurations

| Engine | Method | Space Types | Vector Types | Encoders |
|--------|--------|-------------|--------------|----------|
| FAISS | HNSW | L2, INNER_PRODUCT | FLOAT, BYTE | flat, sq |

### Usage Example

#### Enable via Index Setting

```json
PUT /my-vector-index
{
"settings": {
"index.knn": true,
"index.knn.memory_optimized_search": true
},
"mappings": {
"properties": {
"my_vector": {
"type": "knn_vector",
"dimension": 768,
"method": {
"name": "hnsw",
"engine": "faiss",
"space_type": "l2",
"parameters": {
"ef_construction": 128,
"m": 16
}
}
}
}
}
}
```

#### Enable via On-Disk Mode

```json
PUT /my-vector-index
{
"mappings": {
"properties": {
"my_vector": {
"type": "knn_vector",
"dimension": 768,
"mode": "on_disk",
"compression_level": "1x"
}
}
}
}
```

#### Search Query

```json
GET /my-vector-index/_search
{
"query": {
"knn": {
"my_vector": {
"vector": [0.1, 0.2, ...],
"k": 10
}
}
}
}
```

### Performance Characteristics

Based on benchmarks with Cohere-10M dataset:

| Configuration | QPS Change vs FAISS C++ | Recall Change |
|---------------|-------------------------|---------------|
| FP32 (k=30) | -9.56% | +0.14% |
| FP16 (k=30) | -40.43% | +0.31% |
| 8x quantization (k=30) | +76.85% | -2.76% |
| 16x quantization (k=30) | +85.10% | -3.48% |
| 32x quantization (k=30) | +51.52% | -4.52% |
| 32x quantization (k=100) | +107.27% | -1.72% |

Key observations:
- For quantized indexes, Lucene-on-Faiss can achieve up to 2x throughput improvement
- Slight recall reduction (up to 4.5%) due to Lucene's early termination logic
- Enables running large indexes (e.g., 30GB) on memory-constrained instances (e.g., 8GB RAM)

## Limitations

- **Engine Support**: Only FAISS engine is supported
- **Method Support**: Only HNSW algorithm is supported; IVF and PQ are not yet supported
- **Quantization**: `QuantizationConfig` is not supported with memory-optimized search
- **Vector Types**: Only FLOAT and BYTE data types are supported
- **Space Types**: Only L2 and INNER_PRODUCT are supported
- **Result Consistency**: Results may differ slightly from full-memory FAISS search due to differences in loop termination conditions between Lucene and FAISS

## Related PRs

| Version | PR | Description |
|---------|-----|-------------|
| v3.0.0 | [#2630](https://github.com/opensearch-project/k-NN/pull/2630) | Main implementation (10 sub-PRs combined) |
| v3.0.0 | [#2581](https://github.com/opensearch-project/k-NN/pull/2581) | Building blocks for memory optimized search |
| v3.0.0 | [#2590](https://github.com/opensearch-project/k-NN/pull/2590) | IxMp section loading logic |
| v3.0.0 | [#2594](https://github.com/opensearch-project/k-NN/pull/2594) | FaissHNSW graph implementation |
| v3.0.0 | [#2598](https://github.com/opensearch-project/k-NN/pull/2598) | FAISS float flat index |
| v3.0.0 | [#2604](https://github.com/opensearch-project/k-NN/pull/2604) | FaissIndexScalarQuantizedFlat |
| v3.0.0 | [#2618](https://github.com/opensearch-project/k-NN/pull/2618) | Byte index, FP16 index decoding |
| v3.0.0 | [#2608](https://github.com/opensearch-project/k-NN/pull/2608) | VectorReader integration |
| v3.0.0 | [#2616](https://github.com/opensearch-project/k-NN/pull/2616) | Index setting implementation |
| v3.0.0 | [#2621](https://github.com/opensearch-project/k-NN/pull/2621) | CAGRA index partial loading |
| v3.0.0 | [#2609](https://github.com/opensearch-project/k-NN/pull/2609) | Monotonic integer encoding for HNSW |

## References

- [RFC Issue #2401](https://github.com/opensearch-project/k-NN/issues/2401): Partial loading with FAISS engine - detailed design document
- [Documentation: Memory-optimized vectors](https://docs.opensearch.org/3.0/field-types/supported-field-types/knn-memory-optimized/)
- [Blog: Lucene-on-Faiss](https://opensearch.org/blog/lucene-on-faiss-powering-opensearchs-high-performance-memory-efficient-vector-search/)

## Change History

- **v3.0.0** (2025-03-28): Initial implementation with HNSW support for FAISS engine
145 changes: 145 additions & 0 deletions docs/releases/v3.0.0/features/k-nn/lucene-on-faiss.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,145 @@
# Lucene On Faiss (Memory Optimized Search)

## Summary

OpenSearch v3.0.0 introduces a new memory-optimized search mode for FAISS HNSW indexes called "Lucene-on-Faiss". This hybrid approach enables vector search on FAISS indexes in memory-constrained environments by combining Lucene's efficient HNSW search algorithm with FAISS's high-performance index format. Users can enable this feature via the `index.knn.memory_optimized_search` index setting.

## Details

### What's New in v3.0.0

The Lucene-on-Faiss feature addresses a fundamental limitation of FAISS: the requirement to load entire vector indexes into memory. By implementing partial loading, OpenSearch can now run vector searches on FAISS indexes without loading all data into memory upfront.

Key capabilities:
- Run vector search on FAISS HNSW indexes under memory-constrained environments
- Partial loading of FAISS index sections on demand
- Transparent integration with existing FAISS indexes
- Support for FP32, FP16, and scalar quantized (8-bit) vectors

### Technical Changes

#### Architecture Changes

```mermaid
graph TB
subgraph "Memory Optimized Search Flow"
Query[Query Vector] --> KNNQuery[KNNQueryBuilder]
KNNQuery --> |memory_optimized_search=true| LuceneSearch[Lucene HNSW Searcher]
KNNQuery --> |memory_optimized_search=false| FaissSearch[FAISS C++ Search]

LuceneSearch --> FaissGraph[FaissHnswGraph Adapter]
FaissGraph --> IndexInput[Lucene IndexInput]
IndexInput --> FaissFile[FAISS Index File]

FaissSearch --> NativeMemory[Native Memory]
NativeMemory --> FaissFile
end
```

#### New Components

| Component | Description |
|-----------|-------------|
| `FaissIndex` | Base class for parsing FAISS index file sections |
| `FaissHNSW` | Represents FAISS HNSW graph structure with partial loading |
| `FaissHnswGraph` | Lucene HnswGraph adapter wrapping FAISS HNSW |
| `FaissMemoryOptimizedSearcher` | VectorSearcher implementation for FAISS indexes |
| `VectorSearcherFactory` | Factory interface for creating memory-optimized searchers |
| `MemoryOptimizedSearchSupportSpec` | Determines if a field supports memory-optimized search |

#### New Configuration

| Setting | Description | Default |
|---------|-------------|---------|
| `index.knn.memory_optimized_search` | Enable memory-optimized search mode for FAISS indexes | `false` |

### Usage Example

```json
PUT /my-index
{
"settings": {
"index.knn": true,
"index.knn.memory_optimized_search": true
},
"mappings": {
"properties": {
"my_vector": {
"type": "knn_vector",
"dimension": 768,
"method": {
"name": "hnsw",
"engine": "faiss",
"space_type": "l2"
}
}
}
}
}
```

Alternatively, use the `on_disk` mode with `1x` compression:

```json
PUT /my-index
{
"mappings": {
"properties": {
"my_vector": {
"type": "knn_vector",
"dimension": 768,
"mode": "on_disk",
"compression_level": "1x"
}
}
}
}
```

### How It Works

1. **Index Loading**: Instead of loading the entire FAISS index into memory, the system marks section offsets and skips to the next section
2. **Search Execution**: When a search is triggered, Lucene's `HnswGraphSearcher` navigates the FAISS HNSW graph via the `FaissHnswGraph` adapter
3. **On-Demand Loading**: Vector data and neighbor lists are fetched via Lucene's `IndexInput` as needed during search
4. **Score Computation**: Distance calculations use FAISS's optimized SIMD operations where available

### Migration Notes

- Existing FAISS indexes work without reindexing
- Enable the setting on new or existing indexes
- Performance may vary based on storage I/O characteristics

## Limitations

- Supported only for FAISS engine with HNSW algorithm
- Training-based methods (IVF, PQ) are not yet supported
- Quantization via `QuantizationConfig` is not supported with memory-optimized search
- Only FLOAT and BYTE vector data types are supported
- Only L2 and INNER_PRODUCT space types are supported
- Results may differ slightly from full-memory FAISS search due to Lucene's early termination logic

## Related PRs

| PR | Description |
|----|-------------|
| [#2630](https://github.com/opensearch-project/k-NN/pull/2630) | Main implementation combining 10 sub-PRs |
| [#2581](https://github.com/opensearch-project/k-NN/pull/2581) | Building blocks for memory optimized search |
| [#2590](https://github.com/opensearch-project/k-NN/pull/2590) | IxMp section loading logic |
| [#2594](https://github.com/opensearch-project/k-NN/pull/2594) | FaissHNSW graph implementation |
| [#2598](https://github.com/opensearch-project/k-NN/pull/2598) | FAISS float flat index |
| [#2604](https://github.com/opensearch-project/k-NN/pull/2604) | FaissIndexScalarQuantizedFlat |
| [#2618](https://github.com/opensearch-project/k-NN/pull/2618) | Byte index, FP16 index decoding |
| [#2608](https://github.com/opensearch-project/k-NN/pull/2608) | VectorReader integration |
| [#2616](https://github.com/opensearch-project/k-NN/pull/2616) | Index setting implementation |
| [#2621](https://github.com/opensearch-project/k-NN/pull/2621) | CAGRA index partial loading |
| [#2609](https://github.com/opensearch-project/k-NN/pull/2609) | Monotonic integer encoding for HNSW |

## References

- [RFC Issue #2401](https://github.com/opensearch-project/k-NN/issues/2401): Partial loading with FAISS engine
- [Documentation: Memory-optimized vectors](https://docs.opensearch.org/3.0/field-types/supported-field-types/knn-memory-optimized/)
- [Blog: Lucene-on-Faiss](https://opensearch.org/blog/lucene-on-faiss-powering-opensearchs-high-performance-memory-efficient-vector-search/)

## Related Feature Report

- [Full feature documentation](../../../../features/k-nn/lucene-on-faiss.md)
1 change: 1 addition & 0 deletions docs/releases/v3.0.0/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -117,3 +117,4 @@

- [Vector Search (k-NN)](features/k-nn/vector-search-k-nn.md)
- [Explain API Support](features/k-nn/explain-api-support.md)
- [Lucene On Faiss (Memory Optimized Search)](features/k-nn/lucene-on-faiss.md)