[data] Fix ES scaling

### Context

The current architecture creates one Elasticsearch index per user and per collection.
This choice was initially made to remain functionally equivalent to Qdrant, the technology previously used.

With increased usage and load, this model is showing its limits and does not comply with best practices recommended by Elastic.

AIA creates one collection per document and never deletes it, which leads to saturation of our Elasticsearch database.

---

### Security

Separating data by index **does not provide any additional security** in our architecture.

Indeed:

* Elasticsearch is not directly exposed
* All access goes through our API
* Isolation (user, collection) is enforced at the application level

In the event of an API bug, unauthorized access is possible **regardless of the number of indices**.
Security therefore relies on the API, not on the index structure.

Index-level isolation is only relevant if Elasticsearch is:

* used by several distinct services
* or directly exposed to clients ➡️ which is not our case.

---

### Limitations of the Current Model

* The multiplication of indices leads to an excessive number of shards
* This degrades cluster performance and stability
* Elastic explicitly discourages this model

Moreover, physical separation by business criteria (e.g., organization) would greatly complicate:

* organizational changes
* data migrations
* overall system maintenance

---

## Planned Actions

💡

The goal is not to change AIA’s implementation, but to introduce constraints that will allow us to ensure the long-term sustainability of the infrastructure.

Even if AIA eventually migrates to another system, these actions are necessary to maintain the RAG system.

### Consolidate Elasticsearch indices (4 days)

* Move from one index per user / collection to **a single global index** (or very few indices)
* Isolation ensured through fields (`user_id`, `collection_id`) and mandatory application-level filters
* Objectives:

  * drastically reduce the number of shards
  * improve stability and scalability
  * align with Elastic’s recommendations

---

### Limit document volume per user (1 day)

* Define a configurable total volume cap per user, recommended: 2 GB
* Block uploads when the threshold is reached
* Objectives:

  * prevent abuse or uncontrolled usage
  * ensure fairness between users
  * protect the infrastructure

**→ For next milestone**
---

### Implement a TTL per collection (2 days) 

* Define a configurable retention period per collection
* Apply this TTL by default to all new collections
* Allow users to choose not to define a TTL
* Automatically delete documents once they expire
* Objectives:

  * control data volume
  * reduce costs
  * align with usage patterns (temporary vs. long-lived documents)
* Possible implementation via:

  * an `expires_at` field

**→ For next milestone**
---

### Limit the number of collections per user (1 day)

* Define a configurable cap on the number of collections per user, recommended: 100
* Block creation once the threshold is reached
* Objectives:

  * limit abuse (e.g., massive creation of empty collections)
  * prevent pathological usage patterns
  * simplify product governance
  * simplify technical evaluations

#### Risks / Drawbacks

* Requires AIA to revise their implementation

**→ For next milestone**


- **+ issue 618**

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[data] Fix ES scaling #643

Context

Security

Limitations of the Current Model

Planned Actions

Consolidate Elasticsearch indices (4 days)

Limit document volume per user (1 day)

→ For next milestone

Implement a TTL per collection (2 days)

→ For next milestone

Limit the number of collections per user (1 day)

Risks / Drawbacks

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[data] Fix ES scaling #643

Description

Context

Security

Limitations of the Current Model

Planned Actions

Consolidate Elasticsearch indices (4 days)

Limit document volume per user (1 day)

→ For next milestone

Implement a TTL per collection (2 days)

→ For next milestone

Limit the number of collections per user (1 day)

Risks / Drawbacks

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions