[Proposal] Rule Matching

**Author: Kaushal Kumar**
Pre-read for this issue: https://github.com/opensearch-project/OpenSearch/issues/16797

### Please describe the end goal of this project

This document covers the Rule Matching for incoming search requests in depth. After reading this document, reader will have sound understanding of the following

1. **How the request attributes are extracted ?**
2. **How exactly does the matching algorithm works on the extracted attributes ?**
3. **What are the new constructs and how they work with each other ?**

### Assumptions

* Rule attributes (cardinality) will be limited in nature at feature level and aggregated count.
    * When creating Rules the control will first go to the feature to do the
        * Cardinality validation
        * Attributes validation
* Each feature will define their own attributes which the feature will be responsible for validating.



### New Rule insertion process
Rule schema
```
{
    
   "attribute1": ["value*"],
   "attribute2": ["value*"],
   "label": "fjagjag9243421_425285",
   "updatedAt": "12-03-2024T18:00:23Z",
   "feature": "WLM"
}
```

New rule insertion works pretty much how insertion in [compressed trie](https://en.wikipedia.org/wiki/Radix_tree) works. But given each attribute can take N(this will be limited e,g; 5 at max) number there is a chance that two or more rules could share one of the values in their respective attribute values list.
In this case whichever rule was created or modified last will override the label for that attribute in the trie. 

### Attribute extraction process

Given that the Rule attributes are specific to a feature, feature will provide a construct to extract the attributes. For example WLM feature would use threadContext to retrieve the authN info and request itself to retrieve the indices.

### Rule Matching Process

In a nutshell the rule matching will be per attribute trie (prefix tree) based matching. Hence ultimately we will be dealing with a list of values per attribute. How this per attribute based list is calculated, we will understand that with the help of some examples 
![RulesMatching](https://github.com/user-attachments/assets/d86b30a6-ae51-4c76-bd15-21baf7011ff7)



lets say above picture captures a subpart of the in-memory rules Trie. Now following scenarios can occur while calculating possible labels for an attribute value. 


#### Scenario - 1

When the search for the attribute value ends on a node which has a label 

`search_value: log_q3_2024`

In this case the resulting label would be abc.


#### Scenario - 2

When the search ends on a node which doesn’t have a label then there are following sub scenarios

* There is an empty prefix match for this value i,e the search ends at the root node which holds the empty value ( e,g; `order_q3_2024`)
    * In this case we assign nothing and downstream systems can treat this as default behavior
* There is a partial match, then it should return top 10 results from the subtree. The order is decided by depth of the node.
    * for example for search string `logs_q2_2024` , in this case it would return `[abc, pqr]`


Now the above run down shows how the labels are decided per attribute value but given the request carries multiple values and rule also contains multiple attributes, how to calculate the final value. Now we will select the one which is occurring in all attribute based return label lists .

Lets say  for a feature based label evaluation we are considering two attributes and corresponding labels as per the above algorithm are [abc, xyz] and [abc,pqr] now the resulting label here will be abc for this request.

Even with multiple lists following scenarios can occur

1. All lists have nothing in common ==>>  In this case the request will be treated with a default label
2. All lists have only one value in common ==>> This will be the resultant label for the request
3. All lists have multiple values in common ==>> In this case to decide the resulting label. we will consider the depth as the tie breaker i,e; label with the min depth will be selected.




### Low Level Design

Following UML and interaction diagrams shows the new constructs, their relationships and how they interact with each other.
![RuleMatchingUML](https://github.com/user-attachments/assets/9368b49f-2358-4aad-ad83-a7486290ec08)
![RulesSequenceDiagram](https://github.com/user-attachments/assets/84578430-969f-4f3d-a3b2-e7f2ada6b35e)
**InMemoryRuleEngine** will save the label in **ThreadContext** which downstream transport actions can consume.

### Supporting References

* https://github.com/opensearch-project/OpenSearch/issues/16813
* https://github.com/opensearch-project/OpenSearch/issues/16797

### Issues

https://github.com/opensearch-project/OpenSearch/issues/16797

### Related component

Search

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Proposal] Rule Matching #16888

Please describe the end goal of this project

Assumptions

New Rule insertion process

Attribute extraction process

Rule Matching Process

Scenario - 1

Scenario - 2

Low Level Design

Supporting References

Issues

Related component

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Proposal] Rule Matching #16888

Description

Please describe the end goal of this project

Assumptions

New Rule insertion process

Attribute extraction process

Rule Matching Process

Scenario - 1

Scenario - 2

Low Level Design

Supporting References

Issues

Related component

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions