Extend process module

Currently the process module has the following functions:

| function | kind | explanation |
|------------|--------|-----------------|
| extractOne | one x many | returns the best match as (choice, score, index/key) |
| extract | one x many | returns the best matches until limit as list[(choice, score, index/key)] |
| extract_iter | one x many | generator yielding (choice, score, index/key). Usage is not really recommended, since it is far slower than the others |
| cdist | many x many | returns all results as numpy matrix |

It would be nice to have equivalents of `extractOne` / `extract` for  `many x many`. They would need less memory than `cdist`, which can take a large amount of memory when `len(queries)` and `len(choices)` are large.

| function | kind | explanation |
|------------|--------|-----------------|
| - | many x many | returns the best matches as list[(choice, score, index)] |
| - | many x many | returns the best matches until limit as list[list[(choice, score, index)]] |
| - | one x many | returns all result without any sorting like cdist |

A first thought might be to overload the existing `extractOne` / `extract` on the type passed as `query` / `queries`. However this is not possible, since the following is a valid usage of these methods:
```python
extractOne(["hello", "world"], [["hello", "world"]])
```
which can not be distinguished from `many x many`. For this reason these functions need a new API.

Beside this in many cases users are not actually interested, but only care about finding elements with a score, which is better than the score_cutoff. These could potentially be implemented more efficiently, since the implementation could quit once it is known, that they are better than `score_cutoff`. These could be cases:

| function | kind | explanation |
|------------|--------|-----------------|
| - | many x many | returns matrix of bool |
| - | one x many | returns list of bool when there is a matching choice (e.g. https://stackoverflow.com/questions/70770842/matching-strings-within-two-lists/70780527#70780527) |

This could be automatically done when the user passes `dtype=bool`.

Any suggestions on the naming of these new API's are welcome.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Extend process module #188

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

function	kind	explanation
extractOne	one x many	returns the best match as (choice, score, index/key)
extract	one x many	returns the best matches until limit as list[(choice, score, index/key)]
extract_iter	one x many	generator yielding (choice, score, index/key). Usage is not really recommended, since it is far slower than the others
cdist	many x many	returns all results as numpy matrix

function	kind	explanation
-	many x many	returns the best matches as list[(choice, score, index)]
-	many x many	returns the best matches until limit as list[list[(choice, score, index)]]
-	one x many	returns all result without any sorting like cdist

function	kind	explanation
-	many x many	returns matrix of bool
-	one x many	returns list of bool when there is a matching choice (e.g. https://stackoverflow.com/questions/70770842/matching-strings-within-two-lists/70780527#70780527)

Uh oh!

Extend process module #188

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions