Add runner for paging through composite aggregations#1526
Add runner for paging through composite aggregations#1526ywelsch merged 9 commits intoelastic:masterfrom
Conversation
|
|
||
| def parse(text: BytesIO, props: List[str], lists: List[str] = None) -> dict: | ||
| def parse(text: BytesIO, props: List[str], lists: List[str] = None, objects: List[str] = None) -> dict: | ||
| """ |
There was a problem hiding this comment.
Would you mind adding some information about the new objects parameter to the docstring for this function?
esrally/driver/runner.py
Outdated
|
|
||
| paths_to_composite = paths_to_composite_agg(body, []) | ||
| if not paths_to_composite or len(paths_to_composite) != 1: | ||
| raise Exception("Unique path to composite agg required") |
There was a problem hiding this comment.
Could you raise exceptions.DataError instead? We typically use that for cases where the provided parameters don't match expectations. This would also apply everywhere else exceptions are raised below within this class.
| return results | ||
|
|
||
| def select_aggs(obj): | ||
| if isinstance(obj, dict): |
There was a problem hiding this comment.
I think we can refactor the isinstance checks in this function (as well as in paths_to_composite_agg, resolve_composite_agg, and tree_copy_composite_agg below.) As far as I can tell--and let me know if I'm missing something--everywhere isinstance is called in these functions, the variable whose type is being inspected can only ever be a dict or None.
This is because body is what's being passed through the chain of function calls, and it can only ever be a dict. Along the way, we can also get intermediate None return values. I think that covers all cases, unless I'm misreading.
If so, this function could look something like:
def select_aggs(obj):
if obj is not None:
return obj.get("aggs") or obj.get("aggregations")
else:
return NoneThere was a problem hiding this comment.
There's two types of functions here. The first ones are paths_to_composite_agg (and the select_aggs function that is being used within), which figures out whether the given user-defined request body contains a composite agg definition. The parsing here cannot assume for the request body to be well-formed, and is rather lenient in the parsing (i.e. double-checks that we have dicts in the right place) so that it can later throw a proper error stating that it couldn't find the composite agg. If we were to just assume that some of these definitions would be dicts, we would give a cryptic Python error message accessing a non-dict in the wrong way.
For the above I prefer the current parsing code.
There are two more functions, resolve_composite_agg and tree_copy_composite_agg, where we use as input whatever was derived via paths_to_composite_agg, and hence we can assume that the right structure and types are in place. I made the code in those functions less defensive, as paths_to_composite_agg has already checked the right shape of dicts.
I hope this addresses your concerns.
esrally/driver/runner.py
Outdated
| return paths | ||
|
|
||
| def resolve_composite_agg(obj, key_path): | ||
| if isinstance(obj, dict): |
There was a problem hiding this comment.
A similar refactoring as the above two examples would work here, too.
esrally/driver/runner.py
Outdated
| raise Exception("Could not find composite agg - parser inconsistency") | ||
|
|
||
| def tree_copy_composite_agg(obj, key_path): | ||
| if isinstance(obj, dict): |
There was a problem hiding this comment.
And here, as well.
|
@ywelsch Thanks for putting so much effort into this! Very thorough. It looks good overall. I left some comments after a first pass, but I'm just about at the end of my day. I'll give it another look tomorrow and take a closer look at the tests in particular. |
|
Okay, I have no further suggestions. If you could address the comments I've left, we'll be good to go. |
|
Thanks @michaelbaamonde! |
Adds a "composite-agg" runner that allows paginating through composite aggregations. This allows benchmarking realistic use of composite aggs better (e.g. ML transforms) as users of composite aggs are typically paginating through all pages.
Relates elastic/elasticsearch#70035