-
Notifications
You must be signed in to change notification settings - Fork 2.4k
Description
Describe the bug
When running a simple geotile_grid aggregation on a geoshape field that has been indexed with only a single document with a geojson LineString, we see that the query times out and the data node CPU jumps to a high value and stays that way indefinitely until the node is manually restarted.
Related component
Search:Aggregations
To Reproduce
Note 1: I have done this with an Amazon OpenSearch managed cluster but I have not tried to reproduce it locally.
Note 2: I originally saw the issue on a larger cluster with r7g.4xlarge data nodes even though the below steps mention a very small single-node cluster.
- Create a new Amazon OpenSearch managed cluster with the following settings:
- Domain creation method - Standard Create
- Templates - Dev/test
- Availability Zones - 1-AZ without standby
- Engine Version - 2.19
- Number of Data Nodes - 1 r8g.large.search data node
- Create an index with the following definition:
opensearch.indices.create(
index_name,
{
"settings": {
"index.number_of_shards": 1,
"index.number_of_replicas": 0
},
"mappings": {
"properties": {
"geolocation": {"type": "geo_shape"}
}
}
},
)- Index a single document with the following data:
document = {
"geolocation": {
"type": "LineString",
"coordinates": [
[120.69105000000002, -2.1092199999999366],
[120.75767000000008, -2.159189999999967],
[120.82192000000009, -2.1877399999999625],
]
}
}
opensearch.index(index_name, document)- Run the following search query to verify the document was successfully indexed:
query = {
"size": 1,
"timeout": "60s",
"query": {
"match_all": {}
}
}
try:
result = opensearch.search(index=index_name, body=query)
except RequestError as e:
print(e)
raise
print(json.dumps(result, indent=2))- Run the following query to reproduce the issue:
query = {
"size": 1,
"timeout": "60s",
"query": {
"match_all": {}
},
"aggs": {
"locations": {
"geotile_grid": {
"field": "geolocation",
"precision": 29
}
}
}
}
try:
result = opensearch.search(index=index_name, body=query)
except RequestError as e:
print(e)
raise
print(json.dumps(result, indent=2))- Note that a TimeoutError occurs. Go to Amazon CloudWatch and plot the Data Node CPUUtilization metric. Note that the CPU has jumped to ~50% and stays there (this instance type has 2 vCPUs, so this is essentially one vCPU handling the single shard and getting maxed out).
- monitor for a while and notice that the CPU never goes down. In addition running
POST /_tasks/_canceldoes not work. The only option we have found is to restart the node.
Expected behavior
If this type of query is supported it should not bring down the cluster with only a single document. if it's not it should throw an error.
Additional Details
Plugins
Default Amazon OpenSearch configuration as mentioned above.
Host/Environment (please complete the following information):
- OS: Linux - AWS Managed cluster
Metadata
Metadata
Assignees
Labels
Type
Projects
Status