-
Notifications
You must be signed in to change notification settings - Fork 2.5k
Closed
Labels
enhancementEnhancement or improvement to existing feature or requestEnhancement or improvement to existing feature or requestfeatureNew feature or requestNew feature or request
Description
Description
This document contains implementation detail on GeoIP datasource as part of #5856
Tasks
Tasks are listed here to track a progress in the implementation. One PR can cover multiple tasks if code change is small.
Create datasource
- Create API interface
- Read default value from a cluster configuration property
- Read manifest file and validate input parameter
- Store meta data in a system index
- Schedule update GeoIP db task for new datasource
Update datasource
- Update metadata in a system index
- Schedule update GeoIP db task for existing datasource
Read datasource
- Return metadata
Delete datasource
- Return error if there is GeoIP processor using this GeoIP datasource
- Update metadata in a system index
- Schedule delete GeoIP db task
Update GeoIP database
- Check if update is required
- Download zip file and ingest data into an index without storing it in a disk
- Delete old index
- Schedule either next update or delete task
Delete GeoIP database
- Delete GeoIP datasource index
- Delete GeoIP datasource metadata.
User scenarios
Create/Update of GeoIP data source
- Customer make a call to OpenSearch cluster to create GeoIP data source. It takes parameters of endpoint and update interval. Default value is provided as well. Default value can be configurable using property.
- The data about GeoIP data source will be stored in a system index named .geoip_datasource
- PUT/POST API handler for data source
- Read manifest file.
- Validate parameter.
- Manifest file is reachable.
- Manifest file format is correct.
- Update_interval is less than valid_for value in the manifest file.
- Store data in a system index
- Scheduling update
- If data source name exist
- If there is ongoing update
- Does nothing
- If there is no ongoing update
- Cancel scheduled update task
- Reschedule update task after update_interval.
- If there is ongoing update
- If data source name does not exist
- Schedule update task
- If data source name exist
- Return OK
- Update task
- It reads a manifest file.
- If md5_hash value is same with previous one
- Only update meta data of the data source: expire_after, next_update_at, last_skipped_at.
- If md5_hash value is different with previous one,
- Download and ingest it into a new system index.
- Update meta data of the data source: md5_hash, expire_after, updated_at, next_update_at, last_succeeded_at, last_processing_time.
- Delete the old index.
- Schedule the next update task.
- If md5_hash value is same with previous one
- It reads a manifest file.
Datasource API signature
PUT /_geoip/datasource/my-datasource
{
"endpoint": "https://geoip.opensearch.org/v1/geolite2-city/manifest.json"
"update_interval_in_days": 20
}
GET /_geoip/datasource/my-datasource
{
"endpoint": "https://geoip.opensearch.org/v1/manifest/geolite2-city",
"update_interval_in_days": 20,
"state": "AVAILABLE",
"expire_after": 12343434,
"next_update": 12341244,
"database": {
"provider": "maxmind",
"md5_hash": "63d0cea9d550e495fde1b81310951bd7"
"updated_at": 123123213,
"valid_for_in_days" : 30,
"fields": ["latitude", "longitude", "country", "city"]
},
"indices": [
".geoip_datasource.my-datasource.123123213",
".geoip_datasource.my-datasource.123123212"
],
"update_stats": {
"last_succeeded_at": 123123,
"last_processing_time_in_millis": 912999,
"last_failed_at": 123123213123,
"last_skipped_at": 123123213,
}
}
GeoIP database in an index
Index
/.geoip_datasource.my-datasource.1
{
"_cidr" : "2a12:49c5:4380::/41",
"_data" : {
"country_name" : "Georgia",
"continent_name" : "Asia",
...
}
}
Manifest.json
{
"url": "https://d17zozg08cgjfy.cloudfront.net/GeoLite2-ASN-CSV_20221206.zip",
"db_name": "GeoLite2-ASN.csv",
"md5_hash": "safasdfaskkkesadfasdf",
"valid_for_in_days": 30,
"updated_at": 3134012341236,
"provider": "maxmind"
}
Deletion of GeoIP data source
- Customer make a call to OpenSearch cluster to delete GeoIP data source.
- It check if there are any GeoIP processor using the GeoIP data source.
- If there are, return error.
- If there are not
- Mark the datasource as deleted.
- If there is ongoing update
- Let the update task to trigger delete task at the end
- If there is no ongoing update
- Cancel scheduled update task
- Schedule delete task immediately.
- Delete GeoIP data index
- Delete GeoIP data source data
DELETE /_geoip/datasource/my-datasource
Cluster manager node failure
All of the works related with GeoIP datasource will be executed in a cluster manager node. The cluster manager node maintains scheduled tasks in memory. When cluster manager node fails, it will fail over to the one of cluster eligible node. The new cluster manager node will scan all existing GeoIP datasource and schedule tasks again accordingly. It use "next_update" field in GeoIP datasource to set correct time to update GeoIP databases.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementEnhancement or improvement to existing feature or requestEnhancement or improvement to existing feature or requestfeatureNew feature or requestNew feature or request