Skip to content

[Proposal]: Low Level Implementation Details for Geo Shape Aggregation and future Geo related Features #92

@navneet1v

Description

@navneet1v

Introduction

This proposal talks about the low level implementation details for the Geo Shape aggregations. We will be covering what are the components/classes/interfaces required for the Geo Shape aggregations and what should be the right place to put those components.

In whole proposal we will be using components, classes, interfaces terms interchangeably. As we will be focusing on the low level implementation details it is expected that reader is already aware that how indexing and query works on GeoShape in OpenSearch, what is a doc value and how it is used in the aggregation framework.

Goal

Goal of proposal is to answer the following questions.

Note: Module here refer to the different libs, modules folders provided in the OpenSearch core repo.

P0

  1. Which module we should use to write the Geo Shape Aggregation level interfaces? or should we directly go with the Geo-Spatial repo?
  2. Which module we should use to write the Value Source boiler plate code for converting Doc Values back to specific classes for GeoShape, as they have some dependency between classes already present like CoreValueSourceType etc?

P1

  1. What should be the long term vision of Geo team around what is the right place to put the code related to Geo space? The code involves Geo FieldMapper(Geo Point and Geo Shape) used for indexing and querying, Geo Aggregations(both older and newer geo aggregations both on Point and Shape) and Value Source boiler plate code for converting Doc Values back to specific classes.
  2. How we should do the migration of the code mentioned in #1 to the module we decide to move?

Assumptions

  1. The XYPoint and XY shape related features will be added in the Geo-Spatial repo and it will be keep on happening.

Why we need to answer the questions above?

The answer to these questions will help us do the following:

  1. As there will be a lot of common code between the aggregations and we will be abstracting that logic out, hence we need to make sure that we are abstracting them in the right place so that this is not a throw away work.
  2. With the release of XY Point and XY shape in the Geo-Spatial plugin, we need to deicide in long term where we want to put all our Geo related features(in a backward compatible fashion), how we want to provide them to customers.
  3. It will help us provide better estimate on the timelines, efforts for the launch of Geo Shape aggregations and the other features in the future like(GeoLineAggregation, GeoHexAggregations, Aggregations on XY Point and XY Shape).

Current Architecture

As of version 2.1 of OpenSearch the Geo related aggregations(which contains only GeoPoint aggregations) are present in the OpenSearch core repo. There are no aggregations present on the Geo Shape. The Field mapper which init the GeoShape mapper is initialized via the Geo module present in the OpenSearch repo.

We have created a new repo named Geo-Spatial which host GeoJson related features. The XY Point and XY shape indexing and querying functionalities are getting developed and will be released as part of 2.3 release(tentative). The intent of that repo was to provide and encapsulate all the geo-related features via that

New Aggregations which we are building

In the current roadmap we have 4 different aggregations that we will be implementing on the Geo Shape. Below are those aggregations.

  1. GeoBounds
  2. GeoCentroid
  3. GeoTile
  4. GeoHashGrid

To provide more context these aggregations are already implemented for the GeoPoint field. We are also working on planning on implementing the Geo Line Aggregation, GeoHex Gird(customer has already reached out for this) which are already supported in Elastic Search.

Proposed Solution

The solution is to move the Aggregations, Indexing and Query for both Geo Point and GeoShape to the Geo-Spatial plugin provided via Geo-Spatial Repo. As this is breaking changes, hence we are proposing a phased approach, which make sure that we are incrementally progressing to the final goal.

Phase 1

Step1: One by one move aggregation(4 different classes for aggregations + Tests) to the Geo modules folder in the OpenSearch core. The aggregation that we will move will be the one which we are developing over the GeoShape.

Why we need this? Or what is the Pros about it?
The code between the aggregations for GeoPoint and GeoShape is same and we can abstract common logic between GeoPoint and GeoShape rather than duplicating them.

Will there be customer Impact?
No, there will be no customer impact, as these classes are directly build with the OpenSearch core min distribution.

What will happen if we don’t do this?
We either need to copy the same code/logic from GeoPoint Aggregations to GeoShape aggregations or we will be creating more interfaces in the server folder of OpenSearch which will needs to be migrated later on.

Step2: Abstract out the common logic for the GeoShape and GeoPoint aggregation. Allow both the aggregation to depend on this logic.

Why we need this? Or what is the Pros about it?
This will abstract out the common logic and we can reuse the logic across aggregations

Will there be customer Impact?
No, as the aggregations will still be built in min distribution of OpenSearch.

Repeat the above 2 steps for each Aggregation that we are launching for GeoShape. This will make sure that we are moving towards right abstraction.

Step3: Move the FieldMappers, Queries and ValueSource related code(Both GeoShape and GeoPoint), Integration Tests etc. to the Geo module once all the Aggregations have moved to the Geo Module.

Why we need this? Or what is the Pros about it?
This will abstract out the common logic and we can reuse the logic across aggregations

Will there be customer Impact?
No, as the aggregations will still be built in min distribution of OpenSearch.

Phase 2

Step1: Build the proposal to move the Geo Module to Geo-Spatial Repo as a separate plugin or under a GeoSpatial Plugin. Analyze the impact and community feedback around what is the best way to provide the backward compatibly and seamless migration.

Why we need this? or what we want to achieve on this?
This will provide us the insights around what community wants and what extra support will be required from our side for the seamless upgrade.

Step2: Move the actual code from the Geo Module to Geo-Spatial Repo and release it via OpenSearch major version(exact version depends on various factors) release cycle as a breaking change.
On a high level this will be a simple movement of code from Geo Module to Geo-Spatial plugin. The complexity will arise around how customer will migrate. Given that this will be done in major release of OpenSearch(given that it is breaking change), we will use the feedbacks provided in step-1 to give customer all the tools for doing backward compatibility. Example:

  1. We can release a distribution of open search with Geo-spatial plugin already installed.
  2. We can build on top of the Extensions framework which is currently being worked upon OpenSearch.Issue

Below are the pros and cons of the above solution:

Pros:

  1. No change for customers till we go the phase-2 of the solution.
  2. No confusion in customer around from which version they need the GeoSpatial plugin for the geo based aggregations and when they will get the aggregations out of the box from OpenSearch min distribution.
  3. Code remains in 1 single repo always which provides easier debugging, development experience and release experience.

Cons:

  1. Till we have moved the aggregations to the Geo-Spatial repo we need to depend on the OpenSearch Core infrastructure, PR reviewers etc, which can increase the development time, as it is a fast moving repo. Seeing that we have many new aggregations to build this can have large overall impact on development timeline.
  2. Difficulties in back-port as the main and 2.x version of repo has breaking changes as compared to Geo-Spatial repo which has less or no breaking changes.

Alternatives

Alternative 1

Provide Aggregations on GeoShape as a part of Open search min distribution by directly implementing the GeoShape aggregations in the OpenSearch core itself.(Don’t move aggregations to even geo module in the core repo).
Pros:

  1. Code remains in 1 single repo always which provides easier debugging and development experience.
  2. No upgrade required for customers till phase-1.
  3. Customers will start to get out of box aggregations support on geo-shape. Plus they will have proper upgrade paths and SOP when we move the code to plugin.

Cons:

  1. Difficulties in back-port as the main and 2.x version of repo has breaking changes as compared to Geo-Spatial repo which has less or no breaking changes.
  2. We need to depend on the OpenSearch Core infrastructure, PR reviewers etc, which can increase the development time, as it is a fast moving repo. Seeing that we have many new aggregations to build this can have larger impact on development timeline.
  3. There can be potential changes required when we start implementing the aggregations over XY shape and Point.

The cons provided here can be removed working with the OpenSearch team to discuss on the dedicated reviewers and bandwidth for PRs. Also, as we move towards moving the code to geo-module we will have zero to no conflicts as apart from geo-spatial team no one will be working on that in best case. On con #3 as of now we don’t have any plan and customer use-case. Even if we do have that we can still use the common interfaces that we have created in the Geo module.

Overall the solution do have some cons which can create some delays and delayed launches but this is the most customer obsessed way.

Alternative 2

Create the new aggregations for GeoShape directly in the Geo Spatial repo, do all the refactoring required in core OpenSearch repo to avoid code duplication. As a backlog task move the Aggregations(Geo Point) to new repo and keep adding new features on Geo to the Geo-Spatial repo only.

Pros:

  1. Easy Upgrade: As fewer customers will be there who won’t have installed the Geo-Spatial plugin when we move all our aggregations to Geo-Spatial plugin.
  2. Easy to back-port and release the patch versions.
  3. Faster release cycles, reduction in PR review time, merges and lesser merge conflicts etc.

Cons:

  1. Customer Confusion: It can cause confusion to the customer to understand as the aggregations are divided in the plugins and core repo.
  2. We can end up modifying the code in core repo more often than not to support the new features and interfaces that we want to build in Geo-Spatial repo. Even small changes can end up touching more than 1 repo, which will create problems of backward compatibility, testing, developer frustration.
  3. As of now, we have no input from community whether we should move the aggregations to Geo-Spatial repo or keep them in the OpenSearch core.

Alternative 3

Create the new aggregations for GeoShape directly in the Geo Spatial repo, do all the refactoring required in core OpenSearch repo to avoid code duplication. Keep adding new features on the Geo Spatial repo only.
Pros:

  1. No upgrade required and no confusion as customer who wants to use new aggregations just need to install the right plugin.
  2. Easy to back-port and release for the patch versions.
  3. Faster release cycles, reduction in PR review time, merges and lesser merge conflicts etc.

Cons:

  1. We can end up modifying the code in core repo more often to support the new features and interfaces that we want to build in Geo-Spatial repo. Even small changes can end up touching more than 1 repo, which will increase the developer frustration for release.

Feedback Required

The above sections provide Phased Approach as the proposed solution, but we really want to focus on the Alternative-2 as well. It would be great to know community thinks about the Alternative-2 and should we do that instead of phased approach which is long and time consuming.

Appendix

Frequently Asked Question

Q: If after step 1 of phase 2 we found out that community doesn’t want yet another plugin for Geo related indexing and querying then will phase 1 effort be in vain?
No it won’t be in vain as we would have developed the interfaces that can be used in the XY aggregations, we would have carved out the geo related code to module which can evolve of its own, even when it is with the Core repo.

Useful Links:

  1. https://github.com/opensearch-project/OpenSearch/
  2. https://github.com/opensearch-project/geospatial/
  3. [RFC] - Implement Aggregations on Geo Shape Field #84

Metadata

Metadata

Assignees

Labels

DesignDesign ProposalenhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions