Skip to content

(WIP) Upgrade elasticsearch php dependency to 8.8#2168

Closed
coreation wants to merge 7 commits intoruflin:8.xfrom
coreation:feature/upgrade-elasticsearch-php-dependency
Closed

(WIP) Upgrade elasticsearch php dependency to 8.8#2168
coreation wants to merge 7 commits intoruflin:8.xfrom
coreation:feature/upgrade-elasticsearch-php-dependency

Conversation

@coreation
Copy link
Copy Markdown
Contributor

@coreation coreation commented May 31, 2023

Tasklist

Analysis tasks

  • Create a list of all files in Elastica that are affected by the elasticsearch-php breaking changes caused by the major upgrade 7.x to 8.x. Add the affected files to the Refactor tasks list

Refactor tasks

Docker & Tests

  • Update the docker container to use ES8, set the password to changeme
  • Update the Base.php test so that the ES config contains username: elastic, password: changeme and contains the path to the CA cert bundle of the Elasticsearch instance in the docker container

Generics

  • Task 1. Create a generic transformation function that transforms the elasticsearch-php response to an Elastica Response object. This allows for a smooth upgrade of Elastica 7.x to 8.x for clients using the Elastica library, and not forcing them to use a new type of Response object.

Specifics

  • Refactor the src/Index.php, using the new Endpoints classes (cfr. Analysis)
  • Refactor the src/Index.php, return a Response object based on the response coming from elasticsearch-php - Index::create() using the outcome of Generics - Task 1.
  • Refactor the requestEndPoint function in src/Index.php with the new elasticsearch-php Endpoint classes. This will require the introduction of at least a function that returns an elasticsearch-php Client, using ClientBuilder::create()

RFC

  • Replace the src/Client.php usage with the elasticsearch-php Client class. RFC: @ruflin - seems like the elasticsearch-php Client pretty much does everything the src/Client.php does. I'm not sure though how vital this was for other users to be using this class naked sort of speak. If the answer is: a lot. Then I suggest we rework the src/Client so that it uses the elasticsearch-php Client under the hood. Makes for an easier phasing out later on.

coreation added 2 commits May 31, 2023 14:18
…Removed usages of the latter and removed unused/deprecated constants in Reindex.php.
@coreation coreation changed the title Upgrade elasticsearch php dependency to 8.8 (WIP) Upgrade elasticsearch php dependency to 8.8 May 31, 2023
@coreation
Copy link
Copy Markdown
Contributor Author

@ruflin @thePanz since you are the 2 people I've interacted with so far, I'll just post my questions here. From my point of view, this is the only elasticsearch-php upgrade PR. I've seen some back and forth with regards to this upgrade on the discussion pages, but perhaps it's best to bundle everything here from now.

I wanted to get started, but immediately hit a first barrier. The operations with regards to indices have been rewritten. Instead of having a specific Create() class, everything regarding indices is now added to Indices class, where the create is refactored into a function, which then executes the operation.

As far as I see in the Elastica codebase, the aim is/was to create a Create() instance, which was an AbstractEndpoint class, and then calling the methods on that class to execute the creation. However, this is now all abstracted away inside the new Indices::create() function.

This makes me think that the requestEndpoint might need to be removed, and a whole bunch of code in Index.php reworked to use the high level Indices::XYZ() methods. This would require us to convert the Elastica client configuration to a elasticsearch-php ClientInterface object.

These kind of decisions I'd much rather talk through, then just go ahead and spent time on because I'm by far no expert in the approach this library takes, but happy to learn.

@ruflin
Copy link
Copy Markdown
Owner

ruflin commented Jun 6, 2023

@coreation Have not forgotten you on this one. I have not found yet the time to dig into the elasticsearch-php code. Ideally, I would like to keep the basic Elastica construct and change things under the hood. Removing requestEndpoint would be quite fundamental. I think that is where recently the discussion started, to potentially remove dependency on elasticsearch-php which I'm not a big fan of.

Are the changes on elasticsearch-php only for indices or all over the place? I'm asking because requestEndpoint exists in several places so I try to get my head around how big the change would be.

@coreation What if we don't convert the Elastic client to ClientInterface? Would it mean we require additional boiler plate around the client? I know you mentioned not to start coding at the same time, I feel like having a small sample PR on what the change would mean for one class (no tests passing) could help the conversation.

@coreation
Copy link
Copy Markdown
Contributor Author

@ruflin I'll see what I can do in terms of an example. Going by the unittest output, most of the test didn't make it much further than initiating the index, so it might be that fixing that part of the codebase might actually by 90% of the work, but that's hopeful thinking.

I like the idea of making an example and discuss things further based on that, I'll see what I can do this week.

@oleg-andreyev
Copy link
Copy Markdown
Contributor

@coreation @ruflin if you need any help with this - just tell.

coreation added 2 commits June 9, 2023 22:57
…sticsearch-php-dependency' into feature/upgrade-elasticsearch-php-dependency
@coreation
Copy link
Copy Markdown
Contributor Author

coreation commented Jun 9, 2023

@oleg-andreyev thanks for the heads up, I think we're going to be needing some extra hands here. @ruflin @oleg-andreyev if you look at the elastic-search 8 branch endpoints compared with elastic-search 7 branch endpoints, it shows that the Endpoints have undergone a serious transformation. There could be other parts of the library that have been addressed the same way, I haven't looked into that just yet.

The transformation goes from being Endpoints being a namespace with different classes representing different operations in the 7.x branch to Endpoints becoming dedicated classes with the ElasticSearch operations embedded in those classes as functions.

Example: \ElasticSearch\Endpoints\Indices\Create is now \ElasticSearch\Endpoints\Indices::create

With that, they also bring an ElasticSearch class to the table, which is an HTTP response wrapper around ElasticSearch Psr7 HTTP responses.

What you'll see in my very short refactor example in the index.php file, is that we'll likely have to refactor these kind of classes (i.e. Index) on 2 parts: business logic within the methods and transforming the ElasticSearch response into an Elastica Response. I think the latter makes sense in order to make migrating to the new Elastica version doable.

@ruflin is that enough of an example? Note that I'm not looking at good code practises, just a rough example of how things need to change.

@oleg-andreyev
Copy link
Copy Markdown
Contributor

oleg-andreyev commented Jun 13, 2023

@coreation what I see is that all endpoints are autogenerated by some script that I cannot find in the repository (probably private), depending on how we wanna manage dependency with https://github.com/elastic/elasticsearch-php, relaxed or strict, if strict that means we follow their semantic (and php version) and we don't need to think about any workaround/wrappers and etc. if we want to allow to relax versions (supporting both 7.x and 8.x), then we need to create some wrapper for each endpoint and based on Client::VERSION constant decide what to do.

@ruflin what's your thoughts on this?

To keep it simple, I'd follow elasticsearch-php as is, but maybe I'm missing something.

@ruflin
Copy link
Copy Markdown
Owner

ruflin commented Jun 27, 2023

@coreation I like the direction you are taking here. It means in many cases, users can upgrade to Elastica 8.0 without having to worry about the underlying changes (in most cases).

On the support of 7 and 8: I would keep it simple for now and only support 8. If there are many requests (and contributions) we can always introduce this later. +1 on following elasticsearch-php.

Sorry for the late reply again on my end.

@coreation
Copy link
Copy Markdown
Contributor Author

Hey @ruflin , just to summarize to make sure I understand you well:

  • You like the approach of refactoring so that to clients using the Elastica library receive the same type of response, meaning the Elastica library needs to rework a lot of basic things, mostly because of the elasticsearch-php library change
  • "support of only 8", you're talking about PHP8.x, right? :)

So moving this forward, does that mean the approach based on the example I gave would be approved? If so, then we can started and hopefully get some people involved to code that up. \cc @oleg-andreyev ;) If someone is working on it, given the big amount of work ahead of us, it would be good to give each other a heads-up, so that we don't have people working on the same thing. @ruflin since you build most of this library, any recommendations for next steps / organising this?

@ruflin
Copy link
Copy Markdown
Owner

ruflin commented Jun 27, 2023

Glad you asked back @coreation, we need to dive a bit deeper here. Lets first get PHP / Elasticsearch version out of the way. I was referring to the following:

if we want to allow to relax versions (supporting both 7.x and 8.x), then we need to create some wrapper for each endpoint and based on Client::VERSION constant decide what to do.

My assumption is @oleg-andreyev talks here about the Elasticsearch version, not PHP? At least on my end, I would focus on supporting Elasticsearch 8.x which I think is also what elasticsearch-php does. With it, we indirectly likely align also on the PHP version.

Elastica library receive the same type of response,

The same type of response we have today, not the same type of response that the elasticsearch-php library has I assume? This is indicated by line https://github.com/ruflin/Elastica/pull/2168/files#diff-858b65fbfb449bff73db68734d544fb5a4849e4bc8b986fad5896c202813641bR517 Do you forsee we can use some generic method for this or you expect it a lot of manual work?

The reason I like the above approach because one of the powers of Elastica is that you can chain calls together to build queries etc. If we move over to the elasticsearch-php response format, I think we partially loose this capability. At the same time I'm concerned it might create lots of additional code paths. It brings back the question: Would it be easier to just rip out elasticsearch-php again ...

@coreation For the split of the work, I hope we can find ways to do it in many small chunks. Have lots of small PR's that can go in in parallel. As not too many people work on the project, I'm not too worried about multiple people working at the same. What is not clear to me is if we can easily split up the work in many small PR's as the dependency update of the lib is global :-(

@coreation
Copy link
Copy Markdown
Contributor Author

Hey @ruflin

Glad you asked back @coreation, we need to dive a bit deeper here. Lets first get PHP / Elasticsearch version out of the way. I was referring to the following:

if we want to allow to relax versions (supporting both 7.x and 8.x), then we need to create some wrapper for each endpoint and based on Client::VERSION constant decide what to do.

My assumption is @oleg-andreyev talks here about the Elasticsearch version, not PHP? At least on my end, I would focus on supporting Elasticsearch 8.x which I think is also what elasticsearch-php does. With it, we indirectly likely align also on the PHP version.

Clear!

Elastica library receive the same type of response,
The same type of response we have today, not the same type of response that the elasticsearch-php library has I assume? This is indicated by line https://github.com/ruflin/Elastica/pull/2168/files#diff-858b65fbfb449bff73db68734d544fb5a4849e4bc8b986fad5896c202813641bR517 Do you forsee we can use some generic method for this or you expect it a lot of manual work?

Indeed, so that users of the library don't need an upgrade path to adhere to some new type of Response object. I assume the transformation function is going to be a very generic one.

The reason I like the above approach because one of the powers of Elastica is that you can chain calls together to build queries etc. If we move over to the elasticsearch-php response format, I think we partially loose this capability. At the same time I'm concerned it might create lots of additional code paths. It brings back the question: Would it be easier to just rip out elasticsearch-php again ...

This is very fundamental and I don't have that much insight into the earlier work that was done. If there's no massive amount of work needed for every elasticsearch-php update, then I would keep it. I think this is an odd one out, since they reworked basic functionality from a more "class per ElasticSearch operation" to "class per bundle of ElasticSearch operations". Which is now affecting the Elastica library.

@coreation For the split of the work, I hope we can find ways to do it in many small chunks. Have lots of small PR's that can go in in parallel. As not too many people work on the project, I'm not too worried about multiple people working at the same. What is not clear to me is if we can easily split up the work in many small PR's as the dependency update of the lib is global :-(

I think splitting it up would require first some "basic" things to be built in place. One of them is that we make sure return types remain the same. The example I gave with the transformation of response objects is - I hope - one of the few we'll need.

Once we got that in place, it's just working in the new way of elasticsearch-php 's classes like Index::create() instead of new Create().

To split things up, I think someone has to go through some classes that are affected, i.e. Index, and just make a kind of billboard ticket listing all the affected classes? That way people can contribute in small chunks in an organised manner?

@ruflin
Copy link
Copy Markdown
Owner

ruflin commented Jun 28, 2023

I went a bit more through the code base and I think ripping out elasticsearch-php would actually be more work then doing the transition. My hope is that most of the transition will look very similar. Also I still think Elastica benefits to rely on elasticsearch-php to get all the underlying improvements that are made to it.

To split things up, I think someone has to go through some classes that are affected, i.e. Index, and just make a kind of billboard ticket listing all the affected classes? That way people can contribute in small chunks in an organised manner?

++. We can use the new tasks features in Github issues for this. The part I'm trying to get my head around: How can we do it in iterations? As soon as we upgrade the dependency everything breaks and all has to be done at once? Can we have somehow the two dependencies coexist for a while? Ideas on how we should deal with this?

@coreation
Copy link
Copy Markdown
Contributor Author

@ruflin I agree, I would for sure keep the elasticsearch-php as a base and just do what this library is good at, providing a neat handy wrapper around it.

As far as tasks go, I think the iterations will likely be aspect based. Meaning, getting the Index class "green", meaning, all errors coming from not being able to create an endpoint (cfr. unittests) must no longer appear and are fixed because of the refactor work done using the new elasticsearch-php library for example.

In this case, I don't think it's doable of keeping the two dependencies in there, I think we're better off just with a clean refactor, putting a new major release version on it. I haven't done any analysis, but there's some gut feeling telling me that the majority of the work will be refactoring the deprecated classes + adding a response transformation function will be 80% of the work. But that's a very uneducated guess, more of a 🤞

@ruflin
Copy link
Copy Markdown
Owner

ruflin commented Jun 29, 2023

@coreation One suggestion to move this forward. Lets start having a Tasklist in the description of this PR and start using this PR for all the work (kind of a feature branch). All the contributions would go with a PR against this branch. As soon as we get to a green state, we can get it into 8.x and start more iterative from there.

@coreation
Copy link
Copy Markdown
Contributor Author

@ruflin good idea, I've updated my PR comment, feel free to update where applicable. Might be a good time to get @oleg-andreyev back into the conversation, time to get our hands dirty ;)

@coreation
Copy link
Copy Markdown
Contributor Author

@ruflin I've updated the original post and I think apart from the approach, I think we can have a first effective discussion to move the actual code forward. My suggestion in the RFC section is that we either remove the src/Client.php or make it compatible with elasticsearch-php since the new Endpoint classes like "Indices" need an elasticsearch-php Client object.

My aim for my this commit is to simply have the "testCreateNewIndex" test working. That involves:

  • Creating a valid Client object that "Indices" accepts
  • Actually calling the "Indices::create()"

What I've found is that:

  • ElasticSearch 8 by default has username/password enabled, I've updated the Base.php test and added some boilerplating in the phpunit.dist.xml. This can be disabled, but requires some set-up which can be found here. This will probably have to change in the CI.
    • This also means we have to provide a way to pass the path to the CA bundle in ClientConfiguration
  • The elasticsearch-php ClientBuilder supports a plethora of options, a lot of which I can't really backtrack to what is available in the ClientConfiguration class. Could you help out on that one @ruflin ?
  • I've provided a first go at getting an elasticsearch-php Client object from the src/Client.php class. That does the trick, in other words, a basic connection can be established and the Index::create() creates the new index! Our Hello World! has seen first light.

I think our first focus should be on deciding what to do with the src/Client and how we can get that to work with the Endpoint classes of the updated elasticsearch-php classes. Once we have that, we can build up from there!

Please excuse the messy try-out code, this is really to just get something going as I discover the hoops to jump through :D.

@ruflin
Copy link
Copy Markdown
Owner

ruflin commented Jun 30, 2023

Replace the src/Client.php usage with the elasticsearch-php Client class. RFC: @ruflin - seems like the elasticsearch-php Client pretty much does everything the src/Client.php does. I'm not sure though how vital this was for other users to be using this class naked sort of speak. If the answer is: a lot. Then I suggest we rework the src/Client so that it uses the elasticsearch-php Client under the hood. Makes for an easier phasing out later on.

I don't have data on this but I would assume, the Client object is used heavily. Every time Elastica did not support a specific call, one of the recommendations is to use the raw request method: https://github.com/ruflin/Elastica/blob/8.x/src/Client.php#L538 If possible, I would keep the Client.

@coreation I'll be out of most of July but please don't hold back on progressing with this one. @deguif @thePanz @franmomu Would be great if you could get involved in this too.

Comment thread src/Client.php
*/
public function getElasticSearchClient(): \Elastic\Elasticsearch\Client
{
$clientBuilder = ClientBuilder::create();
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any concern around recreating the builder each time or should we store in somewhere in the Client under `_elasticsearchClient?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, not really. If you look at the create() function, it just returns a new instance of the class, without any arguments or state. We could store it somewhere in a property as well. Would it make it sense to store in a property if there's no real state attached to the object?

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is no state, no need to store it. One thing I want to make sure, we don't open a new connection to Elasticsearch if we send multiple requests.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think (expect) all the connections the Elasticsearch Client makes will be opened and closed within the function call. The ClientBuilder - afaik - doesn't manage some sort of connection pool, if that's the sort of thing you were looking for?

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found the part I was referring to here: https://github.com/ruflin/Elastica/blob/8.x/src/Transport/Http.php#L80 I don't think this is impacted by this change at all.

Comment thread src/Client.php Outdated
Comment thread tests/Base.php
Co-authored-by: Nicolas Ruflin <spam@ruflin.com>
@coreation
Copy link
Copy Markdown
Contributor Author

I don't have data on this but I would assume, the Client object is used heavily. Every time Elastica did not support a specific call, one of the recommendations is to use the raw request method: https://github.com/ruflin/Elastica/blob/8.x/src/Client.php#L538 If possible, I would keep the Client.

I absolutely agree, I thought this was the way to go as well, but I wanted to avoid injecting bias into the question. So the approach of getting an Elasticsearch Client based on the Elastica Client is an ok approach then I believe.

@coreation I'll be out of most of July but please don't hold back on progressing with this one. @deguif @thePanz @franmomu Would be great if you could get involved in this too.

No problem, enjoy your time OoO!

@franmomu
Copy link
Copy Markdown
Contributor

franmomu commented Jul 1, 2023

Some time ago I tried to start this and ended up having a ElasticSearch\Client client inside of Elastica\Client and transforming the response, I wanted to start making this work and then see how this could be improved, but something else came up and I didn't continue.

Just in case is of any help (some of the code is not tested/finished):

https://github.com/ruflin/Elastica/compare/8.x...franmomu:Elastica:elasticsearch_8?expand=1

@VincentLanglet
Copy link
Copy Markdown
Contributor

Thanks for trying to move on this migration @coreation.
How far are you in the first analysis task ?

RFC: Replace the src/Client.php usage

With Elastic 7 being EOL in few days elastic.co/support/eol, shouldn't new RFC be delayed in order to release the v8 compatibility as soon as possible ?

@coreation
Copy link
Copy Markdown
Contributor Author

hey @VincentLanglet I haven't had the time to start yet, apart from what I've already listed in the todo task list. I'm currently on vacation for another 2 weeks. After that I'm ready to pick up where I left off, but work is very demanding at this time unfortunately.

@oleg-andreyev
Copy link
Copy Markdown
Contributor

oleg-andreyev commented Jul 30, 2023

@coreation @ruflin

Here is my example for delete method:

    /**
     * Deletes the index.
     *
     * @throws ClientException
     * @throws ConnectionException
     * @throws ResponseException
     */
    public function delete(): Response
    {
        $response = (new Indices($this->getClient()->getElasticsearchClient()))->delete(['index' => $this->getName()]);
        return $this->covertToElasticaResponse($response);
    }

what I don't like is copying: $this->getClient()->getElasticsearchClient()

also had to implement helper method:

    private function covertToElasticaResponse(ElasticsearchResponse $response)
    {
        return new \Elastica\Response(
            (string) $response->getBody(),
            $response->getStatusCode(),
        );
    }

speaking about response, probably \Elastica\Response wrap \Elastic\Elasticsearch\Response\Elasticsearch and provide all B/C layers within in response.

@ruflin
Copy link
Copy Markdown
Owner

ruflin commented Aug 4, 2023

With Elastic 7 being EOL in few days

@VincentLanglet I don't think this is the case as 9.0 is not out yet: https://www.elastic.co/support/eol

what I don't like is copying: $this->getClient()->getElasticsearchClient()

@oleg-andreyev What is your preference here? Should we fetch the client only once and then store it locally. For example having $this->getElasticsearchClient() method that wraps the above? What is the main concern?

covertToElasticaResponse

If you compare the response object before and after, how close are we to backward compatibility? Is all the info we need in the ElasticsearchResponse response?

@VincentLanglet
Copy link
Copy Markdown
Contributor

With Elastic 7 being EOL in few days

@VincentLanglet I don't think this is the case as 9.0 is not out yet: elastic.co/support/eol

My bad, it's indeed the 1rst august OR when 9.0.0 is released.

@oleg-andreyev
Copy link
Copy Markdown
Contributor

With Elastic 7 being EOL in few days

@VincentLanglet I don't think this is the case as 9.0 is not out yet: elastic.co/support/eol

what I don't like is copying: $this->getClient()->getElasticsearchClient()

@oleg-andreyev What is your preference here? Should we fetch the client only once and then store it locally. For example having $this->getElasticsearchClient() method that wraps the above? What is the main concern?

covertToElasticaResponse

If you compare the response object before and after, how close are we to backward compatibility? Is all the info we need in the ElasticsearchResponse response?

Speaking about client, it would be great to have single client that is using HttpInterface (so it could be Symfony Client, Guzzle and etc) and yes, it would be better just to call $this->getClient()

Speaking about response, need to compare but they are not 1:1 - 100%

@coreation
Copy link
Copy Markdown
Contributor Author

Hi everyone, as I look towards EOY, I'm much too occupied with my own company in order to work on this in a decent way. There's so much that needs to be done, and I don't see any way I can make time available to work on this. I'm very sorry and hopefully someone finds the preparation done so far useful enough to continue.

@ruflin
Copy link
Copy Markdown
Owner

ruflin commented Sep 20, 2023

@coreation Appreciate the heads up. It seems we all are a bit in the same boat. Lets see who can pick it up and drive to completion.

@ruflin ruflin mentioned this pull request Jan 17, 2024
6 tasks
@ruflin
Copy link
Copy Markdown
Owner

ruflin commented Jan 17, 2024

With #2181 merged, should we close this PR and continue the work on top of 8.x.

@coreation
Copy link
Copy Markdown
Contributor Author

Definitely!

@coreation coreation closed this Jan 26, 2024
@coreation coreation deleted the feature/upgrade-elasticsearch-php-dependency branch January 26, 2024 10:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants