Add automatic X-Opaque-Id headers to datastore msearch requests by jwils · Pull Request #1135 · block/elasticgraph

jwils · 2026-04-17T16:12:37Z

Elasticsearch and OpenSearch propagate the X-Opaque-Id HTTP header into the tasks API, slow logs, and deprecation logs. This PR now makes ElasticGraph set useful opaque ids itself for datastore _msearch traffic instead of offering an opt-in Faraday middleware that applications wire manually.

What changed

Removes the standalone ElasticGraph::Support::FaradayMiddleware::OpaqueId middleware and its opt-in datastore_client_customization_block design.
Adds ElasticGraph::GraphQL::Client#extra_opaque_id_parts as the GraphQL customization hook.
elasticgraph-graphql now automatically includes X-Opaque-Id on datastore _msearch requests using stable EG-owned parts:
- elasticgraph-graphql
- client=<client.name>
- any client.extra_opaque_id_parts
- query=<fingerprint>
Threads those parts through QueryExecutor -> QuerySource -> DatastoreSearchRouter rather than relying on Faraday env customization.
Adds coarse opaque ids for other EG-owned _msearch call sites:
- elasticgraph-health_check -> elasticgraph-health_check
- Apollo test resolver -> elasticgraph-apollo;resolver=product
- elasticgraph-indexer#source_event_versions_in_index -> elasticgraph-indexer;purpose=source_event_versions;cluster=<cluster>
Uses stable/readable values only; no per-request random suffix.

Why this shape

Addresses the review feedback that the original PR was useful but did not really belong in ElasticGraph as generic opt-in support code.
Keeps the feature in EG request-layer abstractions where EG already knows meaningful request metadata.
Avoids _bulk for now. Indexer write traffic still needs a better request-identity model before we choose default opaque-id semantics there.

Notes

A small internal ElasticGraph::Support::OpaqueID formatter remains for assembling and sanitizing header values.
The shared X-Opaque-Id header name now lives in elastic_graph/constants.

Verification

Updated unit/spec coverage for GraphQL routing, GraphQL::Client normalization, health-check _msearch, and indexer source-event-version lookups.
ruby -c passed on the touched Ruby files.
git diff --check is clean.
I could not run the full Ruby test suite in this workspace because the local bundle is incomplete and is missing gems such as aws_lambda_ric, faker, httpx, and nokogiri.

myronmarston

Left some mild feedback. I'm also curious if you can describe how you plan to use these ids (as I've not seen them before)? During an incident will you list the tasks on nodes having problems (which would include the opaque ids, I guess?) and then use those opaque ids to cancel requests? Are the properties that matter uniqueness and ability to correlate it to a client of your application so that you can understand the consequences of cancelling a task?

Elasticsearch and OpenSearch propagate the X-Opaque-Id HTTP header into the tasks API (GET _tasks?detailed=true), search slow logs, and deprecation logs. For any non-trivial deployment, being able to attribute stuck/slow/expensive datastore work back to the originating application is essential operational visibility. Operators can locate a specific task by opaque id in the tasks API output and cancel it individually via POST _tasks/<task_id>/_cancel. Motivated by a production incident on an OpenSearch 2.19 cluster where a single data node ended up holding 13 stuck search tasks for hours; without X-Opaque-Id we could not identify the originating application. Changes: - Add ElasticGraph::Support::OpaqueID.build_header(parts): joins string parts with ";", drops empty values, and replaces ";" / "\r" / "\n" inside parts with "," so each part stays a single segment. - Add OPAQUE_ID_HEADER = "X-Opaque-Id" constant in elastic_graph/constants. - DatastoreSearchRouter#msearch accepts an opaque_id_parts: kwarg (default ["elasticgraph-graphql"]) and sets X-Opaque-Id when the computed value is non-empty. - QueryExecutor sets :elastic_graph_client in the resolver context so QuerySource.execute_many can derive opaque_id_parts of ["elasticgraph-graphql", "client=<name>", *client.extra_opaque_id_parts, "query=<fingerprint>"] for datastore msearch calls. - Client (Data.define) gains extra_opaque_id_parts (default []) so an HTTPRequest resolver can attach per-client tags (e.g. tenant=acme). - HealthChecker tags its msearch with ["elasticgraph-health_check"]. - DatastoreIndexingRouter#source_event_versions_in_index tags its msearch with ["elasticgraph-indexer", "purpose=source_event_versions", "cluster=<name>"]. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

myronmarston · 2026-04-28T15:23:53Z

Would it be useful to set OPAQUE_ID from the bulk calls as well?

myronmarston · 2026-04-28T15:27:22Z

+                  OPAQUE_ID_HEADER => Support::OpaqueID.build_header([
+                    "elasticgraph-indexer",
+                    "purpose=source_event_versions",
+                    "cluster=#{client_name}"


It's not clear to me why cluster is useful here. At the point you're seeing these opaque ids (e.g. in an OpenSearch task list) you'll know what cluster you're hitting, right? Or are you expecting these opaque ids to wind up in a centralized logging system that contains opaque ids data from multiple clusters?

In terms of other potential useful bits of data to include...maybe include the count of operations per type? e.g. widget_ops=7;team_ops=12

That would indicate at a glance how large of request it was and which types the request is for.

myronmarston · 2026-04-28T15:30:32Z


      def datastore_msearch(queries)
-        @datastore_search_router.msearch(queries)
+        @datastore_search_router.msearch(queries, opaque_id_parts: ["elasticgraph-health_check"])


Is it worth including of types being queried here? You could change the caller from datastore_msearch(recency_queries_by_type_name.values) to datastore_msearch(recency_queries_by_type_name) and then you'd have the queried types as keys and the queries themselves as values. The types could nicely go in the opaque id.

myronmarston · 2026-04-28T15:32:36Z

    )

-    @datastore_router.msearch([query]).fetch(query).documents.first
+    @datastore_router.msearch([query], opaque_id_parts: ["elasticgraph-apollo", "resolver=product"]).fetch(query).documents.first


This doesn't ship as part of ElasticGraph--it's just part of the implementation we use to run the apollo federation subgraph test suite against elasticgraph-apollo. As such, it only runs locally and on CI and I think we can simplify and revert this.

myronmarston · 2026-04-28T15:55:05Z

+      def msearch: (
+        ::Array[DatastoreQuery],
+        ?query_tracker: QueryDetailsTracker,
+        ?opaque_id_parts: ::Array[::String]


Suggested change

?opaque_id_parts: ::Array[::String]

?opaque_id_parts: ::Array[::String?]

...since Support::OpaqueID.build_header accepts an array of ::String?.

myronmarston · 2026-04-28T16:00:12Z

          # timeout in a specific header that the `SupportTimeouts` Faraday middleware will use.
          headers = {TIMEOUT_MS_HEADER => msearch_request_timeout_from(queries)&.to_s}.compact
+          opaque_id = Support::OpaqueID.build_header(opaque_id_parts)
+          headers[OPAQUE_ID_HEADER] = opaque_id unless opaque_id.empty?


It feels odd for build_header to return an empty string. Should we have it return nil if it can't build anything meaningful? Then this could be:

if (opaque_id = Support::OpaqueID.build_header(opaque_id_parts)) headers[OPAQUE_ID_HEADER] = opaque_id end

myronmarston · 2026-04-28T16:12:27Z

+
+        # `:elastic_graph_client` is set by `QueryExecutor` but not by all entry points
+        # (e.g. Apollo's `_entities` resolver dispatches through `execute_many` without
+        # going through `QueryExecutor` first). When it's absent we fall back to a


Apollo's _entities resolver dispatches through execute_many without
going through QueryExecutor first

I don't think this is accurate. QueryExecutor is the entry point before any of the resolvers get involved. It delegates to Query#result which invokes all the configured resolvers. When the resolvers execute, they get context which has al the context entries set by QueryExecutor. I'd expect for_context[:elastic_graph_client] to be populated when we get here via the Apollo _entities resolver.

What am I missing? (Or did an LLM hallucinate this?)

myronmarston · 2026-04-28T16:17:56Z

+            parts = QuerySource.send(:datastore_opaque_id_parts_for, for_context)
+
+            expect(parts).to eq(["elasticgraph-graphql", "client=client-name", "query=(no query string)"])
+          end


If we're going to test query_string being nil, it would be good to also include a test for when query_string is non-nil.

That said, the original reason we did query.query_string ? query.fingerprint : "(no query string)" for the fingerprint no longer applies. It was a work around from back when I reported this bug:

rmosolgo/graphql-ruby#4942

That was fixed in 2024 (and we are more than up-to-date with the GraphQL gem) so I think we can remove that bit of conditional logic entirely, from both query_executor.rb and from query_source.rb. We can just use query.fingerprint with no need to check query.string. Then this test can probably go away.

myronmarston · 2026-04-28T16:18:48Z

+        # `datastore_opaque_id_parts_for` is driven off `for_context[:elastic_graph_client]`,
+        # which `QueryExecutor` sets for every query it runs. Some entry points (notably
+        # Apollo's `_entities` resolver test harness) reach `QuerySource.execute_many`
+        # without going through `QueryExecutor`, so the fallback needs to keep working.


Some entry points (notably Apollo's _entities resolver test harness) reach QuerySource.execute_many without going through QueryExecutor

This may need to be updated if (as per my comment above) it's inaccurate.

myronmarston · 2026-04-28T16:22:13Z

+          it "includes `(no query string)` in place of the fingerprint when the query has no source string" do
+            client = Client.new(source_description: "client-description", name: "client-name")
+            graphql_query = instance_double(::GraphQL::Query, query_string: nil, fingerprint: "unused")
+            for_context = instance_double(::GraphQL::Query::Context, :[] => client, :query => graphql_query)


It's a little weird to stub :[] to always return client--if the implementation changes to access different entries this spec is going to act weird.

It's not too hard to construct a Context--we do it in some other specs (if you search for Context.new you'll see how). I'd prefer using the real thing here.

jwils requested review from BrianSigafoos-SQ, ayousufi, bsorbo, jwondrusch and myronmarston as code owners April 17, 2026 16:12

myronmarston requested changes Apr 21, 2026

View reviewed changes

jwils force-pushed the joshuaw/add-opaque-id-middleware branch 2 times, most recently from 2fcc7a8 to ac08f3c Compare April 21, 2026 20:52

jwils changed the title ~~Add Faraday middleware for tagging requests with X-Opaque-Id~~ Add automatic X-Opaque-Id headers to datastore msearch requests Apr 22, 2026

myronmarston requested changes Apr 23, 2026

View reviewed changes

jwils force-pushed the joshuaw/add-opaque-id-middleware branch 5 times, most recently from 4124689 to f6eaa34 Compare April 23, 2026 16:45

jwils force-pushed the joshuaw/add-opaque-id-middleware branch from f6eaa34 to 6b3c83f Compare April 28, 2026 14:26

Define OpaqueID helper as a class method

ed29667

myronmarston requested changes Apr 28, 2026

View reviewed changes

	?opaque_id_parts: ::Array[::String]
	?opaque_id_parts: ::Array[::String?]

Conversation

jwils commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changed

Why this shape

Notes

Verification

Uh oh!

myronmarston left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jwils commented Apr 17, 2026 •

edited

Loading