Rename document_type_stored_in → document_types_stored_in returning Set by marcdaniels-toast · Pull Request #1159 · block/elasticgraph

marcdaniels-toast · 2026-04-29T01:48:48Z

This PR was created as a companion to #1150 (an initial step). It paves the way for the non_subtypes_in_shared_index method by ensuring the GraphQL schema correctly tracks all document types per index, rather than only the first one seen.

`indexed_document_types_by_index_definition_name` now returns `Set` per index

Previously, the hash stored only the first type seen for a given index name (via ||=). With index inheritance, multiple concrete types can share the same index — so the old approach would non-deterministically drop all but one.

Changed to accumulate all indexed document types per index into a Set. This is required for non_subtypes_in_shared_index to work correctly: that method uses this hash to enumerate all types sharing an index when deciding whether a __typename filter is needed on abstract type queries. With the old approach, other types sharing the same index could be silently omitted, leading to incorrect or missing __typename filters.

`document_type_stored_in` → `document_types_stored_in`

Renamed to reflect that the method now returns a Set[Type] rather than a single Type. Updated the two callers:

graphql_adapter_builder (resolve_type): uses .first — safe here because this branch is only reached for individually-indexed types (no __typename in the document means no shared index), so the set always contains exactly one type.
search_response_adapter_builder (all_highlights): now prefers __typename from the document source when present. Documents stored in a shared index carry __typename to identify their concrete type; using it directly ensures highlight field paths are resolved against the right type rather than whichever abstract or sibling type .first might return. Falls back to .first for individually-indexed types.

Test plan

New unit test in search_response_adapter_builder_spec.rb verifies that all_highlights uses __typename to resolve the concrete type: a hit from the retailers index with __typename: "OnlineStore" correctly resolves a highlight on url (an OnlineStore-specific field absent from the Retailer interface — without the fix, the wrong type would be used and the highlight would be silently dropped)
In Add query-time __typename filtering and end-to-end tests for index inheritance #1150, I plan to add an all_highlights acceptance test to search_spec.rb for shared-index types that would fail without the search_response_adapter_builder change made here. That acceptance test requires additions to the stock schema types and remaining index inheritance support that I didn't want to cloud this more focused PR step.

Changes indexed_document_types_by_index_definition_name to accumulate all indexed document types per index into a Set (was: store only the first type seen via ||=). This is necessary for correctness with index inheritance, where multiple concrete types share the same index and all need to be visible to callers like non_subtypes_in_shared_index. Renames document_type_stored_in → document_types_stored_in to return the full Set. Updates the two callers: - graphql_adapter_builder: .first is safe — this branch is only reached for individually-indexed types, which always have exactly one type per index (no __typename in the document means no shared index). - search_response_adapter_builder: use __typename from the document to resolve the concrete type directly when present; shared-index types carry __typename so field paths in all_highlights resolve to the right type. Falls back to .first for individually-indexed types. Generated with Claude Code

Required by non_subtypes_in_shared_index (on the index inheritance branch), which needs to enumerate all types sharing an index when deciding whether a __typename filter is needed on abstract type queries. Generated with Claude Code

myronmarston

Thanks for spinning this off!

myronmarston · 2026-04-29T17:00:09Z

+              expect(results.dig("data", "retailers", "edges")).to contain_exactly(
+                {"all_highlights" => [{"path" => ["url"], "snippets" => ["https://example.com"]}]}
+              )
+            end


This test demonstrates that all_highlights works correctly in this case if __typename is returned in the search hit's _source. But that will only happen if:

The __typename field is included in the mapping to begin with.

The indexer properly populates __typename

The search query generated for this GraphQL query requests that __typename be included in the _source response.

If any of those things don't hold true then this won't actually work in practice, even though this test passes. This test, by its nature as a unit test which stubs the datastore client, isn't capable of answering if this case works for real.

It'd give me much greater confidence that this will work for real if we could include some end-to-end acceptance test coverage of this case. I'm guessing you may need some of the other changes from #1150 (e.g. the config/schema updates) to be able to cover this with an acceptance test. If that's correct, can you make a point to do a follow-up PR adding that acceptance test coverage? Otherwise, if it's easy to add acceptance coverage of this now, please do that as part of this PR!

myronmarston · 2026-04-29T17:06:30Z

+      # Returns a hash mapping each index definition name to all indexed document types that use that index.
+      # Multiple types may map to the same index when abstract types share an index with their concrete subtypes
+      # via index inheritance.
+      def indexed_document_types_by_index_definition_name


I think we should keep this private. document_types_stored_in is meant to be the public API facade that sits on top of this. It's a bit safer/better than allowing direct usage of this method since it guards against common mistakes and provides friendlier errors.

When you move it back to private can you also add a note to the doc comment about how it's intentionally private and public callers should use document_types_stored_in instead?

myronmarston · 2026-04-29T17:09:39Z

+            (hash[index_def.name] ||= Set.new) << type
+          end
+        end.freeze
+      end


I'd favor a slightly simplified form for this:

def indexed_document_types_by_index_definition_name @indexed_document_types_by_index_definition_name ||= indexed_document_types.each_with_object(::Hash.new { |h, k| h[k] = ::Set.new }) do |type, hash| type.index_definitions.each do |index_def| hash[index_def.name] << type end end.freeze end

::Hash.new { ... } makes it easy to build up a data structure like this as it means that hash[new_key] gets an empty set so you don't have to deal with the slightly messy (hash[index_def.name] ||= Set.new).

One potential downside to be aware of, though: ::Hash.new { ... } can create a memory leak if you naively do lookups on arbitrary keys (e.g. directly from unvalidated client requests). Each lookup of an unfound key adds a new entry to the hash which will never get cleaned up:

irb(main):001> hash = ::Hash.new { |h, k| h[k] = ::Set.new } => {} irb(main):002> hash[:foo] => Set[] irb(main):003> hash[:bar] => Set[] irb(main):004> hash[:bazz] => Set[] irb(main):005> hash => {foo: Set[], bar: Set[], bazz: Set[]}

However, that's not a risk here, because the .freeze prevents the hash from growing further:

irb(main):006> hash.freeze => {foo: Set[], bar: Set[], bazz: Set[]} irb(main):007> hash[:new_key] (irb):1:in 'block in <top (required)>': can't modify frozen Hash: {foo: Set[], bar: Set[], bazz: Set[]} (FrozenError) from (irb):7:in '<main>' from /Users/myron.marston/.asdf/installs/ruby/4.0.0/lib/ruby/gems/4.0.0/gems/irb-1.18.0/exe/irb:9:in '<top (required)>' from /Users/myron.marston/.asdf/installs/ruby/4.0.0/lib/ruby/4.0.0/rubygems.rb:303:in 'Kernel#load' from /Users/myron.marston/.asdf/installs/ruby/4.0.0/lib/ruby/4.0.0/rubygems.rb:303:in 'Gem.activate_and_load_bin_path' from /Users/myron.marston/.asdf/installs/ruby/4.0.0/bin/irb:25:in '<main>'

Also, hash.fetch(key) (used from document_types_stored_in) never uses the hash's default value proc:

irb(main):008> hash = ::Hash.new { |h, k| h[k] = ::Set.new } => {} irb(main):009> hash[:foo] => Set[] irb(main):010> hash.fetch(:bar) (irb):10:in 'Hash#fetch': key not found: :bar (KeyError) from (irb):10:in '<main>' from /Users/myron.marston/.asdf/installs/ruby/4.0.0/lib/ruby/gems/4.0.0/gems/irb-1.18.0/exe/irb:9:in '<top (required)>' from /Users/myron.marston/.asdf/installs/ruby/4.0.0/lib/ruby/4.0.0/rubygems.rb:303:in 'Kernel#load' from /Users/myron.marston/.asdf/installs/ruby/4.0.0/lib/ruby/4.0.0/rubygems.rb:303:in 'Gem.activate_and_load_bin_path' from /Users/myron.marston/.asdf/installs/ruby/4.0.0/bin/irb:25:in '<main>' irb(main):011> hash.fetch(:bar, 17) => 17 irb(main):012> hash => {foo: Set[]}

myronmarston · 2026-04-29T17:15:42Z


-      describe "#document_type_stored_in" do
-        it "returns the type stored in the named index" do
+      describe "#document_types_stored_in" do


Can we add an example here where it returns multiple types?

myronmarston · 2026-04-29T17:17:57Z

      end

-      def document_type_stored_in(index_definition_name)
+      def document_types_stored_in(index_definition_name)


Can we add some doc comments to this method? In particular I think it's useful for it to document that the returned set will never be empty (it'll raise an error in that case instead).

marcdaniels-toast requested review from BrianSigafoos-SQ, ayousufi, bsorbo, jwils, jwondrusch and myronmarston as code owners April 29, 2026 01:48

myronmarston requested changes Apr 29, 2026

View reviewed changes

myronmarston reviewed Apr 29, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rename document_type_stored_in → document_types_stored_in returning Set#1159

Rename document_type_stored_in → document_types_stored_in returning Set#1159
marcdaniels-toast wants to merge 2 commits intoblock:mainfrom
marcdaniels-toast:mdaniels/document-types-stored-in

marcdaniels-toast commented Apr 29, 2026

Uh oh!

myronmarston left a comment

Uh oh!

myronmarston Apr 29, 2026

Uh oh!

myronmarston Apr 29, 2026

Uh oh!

myronmarston Apr 29, 2026

Uh oh!

myronmarston Apr 29, 2026

Uh oh!

myronmarston Apr 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

marcdaniels-toast commented Apr 29, 2026

indexed_document_types_by_index_definition_name now returns Set per index

document_type_stored_in → document_types_stored_in

Test plan

Uh oh!

myronmarston left a comment

Choose a reason for hiding this comment

Uh oh!

myronmarston Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

myronmarston Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

myronmarston Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

myronmarston Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

myronmarston Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

`indexed_document_types_by_index_definition_name` now returns `Set` per index

`document_type_stored_in` → `document_types_stored_in`