Rename document_type_stored_in → document_types_stored_in returning Set#1159
Rename document_type_stored_in → document_types_stored_in returning Set#1159marcdaniels-toast wants to merge 2 commits intoblock:mainfrom
Conversation
Changes indexed_document_types_by_index_definition_name to accumulate all indexed document types per index into a Set (was: store only the first type seen via ||=). This is necessary for correctness with index inheritance, where multiple concrete types share the same index and all need to be visible to callers like non_subtypes_in_shared_index. Renames document_type_stored_in → document_types_stored_in to return the full Set. Updates the two callers: - graphql_adapter_builder: .first is safe — this branch is only reached for individually-indexed types, which always have exactly one type per index (no __typename in the document means no shared index). - search_response_adapter_builder: use __typename from the document to resolve the concrete type directly when present; shared-index types carry __typename so field paths in all_highlights resolve to the right type. Falls back to .first for individually-indexed types. Generated with Claude Code
Required by non_subtypes_in_shared_index (on the index inheritance branch), which needs to enumerate all types sharing an index when deciding whether a __typename filter is needed on abstract type queries. Generated with Claude Code
myronmarston
left a comment
There was a problem hiding this comment.
Thanks for spinning this off!
| expect(results.dig("data", "retailers", "edges")).to contain_exactly( | ||
| {"all_highlights" => [{"path" => ["url"], "snippets" => ["https://example.com"]}]} | ||
| ) | ||
| end |
There was a problem hiding this comment.
This test demonstrates that all_highlights works correctly in this case if __typename is returned in the search hit's _source. But that will only happen if:
- The
__typenamefield is included in the mapping to begin with. - The indexer properly populates
__typename - The search query generated for this GraphQL query requests that
__typenamebe included in the_sourceresponse.
If any of those things don't hold true then this won't actually work in practice, even though this test passes. This test, by its nature as a unit test which stubs the datastore client, isn't capable of answering if this case works for real.
It'd give me much greater confidence that this will work for real if we could include some end-to-end acceptance test coverage of this case. I'm guessing you may need some of the other changes from #1150 (e.g. the config/schema updates) to be able to cover this with an acceptance test. If that's correct, can you make a point to do a follow-up PR adding that acceptance test coverage? Otherwise, if it's easy to add acceptance coverage of this now, please do that as part of this PR!
| # Returns a hash mapping each index definition name to all indexed document types that use that index. | ||
| # Multiple types may map to the same index when abstract types share an index with their concrete subtypes | ||
| # via index inheritance. | ||
| def indexed_document_types_by_index_definition_name |
There was a problem hiding this comment.
I think we should keep this private. document_types_stored_in is meant to be the public API facade that sits on top of this. It's a bit safer/better than allowing direct usage of this method since it guards against common mistakes and provides friendlier errors.
When you move it back to private can you also add a note to the doc comment about how it's intentionally private and public callers should use document_types_stored_in instead?
| (hash[index_def.name] ||= Set.new) << type | ||
| end | ||
| end.freeze | ||
| end |
There was a problem hiding this comment.
I'd favor a slightly simplified form for this:
def indexed_document_types_by_index_definition_name
@indexed_document_types_by_index_definition_name ||=
indexed_document_types.each_with_object(::Hash.new { |h, k| h[k] = ::Set.new }) do |type, hash|
type.index_definitions.each do |index_def|
hash[index_def.name] << type
end
end.freeze
end::Hash.new { ... } makes it easy to build up a data structure like this as it means that hash[new_key] gets an empty set so you don't have to deal with the slightly messy (hash[index_def.name] ||= Set.new).
One potential downside to be aware of, though: ::Hash.new { ... } can create a memory leak if you naively do lookups on arbitrary keys (e.g. directly from unvalidated client requests). Each lookup of an unfound key adds a new entry to the hash which will never get cleaned up:
irb(main):001> hash = ::Hash.new { |h, k| h[k] = ::Set.new }
=> {}
irb(main):002> hash[:foo]
=> Set[]
irb(main):003> hash[:bar]
=> Set[]
irb(main):004> hash[:bazz]
=> Set[]
irb(main):005> hash
=> {foo: Set[], bar: Set[], bazz: Set[]}
However, that's not a risk here, because the .freeze prevents the hash from growing further:
irb(main):006> hash.freeze
=> {foo: Set[], bar: Set[], bazz: Set[]}
irb(main):007> hash[:new_key]
(irb):1:in 'block in <top (required)>': can't modify frozen Hash: {foo: Set[], bar: Set[], bazz: Set[]} (FrozenError)
from (irb):7:in '<main>'
from /Users/myron.marston/.asdf/installs/ruby/4.0.0/lib/ruby/gems/4.0.0/gems/irb-1.18.0/exe/irb:9:in '<top (required)>'
from /Users/myron.marston/.asdf/installs/ruby/4.0.0/lib/ruby/4.0.0/rubygems.rb:303:in 'Kernel#load'
from /Users/myron.marston/.asdf/installs/ruby/4.0.0/lib/ruby/4.0.0/rubygems.rb:303:in 'Gem.activate_and_load_bin_path'
from /Users/myron.marston/.asdf/installs/ruby/4.0.0/bin/irb:25:in '<main>'
Also, hash.fetch(key) (used from document_types_stored_in) never uses the hash's default value proc:
irb(main):008> hash = ::Hash.new { |h, k| h[k] = ::Set.new }
=> {}
irb(main):009> hash[:foo]
=> Set[]
irb(main):010> hash.fetch(:bar)
(irb):10:in 'Hash#fetch': key not found: :bar (KeyError)
from (irb):10:in '<main>'
from /Users/myron.marston/.asdf/installs/ruby/4.0.0/lib/ruby/gems/4.0.0/gems/irb-1.18.0/exe/irb:9:in '<top (required)>'
from /Users/myron.marston/.asdf/installs/ruby/4.0.0/lib/ruby/4.0.0/rubygems.rb:303:in 'Kernel#load'
from /Users/myron.marston/.asdf/installs/ruby/4.0.0/lib/ruby/4.0.0/rubygems.rb:303:in 'Gem.activate_and_load_bin_path'
from /Users/myron.marston/.asdf/installs/ruby/4.0.0/bin/irb:25:in '<main>'
irb(main):011> hash.fetch(:bar, 17)
=> 17
irb(main):012> hash
=> {foo: Set[]}
|
|
||
| describe "#document_type_stored_in" do | ||
| it "returns the type stored in the named index" do | ||
| describe "#document_types_stored_in" do |
There was a problem hiding this comment.
Can we add an example here where it returns multiple types?
| end | ||
|
|
||
| def document_type_stored_in(index_definition_name) | ||
| def document_types_stored_in(index_definition_name) |
There was a problem hiding this comment.
Can we add some doc comments to this method? In particular I think it's useful for it to document that the returned set will never be empty (it'll raise an error in that case instead).
This PR was created as a companion to #1150 (an initial step). It paves the way for the
non_subtypes_in_shared_indexmethod by ensuring the GraphQL schema correctly tracks all document types per index, rather than only the first one seen.indexed_document_types_by_index_definition_namenow returnsSetper indexPreviously, the hash stored only the first type seen for a given index name (via
||=). With index inheritance, multiple concrete types can share the same index — so the old approach would non-deterministically drop all but one.Changed to accumulate all indexed document types per index into a
Set. This is required fornon_subtypes_in_shared_indexto work correctly: that method uses this hash to enumerate all types sharing an index when deciding whether a__typenamefilter is needed on abstract type queries. With the old approach, other types sharing the same index could be silently omitted, leading to incorrect or missing__typenamefilters.document_type_stored_in→document_types_stored_inRenamed to reflect that the method now returns a
Set[Type]rather than a singleType. Updated the two callers:graphql_adapter_builder(resolve_type): uses.first— safe here because this branch is only reached for individually-indexed types (no__typenamein the document means no shared index), so the set always contains exactly one type.search_response_adapter_builder(all_highlights): now prefers__typenamefrom the document source when present. Documents stored in a shared index carry__typenameto identify their concrete type; using it directly ensures highlight field paths are resolved against the right type rather than whichever abstract or sibling type.firstmight return. Falls back to.firstfor individually-indexed types.Test plan
search_response_adapter_builder_spec.rbverifies thatall_highlightsuses__typenameto resolve the concrete type: a hit from theretailersindex with__typename: "OnlineStore"correctly resolves a highlight onurl(anOnlineStore-specific field absent from theRetailerinterface — without the fix, the wrong type would be used and the highlight would be silently dropped)all_highlightsacceptance test tosearch_spec.rbfor shared-index types that would fail without thesearch_response_adapter_builderchange made here. That acceptance test requires additions to the stock schema types and remaining index inheritance support that I didn't want to cloud this more focused PR step.