Skip to content

Rename document_type_stored_in → document_types_stored_in returning Set#1159

Open
marcdaniels-toast wants to merge 2 commits intoblock:mainfrom
marcdaniels-toast:mdaniels/document-types-stored-in
Open

Rename document_type_stored_in → document_types_stored_in returning Set#1159
marcdaniels-toast wants to merge 2 commits intoblock:mainfrom
marcdaniels-toast:mdaniels/document-types-stored-in

Conversation

@marcdaniels-toast
Copy link
Copy Markdown
Contributor

This PR was created as a companion to #1150 (an initial step). It paves the way for the non_subtypes_in_shared_index method by ensuring the GraphQL schema correctly tracks all document types per index, rather than only the first one seen.

indexed_document_types_by_index_definition_name now returns Set per index

Previously, the hash stored only the first type seen for a given index name (via ||=). With index inheritance, multiple concrete types can share the same index — so the old approach would non-deterministically drop all but one.

Changed to accumulate all indexed document types per index into a Set. This is required for non_subtypes_in_shared_index to work correctly: that method uses this hash to enumerate all types sharing an index when deciding whether a __typename filter is needed on abstract type queries. With the old approach, other types sharing the same index could be silently omitted, leading to incorrect or missing __typename filters.

document_type_stored_indocument_types_stored_in

Renamed to reflect that the method now returns a Set[Type] rather than a single Type. Updated the two callers:

  • graphql_adapter_builder (resolve_type): uses .first — safe here because this branch is only reached for individually-indexed types (no __typename in the document means no shared index), so the set always contains exactly one type.
  • search_response_adapter_builder (all_highlights): now prefers __typename from the document source when present. Documents stored in a shared index carry __typename to identify their concrete type; using it directly ensures highlight field paths are resolved against the right type rather than whichever abstract or sibling type .first might return. Falls back to .first for individually-indexed types.

Test plan

  • New unit test in search_response_adapter_builder_spec.rb verifies that all_highlights uses __typename to resolve the concrete type: a hit from the retailers index with __typename: "OnlineStore" correctly resolves a highlight on url (an OnlineStore-specific field absent from the Retailer interface — without the fix, the wrong type would be used and the highlight would be silently dropped)
  • In Add query-time __typename filtering and end-to-end tests for index inheritance #1150, I plan to add an all_highlights acceptance test to search_spec.rb for shared-index types that would fail without the search_response_adapter_builder change made here. That acceptance test requires additions to the stock schema types and remaining index inheritance support that I didn't want to cloud this more focused PR step.

Changes indexed_document_types_by_index_definition_name to accumulate
all indexed document types per index into a Set (was: store only the
first type seen via ||=). This is necessary for correctness with index
inheritance, where multiple concrete types share the same index and all
need to be visible to callers like non_subtypes_in_shared_index.

Renames document_type_stored_in → document_types_stored_in to return
the full Set. Updates the two callers:

- graphql_adapter_builder: .first is safe — this branch is only reached
  for individually-indexed types, which always have exactly one type per
  index (no __typename in the document means no shared index).
- search_response_adapter_builder: use __typename from the document to
  resolve the concrete type directly when present; shared-index types
  carry __typename so field paths in all_highlights resolve to the right
  type. Falls back to .first for individually-indexed types.

Generated with Claude Code
Required by non_subtypes_in_shared_index (on the index inheritance branch),
which needs to enumerate all types sharing an index when deciding whether a
__typename filter is needed on abstract type queries.

Generated with Claude Code
Copy link
Copy Markdown
Collaborator

@myronmarston myronmarston left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for spinning this off!

expect(results.dig("data", "retailers", "edges")).to contain_exactly(
{"all_highlights" => [{"path" => ["url"], "snippets" => ["https://example.com"]}]}
)
end
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test demonstrates that all_highlights works correctly in this case if __typename is returned in the search hit's _source. But that will only happen if:

  • The __typename field is included in the mapping to begin with.
  • The indexer properly populates __typename
  • The search query generated for this GraphQL query requests that __typename be included in the _source response.

If any of those things don't hold true then this won't actually work in practice, even though this test passes. This test, by its nature as a unit test which stubs the datastore client, isn't capable of answering if this case works for real.

It'd give me much greater confidence that this will work for real if we could include some end-to-end acceptance test coverage of this case. I'm guessing you may need some of the other changes from #1150 (e.g. the config/schema updates) to be able to cover this with an acceptance test. If that's correct, can you make a point to do a follow-up PR adding that acceptance test coverage? Otherwise, if it's easy to add acceptance coverage of this now, please do that as part of this PR!

# Returns a hash mapping each index definition name to all indexed document types that use that index.
# Multiple types may map to the same index when abstract types share an index with their concrete subtypes
# via index inheritance.
def indexed_document_types_by_index_definition_name
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should keep this private. document_types_stored_in is meant to be the public API facade that sits on top of this. It's a bit safer/better than allowing direct usage of this method since it guards against common mistakes and provides friendlier errors.

When you move it back to private can you also add a note to the doc comment about how it's intentionally private and public callers should use document_types_stored_in instead?

(hash[index_def.name] ||= Set.new) << type
end
end.freeze
end
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd favor a slightly simplified form for this:

def indexed_document_types_by_index_definition_name
  @indexed_document_types_by_index_definition_name ||=
    indexed_document_types.each_with_object(::Hash.new { |h, k| h[k] = ::Set.new }) do |type, hash|
      type.index_definitions.each do |index_def|
        hash[index_def.name] << type
      end
    end.freeze
end

::Hash.new { ... } makes it easy to build up a data structure like this as it means that hash[new_key] gets an empty set so you don't have to deal with the slightly messy (hash[index_def.name] ||= Set.new).

One potential downside to be aware of, though: ::Hash.new { ... } can create a memory leak if you naively do lookups on arbitrary keys (e.g. directly from unvalidated client requests). Each lookup of an unfound key adds a new entry to the hash which will never get cleaned up:

irb(main):001> hash = ::Hash.new { |h, k| h[k] = ::Set.new }
=> {}
irb(main):002> hash[:foo]
=> Set[]
irb(main):003> hash[:bar]
=> Set[]
irb(main):004> hash[:bazz]
=> Set[]
irb(main):005> hash
=> {foo: Set[], bar: Set[], bazz: Set[]}

However, that's not a risk here, because the .freeze prevents the hash from growing further:

irb(main):006> hash.freeze
=> {foo: Set[], bar: Set[], bazz: Set[]}
irb(main):007> hash[:new_key]
(irb):1:in 'block in <top (required)>': can't modify frozen Hash: {foo: Set[], bar: Set[], bazz: Set[]} (FrozenError)
        from (irb):7:in '<main>'
        from /Users/myron.marston/.asdf/installs/ruby/4.0.0/lib/ruby/gems/4.0.0/gems/irb-1.18.0/exe/irb:9:in '<top (required)>'
        from /Users/myron.marston/.asdf/installs/ruby/4.0.0/lib/ruby/4.0.0/rubygems.rb:303:in 'Kernel#load'
        from /Users/myron.marston/.asdf/installs/ruby/4.0.0/lib/ruby/4.0.0/rubygems.rb:303:in 'Gem.activate_and_load_bin_path'
        from /Users/myron.marston/.asdf/installs/ruby/4.0.0/bin/irb:25:in '<main>'

Also, hash.fetch(key) (used from document_types_stored_in) never uses the hash's default value proc:

irb(main):008> hash = ::Hash.new { |h, k| h[k] = ::Set.new }
=> {}
irb(main):009> hash[:foo]
=> Set[]
irb(main):010> hash.fetch(:bar)
(irb):10:in 'Hash#fetch': key not found: :bar (KeyError)
        from (irb):10:in '<main>'
        from /Users/myron.marston/.asdf/installs/ruby/4.0.0/lib/ruby/gems/4.0.0/gems/irb-1.18.0/exe/irb:9:in '<top (required)>'
        from /Users/myron.marston/.asdf/installs/ruby/4.0.0/lib/ruby/4.0.0/rubygems.rb:303:in 'Kernel#load'
        from /Users/myron.marston/.asdf/installs/ruby/4.0.0/lib/ruby/4.0.0/rubygems.rb:303:in 'Gem.activate_and_load_bin_path'
        from /Users/myron.marston/.asdf/installs/ruby/4.0.0/bin/irb:25:in '<main>'
irb(main):011> hash.fetch(:bar, 17)
=> 17
irb(main):012> hash
=> {foo: Set[]}


describe "#document_type_stored_in" do
it "returns the type stored in the named index" do
describe "#document_types_stored_in" do
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add an example here where it returns multiple types?

end

def document_type_stored_in(index_definition_name)
def document_types_stored_in(index_definition_name)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add some doc comments to this method? In particular I think it's useful for it to document that the returned set will never be empty (it'll raise an error in that case instead).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants