Skip to content

Conversation

@Fottas
Copy link

@Fottas Fottas commented Dec 3, 2025

Overview

This is the third PR for the IN predicate optimization feature, implementing IN predicate value filtering for sharding keys with literal values.

Changes

Added ShardingInPredicateValue.java

  • Represents a single value in an IN predicate with its target route information
  • Tracks whether value is a parameter marker or literal via isParameter flag
  • Provides belongsToRoute() method to check if value belongs to a specific route
  • Supports orphan value marking for values that don't match any shard

Added ShardingInPredicateToken.java

  • Implements Substitutable, RouteUnitAware, and ParameterFilterable interfaces
  • Filters IN clause values per route using embedded route information in each value
  • Optimizes single-value cases by converting IN (value) to = value
  • Provides getRemovedParameterIndices() for parameter filtering integration

Added ShardingInPredicateTokenGenerator.java

  • Detects IN predicates on sharding key columns in SELECT statements
  • Extracts literal values from IN expressions (skips parameter markers in this version)
  • Invokes StandardShardingAlgorithm.doSharding() to calculate target routes for each value
  • Generates ShardingInPredicateToken with route distribution information
  • Skips optimization for non-sharding tables and non-standard sharding strategies

Modified ShardingTokenGenerateBuilder.java

  • Registered ShardingInPredicateTokenGenerator in the token generator list

Added comprehensive test coverage

  • ShardingInPredicateValueTest: 7 unit tests for value model behavior
  • ShardingInPredicateTokenTest: 15 unit tests for token SQL generation and filtering
  • ShardingInPredicateTokenGeneratorTest: 6 tests covering detection, generation, and edge cases
  • Total: 28 test cases across 3 test classes, all passing

Design Rationale

  • Route information embedded in values: Each value knows its target routes, simplifying token logic
  • Leverages PR 2 infrastructure: Uses ParameterFilterable interface for parameter filtering
  • Single-value optimization: Automatically converts IN (value) to = value for better database performance
  • Standard algorithm only: Supports StandardShardingAlgorithm implementations (MOD, INLINE, HASH, etc.)
  • Literal values only: This simplified version focuses on literal values to keep the change manageable

Example

Before optimization:

-- Logical SQL
SELECT * FROM t_order WHERE user_id IN (1, 2, 3, 4, 5)

-- Both shards receive all values
SELECT * FROM t_order_0 WHERE user_id IN (1, 2, 3, 4, 5)
SELECT * FROM t_order_1 WHERE user_id IN (1, 2, 3, 4, 5)

After optimization:

-- Each shard receives only its values (assuming MOD algorithm)
SELECT * FROM t_order_0 WHERE user_id IN (2, 4)
SELECT * FROM t_order_1 WHERE user_id IN (1, 3, 5)

Related Issue

Part of #36454

Next Steps

PR 4 will extend this implementation to support parameter markers (?) in IN predicates, enabling prepared statement optimization.

@Fottas Fottas force-pushed the feature/pr3-data-structures branch from 28bc65f to 37f1732 Compare December 3, 2025 03:43
Copy link
Member

@terrymanu terrymanu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now the changes are spread across multiple PRs, which makes it hard to assess the overall design and impact. Please consolidate the main code into a single, complete PR so we can review the design, code, and tests together. Otherwise, each fragmented PR lacks context and cannot be merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants