Skip to content

[Store] Add ObjectDataType enum for type-aware metadata#1719

Open
he-yufeng wants to merge 9 commits intokvcache-ai:mainfrom
he-yufeng:feat/object-data-type-phase1
Open

[Store] Add ObjectDataType enum for type-aware metadata#1719
he-yufeng wants to merge 9 commits intokvcache-ai:mainfrom
he-yufeng:feat/object-data-type-phase1

Conversation

@he-yufeng
Copy link
Copy Markdown
Contributor

Summary

Implements Phase 1 of the ObjectDataType system discussed in #1673.

This adds a data type classification for objects in Mooncake Store, so the metadata layer knows what kind of data each object holds (KV cache, weights, tensors, etc.). This lays the groundwork for type-aware eviction policies, replication strategies, and monitoring — without changing any existing behavior.

What changed

  • ObjectDataType enum in types.hUNKNOWN(0), KVCACHE(1), TENSOR(2), WEIGHT(3), SAMPLE(4), ACTIVATION(5), GRADIENT(6), OPTIMIZER_STATE(7), METADATA(8)
  • ReplicateConfig.data_type field — clients can tag objects at PutStart time. Defaults to UNKNOWN, so existing callers are unaffected.
  • ObjectMetadata.data_type — propagated from PutStart config and stored as a const member.
  • Snapshot serialization — new format writes data_type as the 8th field. Deserialization auto-detects old (7 + replicas) vs new (8 + replicas) format, so rolling upgrades won't break.
  • Python bindingsObjectDataType enum and ReplicateConfig.data_type exposed via pybind11.
  • Unit tests — enum values, stream operator, config defaults, PutStart propagation, uint8 roundtrip.

Backward compatibility

Fully backward compatible. Clients that don't set data_type get UNKNOWN everywhere. Old snapshots deserialize correctly (missing field defaults to UNKNOWN).

Next steps (Phase 2+)

  • Wire data_type into eviction scoring
  • Add per-type metrics/monitoring
  • Expose in gRPC/REST APIs

Closes phase 1 of #1673.

…cache-ai#1673)

Introduce a data type classification system for objects stored in
Mooncake Store, as agreed in the RFC discussion on issue kvcache-ai#1673.

Changes:
- Add ObjectDataType enum (UNKNOWN, KVCACHE, TENSOR, WEIGHT, etc.)
  in mooncake-store/include/types.h
- Extend ReplicateConfig with a data_type field (default UNKNOWN)
- Propagate data_type through PutStart into ObjectMetadata
- Serialize/deserialize data_type in snapshot metadata, with backward
  compatibility for old snapshots (gracefully handles missing field)
- Expose ObjectDataType enum and data_type field in Python bindings
- Add unit tests for enum values, defaults, and PutStart propagation

Existing clients that don't set data_type will continue to work
unchanged (defaults to UNKNOWN everywhere).
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a foundational ObjectDataType system to Mooncake Store, enabling type-aware metadata for stored objects. This enhancement allows the system to understand the nature of the data it holds, such as KV caches, tensors, or weights, without altering existing behavior. This lays crucial groundwork for future advanced features like intelligent eviction policies, optimized replication strategies, and granular monitoring based on data types.

Highlights

  • Object Data Type Enum: Introduced ObjectDataType enum in types.h to classify stored objects (e.g., KVCache, Tensor, Weight), providing a structured way to categorize data within the store.
  • ReplicateConfig Integration: Added a data_type field to ReplicateConfig, allowing clients to tag objects with their specific data type during the PutStart operation.
  • Metadata Propagation: Ensured that the ObjectMetadata structure now stores the data_type propagated from PutStart configurations, making this information persistent with the object's metadata.
  • Snapshot Backward Compatibility: Implemented a new snapshot serialization format that includes the data_type, with auto-detection logic to ensure seamless rolling upgrades and backward compatibility with older snapshot formats.
  • Python Bindings: Exposed the new ObjectDataType enum and the ReplicateConfig.data_type field through pybind11, enabling Python clients to utilize and specify object data types.
  • Comprehensive Unit Tests: Added new unit tests covering ObjectDataType enum values, stream operators, default configurations, PutStart propagation, and uint8 roundtrip conversions to ensure correctness and robustness.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces the ObjectDataType enum to classify objects in the Mooncake Store, which is a well-designed and forward-looking change. The implementation correctly integrates the new data type into ReplicateConfig, ObjectMetadata, and the serialization/deserialization logic. The backward compatibility handling for snapshot deserialization is robust, ensuring smooth rolling upgrades. The Python bindings are appropriately updated, and the comprehensive unit tests cover enum values, stream operators, default configurations, propagation, and roundtrip serialization. This is a solid foundation for future type-aware policies.

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Mar 23, 2026

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 89.39394% with 14 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
mooncake-integration/store/store_py.cpp 0.00% 13 Missing ⚠️
mooncake-store/include/master_service.h 50.00% 1 Missing ⚠️

📢 Thoughts on this report? Let us know!

PYBIND11_MODULE(store, m) {
// Object data type classification
py::enum_<ObjectDataType>(m, "ObjectDataType")
.value("UNKNOWN", ObjectDataType::UNKNOWN)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should add a general type?

Comment on lines 3607 to 3618
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need clearer logic.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a first-class ObjectDataType classification to Mooncake Store metadata so objects can be tagged (e.g., KV cache, weights, tensors) without changing default behavior (still UNKNOWN unless explicitly set).

Changes:

  • Introduces ObjectDataType enum + stream operator, and threads data_type through ReplicateConfigMasterService::ObjectMetadata.
  • Extends snapshot msgpack serialization to include data_type while keeping new-code support for deserializing older snapshots.
  • Exposes ObjectDataType and ReplicateConfig.data_type via pybind11 and adds unit tests.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
mooncake-store/include/types.h Adds ObjectDataType enum and operator<<
mooncake-store/include/replica.h Adds ReplicateConfig::data_type and prints it in operator<<
mooncake-store/include/master_service.h Stores data_type in ObjectMetadata and updates constructors/helpers
mooncake-store/src/master_service.cpp Propagates data_type on PutStart and serializes/deserializes it in snapshots
mooncake-integration/store/store_py.cpp Exposes enum + config field to Python via pybind11
mooncake-store/tests/object_data_type_test.cpp Adds unit tests for enum/config/PutStart behavior
mooncake-store/tests/CMakeLists.txt Registers the new test target

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +146 to +162
inline std::ostream& operator<<(std::ostream& os,
const ObjectDataType& type) noexcept {
static const std::unordered_map<ObjectDataType, std::string_view>
type_strings{{ObjectDataType::UNKNOWN, "UNKNOWN"},
{ObjectDataType::KVCACHE, "KVCACHE"},
{ObjectDataType::TENSOR, "TENSOR"},
{ObjectDataType::WEIGHT, "WEIGHT"},
{ObjectDataType::SAMPLE, "SAMPLE"},
{ObjectDataType::ACTIVATION, "ACTIVATION"},
{ObjectDataType::GRADIENT, "GRADIENT"},
{ObjectDataType::OPTIMIZER_STATE, "OPTIMIZER_STATE"},
{ObjectDataType::METADATA, "METADATA"}};

auto it = type_strings.find(type);
os << (it != type_strings.end() ? it->second : "UNKNOWN");
return os;
}
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

types.h now uses std::string_view in the ObjectDataType stream operator, but the header does not include <string_view> (and also relies on indirect includes for std::ostream). This makes the header non-self-contained and can cause build breaks for translation units that include types.h directly. Add the missing standard includes in types.h.

Copilot uses AI. Check for mistakes.
Comment on lines +93 to +115
// PutStart with data_type propagates to ObjectMetadata
TEST_F(ObjectDataTypeTest, PutStartWithDataType) {
std::unique_ptr<MasterService> service(new MasterService());
Segment segment = MakeSegment();
UUID client_id = generate_uuid();
auto mount_result = service->MountSegment(segment, client_id);
ASSERT_TRUE(mount_result.has_value());

UUID put_client = generate_uuid();

// Put with WEIGHT type
ReplicateConfig config;
config.replica_num = 1;
config.data_type = ObjectDataType::WEIGHT;

auto result = service->PutStart(put_client, "key_weight", 1024, config);
ASSERT_TRUE(result.has_value());
EXPECT_FALSE(result.value().empty());

auto end_result =
service->PutEnd(put_client, "key_weight", ReplicaType::MEMORY);
EXPECT_TRUE(end_result.has_value());
}
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PutStartWithDataType is described as verifying propagation to ObjectMetadata, but it never asserts that data_type was actually stored/serialized—only that PutStart/PutEnd succeed. Either add an assertion that reads back the stored metadata (e.g., via an existing test-only hook/snapshot roundtrip) or rename/re-scope the test so it matches what it verifies.

Copilot uses AI. Check for mistakes.
Comment on lines 3502 to +3551
@@ -3543,6 +3546,9 @@ MasterService::MetadataSerializer::SerializeMetadata(
// Serialize replicas count
packer.pack(static_cast<uint32_t>(metadata.CountReplicas()));

// Serialize data_type
packer.pack(static_cast<uint8_t>(metadata.data_type));

Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Snapshot serialization now always writes the new array shape (8 + replicas_count) by inserting data_type. While the new deserializer can read both formats, this is only backward-compatible in the "new code reads old snapshots" direction. If any mixed-version deployment expects older binaries to load snapshots produced by newer binaries, they will likely fail on the extra field. Consider adding an explicit snapshot format/version marker or a compatibility mode to write the old format during rolling upgrades, and/or clarify the upgrade guarantees.

Copilot uses AI. Check for mistakes.
- Add GENERAL(9) to ObjectDataType for objects that don't fit specific
  categories but are known to be valid
- Restructure snapshot format detection with named constants
  (kOldFieldCount/kNewFieldCount) and explicit bool flags
- Add missing <string_view> include in types.h
- Update Python bindings and tests
@he-yufeng
Copy link
Copy Markdown
Contributor Author

Thanks for the review @stmatengss! Pushed a new commit addressing both points:

  1. GENERAL type: Added GENERAL = 9 for objects that are known/valid but don't fit a specific category. Updated enum, stream operator, Python bindings, and tests.

  2. Clearer deserialization logic: Refactored with named constants (kOldFieldCount/kNewFieldCount) and explicit is_old_format/is_new_format bools. The validation rejects early on mismatch, then the data_type read is in a separate block.

@stmatengss
Copy link
Copy Markdown
Collaborator

@he-yufeng Last thing, merge main and resolve conflicts. Thanks!

@stmatengss stmatengss self-assigned this Mar 29, 2026
…pe-phase1

# Conflicts:
#	mooncake-store/include/master_service.h
#	mooncake-store/src/master_service.cpp
@stmatengss
Copy link
Copy Markdown
Collaborator

@he-yufeng Thank you for your contribution. It appears the CI is failing.

Resolve conflict in master_service.cpp: use AllocateAndInsertMetadata
refactored helper from main while preserving config.data_type threading.
@he-yufeng
Copy link
Copy Markdown
Contributor Author

@stmatengss Merged main and resolved the conflict in master_service.cpp. The conflict was between the inline allocation code (from this PR) and the refactored AllocateAndInsertMetadata helper (from #1662). Resolved by keeping the helper while threading config.data_type through it.

CI should re-run now. The previous stress test failure (Data Mismatch for stress_fixed_t6_32) looks like a flaky concurrent test unrelated to this PR's changes.

@stmatengss
Copy link
Copy Markdown
Collaborator

@stmatengss Merged main and resolved the conflict in master_service.cpp. The conflict was between the inline allocation code (from this PR) and the refactored AllocateAndInsertMetadata helper (from #1662). Resolved by keeping the helper while threading config.data_type through it.

CI should re-run now. The previous stress test failure (Data Mismatch for stress_fixed_t6_32) looks like a flaky concurrent test unrelated to this PR's changes.

Thanks. CI is triggered.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants