[Store] Add ObjectDataType enum for type-aware metadata#1719
[Store] Add ObjectDataType enum for type-aware metadata#1719he-yufeng wants to merge 9 commits intokvcache-ai:mainfrom
Conversation
…cache-ai#1673) Introduce a data type classification system for objects stored in Mooncake Store, as agreed in the RFC discussion on issue kvcache-ai#1673. Changes: - Add ObjectDataType enum (UNKNOWN, KVCACHE, TENSOR, WEIGHT, etc.) in mooncake-store/include/types.h - Extend ReplicateConfig with a data_type field (default UNKNOWN) - Propagate data_type through PutStart into ObjectMetadata - Serialize/deserialize data_type in snapshot metadata, with backward compatibility for old snapshots (gracefully handles missing field) - Expose ObjectDataType enum and data_type field in Python bindings - Add unit tests for enum values, defaults, and PutStart propagation Existing clients that don't set data_type will continue to work unchanged (defaults to UNKNOWN everywhere).
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a foundational Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces the ObjectDataType enum to classify objects in the Mooncake Store, which is a well-designed and forward-looking change. The implementation correctly integrates the new data type into ReplicateConfig, ObjectMetadata, and the serialization/deserialization logic. The backward compatibility handling for snapshot deserialization is robust, ensuring smooth rolling upgrades. The Python bindings are appropriately updated, and the comprehensive unit tests cover enum values, stream operators, default configurations, propagation, and roundtrip serialization. This is a solid foundation for future type-aware policies.
|
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
| PYBIND11_MODULE(store, m) { | ||
| // Object data type classification | ||
| py::enum_<ObjectDataType>(m, "ObjectDataType") | ||
| .value("UNKNOWN", ObjectDataType::UNKNOWN) |
There was a problem hiding this comment.
Maybe we should add a general type?
There was a problem hiding this comment.
Pull request overview
Adds a first-class ObjectDataType classification to Mooncake Store metadata so objects can be tagged (e.g., KV cache, weights, tensors) without changing default behavior (still UNKNOWN unless explicitly set).
Changes:
- Introduces
ObjectDataTypeenum + stream operator, and threadsdata_typethroughReplicateConfig→MasterService::ObjectMetadata. - Extends snapshot msgpack serialization to include
data_typewhile keeping new-code support for deserializing older snapshots. - Exposes
ObjectDataTypeandReplicateConfig.data_typevia pybind11 and adds unit tests.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| mooncake-store/include/types.h | Adds ObjectDataType enum and operator<< |
| mooncake-store/include/replica.h | Adds ReplicateConfig::data_type and prints it in operator<< |
| mooncake-store/include/master_service.h | Stores data_type in ObjectMetadata and updates constructors/helpers |
| mooncake-store/src/master_service.cpp | Propagates data_type on PutStart and serializes/deserializes it in snapshots |
| mooncake-integration/store/store_py.cpp | Exposes enum + config field to Python via pybind11 |
| mooncake-store/tests/object_data_type_test.cpp | Adds unit tests for enum/config/PutStart behavior |
| mooncake-store/tests/CMakeLists.txt | Registers the new test target |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| inline std::ostream& operator<<(std::ostream& os, | ||
| const ObjectDataType& type) noexcept { | ||
| static const std::unordered_map<ObjectDataType, std::string_view> | ||
| type_strings{{ObjectDataType::UNKNOWN, "UNKNOWN"}, | ||
| {ObjectDataType::KVCACHE, "KVCACHE"}, | ||
| {ObjectDataType::TENSOR, "TENSOR"}, | ||
| {ObjectDataType::WEIGHT, "WEIGHT"}, | ||
| {ObjectDataType::SAMPLE, "SAMPLE"}, | ||
| {ObjectDataType::ACTIVATION, "ACTIVATION"}, | ||
| {ObjectDataType::GRADIENT, "GRADIENT"}, | ||
| {ObjectDataType::OPTIMIZER_STATE, "OPTIMIZER_STATE"}, | ||
| {ObjectDataType::METADATA, "METADATA"}}; | ||
|
|
||
| auto it = type_strings.find(type); | ||
| os << (it != type_strings.end() ? it->second : "UNKNOWN"); | ||
| return os; | ||
| } |
There was a problem hiding this comment.
types.h now uses std::string_view in the ObjectDataType stream operator, but the header does not include <string_view> (and also relies on indirect includes for std::ostream). This makes the header non-self-contained and can cause build breaks for translation units that include types.h directly. Add the missing standard includes in types.h.
| // PutStart with data_type propagates to ObjectMetadata | ||
| TEST_F(ObjectDataTypeTest, PutStartWithDataType) { | ||
| std::unique_ptr<MasterService> service(new MasterService()); | ||
| Segment segment = MakeSegment(); | ||
| UUID client_id = generate_uuid(); | ||
| auto mount_result = service->MountSegment(segment, client_id); | ||
| ASSERT_TRUE(mount_result.has_value()); | ||
|
|
||
| UUID put_client = generate_uuid(); | ||
|
|
||
| // Put with WEIGHT type | ||
| ReplicateConfig config; | ||
| config.replica_num = 1; | ||
| config.data_type = ObjectDataType::WEIGHT; | ||
|
|
||
| auto result = service->PutStart(put_client, "key_weight", 1024, config); | ||
| ASSERT_TRUE(result.has_value()); | ||
| EXPECT_FALSE(result.value().empty()); | ||
|
|
||
| auto end_result = | ||
| service->PutEnd(put_client, "key_weight", ReplicaType::MEMORY); | ||
| EXPECT_TRUE(end_result.has_value()); | ||
| } |
There was a problem hiding this comment.
PutStartWithDataType is described as verifying propagation to ObjectMetadata, but it never asserts that data_type was actually stored/serialized—only that PutStart/PutEnd succeed. Either add an assertion that reads back the stored metadata (e.g., via an existing test-only hook/snapshot roundtrip) or rename/re-scope the test so it matches what it verifies.
| @@ -3543,6 +3546,9 @@ MasterService::MetadataSerializer::SerializeMetadata( | |||
| // Serialize replicas count | |||
| packer.pack(static_cast<uint32_t>(metadata.CountReplicas())); | |||
|
|
|||
| // Serialize data_type | |||
| packer.pack(static_cast<uint8_t>(metadata.data_type)); | |||
|
|
|||
There was a problem hiding this comment.
Snapshot serialization now always writes the new array shape (8 + replicas_count) by inserting data_type. While the new deserializer can read both formats, this is only backward-compatible in the "new code reads old snapshots" direction. If any mixed-version deployment expects older binaries to load snapshots produced by newer binaries, they will likely fail on the extra field. Consider adding an explicit snapshot format/version marker or a compatibility mode to write the old format during rolling upgrades, and/or clarify the upgrade guarantees.
- Add GENERAL(9) to ObjectDataType for objects that don't fit specific categories but are known to be valid - Restructure snapshot format detection with named constants (kOldFieldCount/kNewFieldCount) and explicit bool flags - Add missing <string_view> include in types.h - Update Python bindings and tests
|
Thanks for the review @stmatengss! Pushed a new commit addressing both points:
|
|
@he-yufeng Last thing, merge main and resolve conflicts. Thanks! |
…pe-phase1 # Conflicts: # mooncake-store/include/master_service.h # mooncake-store/src/master_service.cpp
|
@he-yufeng Thank you for your contribution. It appears the CI is failing. |
Resolve conflict in master_service.cpp: use AllocateAndInsertMetadata refactored helper from main while preserving config.data_type threading.
|
@stmatengss Merged main and resolved the conflict in CI should re-run now. The previous stress test failure ( |
Thanks. CI is triggered. |
Summary
Implements Phase 1 of the
ObjectDataTypesystem discussed in #1673.This adds a data type classification for objects in Mooncake Store, so the metadata layer knows what kind of data each object holds (KV cache, weights, tensors, etc.). This lays the groundwork for type-aware eviction policies, replication strategies, and monitoring — without changing any existing behavior.
What changed
ObjectDataTypeenum intypes.h—UNKNOWN(0),KVCACHE(1),TENSOR(2),WEIGHT(3),SAMPLE(4),ACTIVATION(5),GRADIENT(6),OPTIMIZER_STATE(7),METADATA(8)ReplicateConfig.data_typefield — clients can tag objects atPutStarttime. Defaults toUNKNOWN, so existing callers are unaffected.ObjectMetadata.data_type— propagated fromPutStartconfig and stored as a const member.data_typeas the 8th field. Deserialization auto-detects old (7 + replicas) vs new (8 + replicas) format, so rolling upgrades won't break.ObjectDataTypeenum andReplicateConfig.data_typeexposed via pybind11.PutStartpropagation, uint8 roundtrip.Backward compatibility
Fully backward compatible. Clients that don't set
data_typegetUNKNOWNeverywhere. Old snapshots deserialize correctly (missing field defaults toUNKNOWN).Next steps (Phase 2+)
data_typeinto eviction scoringCloses phase 1 of #1673.