-
Notifications
You must be signed in to change notification settings - Fork 1k
Closed
Description
Chunking quantized model leads to unequal Chunks, say we have a ~153 MB model, it's gettting chunked to 153 and 2 kb ,
prog = _load_prog_from_mlmodel(model)
# Compute the incision point by bisecting the program based on weights size
op_idx, first_chunk_weights_size, total_weights_size = _get_op_idx_split_location(
prog)
print(f"First chunk size = {first_chunk_weights_size:.2f} MB") # 152.67 MB
print(f"Second chunk size = {total_weights_size - first_chunk_weights_size:.2f} MB") #0.42 MB
print(index=587/2720)
prog_chunk1 = _make_first_chunk_prog(f"index={op_idx}/{len(main_block.operations)") # 587/3000
prog_chunk2 = _make_second_chunk_prog(_load_prog_from_mlmodel(model), op_idx)
how can i chunk model with constant nodes (like in quantization). (might have trouble processing quantized consts)
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels