-
Notifications
You must be signed in to change notification settings - Fork 209
Open
Description
Use case: Hybrid FHE LLM inference. Output ciphertexts from Server.run() dominate communication cost (~1.6 GB per inference for a 2-layer TinyLlama setup). Input ciphertext compression via compress_input_ciphertexts=True works great, but there is no equivalent for outputs.
Request: Add compress_output_ciphertexts: bool = False to concrete.fhe.Configuration, using the GLWE packing / CompressedCiphertextList mechanism already available in the TFHE-rs backend.
Context: TFHE-rs already supports post-computation compression via CompressedCiphertextList / CompressedCiphertextListBuilder. Exposing this through Concrete's Python API would dramatically reduce server→client bandwidth for FHE-as-a-service workloads.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels