Hi, I have successfully continued the pre training and then finetuned the model on high quality voices.
the next major challenge, using the model for production and serve the multiple concurrent users. my use case is to serve it for telephony customer support agent, I tried various method could not set up a inference pipeline which serves the purpose.
I tried vllm, tensorrt, tried to quantise but everything seems to fail
Hi, I have successfully continued the pre training and then finetuned the model on high quality voices.
the next major challenge, using the model for production and serve the multiple concurrent users. my use case is to serve it for telephony customer support agent, I tried various method could not set up a inference pipeline which serves the purpose.
I tried vllm, tensorrt, tried to quantise but everything seems to fail