Conversation
|
@andjsmi This should be a nice PR. But do you find someone to test it on sagemaker? We do not have the access. |
|
Hey @zhaochenyang20. Thanks! Yes, I've tested it on SageMaker myself and have included a screenshot below. The main requirements are responding empty 200 OK from the If there's a particular way you'd like me to test further, please let me know. |
| @@ -0,0 +1,78 @@ | |||
| ARG CUDA_VERSION=12.5.1 | |||
|
|
|||
| FROM nvcr.io/nvidia/tritonserver:24.04-py3-min | |||
There was a problem hiding this comment.
Can we use the lmsysorg/sglang:latest as the base image?
|
This change overall looks good, I can merge it first, minor changes can be submitted in a follow-up, thank you very much for AWS's support! |

Motivation
SageMaker Endpoints support /ping for healthchecks and /invocations for invocation payloads however sglang currently doesn't support this invocation pattern to make the package usable on SageMaker Endpoints.
Modifications
This pull request adds two endpoints for
/ping/ and/invocationsinhttp_server.py./pingprovides the same functionality as/health. At present/invocationsacts the same as/v1/chat/completionshowever it may be worth expanding this to invoke as based on the request content.I've included test cases as well and have been able to test on a SageMaker endpoint.
Checklist