Respond with JSON if the request is non-stream by aponcedeleonch · Pull Request #149 · stacklok/codegate

aponcedeleonch · 2024-12-02T11:00:39Z

We are currently not handling non-streaming requests, e.g.

% curl -SsX POST "http://localhost:8989/vllm/chat/completions" \
     -H "Content-Type: application/json" \
     -H "Authorization: Bearer $token" \
     -d '{
           "model": "Qwen/Qwen2.5-Coder-14B-Instruct",
           "messages": [{"role": "user", "content": "hello."}],
           "stream": false
        }'

This PR enables to respons with the entire JSON if the request was non-streaming.

Before PR response:

data: 'async for' requires an object with __aiter__ method, got ModelResponse

data: [DONE]

After this PR:

{
  "id": "chatcmpl-AZyYAAOFMsHdx2rabESYcHjsObIB2",
  "created": 1733137598,
  "model": "gpt-4o-mini-2024-07-18",
  "object": "chat.completion",
  "system_fingerprint": "fp_0705bf87c0",
  "choices": [
    {
      "finish_reason": "stop",
      "index": 0,
      "message": {
        "content": "Hello! How can I assist you today?",
        "role": "assistant",
        "tool_calls": null,
        "function_call": null
      }
    }
  ],
  "usage": {
    "completion_tokens": 9,
    "prompt_tokens": 9,
    "total_tokens": 18,
    "completion_tokens_details": {
      "accepted_prediction_tokens": 0,
      "audio_tokens": 0,
      "reasoning_tokens": 0,
      "rejected_prediction_tokens": 0
    },
    "prompt_tokens_details": {
      "audio_tokens": 0,
      "cached_tokens": 0
    }
  },
  "service_tier": null
}

We are currently not handling non-streaming requests, e.g. ```sh % curl -SsX POST "http://localhost:8989/vllm/chat/completions" \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $token" \ -d '{ "model": "Qwen/Qwen2.5-Coder-14B-Instruct", "messages": [{"role": "user", "content": "hello."}], "stream": false }' ``` This PR enables to respons with the entire JSON if the request was non-streaming

aponcedeleonch requested a review from jhrozek December 2, 2024 11:00

aponcedeleonch changed the title ~~Respond with JSON if the request is non-async~~ Respond with JSON if the request is non-stream Dec 2, 2024

jhrozek approved these changes Dec 2, 2024

View reviewed changes

aponcedeleonch merged commit d4f1ab8 into main Dec 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Respond with JSON if the request is non-stream#149

Respond with JSON if the request is non-stream#149
aponcedeleonch merged 1 commit intomainfrom
normalize-vllm-output

aponcedeleonch commented Dec 2, 2024 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

aponcedeleonch commented Dec 2, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

aponcedeleonch commented Dec 2, 2024 •

edited

Loading