Skip to content

feat: add MiniMax as LLM provider for text generation#90

Open
octo-patch wants to merge 1 commit intorapidaai:mainfrom
octo-patch:feature/add-minimax-llm-provider
Open

feat: add MiniMax as LLM provider for text generation#90
octo-patch wants to merge 1 commit intorapidaai:mainfrom
octo-patch:feature/add-minimax-llm-provider

Conversation

@octo-patch
Copy link
Copy Markdown

Summary

Add MiniMax as a first-class LLM provider in the integration-api service, extending the existing TTS-only integration to include chat completion with both streaming (SSE) and non-streaming modes.

MiniMax Models

  • MiniMax-M2.7 — Latest flagship model with 204K context window
  • MiniMax-M2.7-highspeed — Optimized for low-latency voice AI workloads

Changes

Backend (api/integration-api/internal/caller/minimax/):

  • minimax.go — Base HTTP client with credential resolver and usage metrics
  • llm.goLargeLanguageCaller implementation with:
    • Non-streaming GetChatCompletion via OpenAI-compatible /v1/chat/completions
    • Streaming StreamChatCompletion with SSE parsing and first-token-time tracking
    • Temperature clamping to MiniMax's required (0, 1] range
    • <think> tag stripping for reasoning model output
    • Full tool/function calling support
  • verify-credential.go — Credential verification via minimal chat completion
  • Registered MINIMAX constant and factory cases in caller.go

Frontend (ui/src/providers/minimax/):

  • text-models.json — Model catalog with per-model parameter configs (temperature, top_p, max_tokens, tool_choice, stop sequences)
  • Added "text" to featureList in both dev and prod provider registries

Tests:

  • 10 unit tests covering message building, think-tag stripping, temperature clamping, usage metrics, endpoint resolution, and error handling
  • 3 integration tests (chat completion, streaming, credential verification) with //go:build integration tag

Docs:

  • Updated integration test config example with MiniMax sections
  • Added MiniMax to README provider list

Test plan

  • Unit tests pass: go test ./api/integration-api/internal/caller/minimax/...
  • Integration tests pass with MINIMAX_API_KEY configured in integration_config.yaml
  • MiniMax appears in the text provider dropdown in the UI
  • Non-streaming chat completion returns valid responses
  • Streaming chat completion streams tokens and reports metrics
  • Credential verification succeeds with valid API key
  • Temperature values at boundaries (0, 1) are correctly clamped
  • Existing providers unaffected (no regressions)

Add MiniMax (MiniMax-M2.7, MiniMax-M2.7-highspeed) as a first-class LLM
provider in the integration-api, extending the existing TTS-only support
to include chat completion with both streaming and non-streaming modes.

Backend changes:
- New minimax caller package with LargeLanguageCaller and Verifier
- OpenAI-compatible HTTP API with SSE streaming support
- Temperature clamping to MiniMax's (0, 1] range
- Think-tag stripping for reasoning model output
- Tool calling / function calling support

Frontend changes:
- text-models.json with M2.7 and M2.7-highspeed model configs
- Added "text" to featureList in provider registry (dev + prod)

Tests:
- 10 unit tests (message building, think-tag stripping, temp clamping,
  usage metrics, endpoint resolution, error handling)
- 3 integration tests (chat completion, streaming, credential verify)
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds MiniMax as a first-class text LLM provider in the integration-api service (chat completions + SSE streaming), and wires it into the UI provider catalogs/registries and docs so it can be selected/configured like other providers.

Changes:

  • Implement MiniMax caller support in integration-api (non-streaming + streaming chat completion, usage metrics, credential verification) and register it in the provider factory.
  • Add MiniMax text model catalog + enable "text" in provider registries (dev/prod).
  • Update docs/test config examples and add unit + integration tests for the new provider.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
ui/src/providers/provider.production.json Enables MiniMax "text" feature in production provider registry and updates description.
ui/src/providers/provider.development.json Enables MiniMax "text" feature in development provider registry and updates description.
ui/src/providers/minimax/text-models.json Adds MiniMax text model catalog + parameter schema for UI configuration.
README.md Mentions MiniMax in LLM-agnostic/provider lists.
api/integration-api/internal/caller/testdata/integration_config.yaml.example Adds MiniMax sections for chat + credential verification integration test config.
api/integration-api/internal/caller/minimax/verify-credential.go Implements MiniMax credential verification via a minimal chat completion request.
api/integration-api/internal/caller/minimax/minimax.go Adds MiniMax HTTP client wrapper, endpoint resolution, error/usage structures.
api/integration-api/internal/caller/minimax/llm.go Implements MiniMax LargeLanguageCaller for chat completion + SSE streaming (tools supported).
api/integration-api/internal/caller/minimax/llm_test.go Unit tests for request building, think-tag stripping, temperature clamping, metrics, etc.
api/integration-api/internal/caller/minimax/integration_test.go Integration tests for chat completion, streaming, and credential verification.
api/integration-api/internal/caller/caller.go Registers MINIMAX provider in the caller + verifier factories.

"github.com/rapidaai/pkg/commons"
type_enums "github.com/rapidaai/pkg/types/enums"
"github.com/rapidaai/protos"
integration_api "github.com/rapidaai/protos"
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minimax.go imports github.com/rapidaai/protos twice (once as protos and once as integration_api). Go disallows duplicate imports of the same path, so this file will not compile. Consolidate to a single import and use one package name consistently (e.g., use *protos.Credential everywhere).

Suggested change
integration_api "github.com/rapidaai/protos"

Copilot uses AI. Check for mistakes.
client := &http.Client{
Timeout: TIMEOUT,
}
mm.logger.Debugf("making request to minimax with %+v", req)
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do() logs the full http.Request with %+v, which includes headers. Since Call() sets the Authorization: Bearer ... header, this can leak MiniMax API keys into logs. Log only sanitized request metadata (method/URL) and avoid emitting auth headers.

Suggested change
mm.logger.Debugf("making request to minimax with %+v", req)
mm.logger.Debugf("making request to minimax %s %s", req.Method, req.URL.String())

Copilot uses AI. Check for mistakes.
Comment on lines +175 to +183

bodyBytes, _ := io.ReadAll(resp.Body)
var apiErr MiniMaxError
if jsonErr := json.Unmarshal(bodyBytes, &apiErr); jsonErr == nil && apiErr.Error != nil {
apiErr.StatusCode = resp.StatusCode
return nil, fmt.Errorf("%s", apiErr.ErrorString())
}
return nil, fmt.Errorf("minimax api error: status=%d body=%s", resp.StatusCode, string(bodyBytes))
}
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In CallJSON(), the non-2xx path ignores the io.ReadAll error (bodyBytes, _ := ...). If the body read fails, the error information is lost and the returned message may be misleading. Handle the read error and return it (or include it) instead of discarding it.

Copilot uses AI. Check for mistakes.
Comment on lines +168 to +187
for _, choice := range resp.Choices {
switch choice.FinishReason {
case "stop":
content := stripThinkingTags(choice.Message.Content)
assistantMsg.Contents = append(assistantMsg.Contents, content)
case "tool_calls":
for _, tc := range choice.Message.ToolCalls {
if tc.Type == "function" {
assistantMsg.ToolCalls = append(assistantMsg.ToolCalls, &protos.ToolCall{
Id: tc.ID,
Type: tc.Type,
Function: &protos.FunctionCall{
Name: tc.Function.Name,
Arguments: tc.Function.Arguments,
},
})
}
}
}
}
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GetChatCompletion() only appends assistant content when finish_reason == "stop". If MiniMax returns other valid finish reasons like "length" (OpenAI-compatible), the response content would be silently dropped and the caller may get an empty assistant message. Treat non-tool-call finish reasons as content-bearing when choice.Message.Content is present (or handle length explicitly).

Copilot uses AI. Check for mistakes.
Comment on lines +248 to +252
scanner := bufio.NewScanner(resp.Body)
for scanner.Scan() {
line := scanner.Text()
if !strings.HasPrefix(line, "data: ") {
continue
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Streaming SSE is parsed with bufio.Scanner using the default token size limit (~64K). Tool-call arguments or long chunks can exceed this and cause scanner.Err() == bufio.ErrTooLong, breaking streaming. Set a larger buffer via scanner.Buffer(...) (and/or switch to bufio.Reader with ReadString('\n')).

Copilot uses AI. Check for mistakes.
Comment on lines +245 to +343
// Track tool call accumulation by index
toolCallMap := map[int]*protos.ToolCall{}

scanner := bufio.NewScanner(resp.Body)
for scanner.Scan() {
line := scanner.Text()
if !strings.HasPrefix(line, "data: ") {
continue
}
data := strings.TrimPrefix(line, "data: ")
if data == "[DONE]" {
break
}

var chunk MiniMaxStreamChunk
if err := json.Unmarshal([]byte(data), &chunk); err != nil {
llc.logger.Warnf("failed to parse minimax stream chunk: %v", err)
continue
}

// Capture final usage from the last chunk
if chunk.Usage != nil {
metrics.OnAddMetrics(llc.UsageMetrics(chunk.Usage)...)
}

for i, choice := range chunk.Choices {
// Accumulate tool calls
for _, tc := range choice.Delta.ToolCalls {
hasToolCalls = true
existing, ok := toolCallMap[tc.Index]
if !ok {
existing = &protos.ToolCall{
Id: tc.ID,
Type: tc.Type,
Function: &protos.FunctionCall{
Name: tc.Function.Name,
Arguments: tc.Function.Arguments,
},
}
toolCallMap[tc.Index] = existing
} else {
if tc.ID != "" {
existing.Id = tc.ID
}
if tc.Function.Name != "" {
existing.Function.Name += tc.Function.Name
}
existing.Function.Arguments += tc.Function.Arguments
}
}

content := choice.Delta.Content
if content != "" {
if len(contentBuffer) <= i {
contentBuffer = append(contentBuffer, content)
} else {
contentBuffer[i] += content
}

if !hasToolCalls {
if firstTokenTime == nil {
now := time.Now()
firstTokenTime = &now
}
tokenMsg := &protos.Message{
Role: "assistant",
Message: &protos.Message_Assistant{
Assistant: &protos.AssistantMessage{
Contents: []string{content},
},
},
}
if err := onStream(options.Request.GetRequestId(), tokenMsg); err != nil {
llc.logger.Warnf("error streaming token: %v", err)
}
}
}
}
}

if err := scanner.Err(); err != nil {
llc.logger.Errorf("error reading minimax stream: %v", err)
onError(options.Request.GetRequestId(), err)
options.PostHook(map[string]interface{}{
"error": err,
}, metrics.OnFailure().Build())
return err
}

// Strip thinking tags from accumulated content
for i, c := range contentBuffer {
contentBuffer[i] = stripThinkingTags(c)
}
assistantMsg.Contents = contentBuffer

// Collect accumulated tool calls
for _, tc := range toolCallMap {
assistantMsg.ToolCalls = append(assistantMsg.ToolCalls, tc)
}
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Accumulated tool calls are stored in a map[int]*protos.ToolCall and then appended by ranging over the map. Map iteration order is non-deterministic, so tool call ordering can change between runs even when the provider sends stable indices. Collect indices, sort them, and append tool calls in index order.

Copilot uses AI. Check for mistakes.
Comment on lines +39 to +49
_, err := vc.CallJSON(ctx, "chat/completions", "POST", map[string]string{}, payload)
if err != nil {
vc.logger.Debugf("minimax credential verification with error %v", err)
// Check if the error indicates auth failure specifically
if resp, callErr := vc.Call(ctx, "chat/completions", "POST", map[string]string{}, payload); callErr == nil {
defer resp.Body.Close()
if resp.StatusCode != http.StatusUnauthorized && resp.StatusCode != http.StatusForbidden {
return utils.Ptr("valid"), nil
}
}
return nil, err
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CredentialVerifier() makes a second HTTP request when the first request fails (CallJSON then Call). This doubles latency and load on MiniMax in error cases. Consider using a single request that returns both status code and body (e.g., call Call(...) once, inspect StatusCode, and optionally read/parse the body) to decide whether the credential is valid.

Copilot uses AI. Check for mistakes.
@iamprashant iamprashant added the enhancement New feature or request label Mar 31, 2026
@iamprashant iamprashant assigned octo-patch and unassigned Copilot Apr 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants