feat: add MiniMax as LLM provider for text generation#90
feat: add MiniMax as LLM provider for text generation#90octo-patch wants to merge 1 commit intorapidaai:mainfrom
Conversation
Add MiniMax (MiniMax-M2.7, MiniMax-M2.7-highspeed) as a first-class LLM provider in the integration-api, extending the existing TTS-only support to include chat completion with both streaming and non-streaming modes. Backend changes: - New minimax caller package with LargeLanguageCaller and Verifier - OpenAI-compatible HTTP API with SSE streaming support - Temperature clamping to MiniMax's (0, 1] range - Think-tag stripping for reasoning model output - Tool calling / function calling support Frontend changes: - text-models.json with M2.7 and M2.7-highspeed model configs - Added "text" to featureList in provider registry (dev + prod) Tests: - 10 unit tests (message building, think-tag stripping, temp clamping, usage metrics, endpoint resolution, error handling) - 3 integration tests (chat completion, streaming, credential verify)
There was a problem hiding this comment.
Pull request overview
Adds MiniMax as a first-class text LLM provider in the integration-api service (chat completions + SSE streaming), and wires it into the UI provider catalogs/registries and docs so it can be selected/configured like other providers.
Changes:
- Implement MiniMax caller support in
integration-api(non-streaming + streaming chat completion, usage metrics, credential verification) and register it in the provider factory. - Add MiniMax text model catalog + enable
"text"in provider registries (dev/prod). - Update docs/test config examples and add unit + integration tests for the new provider.
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| ui/src/providers/provider.production.json | Enables MiniMax "text" feature in production provider registry and updates description. |
| ui/src/providers/provider.development.json | Enables MiniMax "text" feature in development provider registry and updates description. |
| ui/src/providers/minimax/text-models.json | Adds MiniMax text model catalog + parameter schema for UI configuration. |
| README.md | Mentions MiniMax in LLM-agnostic/provider lists. |
| api/integration-api/internal/caller/testdata/integration_config.yaml.example | Adds MiniMax sections for chat + credential verification integration test config. |
| api/integration-api/internal/caller/minimax/verify-credential.go | Implements MiniMax credential verification via a minimal chat completion request. |
| api/integration-api/internal/caller/minimax/minimax.go | Adds MiniMax HTTP client wrapper, endpoint resolution, error/usage structures. |
| api/integration-api/internal/caller/minimax/llm.go | Implements MiniMax LargeLanguageCaller for chat completion + SSE streaming (tools supported). |
| api/integration-api/internal/caller/minimax/llm_test.go | Unit tests for request building, think-tag stripping, temperature clamping, metrics, etc. |
| api/integration-api/internal/caller/minimax/integration_test.go | Integration tests for chat completion, streaming, and credential verification. |
| api/integration-api/internal/caller/caller.go | Registers MINIMAX provider in the caller + verifier factories. |
| "github.com/rapidaai/pkg/commons" | ||
| type_enums "github.com/rapidaai/pkg/types/enums" | ||
| "github.com/rapidaai/protos" | ||
| integration_api "github.com/rapidaai/protos" |
There was a problem hiding this comment.
minimax.go imports github.com/rapidaai/protos twice (once as protos and once as integration_api). Go disallows duplicate imports of the same path, so this file will not compile. Consolidate to a single import and use one package name consistently (e.g., use *protos.Credential everywhere).
| integration_api "github.com/rapidaai/protos" |
| client := &http.Client{ | ||
| Timeout: TIMEOUT, | ||
| } | ||
| mm.logger.Debugf("making request to minimax with %+v", req) |
There was a problem hiding this comment.
Do() logs the full http.Request with %+v, which includes headers. Since Call() sets the Authorization: Bearer ... header, this can leak MiniMax API keys into logs. Log only sanitized request metadata (method/URL) and avoid emitting auth headers.
| mm.logger.Debugf("making request to minimax with %+v", req) | |
| mm.logger.Debugf("making request to minimax %s %s", req.Method, req.URL.String()) |
|
|
||
| bodyBytes, _ := io.ReadAll(resp.Body) | ||
| var apiErr MiniMaxError | ||
| if jsonErr := json.Unmarshal(bodyBytes, &apiErr); jsonErr == nil && apiErr.Error != nil { | ||
| apiErr.StatusCode = resp.StatusCode | ||
| return nil, fmt.Errorf("%s", apiErr.ErrorString()) | ||
| } | ||
| return nil, fmt.Errorf("minimax api error: status=%d body=%s", resp.StatusCode, string(bodyBytes)) | ||
| } |
There was a problem hiding this comment.
In CallJSON(), the non-2xx path ignores the io.ReadAll error (bodyBytes, _ := ...). If the body read fails, the error information is lost and the returned message may be misleading. Handle the read error and return it (or include it) instead of discarding it.
| for _, choice := range resp.Choices { | ||
| switch choice.FinishReason { | ||
| case "stop": | ||
| content := stripThinkingTags(choice.Message.Content) | ||
| assistantMsg.Contents = append(assistantMsg.Contents, content) | ||
| case "tool_calls": | ||
| for _, tc := range choice.Message.ToolCalls { | ||
| if tc.Type == "function" { | ||
| assistantMsg.ToolCalls = append(assistantMsg.ToolCalls, &protos.ToolCall{ | ||
| Id: tc.ID, | ||
| Type: tc.Type, | ||
| Function: &protos.FunctionCall{ | ||
| Name: tc.Function.Name, | ||
| Arguments: tc.Function.Arguments, | ||
| }, | ||
| }) | ||
| } | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
GetChatCompletion() only appends assistant content when finish_reason == "stop". If MiniMax returns other valid finish reasons like "length" (OpenAI-compatible), the response content would be silently dropped and the caller may get an empty assistant message. Treat non-tool-call finish reasons as content-bearing when choice.Message.Content is present (or handle length explicitly).
| scanner := bufio.NewScanner(resp.Body) | ||
| for scanner.Scan() { | ||
| line := scanner.Text() | ||
| if !strings.HasPrefix(line, "data: ") { | ||
| continue |
There was a problem hiding this comment.
Streaming SSE is parsed with bufio.Scanner using the default token size limit (~64K). Tool-call arguments or long chunks can exceed this and cause scanner.Err() == bufio.ErrTooLong, breaking streaming. Set a larger buffer via scanner.Buffer(...) (and/or switch to bufio.Reader with ReadString('\n')).
| // Track tool call accumulation by index | ||
| toolCallMap := map[int]*protos.ToolCall{} | ||
|
|
||
| scanner := bufio.NewScanner(resp.Body) | ||
| for scanner.Scan() { | ||
| line := scanner.Text() | ||
| if !strings.HasPrefix(line, "data: ") { | ||
| continue | ||
| } | ||
| data := strings.TrimPrefix(line, "data: ") | ||
| if data == "[DONE]" { | ||
| break | ||
| } | ||
|
|
||
| var chunk MiniMaxStreamChunk | ||
| if err := json.Unmarshal([]byte(data), &chunk); err != nil { | ||
| llc.logger.Warnf("failed to parse minimax stream chunk: %v", err) | ||
| continue | ||
| } | ||
|
|
||
| // Capture final usage from the last chunk | ||
| if chunk.Usage != nil { | ||
| metrics.OnAddMetrics(llc.UsageMetrics(chunk.Usage)...) | ||
| } | ||
|
|
||
| for i, choice := range chunk.Choices { | ||
| // Accumulate tool calls | ||
| for _, tc := range choice.Delta.ToolCalls { | ||
| hasToolCalls = true | ||
| existing, ok := toolCallMap[tc.Index] | ||
| if !ok { | ||
| existing = &protos.ToolCall{ | ||
| Id: tc.ID, | ||
| Type: tc.Type, | ||
| Function: &protos.FunctionCall{ | ||
| Name: tc.Function.Name, | ||
| Arguments: tc.Function.Arguments, | ||
| }, | ||
| } | ||
| toolCallMap[tc.Index] = existing | ||
| } else { | ||
| if tc.ID != "" { | ||
| existing.Id = tc.ID | ||
| } | ||
| if tc.Function.Name != "" { | ||
| existing.Function.Name += tc.Function.Name | ||
| } | ||
| existing.Function.Arguments += tc.Function.Arguments | ||
| } | ||
| } | ||
|
|
||
| content := choice.Delta.Content | ||
| if content != "" { | ||
| if len(contentBuffer) <= i { | ||
| contentBuffer = append(contentBuffer, content) | ||
| } else { | ||
| contentBuffer[i] += content | ||
| } | ||
|
|
||
| if !hasToolCalls { | ||
| if firstTokenTime == nil { | ||
| now := time.Now() | ||
| firstTokenTime = &now | ||
| } | ||
| tokenMsg := &protos.Message{ | ||
| Role: "assistant", | ||
| Message: &protos.Message_Assistant{ | ||
| Assistant: &protos.AssistantMessage{ | ||
| Contents: []string{content}, | ||
| }, | ||
| }, | ||
| } | ||
| if err := onStream(options.Request.GetRequestId(), tokenMsg); err != nil { | ||
| llc.logger.Warnf("error streaming token: %v", err) | ||
| } | ||
| } | ||
| } | ||
| } | ||
| } | ||
|
|
||
| if err := scanner.Err(); err != nil { | ||
| llc.logger.Errorf("error reading minimax stream: %v", err) | ||
| onError(options.Request.GetRequestId(), err) | ||
| options.PostHook(map[string]interface{}{ | ||
| "error": err, | ||
| }, metrics.OnFailure().Build()) | ||
| return err | ||
| } | ||
|
|
||
| // Strip thinking tags from accumulated content | ||
| for i, c := range contentBuffer { | ||
| contentBuffer[i] = stripThinkingTags(c) | ||
| } | ||
| assistantMsg.Contents = contentBuffer | ||
|
|
||
| // Collect accumulated tool calls | ||
| for _, tc := range toolCallMap { | ||
| assistantMsg.ToolCalls = append(assistantMsg.ToolCalls, tc) | ||
| } |
There was a problem hiding this comment.
Accumulated tool calls are stored in a map[int]*protos.ToolCall and then appended by ranging over the map. Map iteration order is non-deterministic, so tool call ordering can change between runs even when the provider sends stable indices. Collect indices, sort them, and append tool calls in index order.
| _, err := vc.CallJSON(ctx, "chat/completions", "POST", map[string]string{}, payload) | ||
| if err != nil { | ||
| vc.logger.Debugf("minimax credential verification with error %v", err) | ||
| // Check if the error indicates auth failure specifically | ||
| if resp, callErr := vc.Call(ctx, "chat/completions", "POST", map[string]string{}, payload); callErr == nil { | ||
| defer resp.Body.Close() | ||
| if resp.StatusCode != http.StatusUnauthorized && resp.StatusCode != http.StatusForbidden { | ||
| return utils.Ptr("valid"), nil | ||
| } | ||
| } | ||
| return nil, err |
There was a problem hiding this comment.
CredentialVerifier() makes a second HTTP request when the first request fails (CallJSON then Call). This doubles latency and load on MiniMax in error cases. Consider using a single request that returns both status code and body (e.g., call Call(...) once, inspect StatusCode, and optionally read/parse the body) to decide whether the credential is valid.
Summary
Add MiniMax as a first-class LLM provider in the integration-api service, extending the existing TTS-only integration to include chat completion with both streaming (SSE) and non-streaming modes.
MiniMax Models
Changes
Backend (
api/integration-api/internal/caller/minimax/):minimax.go— Base HTTP client with credential resolver and usage metricsllm.go—LargeLanguageCallerimplementation with:GetChatCompletionvia OpenAI-compatible/v1/chat/completionsStreamChatCompletionwith SSE parsing and first-token-time tracking<think>tag stripping for reasoning model outputverify-credential.go— Credential verification via minimal chat completionMINIMAXconstant and factory cases incaller.goFrontend (
ui/src/providers/minimax/):text-models.json— Model catalog with per-model parameter configs (temperature, top_p, max_tokens, tool_choice, stop sequences)"text"tofeatureListin both dev and prod provider registriesTests:
//go:build integrationtagDocs:
Test plan
go test ./api/integration-api/internal/caller/minimax/...MINIMAX_API_KEYconfigured inintegration_config.yaml