feat: add MiniMax as LLM provider for text generation by octo-patch · Pull Request #90 · rapidaai/voice-ai

octo-patch · 2026-03-28T15:47:44Z

Summary

Add MiniMax as a first-class LLM provider in the integration-api service, extending the existing TTS-only integration to include chat completion with both streaming (SSE) and non-streaming modes.

MiniMax Models

MiniMax-M2.7 — Latest flagship model with 204K context window
MiniMax-M2.7-highspeed — Optimized for low-latency voice AI workloads

Changes

Backend (api/integration-api/internal/caller/minimax/):

minimax.go — Base HTTP client with credential resolver and usage metrics
llm.go — LargeLanguageCaller implementation with:
- Non-streaming GetChatCompletion via OpenAI-compatible /v1/chat/completions
- Streaming StreamChatCompletion with SSE parsing and first-token-time tracking
- Temperature clamping to MiniMax's required (0, 1] range
- <think> tag stripping for reasoning model output
- Full tool/function calling support
verify-credential.go — Credential verification via minimal chat completion
Registered MINIMAX constant and factory cases in caller.go

Frontend (ui/src/providers/minimax/):

text-models.json — Model catalog with per-model parameter configs (temperature, top_p, max_tokens, tool_choice, stop sequences)
Added "text" to featureList in both dev and prod provider registries

Tests:

10 unit tests covering message building, think-tag stripping, temperature clamping, usage metrics, endpoint resolution, and error handling
3 integration tests (chat completion, streaming, credential verification) with //go:build integration tag

Docs:

Updated integration test config example with MiniMax sections
Added MiniMax to README provider list

Test plan

Unit tests pass: go test ./api/integration-api/internal/caller/minimax/...
Integration tests pass with MINIMAX_API_KEY configured in integration_config.yaml
MiniMax appears in the text provider dropdown in the UI
Non-streaming chat completion returns valid responses
Streaming chat completion streams tokens and reports metrics
Credential verification succeeds with valid API key
Temperature values at boundaries (0, 1) are correctly clamped
Existing providers unaffected (no regressions)

Add MiniMax (MiniMax-M2.7, MiniMax-M2.7-highspeed) as a first-class LLM provider in the integration-api, extending the existing TTS-only support to include chat completion with both streaming and non-streaming modes. Backend changes: - New minimax caller package with LargeLanguageCaller and Verifier - OpenAI-compatible HTTP API with SSE streaming support - Temperature clamping to MiniMax's (0, 1] range - Think-tag stripping for reasoning model output - Tool calling / function calling support Frontend changes: - text-models.json with M2.7 and M2.7-highspeed model configs - Added "text" to featureList in provider registry (dev + prod) Tests: - 10 unit tests (message building, think-tag stripping, temp clamping, usage metrics, endpoint resolution, error handling) - 3 integration tests (chat completion, streaming, credential verify)

Copilot

Pull request overview

Adds MiniMax as a first-class text LLM provider in the integration-api service (chat completions + SSE streaming), and wires it into the UI provider catalogs/registries and docs so it can be selected/configured like other providers.

Changes:

Implement MiniMax caller support in integration-api (non-streaming + streaming chat completion, usage metrics, credential verification) and register it in the provider factory.
Add MiniMax text model catalog + enable "text" in provider registries (dev/prod).
Update docs/test config examples and add unit + integration tests for the new provider.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 7 comments.

Show a summary per file

File	Description
ui/src/providers/provider.production.json	Enables MiniMax `"text"` feature in production provider registry and updates description.
ui/src/providers/provider.development.json	Enables MiniMax `"text"` feature in development provider registry and updates description.
ui/src/providers/minimax/text-models.json	Adds MiniMax text model catalog + parameter schema for UI configuration.
README.md	Mentions MiniMax in LLM-agnostic/provider lists.
api/integration-api/internal/caller/testdata/integration_config.yaml.example	Adds MiniMax sections for chat + credential verification integration test config.
api/integration-api/internal/caller/minimax/verify-credential.go	Implements MiniMax credential verification via a minimal chat completion request.
api/integration-api/internal/caller/minimax/minimax.go	Adds MiniMax HTTP client wrapper, endpoint resolution, error/usage structures.
api/integration-api/internal/caller/minimax/llm.go	Implements MiniMax `LargeLanguageCaller` for chat completion + SSE streaming (tools supported).
api/integration-api/internal/caller/minimax/llm_test.go	Unit tests for request building, think-tag stripping, temperature clamping, metrics, etc.
api/integration-api/internal/caller/minimax/integration_test.go	Integration tests for chat completion, streaming, and credential verification.
api/integration-api/internal/caller/caller.go	Registers `MINIMAX` provider in the caller + verifier factories.

Copilot · 2026-03-31T07:15:43Z

+	"github.com/rapidaai/pkg/commons"
+	type_enums "github.com/rapidaai/pkg/types/enums"
+	"github.com/rapidaai/protos"
+	integration_api "github.com/rapidaai/protos"


minimax.go imports github.com/rapidaai/protos twice (once as protos and once as integration_api). Go disallows duplicate imports of the same path, so this file will not compile. Consolidate to a single import and use one package name consistently (e.g., use *protos.Credential everywhere).

Suggested change

integration_api "github.com/rapidaai/protos"

Copilot · 2026-03-31T07:15:43Z

+	client := &http.Client{
+		Timeout: TIMEOUT,
+	}
+	mm.logger.Debugf("making request to minimax with %+v", req)


Do() logs the full http.Request with %+v, which includes headers. Since Call() sets the Authorization: Bearer ... header, this can leak MiniMax API keys into logs. Log only sanitized request metadata (method/URL) and avoid emitting auth headers.

Suggested change

mm.logger.Debugf("making request to minimax with %+v", req)

mm.logger.Debugf("making request to minimax %s %s", req.Method, req.URL.String())

Copilot · 2026-03-31T07:15:43Z

+
+	bodyBytes, _ := io.ReadAll(resp.Body)
+	var apiErr MiniMaxError
+	if jsonErr := json.Unmarshal(bodyBytes, &apiErr); jsonErr == nil && apiErr.Error != nil {
+		apiErr.StatusCode = resp.StatusCode
+		return nil, fmt.Errorf("%s", apiErr.ErrorString())
+	}
+	return nil, fmt.Errorf("minimax api error: status=%d body=%s", resp.StatusCode, string(bodyBytes))
+}


In CallJSON(), the non-2xx path ignores the io.ReadAll error (bodyBytes, _ := ...). If the body read fails, the error information is lost and the returned message may be misleading. Handle the read error and return it (or include it) instead of discarding it.

Copilot · 2026-03-31T07:15:44Z

+	for _, choice := range resp.Choices {
+		switch choice.FinishReason {
+		case "stop":
+			content := stripThinkingTags(choice.Message.Content)
+			assistantMsg.Contents = append(assistantMsg.Contents, content)
+		case "tool_calls":
+			for _, tc := range choice.Message.ToolCalls {
+				if tc.Type == "function" {
+					assistantMsg.ToolCalls = append(assistantMsg.ToolCalls, &protos.ToolCall{
+						Id:   tc.ID,
+						Type: tc.Type,
+						Function: &protos.FunctionCall{
+							Name:      tc.Function.Name,
+							Arguments: tc.Function.Arguments,
+						},
+					})
+				}
+			}
+		}
+	}


GetChatCompletion() only appends assistant content when finish_reason == "stop". If MiniMax returns other valid finish reasons like "length" (OpenAI-compatible), the response content would be silently dropped and the caller may get an empty assistant message. Treat non-tool-call finish reasons as content-bearing when choice.Message.Content is present (or handle length explicitly).

Copilot · 2026-03-31T07:15:44Z

+	scanner := bufio.NewScanner(resp.Body)
+	for scanner.Scan() {
+		line := scanner.Text()
+		if !strings.HasPrefix(line, "data: ") {
+			continue


Streaming SSE is parsed with bufio.Scanner using the default token size limit (~64K). Tool-call arguments or long chunks can exceed this and cause scanner.Err() == bufio.ErrTooLong, breaking streaming. Set a larger buffer via scanner.Buffer(...) (and/or switch to bufio.Reader with ReadString('\n')).

Copilot · 2026-03-31T07:15:44Z

+	// Track tool call accumulation by index
+	toolCallMap := map[int]*protos.ToolCall{}
+
+	scanner := bufio.NewScanner(resp.Body)
+	for scanner.Scan() {
+		line := scanner.Text()
+		if !strings.HasPrefix(line, "data: ") {
+			continue
+		}
+		data := strings.TrimPrefix(line, "data: ")
+		if data == "[DONE]" {
+			break
+		}
+
+		var chunk MiniMaxStreamChunk
+		if err := json.Unmarshal([]byte(data), &chunk); err != nil {
+			llc.logger.Warnf("failed to parse minimax stream chunk: %v", err)
+			continue
+		}
+
+		// Capture final usage from the last chunk
+		if chunk.Usage != nil {
+			metrics.OnAddMetrics(llc.UsageMetrics(chunk.Usage)...)
+		}
+
+		for i, choice := range chunk.Choices {
+			// Accumulate tool calls
+			for _, tc := range choice.Delta.ToolCalls {
+				hasToolCalls = true
+				existing, ok := toolCallMap[tc.Index]
+				if !ok {
+					existing = &protos.ToolCall{
+						Id:   tc.ID,
+						Type: tc.Type,
+						Function: &protos.FunctionCall{
+							Name:      tc.Function.Name,
+							Arguments: tc.Function.Arguments,
+						},
+					}
+					toolCallMap[tc.Index] = existing
+				} else {
+					if tc.ID != "" {
+						existing.Id = tc.ID
+					}
+					if tc.Function.Name != "" {
+						existing.Function.Name += tc.Function.Name
+					}
+					existing.Function.Arguments += tc.Function.Arguments
+				}
+			}
+
+			content := choice.Delta.Content
+			if content != "" {
+				if len(contentBuffer) <= i {
+					contentBuffer = append(contentBuffer, content)
+				} else {
+					contentBuffer[i] += content
+				}
+
+				if !hasToolCalls {
+					if firstTokenTime == nil {
+						now := time.Now()
+						firstTokenTime = &now
+					}
+					tokenMsg := &protos.Message{
+						Role: "assistant",
+						Message: &protos.Message_Assistant{
+							Assistant: &protos.AssistantMessage{
+								Contents: []string{content},
+							},
+						},
+					}
+					if err := onStream(options.Request.GetRequestId(), tokenMsg); err != nil {
+						llc.logger.Warnf("error streaming token: %v", err)
+					}
+				}
+			}
+		}
+	}
+
+	if err := scanner.Err(); err != nil {
+		llc.logger.Errorf("error reading minimax stream: %v", err)
+		onError(options.Request.GetRequestId(), err)
+		options.PostHook(map[string]interface{}{
+			"error": err,
+		}, metrics.OnFailure().Build())
+		return err
+	}
+
+	// Strip thinking tags from accumulated content
+	for i, c := range contentBuffer {
+		contentBuffer[i] = stripThinkingTags(c)
+	}
+	assistantMsg.Contents = contentBuffer
+
+	// Collect accumulated tool calls
+	for _, tc := range toolCallMap {
+		assistantMsg.ToolCalls = append(assistantMsg.ToolCalls, tc)
+	}


Accumulated tool calls are stored in a map[int]*protos.ToolCall and then appended by ranging over the map. Map iteration order is non-deterministic, so tool call ordering can change between runs even when the provider sends stable indices. Collect indices, sort them, and append tool calls in index order.

Copilot · 2026-03-31T07:15:44Z

+	_, err := vc.CallJSON(ctx, "chat/completions", "POST", map[string]string{}, payload)
+	if err != nil {
+		vc.logger.Debugf("minimax credential verification with error %v", err)
+		// Check if the error indicates auth failure specifically
+		if resp, callErr := vc.Call(ctx, "chat/completions", "POST", map[string]string{}, payload); callErr == nil {
+			defer resp.Body.Close()
+			if resp.StatusCode != http.StatusUnauthorized && resp.StatusCode != http.StatusForbidden {
+				return utils.Ptr("valid"), nil
+			}
+		}
+		return nil, err


CredentialVerifier() makes a second HTTP request when the first request fails (CallJSON then Call). This doubles latency and load on MiniMax in error cases. Consider using a single request that returns both status code and body (e.g., call Call(...) once, inspect StatusCode, and optionally read/parse the body) to decide whether the credential is valid.

iamprashant requested review from Copilot and iamprashant March 31, 2026 07:10

Copilot started reviewing on behalf of iamprashant March 31, 2026 07:10 View session

Copilot AI reviewed Mar 31, 2026

View reviewed changes

iamprashant assigned Copilot Mar 31, 2026

iamprashant added the enhancement New feature or request label Mar 31, 2026

iamprashant assigned octo-patch and unassigned Copilot Apr 1, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add MiniMax as LLM provider for text generation#90

feat: add MiniMax as LLM provider for text generation#90
octo-patch wants to merge 1 commit intorapidaai:mainfrom
octo-patch:feature/add-minimax-llm-provider

octo-patch commented Mar 28, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 31, 2026

Uh oh!

Copilot AI Mar 31, 2026

Uh oh!

Copilot AI Mar 31, 2026

Uh oh!

Copilot AI Mar 31, 2026

Uh oh!

Copilot AI Mar 31, 2026

Uh oh!

Copilot AI Mar 31, 2026

Uh oh!

Copilot AI Mar 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

	mm.logger.Debugf("making request to minimax with %+v", req)
	mm.logger.Debugf("making request to minimax %s %s", req.Method, req.URL.String())

Conversation

octo-patch commented Mar 28, 2026

Summary

MiniMax Models

Changes

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants