agentscope-ai · DavdGao · Dec 7, 2025 · Nov 25, 2025 · Nov 25, 2025 · Nov 26, 2025
diff --git a/README.md b/README.md
@@ -54,6 +54,7 @@
 </p>
 
 ## 📢 News
+- **[2025-12]** AgentScope supports [TTS(Text-to-Speech)](https://doc.agentscope.io/tutorial/task_tts.html) now! Check our [example]() and [tutorial](https://doc.agentscope.io/tutorial/task_tts.html) for more details.
 - **[2025-11]** AgentScope supports [Anthropic Agent Skill](https://claude.com/blog/skills) now! Check our [example](https://github.com/agentscope-ai/agentscope/tree/main/examples/functionality/agent_skill) and [tutorial](https://doc.agentscope.io/tutorial/task_agent_skill.html) for more details.
 - **[2025-11]** AgentScope open-sources [Alias-Agent](https://github.com/agentscope-ai/agentscope-samples/tree/main/alias) for diverse real-world tasks and [Data-Juicer Agent](https://github.com/agentscope-ai/agentscope-samples/tree/main/data_juicer_agent) for data processing.
 - **[2025-11]** AgentScope supports [Agentic RL](https://github.com/agentscope-ai/agentscope/tree/main/examples/training/react_agent) via integrating [Trinity-RFT](https://github.com/modelscope/Trinity-RFT) library.

diff --git a/README_zh.md b/README_zh.md
@@ -54,6 +54,7 @@
 </p>
 
 ## 📢 新闻
+- **[2025-12]** AgentScope 已支持 [TTS(Text-to-Speech) 模型](https://doc.agentscope.io/zh_CN/tutorial/task_tts.html) ！欢迎查看 [样例]() 和 [教程](https://doc.agentscope.io/zh_CN/tutorial/task_tts.html) 了解更多详情。
 - **[2025-11]** AgentScope 已支持 [Anthropic Agent Skill](https://claude.com/blog/skills) ！欢迎查看 [样例](https://github.com/agentscope-ai/agentscope/tree/main/examples/functionality/agent_skill) 和 [教程](https://doc.agentscope.io/zh_CN/tutorial/task_agent_skill.html) 了解更多详情。
 - **[2025-11]** AgentScope 开源 [Alias-Agent](https://github.com/agentscope-ai/agentscope-samples/tree/main/alias) 用于处理多样化的真实任务，以及 [Data-Juicer Agent](https://github.com/agentscope-ai/agentscope-samples/tree/main/data_juicer_agent) 用于自然语言驱动的数据处理。
 - **[2025-11]** AgentScope 通过集成 [Trinity-RFT](https://github.com/modelscope/Trinity-RFT) 实现对 [Agentic RL](https://github.com/agentscope-ai/agentscope/tree/main/examples/training/react_agent) 的支持。

diff --git a/docs/tutorial/en/index.rst b/docs/tutorial/en/index.rst
@@ -33,26 +33,42 @@ Welcome to AgentScope's documentation!
 
 .. toctree::
    :maxdepth: 1
-   :caption: Task Guides
+   :caption: Model and Context
 
    tutorial/task_model
    tutorial/task_prompt
-   tutorial/task_tool
+   tutorial/task_token
    tutorial/task_memory
    tutorial/task_long_term_memory
+
+.. toctree::
+   :maxdepth: 1
+   :caption: Tool
+
+   tutorial/task_tool
+   tutorial/task_mcp
+   tutorial/task_agent_skill
+
+.. toctree::
+   :maxdepth: 1
+   :caption: Agent
+
    tutorial/task_agent
+   tutorial/task_state
+   tutorial/task_hook
+
+.. toctree::
+   :maxdepth: 1
+   :caption: Features
+
    tutorial/task_pipeline
    tutorial/task_plan
    tutorial/task_rag
-   tutorial/task_state
-   tutorial/task_hook
-   tutorial/task_mcp
-   tutorial/task_agent_skill
    tutorial/task_studio
    tutorial/task_tracing
    tutorial/task_eval
    tutorial/task_embedding
-   tutorial/task_token
+   tutorial/task_tts
 
 .. toctree::
    :maxdepth: 1
@@ -76,3 +92,4 @@ Welcome to AgentScope's documentation!
    api/agentscope.tracing
    api/agentscope.session
    api/agentscope.exception
+   api/agentscope.tts
diff --git a/docs/tutorial/en/src/task_tts.py b/docs/tutorial/en/src/task_tts.py
@@ -0,0 +1,243 @@
+# -*- coding: utf-8 -*-
+"""
+.. _tts:
+
+TTS
+====================
+
+AgentScope provides a unified interface for Text-to-Speech (TTS) models across multiple API providers.
+This tutorial demonstrates how to use TTS models in AgentScope.
+
+AgentScope supports the following TTS APIs:
+
+.. list-table:: Built-in TTS Models
+    :header-rows: 1
+
+    * - API
+      - Class
+      - Streaming Input
+      - Non-Streaming Input
+      - Streaming Output
+      - Non-Streaming Output
+    * - DashScope Realtime API
+      - ``DashScopeRealtimeTTSModel``
+      - ✅
+      - ✅
+      - ✅
+      - ✅
+    * - DashScope API
+      - ``DashScopeTTSModel``
+      - ❌
+      - ✅
+      - ✅
+      - ✅
+    * - OpenAI API
+      - ``OpenAITTSModel``
+      - ❌
+      - ✅
+      - ✅
+      - ✅
+    * - Gemini API
+      - ``GeminiTTSModel``
+      - ❌
+      - ✅
+      - ✅
+      - ✅
+
+.. note:: The streaming input and output in AgentScope TTS models are all accumulative.
+
+**Choosing the Right Model:**
+
+- **Use Non-Realtime TTS** when you have complete text ready (e.g., pre-written
+  responses, complete LLM outputs)
+- **Use Realtime TTS** when text is generated progressively (e.g., streaming
+  LLM responses) for lower latency
+
+"""
+
+import asyncio
+import os
+
+from agentscope.agent import ReActAgent, UserAgent
+from agentscope.formatter import DashScopeChatFormatter
+from agentscope.message import Msg
+from agentscope.model import DashScopeChatModel
+from agentscope.tts import (
+    DashScopeRealtimeTTSModel,
+    DashScopeTTSModel,
+)
+
+# %%
+# Non-Realtime TTS
+# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+# Non-realtime TTS models process complete text inputs and are the simplest
+# to use. You can directly call their ``synthesize()`` method.
+#
+# Taking DashScope TTS model as an example:
+
+
+async def example_non_realtime_tts() -> None:
+    """A basic example of using non-realtime TTS models."""
+    # Example with DashScope TTS
+    tts_model = DashScopeTTSModel(
+        api_key=os.environ.get("DASHSCOPE_API_KEY", ""),
+        model_name="qwen3-tts-flash",
+        voice="Cherry",
+        stream=False,  # Non-streaming output
+    )
+
+    msg = Msg(
+        name="assistant",
+        content="Hello, this is DashScope TTS.",
+        role="assistant",
+    )
+
+    # Directly synthesize without connecting
+    tts_response = await tts_model.synthesize(msg)
+
+    # tts_response.content contains an audio block with base64-encoded audio data
+    print(
+        "The length of audio data:",
+        len(tts_response.content[0]["source"]["data"]),
+    )
+
+
+asyncio.run(example_non_realtime_tts())
+
+# %%
+# **Streaming Output for Lower Latency:**
+#
+# When ``stream=True``, the model returns audio chunks progressively, allowing
+# you to start playback before synthesis completes. This reduces perceived latency.
+#
+
+
+async def example_non_realtime_tts_streaming() -> None:
+    """An example of using non-realtime TTS models with streaming output."""
+    # Example with DashScope TTS with streaming output
+    tts_model = DashScopeTTSModel(
+        api_key=os.environ.get("DASHSCOPE_API_KEY", ""),
+        model_name="qwen3-tts-flash",
+        voice="Cherry",
+        stream=True,  # Enable streaming output
+    )
+
+    msg = Msg(
+        name="assistant",
+        content="Hello, this is DashScope TTS with streaming output.",
+        role="assistant",
+    )
+
+    # Synthesize and receive an async generator for streaming output
+    async for tts_response in await tts_model.synthesize(msg):
+        # Process each audio chunk as it arrives
+        print(
+            "Received audio chunk of length:",
+            len(tts_response.content[0]["source"]["data"]),
+        )
+
+
+asyncio.run(example_non_realtime_tts_streaming())
+
+
+# %%
+# Realtime TTS
+# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+# Realtime TTS models are designed for scenarios where text is generated
+# incrementally, such as streaming LLM responses. This enables the lowest
+# possible latency by starting audio synthesis before the complete text is ready.
+#
+# **Key Concepts:**
+#
+# - **Stateful Processing**: Realtime TTS maintains state for a single streaming
+#   session, identified by ``msg.id``. Only one streaming session can be active
+#   at a time.
+# - **Two Methods**:
+#
+#   - ``push(msg)``: Non-blocking method that submits text chunks and returns
+#     immediately. May return partial audio if available.
+#   - ``synthesize(msg)``: Blocking method that finalizes the session and returns
+#     all remaining audio. When ``stream=True``, it returns an async generator.
+#
+# .. code-block:: python
+#
+#     async def example_realtime_tts_streaming():
+#         tts_model = DashScopeRealtimeTTSModel(
+#             api_key=os.environ.get("DASHSCOPE_API_KEY", ""),
+#             model_name="qwen3-tts-flash-realtime",
+#             voice="Cherry",
+#             stream=False,
+#         )
+#
+#         # realtime tts model received accumulative text chunks
+#         res = await tts_model.push(msg_chunk_1)  # non-blocking
+#         res = await tts_model.push(msg_chunk_2)  # non-blocking
+#         ...
+#         res = await tts_model.synthesize(final_msg)  # blocking, get all remaining audio
+#
+# When setting ``stream=True`` during initialization, the ``synthesize()`` method returns an async generator of ``TTSResponse`` objects, allowing you to process audio chunks as they arrive.
+#
+#
+# Integrating with ReActAgent
+# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+# AgentScope agents can automatically synthesize their responses to speech
+# when provided with a TTS model. This works seamlessly with both realtime
+# and non-realtime TTS models.
+#
+# **How It Works:**
+#
+# 1. The agent generates a text response (potentially streamed from an LLM)
+# 2. The TTS model synthesizes the text to audio automatically
+# 3. The synthesized audio is attached to the ``speech`` field of the ``Msg`` object
+# 4. The audio is played during the agent's ``self.print()`` method
+#
+
+
+async def example_agent_with_tts() -> None:
+    """An example of using TTS with ReActAgent."""
+    # Create an agent with TTS enabled
+    agent = ReActAgent(
+        name="Assistant",
+        sys_prompt="You are a helpful assistant.",
+        model=DashScopeChatModel(
+            api_key=os.environ.get("DASHSCOPE_API_KEY", ""),
+            model_name="qwen-max",
+            stream=True,
+        ),
+        formatter=DashScopeChatFormatter(),
+        # Enable TTS
+        tts_model=DashScopeRealtimeTTSModel(
+            api_key=os.getenv("DASHSCOPE_API_KEY"),
+            model_name="qwen3-tts-flash-realtime",
+            voice="Cherry",
+        ),
+    )
+    user = UserAgent("User")
+
+    # Build a conversation just like normal
+    msg = None
+    while True:
+        msg = await agent(msg)
+        msg = await user(msg)
+        if msg.get_text_content() == "exit":
+            break
+
+
+# %%
+# Customizing TTS Model
+# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+# You can create custom TTS implementations by inheriting from ``TTSModelBase``.
+# The base class provides a flexible interface for both realtime and non-realtime
+# TTS models.
+# We use an attribute ``supports_streaming_input`` to indicate if the TTS model is realtime or not.
+#
+# For realtime TTS models, you need to implement the ``connect``, ``close``, ``push`` and ``synthesize`` methods to handle the lifecycle and streaming input.
+#
+# While for non-realtime TTS models, you only need to implement the ``synthesize`` method.
+#
+# Further Reading
+# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+# - :ref:`agent` - Learn more about agents in AgentScope
+# - :ref:`message` - Understand message format in AgentScope
+# - API Reference: :class:`agentscope.tts.TTSModelBase`
+#
diff --git a/docs/tutorial/zh_CN/index.rst b/docs/tutorial/zh_CN/index.rst
@@ -31,28 +31,45 @@ Welcome to AgentScope's documentation!
 
    tutorial/faq
 
+
 .. toctree::
    :maxdepth: 1
-   :caption: Task Guides
+   :caption: Model and Context
 
    tutorial/task_model
    tutorial/task_prompt
-   tutorial/task_tool
+   tutorial/task_token
    tutorial/task_memory
    tutorial/task_long_term_memory
+
+.. toctree::
+   :maxdepth: 1
+   :caption: Tool
+
+   tutorial/task_tool
+   tutorial/task_mcp
+   tutorial/task_agent_skill
+
+.. toctree::
+   :maxdepth: 1
+   :caption: Agent
+
    tutorial/task_agent
+   tutorial/task_state
+   tutorial/task_hook
+
+.. toctree::
+   :maxdepth: 1
+   :caption: Features
+
    tutorial/task_pipeline
    tutorial/task_plan
    tutorial/task_rag
-   tutorial/task_state
-   tutorial/task_hook
-   tutorial/task_mcp
-   tutorial/task_agent_skill
    tutorial/task_studio
    tutorial/task_tracing
    tutorial/task_eval
    tutorial/task_embedding
-   tutorial/task_token
+   tutorial/task_tts
 
 .. toctree::
    :maxdepth: 1
@@ -76,3 +93,4 @@ Welcome to AgentScope's documentation!
    api/agentscope.tracing
    api/agentscope.session
    api/agentscope.exception
+   api/agentscope.tts