Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,7 @@
</p>

## 📢 News
- **[2025-12]** AgentScope supports [TTS(Text-to-Speech)](https://doc.agentscope.io/tutorial/task_tts.html) now! Check our [example]() and [tutorial](https://doc.agentscope.io/tutorial/task_tts.html) for more details.
Comment thread
DavdGao marked this conversation as resolved.
- **[2025-11]** AgentScope supports [Anthropic Agent Skill](https://claude.com/blog/skills) now! Check our [example](https://github.com/agentscope-ai/agentscope/tree/main/examples/functionality/agent_skill) and [tutorial](https://doc.agentscope.io/tutorial/task_agent_skill.html) for more details.
- **[2025-11]** AgentScope open-sources [Alias-Agent](https://github.com/agentscope-ai/agentscope-samples/tree/main/alias) for diverse real-world tasks and [Data-Juicer Agent](https://github.com/agentscope-ai/agentscope-samples/tree/main/data_juicer_agent) for data processing.
- **[2025-11]** AgentScope supports [Agentic RL](https://github.com/agentscope-ai/agentscope/tree/main/examples/training/react_agent) via integrating [Trinity-RFT](https://github.com/modelscope/Trinity-RFT) library.
Expand Down
1 change: 1 addition & 0 deletions README_zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,7 @@
</p>

## 📢 新闻
- **[2025-12]** AgentScope 已支持 [TTS(Text-to-Speech) 模型](https://doc.agentscope.io/zh_CN/tutorial/task_tts.html) !欢迎查看 [样例]() 和 [教程](https://doc.agentscope.io/zh_CN/tutorial/task_tts.html) 了解更多详情。
Comment thread
DavdGao marked this conversation as resolved.
- **[2025-11]** AgentScope 已支持 [Anthropic Agent Skill](https://claude.com/blog/skills) !欢迎查看 [样例](https://github.com/agentscope-ai/agentscope/tree/main/examples/functionality/agent_skill) 和 [教程](https://doc.agentscope.io/zh_CN/tutorial/task_agent_skill.html) 了解更多详情。
- **[2025-11]** AgentScope 开源 [Alias-Agent](https://github.com/agentscope-ai/agentscope-samples/tree/main/alias) 用于处理多样化的真实任务,以及 [Data-Juicer Agent](https://github.com/agentscope-ai/agentscope-samples/tree/main/data_juicer_agent) 用于自然语言驱动的数据处理。
- **[2025-11]** AgentScope 通过集成 [Trinity-RFT](https://github.com/modelscope/Trinity-RFT) 实现对 [Agentic RL](https://github.com/agentscope-ai/agentscope/tree/main/examples/training/react_agent) 的支持。
Expand Down
31 changes: 24 additions & 7 deletions docs/tutorial/en/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -33,26 +33,42 @@ Welcome to AgentScope's documentation!

.. toctree::
:maxdepth: 1
:caption: Task Guides
:caption: Model and Context

tutorial/task_model
tutorial/task_prompt
tutorial/task_tool
tutorial/task_token
tutorial/task_memory
tutorial/task_long_term_memory

.. toctree::
:maxdepth: 1
:caption: Tool

tutorial/task_tool
tutorial/task_mcp
tutorial/task_agent_skill

.. toctree::
:maxdepth: 1
:caption: Agent

tutorial/task_agent
tutorial/task_state
tutorial/task_hook

.. toctree::
:maxdepth: 1
:caption: Features

tutorial/task_pipeline
tutorial/task_plan
tutorial/task_rag
tutorial/task_state
tutorial/task_hook
tutorial/task_mcp
tutorial/task_agent_skill
tutorial/task_studio
tutorial/task_tracing
tutorial/task_eval
tutorial/task_embedding
tutorial/task_token
tutorial/task_tts

.. toctree::
:maxdepth: 1
Expand All @@ -76,3 +92,4 @@ Welcome to AgentScope's documentation!
api/agentscope.tracing
api/agentscope.session
api/agentscope.exception
api/agentscope.tts
243 changes: 243 additions & 0 deletions docs/tutorial/en/src/task_tts.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,243 @@
# -*- coding: utf-8 -*-
"""
.. _tts:

TTS
====================

AgentScope provides a unified interface for Text-to-Speech (TTS) models across multiple API providers.
This tutorial demonstrates how to use TTS models in AgentScope.

AgentScope supports the following TTS APIs:

.. list-table:: Built-in TTS Models
:header-rows: 1

* - API
- Class
- Streaming Input
- Non-Streaming Input
- Streaming Output
- Non-Streaming Output
* - DashScope Realtime API
- ``DashScopeRealtimeTTSModel``
- ✅
- ✅
- ✅
- ✅
* - DashScope API
- ``DashScopeTTSModel``
- ❌
- ✅
- ✅
- ✅
* - OpenAI API
- ``OpenAITTSModel``
- ❌
- ✅
- ✅
- ✅
* - Gemini API
- ``GeminiTTSModel``
- ❌
- ✅
- ✅
- ✅

.. note:: The streaming input and output in AgentScope TTS models are all accumulative.

**Choosing the Right Model:**

- **Use Non-Realtime TTS** when you have complete text ready (e.g., pre-written
responses, complete LLM outputs)
- **Use Realtime TTS** when text is generated progressively (e.g., streaming
LLM responses) for lower latency

"""

import asyncio
import os

from agentscope.agent import ReActAgent, UserAgent
from agentscope.formatter import DashScopeChatFormatter
from agentscope.message import Msg
from agentscope.model import DashScopeChatModel
from agentscope.tts import (
DashScopeRealtimeTTSModel,
DashScopeTTSModel,
)

# %%
# Non-Realtime TTS
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# Non-realtime TTS models process complete text inputs and are the simplest
# to use. You can directly call their ``synthesize()`` method.
#
# Taking DashScope TTS model as an example:


async def example_non_realtime_tts() -> None:
"""A basic example of using non-realtime TTS models."""
# Example with DashScope TTS
tts_model = DashScopeTTSModel(
api_key=os.environ.get("DASHSCOPE_API_KEY", ""),
model_name="qwen3-tts-flash",
voice="Cherry",
stream=False, # Non-streaming output
)

msg = Msg(
name="assistant",
content="Hello, this is DashScope TTS.",
role="assistant",
)

# Directly synthesize without connecting
tts_response = await tts_model.synthesize(msg)

# tts_response.content contains an audio block with base64-encoded audio data
print(
"The length of audio data:",
len(tts_response.content[0]["source"]["data"]),
)


asyncio.run(example_non_realtime_tts())

# %%
# **Streaming Output for Lower Latency:**
#
# When ``stream=True``, the model returns audio chunks progressively, allowing
# you to start playback before synthesis completes. This reduces perceived latency.
#


async def example_non_realtime_tts_streaming() -> None:
"""An example of using non-realtime TTS models with streaming output."""
# Example with DashScope TTS with streaming output
tts_model = DashScopeTTSModel(
api_key=os.environ.get("DASHSCOPE_API_KEY", ""),
model_name="qwen3-tts-flash",
voice="Cherry",
stream=True, # Enable streaming output
)

msg = Msg(
name="assistant",
content="Hello, this is DashScope TTS with streaming output.",
role="assistant",
)

# Synthesize and receive an async generator for streaming output
async for tts_response in await tts_model.synthesize(msg):
# Process each audio chunk as it arrives
print(
"Received audio chunk of length:",
len(tts_response.content[0]["source"]["data"]),
)


asyncio.run(example_non_realtime_tts_streaming())


# %%
# Realtime TTS
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# Realtime TTS models are designed for scenarios where text is generated
# incrementally, such as streaming LLM responses. This enables the lowest
# possible latency by starting audio synthesis before the complete text is ready.
#
# **Key Concepts:**
#
# - **Stateful Processing**: Realtime TTS maintains state for a single streaming
# session, identified by ``msg.id``. Only one streaming session can be active
# at a time.
# - **Two Methods**:
#
# - ``push(msg)``: Non-blocking method that submits text chunks and returns
# immediately. May return partial audio if available.
# - ``synthesize(msg)``: Blocking method that finalizes the session and returns
# all remaining audio. When ``stream=True``, it returns an async generator.
#
# .. code-block:: python
#
# async def example_realtime_tts_streaming():
# tts_model = DashScopeRealtimeTTSModel(
# api_key=os.environ.get("DASHSCOPE_API_KEY", ""),
# model_name="qwen3-tts-flash-realtime",
# voice="Cherry",
# stream=False,
# )
#
# # realtime tts model received accumulative text chunks
# res = await tts_model.push(msg_chunk_1) # non-blocking
# res = await tts_model.push(msg_chunk_2) # non-blocking
# ...
# res = await tts_model.synthesize(final_msg) # blocking, get all remaining audio
#
# When setting ``stream=True`` during initialization, the ``synthesize()`` method returns an async generator of ``TTSResponse`` objects, allowing you to process audio chunks as they arrive.
#
#
# Integrating with ReActAgent
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# AgentScope agents can automatically synthesize their responses to speech
# when provided with a TTS model. This works seamlessly with both realtime
# and non-realtime TTS models.
#
# **How It Works:**
#
# 1. The agent generates a text response (potentially streamed from an LLM)
# 2. The TTS model synthesizes the text to audio automatically
# 3. The synthesized audio is attached to the ``speech`` field of the ``Msg`` object
# 4. The audio is played during the agent's ``self.print()`` method
#


async def example_agent_with_tts() -> None:
"""An example of using TTS with ReActAgent."""
# Create an agent with TTS enabled
agent = ReActAgent(
name="Assistant",
sys_prompt="You are a helpful assistant.",
model=DashScopeChatModel(
api_key=os.environ.get("DASHSCOPE_API_KEY", ""),
model_name="qwen-max",
stream=True,
),
formatter=DashScopeChatFormatter(),
# Enable TTS
tts_model=DashScopeRealtimeTTSModel(
api_key=os.getenv("DASHSCOPE_API_KEY"),
model_name="qwen3-tts-flash-realtime",
voice="Cherry",
),
)
user = UserAgent("User")

# Build a conversation just like normal
msg = None
while True:
msg = await agent(msg)
msg = await user(msg)
if msg.get_text_content() == "exit":
break


# %%
# Customizing TTS Model
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# You can create custom TTS implementations by inheriting from ``TTSModelBase``.
# The base class provides a flexible interface for both realtime and non-realtime
# TTS models.
# We use an attribute ``supports_streaming_input`` to indicate if the TTS model is realtime or not.
#
# For realtime TTS models, you need to implement the ``connect``, ``close``, ``push`` and ``synthesize`` methods to handle the lifecycle and streaming input.
#
# While for non-realtime TTS models, you only need to implement the ``synthesize`` method.
#
# Further Reading
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# - :ref:`agent` - Learn more about agents in AgentScope
# - :ref:`message` - Understand message format in AgentScope
# - API Reference: :class:`agentscope.tts.TTSModelBase`
#
32 changes: 25 additions & 7 deletions docs/tutorial/zh_CN/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -31,28 +31,45 @@ Welcome to AgentScope's documentation!

tutorial/faq


.. toctree::
:maxdepth: 1
:caption: Task Guides
:caption: Model and Context

tutorial/task_model
tutorial/task_prompt
tutorial/task_tool
tutorial/task_token
tutorial/task_memory
tutorial/task_long_term_memory

.. toctree::
:maxdepth: 1
:caption: Tool

tutorial/task_tool
tutorial/task_mcp
tutorial/task_agent_skill

.. toctree::
:maxdepth: 1
:caption: Agent

tutorial/task_agent
tutorial/task_state
tutorial/task_hook

.. toctree::
:maxdepth: 1
:caption: Features

tutorial/task_pipeline
tutorial/task_plan
tutorial/task_rag
tutorial/task_state
tutorial/task_hook
tutorial/task_mcp
tutorial/task_agent_skill
tutorial/task_studio
tutorial/task_tracing
tutorial/task_eval
tutorial/task_embedding
tutorial/task_token
tutorial/task_tts

.. toctree::
:maxdepth: 1
Expand All @@ -76,3 +93,4 @@ Welcome to AgentScope's documentation!
api/agentscope.tracing
api/agentscope.session
api/agentscope.exception
api/agentscope.tts
Loading