我相信:面向物理世界的全能 Agent 范式,黎明已在地平线显现。
I believe the dawn of universal agents for the physical world is already on the horizon.
I’m here to build tiny, working pieces of that future.
- Multimodal agents that can reason, call tools, and actually finish tasks
- Deep Learning + CV in the real world (detection / segmentation / tracking / action)
- Speech interaction loops (ASR / TTS / lip-sync) that feel low-latency and natural
- Edge deployment that survives messy constraints (Jetson / RK3588, ONNX, TensorRT)
LLM · VLM · Deep Learning · Computer Vision · ASR/TTS · MCP · ONNX/TensorRT
- mcp-minicpmv-video - video understanding MCP server.
- mcp-funasr-stdio - speech transcription MCP server.
- sherpa_onnx_ortv2 - Flutter + ONNX Runtime v2 compatibility work.
- Deep-Learning-Toolbox - DL experiments and utilities.
- SaySee: a multimodal speaking-coach app with low-latency voice interaction.
- Digital Human Productization: turning demos into deployable product + ops pipelines.
North star: turning frontier models into dependable agents that can operate in the physical world.

