MeetAvatar

A voice bot that joins Google Meet calls, listens to participants, generates responses via LLM (Gemini), and speaks back with a synthesized voice (ElevenLabs).

How it works

The bot joins a Google Meet call via headless Chromium (Playwright)
It captures participant audio through WebRTC peer connections
A VAD (Voice Activity Detection) detects end of speech (1.2s silence threshold)
Audio is transcribed via ElevenLabs STT (scribe_v1)
The transcript is sent to Gemini to generate a reply
The reply is synthesized to audio via ElevenLabs TTS
The audio is injected back into the meeting via WebRTC track replacement

Participant speaks → WebRTC capture → VAD → ElevenLabs STT
    → Gemini LLM → ElevenLabs TTS → WebRTC playback

Prerequisites

Node.js >= 18
A Google account (to bypass the waiting room on your own meetings)
API keys: ElevenLabs + Google Gemini

Installation

npm install
npx playwright install chromium

Configuration

Copy the example env file and fill in your keys:

cp .env.example .env

Variable	Description
`ELEVENLABS_API_KEY`	ElevenLabs API key
`ELEVENLABS_VOICE_ID`	Voice ID to use for TTS
`ELEVENLABS_TTS_MODEL`	TTS model (default: `eleven_turbo_v2_5`)
`GEMINI_API_KEY`	Google Gemini API key
`GEMINI_MODEL`	Gemini model (default: `gemini-2.5-flash-lite`)

Google login (one-time setup)

To let the bot join meetings without being stuck in the waiting room, log into a Google account:

npm run login

A browser window opens. Sign into your Google account, then close the browser. The session is persisted in chrome-profile/ and reused by the headless bot.

Running

npm start

Server starts on http://localhost:3000.

API

Join a call

POST /join
Content-Type: application/json

{
  "meetId": "abc-defg-hij",
  "botName": "My Agent",
  "timeout": 60000
}

meetId accepts a Google Meet ID or a full URL. Returns immediately (202):

{ "sessionId": "uuid", "meetUrl": "https://...", "status": "joining" }

Session status

GET /session/:sessionId

Possible statuses: idle → joining → waiting_room → in_meeting → error | closed

Leave

DELETE /session/:sessionId

List active sessions

GET /sessions

Full example

# Join
SESSION=$(curl -s -X POST http://localhost:3000/join \
  -H "Content-Type: application/json" \
  -d '{"meetId":"abc-defg-hij","botName":"TestBot"}' | jq -r .sessionId)

# Poll status
curl -s http://localhost:3000/session/$SESSION

# Leave
curl -X DELETE http://localhost:3000/session/$SESSION

Project structure

src/
  server.js          # Express server (REST API, session management)
  meet-bot.js        # Google Meet automation (Playwright, WebRTC)
  audio-pipeline.js  # VAD, STT, TTS, Gemini LLM
  login.js           # Google login helper (persistent Chrome profile)
avatar.png           # Reference avatar image
.env.example         # Environment variables template

Lip sync (previous iteration)

An earlier version of this project included real-time lip sync using MuseTalk. The setup was:

A separate Python FastAPI server (lipsync/server.py) running on port 8765
MuseTalk models (VAE, UNet, Whisper, face parsing) loaded on GPU/MPS/CPU
The Node.js bot would POST the TTS audio to /generate and receive back NDJSON-streamed mouth region crops (JPEG sprites at 25fps)
The browser would composite the static avatar image + animated mouth crops on a canvas, then inject the resulting video track into WebRTC

Pipeline with lip sync:

TTS audio → MuseTalk server → mouth crops (25fps NDJSON stream)
                                    ↓
Static avatar (fetched from /avatar) + mouth overlay → Canvas → WebRTC video track

This was removed to simplify the setup (no Python/GPU dependency). The current version is audio-only. The lip sync code can be found in the git history if needed.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
src		src
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
avatar.jpg		avatar.jpg
package-lock.json		package-lock.json
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MeetAvatar

How it works

Prerequisites

Installation

Configuration

Google login (one-time setup)

Running

API

Join a call

Session status

Leave

List active sessions

Full example

Project structure

Lip sync (previous iteration)

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

MeetAvatar

How it works

Prerequisites

Installation

Configuration

Google login (one-time setup)

Running

API

Join a call

Session status

Leave

List active sessions

Full example

Project structure

Lip sync (previous iteration)

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages