Skip to content

Feat/add webrtc transport#780

Open
Nkovaturient wants to merge 111 commits intolibp2p:mainfrom
Nkovaturient:feat/add-webrtc-transport
Open

Feat/add webrtc transport#780
Nkovaturient wants to merge 111 commits intolibp2p:mainfrom
Nkovaturient:feat/add-webrtc-transport

Conversation

@Nkovaturient
Copy link
Contributor

@Nkovaturient Nkovaturient commented Jul 20, 2025

Description

  • This PR introduces a comprehensive WebRTC transport implementation for py-libp2p, enabling browser-to-browser, browser-to-server, and server-to-server real-time peer-to-peer connections

Issue #546

Updates

Core Transport Implementation

  • Private-to-Private WebRTC: Circuit relay-based signaling for NAT traversal
  • WebRTC-Direct: UDP hole punching for direct peer connections
  • Stream Multiplexing: Multiple protocols over single WebRTC data channels
  • Certificate Management: Self-signed certificates with proper hash encoding
  • Protocol Registration: Full multiaddr support with WebRTC, WebRTC-Direct, and certhash protocols

js-libp2p Compatibility

  • Protocol Codes: Exact match with js-libp2p specifications (0x0119, 0x0118, 0x01d2)
  • Multiaddr Format: Compatible address formatting for cross-implementation connections
  • Signaling Protocol: /libp2p/webrtc/signal/1.0.0 message format compliance
  • Certificate Format: uEi prefixed base64url certificate hashes

Production-Ready Features

  • Async Bridge: Robust trio-asyncio integration for WebRTC operations
  • Timeout Protection: Network operations complete in 3s with graceful fallback
  • Error Handling: Comprehensive error states and recovery mechanisms
  • Resource Cleanup: Proper connection and stream lifecycle management

Technical Architecture & Design Decisions

Why trio-asyncio Instead of Pure Trio?

Core Challenge: py-libp2p uses Trio for async operations, but aiortc (the robust Python WebRTC library) is built entirely on asyncio. This created a fundamental integration challenge that required careful architectural decisions.

Solution: I have implemented a sophisticated trio-asyncio bridge (WebRTCAsyncBridge) that provides:

  • Context-managed integration: Safe async context handling across both frameworks
  • Trio token capture: Cross-thread communication from asyncio callbacks to trio contexts
  • Resource lifecycle management: Proper cleanup of both trio and asyncio resources
  • Performance optimization: Minimal overhead bridge operations with connection pooling

Alternative Considered: Writing a pure trio WebRTC implementation would eliminate bridge complexity but would require reimplementing significant portions of WebRTC protocols (DTLS, SCTP, ICE) - a massive undertaking that would delay delivery and introduce bugs.

The trio-asyncio Bridge: Why Essential?

# aiortc operations are asyncio-native:
peer_connection = RTCPeerConnection(config)  # asyncio context required
offer = await peer_connection.createOffer()   # asyncio coroutine

# py-libp2p expects trio operations:
stream = await host.new_stream(peer_id, protocols)  # trio context
await stream.write(data)  # trio async

# Our bridge seamlessly connects both:
async with WebRTCAsyncBridge():
    offer = await bridge.create_offer(peer_connection)  # trio-safe operation

Bridge Benefits:

  • Thread Safety: Handles asyncio callbacks safely in trio contexts via trio tokens
  • Resource Management: Ensures proper cleanup of both asyncio and trio resources
  • Context Isolation: Prevents context bleeding between async frameworks
  • Performance: Minimal overhead with connection reuse and smart batching

Current Status

Main Test Suite (test_webrtc_transport.py)

  • 15 comprehensive tests covering basic functionality, interoperability, and advanced features
  • Network-independent operation with smart timeout handling
  • Real WebRTC connection testing with SDP generation and data channel setup

Specialized Test Suites

  • test_js_libp2p_interop.py: Dedicated js-libp2p compatibility validation
  • test_live_signaling.py: Live signaling with circuit relay simulation
  • test_network_optimized.py: Network-independent testing for CI/CD environments

Questions

  1. Performance Impact: How does the trio-asyncio bridge affect performance in high-throughput scenarios with 100+ concurrent WebRTC connections? Should we prioritize a pure trio WebRTC implementation for v2?

  2. Stream Multiplexing Integration: How should WebRTC streams integrate with existing yamux/mplex stream multiplexing? Should WebRTC connections be treated as muxed connections themselves, or as transport-level primitives?

  3. Circuit Relay Strategy: What's the preferred integration path for circuit relay reservations with the existing relay discovery mechanism? Should WebRTC transport handle relay discovery independently or leverage existing infrastructure?

  4. NAT Traversal Prioritization: When both STUN servers and direct UDP hole punching are available, what's the preferred fallback hierarchy? How do we balance connection speed vs reliability?

  5. Resource Management Philosophy: Given the trio-asyncio bridge complexity, should we add connection pooling and resource limits to prevent memory leaks in long-running applications?

Next work

  • Implementing IHost for utilising real network resources for addressing webrtc listeners and get_network [currently on timeout and mocks ]
  • TODO: Return circuit relay addresses that can be used for WebRTC signaling in private-to-private transport.py. [Must be modular for its extension and utilisation in webrtc-direct transport as well]
  • Implement and test lacunae in NAT Traversal and Circuit-Relay Implementations
  • Implement pubsub-based offer/answer exchange in WebRTC-Direct transport
  • Demonstrate complete end-to-end connectivity via webRTC Private-to-Private & Private-to-Public Connections
  • Demonstrate direct WebRTC connection with NAT traversal
  • Demonstrate P2P connection through circuit relay

Cute Animal Picture

orca

@Nkovaturient Nkovaturient mentioned this pull request Jul 20, 2025
5 tasks
@sukhman-sukh
Copy link
Contributor

Hey @Nkovaturient, after our discussion, I have made some changes to the SDP and ICE exchange protocols for it to be interoperable with JS.
Key changes:

  1. Replaced JSON with .proto serialization
  2. Removed message-length bytes at start of message for now (as it was not in JS. We can add later if needed)
  3. Fixed the ICE_candidate object creation, sending, and receiving handlers.
  4. Fixed some linting errors.
    Please have a look at the changes.
    Now, I will move ahead with the above checkboxes.

Also, in the meantime, can you check for pyrefly typecheck and try to fix them?

@Nkovaturient
Copy link
Contributor Author

Also, in the meantime, can you check for pyrefly typecheck and try to fix them?

Sure, gonna fix them.

@sukhman-sukh sukhman-sukh force-pushed the feat/add-webrtc-transport branch from 969868f to 13378e6 Compare August 11, 2025 20:54
@seetadev
Copy link
Contributor

@Nkovaturient , @sukhman-sukh : Great work :)

Added some feedback on the tests proposed at #839 .

Wonderful progress indeed. Reviewing the PR and looking forward to successful completion of the WebRTC direct PR in the coming days. CCing @pacrob, @Winter-Soren, @AkMo3, @acul71, @guha-rahul and @lla-dane for their feedback and pointers.

Looking forward to seeing WebRTC direct in production soon :)

@sukhman-sukh sukhman-sukh force-pushed the feat/add-webrtc-transport branch from 1cd4b3f to 13378e6 Compare August 13, 2025 02:40
@sukhman-sukh
Copy link
Contributor

Hey @seetadev,
I don't know why this is happening but on my local I have 2 more commits which are not getting pushed.
I tried forced push too but it is not showing on github.
Do you have any idea?

commit 177c14939fa785780bce43bff99401ab9101f6ca (HEAD)
Author: sukhman <sukhmansinghsaluja@gmail.com>
Date:   Tue Aug 12 03:28:00 2025 +0530

    Fix lint error in CI

commit bd7f93a6192b3a610162cd57fe37d695e2157c8a
Author: sukhman <sukhmansinghsaluja@gmail.com>
Date:   Mon Aug 11 06:20:19 2025 +0530

    Add stream read/write for webrtc

Add stream is the second last and Fix lint is the last commit

@sukhman-sukh
Copy link
Contributor

Also, I am facing difficulty in some functions of listener which for now I have created as a placeholder as JS has something called ICEUDPMUXListener in its webrtc's base library for listening to STUN requests for webrtc while I can't find anything equivalent in python (still looking for some workarounds), in worst case we will have to implement it.
Ref: https://github.com/libp2p/js-libp2p/blob/cf9aab5c841ec08bc023b9f49083c95ad78a7a07/packages/transport-webrtc/src/private-to-public/listener.ts#L154-L155
It would be really nice if you could guide me how to go ahead with this.

@seetadev
Copy link
Contributor

@sukhman-sukh , @Nkovaturient : Thank you for making improvements and fixing linting issues.

Re-ran the CI/CD pipeline. We do have some issues that are yet to be fixed too.

…o-pvt

- Add _hold_loop_open from start() to stop()
- Wrap dial/signaling/cleanup in with_webrtc_context
- Ensure data pump ready before upgrade
- Example: host.connect() + new_stream() for proper upgrade
@Nkovaturient
Copy link
Contributor Author

Hello @seetadev @acul71 @sumanjeet0012

I think I have so far fixed and covered all the remaining issues with @asmit27rai and this webrtc transport PR is ready to be reviewed.
Compiled discussions here on fixes and endeavours in webrtc-direct and webrtc pvt-to-pvt
Kindly provide your guidance and insights here.

Thank you!

…x-docs

- pyproject.toml: remove [project.optional-dependencies]; keep only
  [dependency-groups] (PEP 735) as single source of truth. Install via
  uv (e.g. uv sync --group dev, uv pip install --group test -e .).
- tox.ini: drop extras=; envs install only with uv pip install --group ...
- Makefile: docs and linux-docs are synonyms; one recipe that runs
  check-docs then opens with xdg-open or open, help lists both.

Co-authored-by: Cursor <cursoragent@cursor.com>
@acul71
Copy link
Contributor

acul71 commented Feb 23, 2026

@Nkovaturient @asmit27rai — short update on the recent pyproject.toml, tox, and docs (Makefile) changes:

1. uv as preferred installer (PEP 735 only)

  • pyproject.toml: Removed [project.optional-dependencies] and kept only [dependency-groups] (PEP 735) as the single source of truth. All optional deps (test, docs, webrtc, dev) are now in dependency-groups.
  • Install with uv. Example (create venv, activate, install dev group, pre-commit):
    cd py-libp2p
    uv venv venv
    source venv/bin/activate
    uv pip install --upgrade pip
    uv pip install --group dev -e .
    pre-commit install
    Alternatively: uv sync --group dev (uses default .venv), or uv pip install --group test --group docs --group webrtc -e . for only test/docs/webrtc.
  • tox.ini: Dropped the extras= directive; testenvs install only via uv pip install --group ... -e .. No setuptools extras involved.

2. Make docs / linux-docs

  • Makefile: docs and linux-docs are synonyms: one target that runs check-docs then opens the built HTML using xdg-open (Linux) or open (macOS), with a fallback message if neither is available. Both names appear in make help.

Rationale: avoid duplicating dependency lists (optional-dependencies vs dependency-groups) and standardise on uv + PEP 735; simplify docs targets so one implementation works everywhere.

…tAssertRewriteWarning

- connect.py: replace 'if raw_connection._handshake_failure_event:' with
  'if raw_connection._handshake_failure_event is not None:' to avoid
  deprecated trio.Event.__bool__ (Trio 0.31+).
- conftest.py: remove top-level import of relay_fixtures; lazy-import
  store_relay_addrs inside nat_peer_a and nat_peer_b so relay_fixtures
  is not loaded before pytest's assertion rewriter (fixes 24×
  PytestAssertRewriteWarning). Add comments explaining the lazy import.

Co-authored-by: Cursor <cursoragent@cursor.com>
@acul71
Copy link
Contributor

acul71 commented Feb 23, 2026

Latest commit (ef60bbc):

  • trio.Event deprecation (connect.py): Replaced if raw_connection._handshake_failure_event: with if raw_connection._handshake_failure_event is not None: so we no longer rely on the deprecated trio.Event.__bool__ (Trio 0.31+).
  • PytestAssertRewriteWarning (conftest.py): Removed the top-level import of relay_fixtures and lazy-import store_relay_addrs inside the nat_peer_a and nat_peer_b fixtures so relay_fixtures is not loaded before pytest’s assertion rewriter. Added comments in the fixtures explaining the lazy import.

Tests: make pr — 2209 passed, 16 skipped; no trio.Event or PytestAssertRewriteWarning warnings.

@seetadev
Copy link
Contributor

@Nkovaturient : Thank you for the continued efforts. Appreciate it. Also, thanks to @asmit27rai for his contribution too.

Luca and I had a brief discussion on your PR.

@acul71 has shared detailed feedback on your PR. Please review and make additions.

Also, adding @sumanjeet0012 and @deepso from Huddle01 to the thread. Wish if you could share peer review on this PR.

Ccing @pacrob.

@IronJam11
Copy link
Contributor

@Nkovaturient From what I understand of the WebRTC private-to-private implementation, it seems that protobuf framing is not currently being used. Instead, there appears to be an ad-hoc JSON protocol running over a single data channel. My understanding of the spec is that it expects one data channel per stream along with protobuf framing, so I’m wondering if this might cause interoperability issues with other libp2p implementations.

@asabya
Copy link

asabya commented Mar 13, 2026

@Nkovaturient From what I understand of the WebRTC private-to-private implementation, it seems that protobuf framing is not currently being used. Instead, there appears to be an ad-hoc JSON protocol running over a single data channel. My understanding of the spec is that it expects one data channel per stream along with protobuf framing, so I’m wondering if this might cause interoperability issues with other libp2p implementations.

@IronJam11 is accurate. This implementation will not interoperate with other libp2p implementations.

  • Stream lifecycle (FIN, RESET, STOP_SENDING flags) is not implemented per spec
  • lacks protobuf Message framing on data channels
  • It uses a single data channel with ad-hoc JSON muxing instead of one-data-channel-per-stream
  • Massive bloat — 29K+ lines, 100 commits, unrelated kademlia changes, debug print statements everywhere, emoji-laden logging. Cleaning this up would take longer than writing it correctly.

CC: @seetadev @acul71 @pacrob

@Nkovaturient
Copy link
Contributor Author

Thank you both @IronJam11 and @asabya for the detailed review — I genuinely appreciate your feedback.

You're right on the technical gaps: protobuf framing, per-stream data channels, and proper stream lifecycle signaling (FIN/RESET/STOP_SENDING) are missing from the current implementation. I won't dispute that. These are real spec deviations, and I'm committed to correcting them in subsequent iterations.

That said, I'd like to offer some context:-

  • This PR represents months of dedicated research, iterative experimentation, and persistence to bring WebRTC transport to py-libp2p.
  • Everything you see here, including the rough edges, is the product of someone building largely in uncharted territory for this codebase.

As the saying goes: everything starts as a mess before it becomes something refined. This PR was always meant to be a foundation — a working proof-of-concept that opens the door, not a final production implementation.

Regarding the "bloat" concern: I acknowledge the 29K+ lines, stray debug prints, and 'it was not' unrelated Kademlia commits(but failed CI/CD issue that was fixed thence)
I'll isolate the WebRTC transport into a clean, scoped branch, strip debug artifacts, remove unrelated changes, and restructure around the spec — one data channel per stream, protobuf framing, and correct stream lifecycle handling.

I'd welcome guidance on the preferred way to proceed — whether that's a rebase into smaller focused PRs, or a new clean branch with spec-compliant implementation — whatever best serves the project. I'm here to see this through correctly.

@asabya
Copy link

asabya commented Mar 13, 2026

Thanks for being open to feedback @Nkovaturient. The research and exploration here is valuable and that groundwork is useful context for us.

Being said that I'd recommend a new clean branch rather than a rebase. The core architecture needs to change at a fundamental level — it's not something that can be incrementally patched.

The best reference is go-libp2p/p2p/transport/webrtc/— it's a clean, well-tested implementation. A 1:1 port of its architecture would give us something that interoperates with Go and JS out of the box. We can improve things along the way if needed.

Happy to collaborate on this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.