WebRTC in 150 Lines of C++
The first time you try to build Google’s libwebrtc, you run gclient sync and watch 20GB of Chromium toolchain download to your machine. An hour or two later, depending on the hardware, you have a 50MB static library that bundles its own fork of OpenSSL called BoringSSL, binary-incompatible with the OpenSSL your project already links against. Two TLS libraries in the same process, and a linker that has opinions about that.
Most C++ developers who try this once arrive at the same conclusion: native WebRTC isn’t worth the pain. They shell out to a managed service or pick a different language.
None of that pain is intrinsic to the protocol.
What follows is a complete WebRTC media server in C++20 that captures a camera and streams it to a browser. The whole application is 150 lines. It compiles in minutes against system OpenSSL and drops into any CMake project with a single FetchContent block.
Why everyone uses libwebrtc
For over a decade, Google’s libwebrtc has been the only serious native WebRTC implementation. The protocol is complex enough that nobody wanted to build an alternative from scratch; ICE gathering, DTLS, SRTP, RTP packetisation with retransmission and bandwidth estimation, that transport layer alone is thousands of pages of RFCs. So everyone used the monolith.
The build is just the entrance fee. The API is where the time goes. PeerConnectionFactory requires a signalling thread, a networking thread, and a worker thread, all managed manually. Creating a single track involves MediaStreamTrackInterface, an AudioSourceInterface or VideoTrackSourceInterface, frame format adapters, and careful threading discipline because most objects are thread-affine. Callback interfaces change between releases. The reference documentation, in practice, is the Chromium source.
A minimal camera-to-browser streamer in libwebrtc runs about 800-1200 lines, not counting build configuration and thread management. That’s assuming you already know the API.
The way out turned out to be simpler than the framing suggested: don’t replace the whole thing. Replace the layers.
The Stack
| Layer | Library | Role |
|---|---|---|
| ICE, DTLS, SRTP | libdatachannel | Transport. Independent C++ implementation by Paul-Louis Ageneau, not a Chromium fork. |
| Capture, encode, decode | FFmpeg 5/6/7 | Media. Any codec FFmpeg supports, WebRTC can send. |
| Signalling | Symple | SDP and ICE candidate exchange over WebSocket. |
| Pipeline glue | Icey | PacketStream architecture that ties capture, encoding, and transport into a single composable chain. |
| TURN relay | Icey | Self-hosted RFC 5766 TURN server for symmetric NATs. |
libdatachannel handles everything below the application layer: ICE candidate gathering via libjuice, DTLS key exchange via stock OpenSSL, SRTP encryption via libsrtp, RTP packetisation with NACK retransmission and REMB bandwidth estimation. Icey wires capture, encoding, and the libdatachannel transport into a single pipeline.
The Pipeline
Icey’s core abstraction is PacketStream: a chain of sources, processors, and sinks that data flows through. A source emits packets, processors transform them, and a sink consumes them. You snap components together and call start().
For WebRTC, the pipeline is:
MediaCapture → VideoPacketEncoder → WebRtcTrackSender → browser
(camera) (FFmpeg H.264) (RTP packetise,
SRTP encrypt,
ICE send)
In code:
stream.attachSource(capture.get(), false, true);
stream.attach(encoder, 1, true);
stream.attach(&session->media().videoSender(), 5, false);
stream.start();
capture->start();
Five lines. Camera frames flow into the H.264 encoder, then into the WebRTC track sender, which hands encoded NAL units to libdatachannel for RTP packetisation and encrypted transport. The browser receives standard WebRTC media and renders it in a <video> element.
The equivalent in libwebrtc requires creating a VideoTrackSource, implementing a FrameAdaptedVideoTrackSource subclass for format conversion, registering it with a PeerConnectionFactory, adding the track to a transceiver, and getting the threading model right. Around 150-200 lines for the same step.
The Code
Here is the complete webcam streamer. Everything that follows is the actual application; nothing has been removed for brevity.
Includes and setup
#include "icy/application.h"
#include "icy/av/devicemanager.h"
#include "icy/av/mediacapture.h"
#include "icy/av/videopacketencoder.h"
#include "icy/logger.h"
#include "icy/packetstream.h"
#include "icy/symple/client.h"
#include "icy/webrtc/codecnegotiator.h"
#include "icy/webrtc/peersession.h"
#include "symplesignaller.h"
#include <iostream>
#include <memory>
#include <string>
using namespace icy;
#define USE_CAMERA 0
Ten includes. Standard C++20 headers plus the Icey modules we need: base (PacketStream, signals), av (capture, encoding, the VideoPacketEncoder pipeline processor), symple (signalling), webrtc (peer session). USE_CAMERA toggles between a real camera and a test file; the pipeline is identical either way.
For comparison, a libwebrtc project typically starts with 30-40 includes pulled from api/, media/, pc/, rtc_base/, and various internal headers that aren’t part of any stable public API.
The class
class WebcamStreamer
{
public:
smpl::Client client;
std::unique_ptr<wrtc::SympleSignaller> signaller;
std::unique_ptr<wrtc::PeerSession> session;
std::shared_ptr<av::MediaCapture> capture;
std::shared_ptr<av::VideoPacketEncoder> encoder;
PacketStream stream;
av::Device::VideoCapability videoCap;
WebcamStreamer(const smpl::Client::Options& opts)
: client(opts)
, stream("webcam-stream")
{
}
Seven members. A Symple client for signalling, a peer session that manages the WebRTC connection, a media capture source, a video encoder, a packet stream for the pipeline, and video capabilities. That is all the state a WebRTC media server needs; no factory hierarchies or plugin registries underneath it.
Opening the video source
void start()
{
capture = std::make_shared<av::MediaCapture>();
capture->openFile(ICY_DATA_DIR "/test.mp4");
capture->setLoopInput(true);
capture->setLimitFramerate(true);
videoCap = {640, 480, 30, 30, "yuv420p"};
client.Announce += slot(this, &WebcamStreamer::onAnnounce);
client.StateChange += slot(this, &WebcamStreamer::onStateChange);
client.CreatePresence += slot(this, &WebcamStreamer::onCreatePresence);
client.connect();
}
MediaCapture wraps FFmpeg’s demuxer and decoder. For a real camera, you’d use DeviceManager to negotiate resolution and framerate, then call openVideo() instead of openFile(). The rest is identical; the pipeline doesn’t care where the frames come from.
The signal-slot wiring (+=) is Icey’s event system. Type-safe, zero-allocation for small captures, compiles to a direct function call when there’s a single listener.
Creating the WebRTC session
void createSession()
{
// Configure H.264 with WebRTC-safe settings for browser playback
av::VideoCodec videoCodec = wrtc::CodecNegotiator::resolveWebRtcVideoCodec(
av::VideoCodec("H264", "libx264",
videoCap.width, videoCap.height, videoCap.maxFps));
wrtc::PeerSession::Config config;
config.rtcConfig.iceServers.emplace_back("stun:stun.l.google.com:19302");
config.mediaOpts.videoCodec = videoCodec;
config.enableDataChannel = false;
signaller = std::make_unique<wrtc::SympleSignaller>(client);
session = std::make_unique<wrtc::PeerSession>(*signaller, config);
session->IncomingCall += [this](const std::string& peerId) {
std::cout << "Incoming call from " << peerId << '\n';
session->accept();
};
session->StateChanged += [this](wrtc::PeerSession::State state) {
std::cout << "Call state: " << wrtc::stateToString(state) << '\n';
if (state == wrtc::PeerSession::State::Active)
startStreaming();
else if (state == wrtc::PeerSession::State::Ended)
stopStreaming();
};
session->media().BitrateEstimate += [](unsigned int bps) {
std::cout << "REMB: " << bps / 1000 << " kbps" << '\n';
};
session->media().KeyframeRequested += []() {
std::cout << "PLI: keyframe requested" << '\n';
};
}
This is where the complexity lives, and most of it is in libdatachannel rather than in this file.
PeerSession manages the full WebRTC lifecycle: SDP offer/answer exchange, ICE candidate trickle, DTLS handshake, SRTP setup. You give it a signalling backend and codec preferences; it handles the rest. IncomingCall fires when a browser peer requests a call. StateChanged tells us when the DTLS handshake completes and media can flow.
BitrateEstimate and KeyframeRequested are RTCP feedback signals. REMB tells you how much bandwidth the receiver has; PLI requests a keyframe when packets are lost. In production you’d wire these into the encoder for adaptive bitrate. Here we log them to show they’re working. The CodecNegotiator::resolveWebRtcVideoCodec(...) call above centralises the browser-safe H.264 defaults instead of scattering them through samples.
Underneath, libdatachannel is doing the actual transport work: libjuice gathers ICE candidates and punches through NATs, OpenSSL negotiates the DTLS handshake, libsrtp encrypts every RTP packet, and the H.264 packetiser breaks encoded frames into MTU-sized RTP payloads with sequence numbers, timestamps, and marker bits. The application code configures it and the library runs it.
In libwebrtc the same surface is a PeerConnectionObserver subclass with about fifteen virtual methods, most undocumented, some called from threads you didn’t create.
Starting the pipeline
void startStreaming()
{
if (!session || !session->media().hasVideo())
return;
// Create the H.264 encoder for the pipeline
encoder = std::make_shared<av::VideoPacketEncoder>();
capture->getEncoderVideoCodec(encoder->iparams);
encoder->oparams = wrtc::CodecNegotiator::resolveWebRtcVideoCodec(
av::VideoCodec("H264", "libx264",
videoCap.width, videoCap.height, videoCap.maxFps));
// Pipeline: capture → encoder → WebRTC sender
stream.attachSource(capture.get(), false, true);
stream.attach(encoder, 1, true);
stream.attach(&session->media().videoSender(), 5, false);
stream.start();
capture->start();
std::cout << "Streaming started" << '\n';
}
The core of the application. The VideoPacketEncoder takes decoded frames from the capture source and produces H.264 NAL units. The WebRTC track sender packetises them into RTP, encrypts with SRTP, and sends over the ICE transport to the browser. The encoder’s options map passes FFmpeg parameters directly; baseline profile and zerolatency tune ensure every browser can decode the stream without buffering delay.
Signalling callbacks and main
void onAnnounce(const int& status)
{
if (status != 200)
std::cerr << "Auth failed: " << status << '\n';
}
void onStateChange(void*, smpl::ClientState& state, const smpl::ClientState&)
{
std::cout << "Client: " << state.toString() << '\n';
if (state.id() == smpl::ClientState::Online) {
std::cout << "Online as " << client.ourID() << '\n';
client.joinRoom("public");
createSession();
}
}
void onCreatePresence(smpl::Peer& peer)
{
peer["agent"] = "Icey";
peer["type"] = "streamer";
}
void shutdown()
{
stopStreaming();
stream.close();
if (session)
session->hangup("shutdown");
session.reset();
encoder.reset();
capture.reset();
client.close();
}
};
Symple presence management. When the client comes online, it joins a room and creates a peer session. When the browser peer sees the streamer’s presence, it initiates a call. The streamer accepts and the pipeline starts. Shutdown is the reverse: stop the stream, hang up the call, close the client.
int main(int argc, char** argv)
{
Logger::instance().add(std::make_unique<ConsoleChannel>("debug", Level::Debug));
smpl::Client::Options opts;
opts.host = "127.0.0.1";
opts.port = 4500;
for (int i = 1; i + 1 < argc; i += 2) {
std::string key = argv[i];
std::string val = argv[i + 1];
if (key == "-host") opts.host = val;
else if (key == "-port") opts.port = static_cast<uint16_t>(std::stoi(val));
else if (key == "-token") opts.token = val;
else if (key == "-user") opts.user = val;
else if (key == "-name") opts.name = val;
}
if (opts.user.empty()) {
opts.user = "webcam-streamer";
opts.name = "Webcam Streamer";
}
WebcamStreamer app(opts);
app.start();
waitForShutdown([](void* opaque) {
reinterpret_cast<WebcamStreamer*>(opaque)->shutdown();
}, &app);
Logger::destroy();
return 0;
}
Argument parsing and lifecycle. waitForShutdown blocks on a signal handler (SIGINT/SIGTERM) so the libuv event loop runs until you kill the process. That’s the entire application.
Building It
git clone https://github.com/nilstate/icey.git
cd icey
cmake -B build -DCMAKE_BUILD_TYPE=Release -DBUILD_SAMPLES=ON
cmake --build build --parallel $(nproc)
CMake fetches libdatachannel (which brings libjuice, usrsctp, and libsrtp as submodules), discovers system FFmpeg and OpenSSL, and builds everything. The whole tree builds in minutes against the OpenSSL your project already links against, with no separate WebRTC build step.
For integration into your own project:
include(FetchContent)
FetchContent_Declare(icey
GIT_REPOSITORY https://github.com/nilstate/icey.git
GIT_TAG v2.4.10
)
FetchContent_MakeAvailable(icey)
target_link_libraries(myapp PRIVATE Icey::webrtc)
Going Deeper
The webcam streamer uses PeerSession (Layer 3), which is a convenience wrapper. You don’t have to use it.
Icey’s WebRTC module is three layers, each independently usable. If your signalling is plain WebSocket, MQTT, or REST, drop to Layer 1 and Layer 2:
// Layer 1: create tracks directly on a PeerConnection
auto pc = std::make_shared<rtc::PeerConnection>(config);
av::VideoCodec codec = wrtc::CodecNegotiator::resolveWebRtcVideoCodec(
av::VideoCodec("H264", "libx264", 1280, 720, 30));
auto video = wrtc::createVideoTrack(pc, codec);
// Layer 2: wire into a PacketStream with encoder
auto encoder = std::make_shared<av::VideoPacketEncoder>();
capture->getEncoderVideoCodec(encoder->iparams);
encoder->oparams = codec;
wrtc::WebRtcTrackSender sender(video);
PacketStream stream;
stream.attachSource(capture);
stream.attach(encoder, 1, true);
stream.attach(&sender, 5, false);
stream.start();
You handle SDP exchange however you want. The WebRTC transport doesn’t care how the offer and answer got there; it only cares that they did. The full layer architecture is documented in the WebRTC module README.
What’s Next
The webcam streamer is one of four samples that ship with Icey:
- file-streamer turns any video file into a live WebRTC stream. Feed an MP4 in, get real-time WebRTC out.
- media-recorder does the reverse: browser sends camera and microphone over WebRTC, the server decodes and writes to disk via FFmpeg.
- data-echo is the minimal starting point: WebRTC data channels without any media, for when you want the transport without the video.
The full source, build instructions, and the browser-side player are at github.com/nilstate/icey. 150 lines of application code, with the hard parts kept in the libraries that already solved them.