WebRTC in 150 Lines of C++

Kam Low|9 min read

The first time you try to build Google’s libwebrtc, you run gclient sync and watch 20GB of Chromium toolchain download to your machine. Then you wait. An hour, maybe two, depending on the hardware. The build system is GN and Ninja, not CMake, because Google doesn’t use CMake and doesn’t care that you do. When it finally finishes, you have a 50MB static library that bundles its own fork of OpenSSL called BoringSSL, which is binary-incompatible with the OpenSSL your project already links against. You now have two TLS libraries in the same process. The linker errors are spectacular.

This is the experience that thousands of C++ developers have had, and most of them arrive at the same conclusion: WebRTC in native C++ is not worth the pain. They give up, or they shell out to a managed service, or they use a different language entirely.

It doesn’t have to be this way.

What follows is a complete WebRTC media server in C++20 that captures a camera and streams it to a browser. The whole application is 150 lines. It compiles in minutes, links against system OpenSSL, and integrates into any CMake project with a single FetchContent block. No Google toolchain. No BoringSSL. No misery.

The Hostage Situation

For over a decade, Google’s libwebrtc has been the only serious native WebRTC implementation. Not because it’s good, but because the protocol is complex enough that nobody wanted to build an alternative from scratch. ICE candidate gathering, DTLS key exchange, SRTP encryption, RTP packetisation with retransmission and bandwidth estimation; the transport layer alone is thousands of pages of RFCs. So everyone used Google’s monolith.

Surviving the build is just the entrance fee. The real pain is the API. The PeerConnectionFactory requires a signalling thread, a networking thread, and a worker thread, all managed manually. Creating a single track involves MediaStreamTrackInterface, AudioSourceInterface or VideoTrackSourceInterface, frame format adapters, and careful threading discipline because most objects are thread-affine. The callback interfaces change between releases without notice. The documentation is the Chromium source code.

A minimal camera-to-browser streamer with libwebrtc runs to about 800-1200 lines, not counting build system configuration and thread management boilerplate. And that’s if you already know the API. If you don’t, add a month.

The escape route turned out to be simpler than anyone expected: stop trying to replace the whole thing. Replace the layers independently.

The Stack

LayerLibraryRole
ICE, DTLS, SRTPlibdatachannelTransport. Not a Google fork; an independent C++ implementation by Paul-Louis Ageneau.
Capture, encode, decodeFFmpeg 5/6/7Media. The industry standard. Any codec FFmpeg supports, WebRTC can send.
SignallingSympleSDP and ICE candidate exchange over WebSocket.
Pipeline glueIceyPacketStream architecture that ties capture, encoding, and transport into a single composable chain.
TURN relayIceySelf-hosted RFC 5766 TURN server. No third-party relay service.

libdatachannel handles everything below the application layer: ICE candidate gathering via libjuice, DTLS key exchange via OpenSSL (the real one), SRTP encryption via libsrtp, RTP packetisation with NACK retransmission and REMB bandwidth estimation. It is the pipe. Icey is the water and the faucet.

The Pipeline

Icey’s core abstraction is PacketStream: a chain of sources, processors, and sinks that data flows through. A source emits packets. A processor transforms them. A sink consumes them. You snap components together and call start().

For WebRTC, the pipeline is:

MediaCapture → VideoPacketEncoder → WebRtcTrackSender → browser
   (camera)     (FFmpeg H.264)       (RTP packetise,
                                      SRTP encrypt,
                                      ICE send)

In code:

stream.attachSource(capture.get(), false, true);
stream.attach(encoder, 1, true);
stream.attach(&session->media().videoSender(), 5, false);
stream.start();
capture->start();

Five lines. Camera frames flow into the H.264 encoder, then into the WebRTC track sender, which hands encoded NAL units to libdatachannel for RTP packetisation and encrypted transport. The browser receives standard WebRTC media and renders it in a <video> element. It has no idea the server is five lines of pipeline wiring.

With libwebrtc, this same step requires manually creating a VideoTrackSource, implementing a FrameAdaptedVideoTrackSource subclass to handle format conversion, registering it with a PeerConnectionFactory, adding the track to a transceiver, and hoping your threading model is correct. About 150-200 lines on its own. We just did it in five.

The Code

Here is the complete webcam streamer. Everything that follows is the actual application; nothing has been removed for brevity.

Includes and setup

#include "icy/application.h"
#include "icy/av/devicemanager.h"
#include "icy/av/mediacapture.h"
#include "icy/av/videopacketencoder.h"
#include "icy/logger.h"
#include "icy/packetstream.h"
#include "icy/symple/client.h"
#include "icy/webrtc/codecnegotiator.h"
#include "icy/webrtc/peersession.h"
#include "symplesignaller.h"

#include <iostream>
#include <memory>
#include <string>

using namespace icy;

#define USE_CAMERA 0

Ten includes. Standard C++20 headers plus the Icey modules we need: base (PacketStream, signals), av (capture, encoding, the VideoPacketEncoder pipeline processor), symple (signalling), webrtc (peer session). USE_CAMERA toggles between a real camera and a test file; the pipeline is identical either way.

For comparison, a libwebrtc project typically starts with 30-40 includes pulled from api/, media/, pc/, rtc_base/, and various internal headers that aren’t part of any stable public API.

The class

class WebcamStreamer
{
public:
    smpl::Client client;
    std::unique_ptr<wrtc::SympleSignaller> signaller;
    std::unique_ptr<wrtc::PeerSession> session;
    std::shared_ptr<av::MediaCapture> capture;
    std::shared_ptr<av::VideoPacketEncoder> encoder;
    PacketStream stream;
    av::Device::VideoCapability videoCap;

    WebcamStreamer(const smpl::Client::Options& opts)
        : client(opts)
        , stream("webcam-stream")
    {
    }

Seven members. A Symple client for signalling, a peer session that manages the WebRTC connection, a media capture source, a video encoder, a packet stream for the pipeline, and video capabilities. No framework. No factory hierarchies. No plugin registry. This is the entire state of a WebRTC media server.

Opening the video source

    void start()
    {
        capture = std::make_shared<av::MediaCapture>();
        capture->openFile(ICY_DATA_DIR "/test.mp4");
        capture->setLoopInput(true);
        capture->setLimitFramerate(true);
        videoCap = {640, 480, 30, 30, "yuv420p"};

        client.Announce += slot(this, &WebcamStreamer::onAnnounce);
        client.StateChange += slot(this, &WebcamStreamer::onStateChange);
        client.CreatePresence += slot(this, &WebcamStreamer::onCreatePresence);
        client.connect();
    }

MediaCapture wraps FFmpeg’s demuxer and decoder. For a real camera, you’d use DeviceManager to negotiate resolution and framerate, then call openVideo() instead of openFile(). The rest is identical; the pipeline doesn’t care where the frames come from.

The signal-slot wiring (+=) is Icey’s event system. Type-safe, zero-allocation for small captures, compiles to a direct function call when there’s a single listener.

Creating the WebRTC session

    void createSession()
    {
        // Configure H.264 with WebRTC-safe settings for browser playback
        av::VideoCodec videoCodec = wrtc::CodecNegotiator::resolveWebRtcVideoCodec(
            av::VideoCodec("H264", "libx264",
                videoCap.width, videoCap.height, videoCap.maxFps));

        wrtc::PeerSession::Config config;
        config.rtcConfig.iceServers.emplace_back("stun:stun.l.google.com:19302");
        config.mediaOpts.videoCodec = videoCodec;
        config.enableDataChannel = false;

        signaller = std::make_unique<wrtc::SympleSignaller>(client);
        session = std::make_unique<wrtc::PeerSession>(*signaller, config);

        session->IncomingCall += [this](const std::string& peerId) {
            std::cout << "Incoming call from " << peerId << '\n';
            session->accept();
        };

        session->StateChanged += [this](wrtc::PeerSession::State state) {
            std::cout << "Call state: " << wrtc::stateToString(state) << '\n';
            if (state == wrtc::PeerSession::State::Active)
                startStreaming();
            else if (state == wrtc::PeerSession::State::Ended)
                stopStreaming();
        };

        session->media().BitrateEstimate += [](unsigned int bps) {
            std::cout << "REMB: " << bps / 1000 << " kbps" << '\n';
        };

        session->media().KeyframeRequested += []() {
            std::cout << "PLI: keyframe requested" << '\n';
        };
    }

This is where the complexity lives, and there isn’t much of it.

PeerSession manages the full WebRTC lifecycle: SDP offer/answer exchange, ICE candidate trickle, DTLS handshake, SRTP setup. You give it a signalling backend and codec preferences; it handles the rest. IncomingCall fires when a browser peer requests a call. StateChanged tells us when the DTLS handshake completes and media can flow.

BitrateEstimate and KeyframeRequested are RTCP feedback signals. REMB tells you how much bandwidth the receiver has; PLI requests a keyframe when packets are lost. In production you’d wire these into the encoder for adaptive bitrate. Here we log them to show they’re working. The CodecNegotiator::resolveWebRtcVideoCodec(...) call above centralises the browser-safe H.264 defaults instead of scattering them through samples.

Underneath all of this, libdatachannel is doing the real work: libjuice gathers ICE candidates and punches through NATs, OpenSSL negotiates the DTLS handshake, libsrtp encrypts every RTP packet, and the H.264 packetiser breaks encoded frames into MTU-sized RTP payloads with sequence numbers, timestamps, and marker bits. None of this is our code. We don’t touch the transport layer. We configure it and let it run.

The equivalent in libwebrtc is a PeerConnectionObserver subclass with about fifteen virtual methods you’re expected to implement, most of them undocumented, some of them called from threads you didn’t create. Good luck.

Starting the pipeline

    void startStreaming()
    {
        if (!session || !session->media().hasVideo())
            return;

        // Create the H.264 encoder for the pipeline
        encoder = std::make_shared<av::VideoPacketEncoder>();
        capture->getEncoderVideoCodec(encoder->iparams);
        encoder->oparams = wrtc::CodecNegotiator::resolveWebRtcVideoCodec(
            av::VideoCodec("H264", "libx264",
                videoCap.width, videoCap.height, videoCap.maxFps));

        // Pipeline: capture → encoder → WebRTC sender
        stream.attachSource(capture.get(), false, true);
        stream.attach(encoder, 1, true);
        stream.attach(&session->media().videoSender(), 5, false);
        stream.start();
        capture->start();
        std::cout << "Streaming started" << '\n';
    }

The core of the entire application. The VideoPacketEncoder takes decoded frames from the capture source and produces H.264 NAL units. The WebRTC track sender packetises them into RTP, encrypts with SRTP, and sends over the ICE transport to the browser. The encoder’s options map passes FFmpeg parameters directly; baseline profile and zerolatency tune ensure every browser can decode the stream without buffering delay.

Signalling callbacks and main

    void onAnnounce(const int& status)
    {
        if (status != 200)
            std::cerr << "Auth failed: " << status << '\n';
    }

    void onStateChange(void*, smpl::ClientState& state, const smpl::ClientState&)
    {
        std::cout << "Client: " << state.toString() << '\n';
        if (state.id() == smpl::ClientState::Online) {
            std::cout << "Online as " << client.ourID() << '\n';
            client.joinRoom("public");
            createSession();
        }
    }

    void onCreatePresence(smpl::Peer& peer)
    {
        peer["agent"] = "Icey";
        peer["type"] = "streamer";
    }

    void shutdown()
    {
        stopStreaming();
        stream.close();
        if (session)
            session->hangup("shutdown");
        session.reset();
        encoder.reset();
        capture.reset();
        client.close();
    }
};

Symple presence management. When the client comes online, it joins a room and creates a peer session. When the browser peer sees the streamer’s presence, it initiates a call. The streamer accepts and the pipeline starts. Shutdown is the reverse: stop the stream, hang up the call, close the client.

int main(int argc, char** argv)
{
    Logger::instance().add(std::make_unique<ConsoleChannel>("debug", Level::Debug));

    smpl::Client::Options opts;
    opts.host = "127.0.0.1";
    opts.port = 4500;

    for (int i = 1; i + 1 < argc; i += 2) {
        std::string key = argv[i];
        std::string val = argv[i + 1];
        if (key == "-host") opts.host = val;
        else if (key == "-port") opts.port = static_cast<uint16_t>(std::stoi(val));
        else if (key == "-token") opts.token = val;
        else if (key == "-user") opts.user = val;
        else if (key == "-name") opts.name = val;
    }

    if (opts.user.empty()) {
        opts.user = "webcam-streamer";
        opts.name = "Webcam Streamer";
    }

    WebcamStreamer app(opts);
    app.start();

    waitForShutdown([](void* opaque) {
        reinterpret_cast<WebcamStreamer*>(opaque)->shutdown();
    }, &app);

    Logger::destroy();
    return 0;
}

Argument parsing and lifecycle. waitForShutdown blocks on a signal handler (SIGINT/SIGTERM) so the libuv event loop runs until you kill the process. That’s the entire application.

Building It

git clone https://github.com/nilstate/icey.git
cd icey
cmake -B build -DCMAKE_BUILD_TYPE=Release -DBUILD_SAMPLES=ON
cmake --build build --parallel $(nproc)

CMake fetches libdatachannel (which brings libjuice, usrsctp, and libsrtp as submodules), discovers system FFmpeg and OpenSSL, and builds everything. The whole tree builds in minutes against the OpenSSL your project already links against, with no separate WebRTC build step and no Google toolchain anywhere in sight.

For integration into your own project:

include(FetchContent)
FetchContent_Declare(icey
    GIT_REPOSITORY https://github.com/nilstate/icey.git
    GIT_TAG v2.4.10
)
FetchContent_MakeAvailable(icey)
target_link_libraries(myapp PRIVATE Icey::webrtc)

Going Deeper

The webcam streamer uses PeerSession (Layer 3), which is a convenience wrapper. You don’t have to use it.

Icey’s WebRTC module is three layers, each independently usable. If your signalling is plain WebSocket, MQTT, or REST, drop to Layer 1 and Layer 2:

// Layer 1: create tracks directly on a PeerConnection
auto pc = std::make_shared<rtc::PeerConnection>(config);
av::VideoCodec codec = wrtc::CodecNegotiator::resolveWebRtcVideoCodec(
    av::VideoCodec("H264", "libx264", 1280, 720, 30));
auto video = wrtc::createVideoTrack(pc, codec);

// Layer 2: wire into a PacketStream with encoder
auto encoder = std::make_shared<av::VideoPacketEncoder>();
capture->getEncoderVideoCodec(encoder->iparams);
encoder->oparams = codec;

wrtc::WebRtcTrackSender sender(video);
PacketStream stream;
stream.attachSource(capture);
stream.attach(encoder, 1, true);
stream.attach(&sender, 5, false);
stream.start();

You handle SDP exchange however you want. The WebRTC transport doesn’t care how the offer and answer got there; it only cares that they did. The full layer architecture is documented in the WebRTC module README.

What’s Next

The webcam streamer is one of four samples that ship with Icey:

  • file-streamer turns any video file into a live WebRTC stream. Feed an MP4 in, get real-time WebRTC out.
  • media-recorder does the reverse: browser sends camera and microphone over WebRTC, the server decodes and writes to disk via FFmpeg.
  • data-echo is the minimal starting point: WebRTC data channels without any media, for when you want the transport without the video.

The full source, build instructions, and the browser-side player are at github.com/nilstate/icey.

150 lines. Camera to browser. No Google.