This is a long-term, very much in-progress project, so this post currently serves primarily as a progress tracker and collection of notes on the project, along with some information what I've accomplished so far.
A recreation of a 5 minute presentation I gave at Recurse Center for the project.
So, earlier this year I was working on my first React Native application, which was a small app to plan events. It was my first mobile project in general, and at the time I got started with it on a Windows machine to avoid the pains that I assumed may accompany mobile development package management.
It didn't take long to realize that it's very very hard to develop for IOS on anything other than a mac, because tooling to "talk" to iPhones for development relies on xCode. That means that I couldn't use a physical iPhone, or even run an emulator (a "fake" phone running on my computer).
For Android development, Google is much nicer about it, and lets you run emulators and bridge a physical phone over on all platforms. They recommend using Android Studio, which is fine, but a pain in the neck. The process to create an emulator in android studio is pretty straight forward, but requires a lot of clicking and is quite tedious and magical. You can choose a "skin" and android image, sometimes you configure the virtual device a bit further with their gui, and then run it from the app.
I used it for the project, and it worked well enough for my needs. Metro, a bundler that creates and ships APKs based on your react native
project works well with Android studio's emulator, and it was fine at the time, and was only moderately annoying to work with.
Fast forward a few months, and I was returning to the project, but this time was running NixOS. It was my first half of a Recurse Center batch, and I was still figuring a lot of things out. One of the things that nix
claims to be very good at is reproducible builds, meaning that I would expect packages in the nixpkgs
package registry to "just work."
So, I tried adding android-studio
to my developer environment, and then ran it. It launched!
I browsed through the configuration options to go and set up an emulator again, and, upon launching the emulator, it would say emulator booting, but fail to actually start.
Debugging this, as a pretty new nix
user at the time, was painful. I ran android studio
from the command line, and enabled debug logs. It looked like there were a whole bunch of dependencies not for android studio, but for the emulators it was spawning. Part of the issue is that android studio usually pulls emulator images when you set up the emulators, not when you install android studio, so while the build of android studio that I got was working just fine, the emulators that it was pulling were broken, likely because of upstream changes.
Android Studio is very heavy though; it's an entire Jetbrains IDE. I find it really clunky to use, and when things don't work it's a pain in the neck to debug since it's a massive GUI desktop app that does so many things under the hood.
Plus, the licensing and codebase for it seems a bit sketchy and obstruce.
The code is under Apache-2.0, but: If one selects Help -> Licenses in Android Studio, the dialog shows the following: "Android Studio includes proprietary code subject to separate license, including JetBrains CLion(R) (www.jetbrains.com/clion) and IntelliJ(R) IDEA Community Edition (www.jetbrains.com/idea)." Also: For actual development the Android SDK is required and the Google binaries are also distributed as proprietary software (unlike the source-code itself).
I like open source tooling, and I don't like magic. So, I decided that it would be productive to delve deeper and figure out how to do what it's doing without the entire desktop app.
The general breakdown is that there's an android SDK toolkit that provides you with a ton of command line utilities for android development and debugging. This SDK is included in distributions of android studio, and they are what Android studio uses internally under the hood. Furthermore, nixpkgs
already has packaged them underandroid-tools
(source here). Because the android build system is so annoying to deal with, someone created a project to build these cli tools with cmake, which is what the official nixpkgs
package of them uses.
There's a lot of CLI tools for dealing with android, but it turns out that there's a few critical ones for getting an emulator up and running, and having a better conceptual understanding of the magic behind android studio.
First things first. You need to set ANDROID_SDK_ROOT
to the directory with the various binary cli tools we were just talking about (so like, if they're in your PATH, you can figure out where to set ANDROID_SDK_ROOT
to by doing which emulator
, where emulator is an example of one of the many android tools). Since I'm using nix
, can easily keep things simple and automated with a devshell. Okay, now we want to
Then we do something along the lines of
1avdmanager create avd -n my_avd -k "system-images;android-30;google_apis;x86" -d "pixel"
To create an "avd," which is basically a configuration file that specifies the details of some virtual android device that could be run as an emulator. AVD stands for "android virtual device."
We can check to make sure that our avd shows up by doing
1avdmanager list avd
And then, to run it, use
1emulator -avd my_avd
And then, if you're on nix
, you'll be welcomed with a friendly message informing you that there's some dynamically linked dependency that you don't have, that you'll have to wrap with LD_LIBRARY_PATH
. And then again.
All of these android CLI tools are very powerful, and have a gagillion options that you can fiddle with.
I spent almost a week trying to get all the packages and settings fine tuned for nix
, and ended up getting very close -- I had a bash script that would create an avd
, create a virtual sdcard
for "external" device storage, and then actually run the emulator. But for some reason I was getting a segfault, and it wouldn't launch, or give me any more useful error messages.
At that point I decided to do further research on android emulators on Nix, and found that Nix's standard library itself has already solved the problem.
There's "documentation" here that talks about android development on Nix. Here, they talk about some helper functions that you can use to create android SDKs with the exact binaries that you need, nicely purely packaged with nix.
With the help of that guide and reading some source, I was able to scrap together this flake, which gets you most of the lower level CLI tools you need for android development. It's not entirely trivial, and not every tool is necessary for every task, but it's quite good.
1{ 2 description = "React native environment"; 3 4 inputs = { 5 nixpkgs.url = "github:NixOS/nixpkgs"; 6 flake-utils.url = "github:numtide/flake-utils"; 7 }; 8 9 outputs = { self, nixpkgs, flake-utils, ... }@inputs: 10 flake-utils.lib.eachDefaultSystem (system: 11 let 12 pkgs = import nixpkgs { 13 inherit system; 14 config = { 15 allowUnfree = true; 16 android_sdk.accept_license = true; 17 }; 18 }; 19 20 pinnedJDK = pkgs.jdk17; 21 buildToolsVersion = "34.0.0"; 22 ndkVersion = "25.1.8937393"; 23 androidComposition = pkgs.androidenv.composeAndroidPackages { 24 cmdLineToolsVersion = "8.0"; 25 toolsVersion = "26.1.1"; 26 platformToolsVersion = "34.0.4"; 27 buildToolsVersions = [ buildToolsVersion "33.0.1" ]; 28 includeEmulator = false; 29 emulatorVersion = "30.3.4"; 30 platformVersions = [ "34" ]; 31 includeSources = false; 32 includeSystemImages = false; 33 systemImageTypes = [ "google_apis_playstore" ]; 34 abiVersions = [ "armeabi-v7a" "arm64-v8a" ]; 35 cmakeVersions = [ "3.10.2" "3.22.1" ]; 36 includeNDK = true; 37 ndkVersions = [ ndkVersion ]; 38 useGoogleAPIs = false; 39 useGoogleTVAddOns = false; 40 includeExtras = [ "extras;google;gcm" ]; 41 }; 42 in { 43 devShells.default = pkgs.mkShell rec { 44 packages = [ 45 pkgs.android-tools 46 pkgs.nodejs 47 pkgs.corepack 48 pkgs.zulu17 49 ]; 50 51 JAVA_HOME = pinnedJDK; 52 ANDROID_SDK_ROOT = "${androidComposition.androidsdk}/libexec/android-sdk"; 53 ANDROID_NDK_ROOT = "${ANDROID_SDK_ROOT}/ndk-bundle"; 54 GRADLE_OPTS = "-Dorg.gradle.project.android.aapt2FromMavenOverride=${ANDROID_SDK_ROOT}/build-tools/${buildToolsVersion}/aapt2"; 55 shellHook = '' 56 export PATH=$PATH:${androidComposition.androidSdk}/bin 57 adb start-server 58 adb devices 59 ''; 60 }; 61 }); 62}
After reading through all the source though, a super neat function caught my eye: emulate app. It wasn't documented anywhere, but it was super promising. Here's the entire source code, which is definitely worth reading over. It's very similar to what I was doing, but they were able to figure out how to get all the flags just right.
1{ composeAndroidPackages, stdenv, lib, runtimeShell }: 2{ name 3, app ? null 4, platformVersion ? "33" 5, abiVersion ? "armeabi-v7a" 6, systemImageType ? "default" 7, enableGPU ? false # Enable GPU acceleration. It's deprecated, instead use `configOptions` below. 8, configOptions ? ( 9 # List of options to add in config.ini 10 lib.optionalAttrs enableGPU 11 (lib.warn 12 "enableGPU argument is deprecated and will be removed; use configOptions instead" 13 { "hw.gpu.enabled" = "yes"; } 14 ) 15 ) 16, extraAVDFiles ? [ ] 17, package ? null 18, activity ? null 19, androidUserHome ? null 20, avdHomeDir ? null # Support old variable with non-standard naming! 21, androidAvdHome ? avdHomeDir 22, deviceName ? "device" 23, sdkExtraArgs ? { } 24, androidAvdFlags ? null 25, androidEmulatorFlags ? null 26}: 27 28let 29 sdkArgs = { 30 includeEmulator = true; 31 includeSystemImages = true; 32 } // sdkExtraArgs // { 33 cmdLineToolsVersion = "8.0"; 34 platformVersions = [ platformVersion ]; 35 systemImageTypes = [ systemImageType ]; 36 abiVersions = [ abiVersion ]; 37 }; 38 39 sdk = (composeAndroidPackages sdkArgs).androidsdk; 40in 41stdenv.mkDerivation { 42 inherit name; 43 44 buildCommand = '' 45 mkdir -p $out/bin 46 47 cat > $out/bin/run-test-emulator << "EOF" 48 #!${runtimeShell} -e 49 50 # We need a TMPDIR 51 if [ "$TMPDIR" = "" ] 52 then 53 export TMPDIR=/tmp 54 fi 55 56 ${if androidUserHome == null then '' 57 # Store the virtual devices somewhere else, instead of polluting a user's HOME directory 58 export ANDROID_USER_HOME=$(mktemp -d $TMPDIR/nix-android-user-home-XXXX) 59 '' else '' 60 mkdir -p "${androidUserHome}" 61 export ANDROID_USER_HOME="${androidUserHome}" 62 ''} 63 64 ${if androidAvdHome == null then '' 65 export ANDROID_AVD_HOME=$ANDROID_USER_HOME/avd 66 '' else '' 67 mkdir -p "${androidAvdHome}" 68 export ANDROID_AVD_HOME="${androidAvdHome}" 69 ''} 70 71 # We need to specify the location of the Android SDK root folder 72 export ANDROID_SDK_ROOT=${sdk}/libexec/android-sdk 73 74 ${lib.optionalString (androidAvdFlags != null) '' 75 # If NIX_ANDROID_AVD_FLAGS is empty 76 if [[ -z "$NIX_ANDROID_AVD_FLAGS" ]]; then 77 NIX_ANDROID_AVD_FLAGS="${androidAvdFlags}" 78 fi 79 ''} 80 81 ${lib.optionalString (androidEmulatorFlags != null) '' 82 # If NIX_ANDROID_EMULATOR_FLAGS is empty 83 if [[ -z "$NIX_ANDROID_EMULATOR_FLAGS" ]]; then 84 NIX_ANDROID_EMULATOR_FLAGS="${androidEmulatorFlags}" 85 fi 86 ''} 87 88 # We have to look for a free TCP port 89 90 echo "Looking for a free TCP port in range 5554-5584" >&2 91 92 for i in $(seq 5554 2 5584) 93 do 94 if [ -z "$(${sdk}/bin/adb devices | grep emulator-$i)" ] 95 then 96 port=$i 97 break 98 fi 99 done 100 101 if [ -z "$port" ] 102 then 103 echo "Unfortunately, the emulator port space is exhausted!" >&2 104 exit 1 105 else 106 echo "We have a free TCP port: $port" >&2 107 fi 108 109 export ANDROID_SERIAL="emulator-$port" 110 111 # Create a virtual android device for testing if it does not exist 112 if [ "$(${sdk}/bin/avdmanager list avd | grep 'Name: ${deviceName}')" = "" ] 113 then 114 # Create a virtual android device 115 yes "" | ${sdk}/bin/avdmanager create avd --force -n ${deviceName} -k "system-images;android-${platformVersion};${systemImageType};${abiVersion}" -p $ANDROID_AVD_HOME/${deviceName}.avd $NIX_ANDROID_AVD_FLAGS 116 117 ${builtins.concatStringsSep "\n" ( 118 lib.mapAttrsToList (configKey: configValue: '' 119 echo "${configKey} = ${configValue}" >> $ANDROID_AVD_HOME/${deviceName}.avd/config.ini 120 '') configOptions 121 )} 122 123 ${lib.concatMapStrings (extraAVDFile: '' 124 ln -sf ${extraAVDFile} $ANDROID_AVD_HOME/${deviceName}.avd 125 '') extraAVDFiles} 126 fi 127 128 # Launch the emulator 129 echo "\nLaunch the emulator" 130 $ANDROID_SDK_ROOT/emulator/emulator -avd ${deviceName} -no-boot-anim -port $port $NIX_ANDROID_EMULATOR_FLAGS & 131 132 # Wait until the device has completely booted 133 echo "Waiting until the emulator has booted the ${deviceName} and the package manager is ready..." >&2 134 135 ${sdk}/libexec/android-sdk/platform-tools/adb -s emulator-$port wait-for-device 136 137 echo "Device state has been reached" >&2 138 139 while [ -z "$(${sdk}/libexec/android-sdk/platform-tools/adb -s emulator-$port shell getprop dev.bootcomplete | grep 1)" ] 140 do 141 sleep 5 142 done 143 144 echo "dev.bootcomplete property is 1" >&2 145 146 #while [ -z "$(${sdk}/libexec/android-sdk/platform-tools/adb -s emulator-$port shell getprop sys.boot_completed | grep 1)" ] 147 #do 148 #sleep 5 149 #done 150 151 #echo "sys.boot_completed property is 1" >&2 152 153 echo "ready" >&2 154 155 ${lib.optionalString (app != null) '' 156 # Install the App through the debugger, if it has not been installed yet 157 158 if [ -z "${package}" ] || [ "$(${sdk}/libexec/android-sdk/platform-tools/adb -s emulator-$port shell pm list packages | grep package:${package})" = "" ] 159 then 160 if [ -d "${app}" ] 161 then 162 appPath="$(echo ${app}/*.apk)" 163 else 164 appPath="${app}" 165 fi 166 167 ${sdk}/libexec/android-sdk/platform-tools/adb -s emulator-$port install "$appPath" 168 fi 169 170 # Start the application 171 ${lib.optionalString (package != null && activity != null) '' 172 ${sdk}/libexec/android-sdk/platform-tools/adb -s emulator-$port shell am start -a android.intent.action.MAIN -n ${package}/${activity} 173 ''} 174 ''} 175 EOF 176 chmod +x $out/bin/run-test-emulator 177 ''; 178}
Basically, they generate a bash script that "hard codes" paths to the various packages (things like ${pkgs.hello}
here evaluate to /nix/store/jfe...hash...jfei
), which are locations of specific, version locked dependencies.
And it works! Using this function turns out to be extremely trivial. It's literally 4 lines of nix, since you don't need to build an android composition to use the function, even though you can.
So, here it is. Just a few lines of nix, and then nix run
, and you have yourself a full on, working, ready to go android emulator, running on your computer.
1emulator = pkgs.androidenv.emulateApp { 2 name = "AndroidEmulator"; 3 platformVersion = "30"; 4 abiVersion = "x86_64"; # armeabi-v7a, mips, x86_64 5 systemImageType = "google_apis_playstore"; 6};
It's so crazy cool, it just works!
After all of the suffering to get a good working android emulator, and I had an idea: what if I could solve this problem for everyone, on every OS, everywhere, once and for all? What if I could offer android phones as a service, just using nix builds?
One of the massive pain points for Android development is actually building Android itself. Google provides scattered documentation that helps with this process, but it's very disparate, and just about everyone uses prebuilt images as a result.
In the process of tinkering, I came across a project that packaged all google android images with nix
. It seemed pretty neat and promising here, since it would solve the issue of having images change over time in ways that break my setup because of differing dependencies.
I spent a long time trying to get an emulator working using the images it provided, and in the process I learned a ton about android cli tooling.
Tad Fisher's project that I just mentioned also packages android sdk tools too, and his are much finer grain.
1 android-sdk.packages = sdk: with sdk; [ 2 build-tools-34-0-0 3 cmdline-tools-latest 4 emulator 5 platforms-android-34 6 sources-android-34 7];
One of the most important android cli
tools is adb
, or android debug bridge
. adb
is a CLI tool (and has a socket interface) that is used to communicate with android devices.
It automatically can pick up on all android devices plugged into your computer over USB, running on the local network with wireless debugging enabled, and even emulators (which will be important later).
It's very powerful. It can do just about everything. To use adb
for android development, usually you begin by running
1adb start-server
Which starts a server running on a socket on your computer that lets you use the adb
cli client, or other community clients, to manipulate android devices.
What isn't as commonly known is that there's also a tcp/ip
mode for adb
with adb tcpip
. Once adb
is running, you can then list out android devices with
1adb devices
And start issuing commands. You can do things like
adb shell
to enter a shell on the android phone itself.adb push
to push a file onto the device, and adb pull
to yoink files off itadb (un)(in)stall
to un/install apks (android apps) to the deviceadb input tap x y
to tap the screen at a specific spotand a ton of other manipulation commands to do things like entering text input, capturing screenshots, recording audio, and more.
It's super powerful, and will play an important role in this project, as I'll discuss a bit later.
What I've learned while researching these tools is that they are really horribly documented. Google has docs that talk about the general processes, and, if you can figure out where to find things, you can get the source code for the CLI programs too. But, it seems like their main intent is to try to get people to use Android Studio as much as possible.
Appetize.io is a company that offers roughly just that. I found out about them later on when reviewing the react native
documentation -- all the little phone widgets that say "Expo" on them are powered by appetize.io
devices. They let you spin up android or IOS devices, and then embed them in webpages with decently low latency. They also expose an API, so that you can do CI/CD testing with proper phones.
But appetize comes at a cost. It's very expensive, and very restrictive. It isn't open source, and definitely isn't nixified.
It's worth mentioning them here before going further though, since what they do is genuinely really cool. Their main thing is automated testing, although they may also be able to do android development with the regular SDK tools too.
So, I started brainstorming. My idea was to stream nix
packaged android emulators to web browsers.
One other super cool thing that nix
rocks at is containerizing things. Nix's standard library provides utilities to create docker base images that have all of the dependencies and transient dependencies of a nix derivation (build).
This means that you can implicitly refer to libraries in the function nix provides that creates docker images, and if they get mentioned then they get "baked" into the container. Part of what's so nice about dockering android emulators is that the processes live and die with the containers, since the qemu processes that the emulators create are kind of a pain to track and to kill.
I am still working on getting them working in a docker container, but what I currently have, which is pretty close to what I expect to work, looks like this
1{ 2 pkgs, 3 android-tools, 4 system, 5 ... 6}: let 7 run-android-emulator = import ./emulator.nix {inherit pkgs;}; 8 android-sdk = import ./android-sdk.nix {inherit pkgs system android-tools;}; 9in 10 pkgs.dockerTools.buildImage { 11 name = "android-emulator"; 12 tag = "latest"; 13 14 copyToRoot = pkgs.buildEnv { 15 name = "root"; 16 pathsToLink = ["/bin"]; 17 paths = [ 18 pkgs.jdk 19 pkgs.coreutils 20 pkgs.bash 21 pkgs.busybox 22 pkgs.bun 23 android-sdk 24 ]; 25 }; 26 27 config = { 28 Env = [ 29 "JAVA_HOME=${pkgs.jdk}" 30 ]; 31 # has the script that uses the emulate-app function 32 Cmd = ["./${run-android-emulator}/bin/android-emulator"]; 33 }; 34 35 extraCommands = '' 36 mkdir -p tmp 37 ''; 38 }
Which does basically exactly what I just explained -- although we do need some dependencies that I didn't reference directly off of a nix object, like coreutils, and cli things like grep.
My goal was to have the easiest, most streamlined devex possible for the project. So, ideally the final product for the end user could just be a react component, or even an iframe
eventually. Right now, the interface is this.
1export default function App() { 2 return <Android ip={["ws://127.0.0.1:8188"]} apiSecret="secret" /> 3}
The first step is to stream android at as low latency as possible out of the phone. As close to literal 0 as possible (so in reality, ideally <300ms). Doing this, however, is actually pretty nontrivial.
The first step is figuring out how we can stream the screen out of the android device. Obviously this is possible, because when you run android emulators (without the -no-window
flag) you get a window that shows the screen of the android device and is interactive. Android studio also natively provides this feature.
adb
, which we talked about earlier, does do this out of the box with a built in android utility called screenrecord. More information about that can be found here.
To use it, it looks something like this...
1sudo apt-get install adb ffmpeg 2adb exec-out screenrecord --output-format=h264 - | 3 ffplay -framerate 60 -probesize 32 -sync video -
It does work, but unfortunately the latency is pretty high. My tests got about 6 seconds of latency, although this stackoverflow post's video seems to have gotten about a second or less with some ffmpeg
tuning. I may return to this later, but for now I am using what I believe to be a better alternative, scrcpy
.
Scrcpy ("screen copy") is a 3rd party utility that lets you stream an android device to a window at extremely low latency (30ms or less!). It "just works," and is available on nixpkgs
(nix package manager). Using it is literally as simple as plugging in an android device, or running an emulator (headless or headed, it doesn't care), and then running
1scrcpy
And a window will open up on your computer with scrcpy
, which has a live stream of the screen of the android phone. The window it opens up is interactive and is very responsive, and audio works out of the box too.
The client (the window it opens up) has support for more advanced things too.
So, they seem to have solved the issue of adb
having bad latency, but how?
To figure out how scrcpy works behind the scenes, they lay out a general schematic in their docs for contributors. I'm not going to reiterate everything they say there, but there are a few important takeaways if I want to use scrcpy's system of streaming at low latency.
The general idea of how scrcpy
works is that you run it, and it ships a apk
(android app) to the phone in its /tmp
directory (which android phones have, they are also unix
!). This apk
exposes a server on the android phone on a scrcpy
port on the phone, which then the scrcpy
client can access and send data through.
This scrcpy
server that runs on the phone is implemented in java
(which most android apps are), and acts as just a server running in the background.
It turns out that scrcpy
streams out data over three "channels," where the first connection gets a connection to stream video, followed by audio, and finally data (interactions like gestures, which scrcpy also handles itself with a super low-latency custom binary interface).
Their client itself is very complex, and is implemented in raw C and uses some very advanced frameworks to optimize for very high performance. That's less relevant here.
Knowing that scrcpy is just using a server running on the phone was enticing though.
At this point, I really just wanted to pipe the video into ffmpeg
so that I could do things with it, since I still didn't know how I would stream it, but I knew that pretty much no matter what ffmpeg
should be able to do the necessary forwarding.
I did a bit of googling, and it looks like it is possible to do. I found a github issue, with a link to a VLC (the video viewing program) PR that fixes a latency issue having to do with how VLC throttles video stream outputs.
I was able to follow their steps, and the main pain point was getting adb
to forward a tcp port.
To use scrcpy
to stream video output, you need to put and then start scrcpy
on the phone, and then remap the port on the phone that scrcpy is using to a different port on your computer.
They provide an example script that shows how to do this, which is this
1adb push scrcpy-server-v2.1 /data/local/tmp/scrcpy-server-manual.jar 2adb forward tcp:1234 localabstract:scrcpy 3adb shell CLASSPATH=/data/local/tmp/scrcpy-server-manual.jar \ 4 app_process / com.genymobile.scrcpy.Server 2.1 \ 5 tunnel_forward=true audio=false control=false cleanup=false \ 6 raw_stream=true max_size=1920
It's not that bad. Another important thing here is that you need to set raw_stream
to false
, since usually scrcpy
sends some metadata at the start of streams, which could stop ffmpeg
from correctly interpreting the stream.
I found this great medium post that talks about how adb forward
works, since it is horribly documented. They mention SERVICES.txt, which has even more helpful docs.
I didn't end up needing to fiddle around much with the default command that they say to use to forward the port, but I like having an understanding of how it works.
So, at this point I have a working script that can copy scrcpy
over to an android phone, run it on the phone, and then stream out the screen.
Now, what's actually coming out of the port? Raw h264 video. Great! Viewing the screen using the vlc
command they say to try does work, but it is very very laggy just like they say.
Okay, I now (kinda) have a working stream coming out of the android device. This is pretty sick. Now comes the hard part... how do I take this output stream and get it to a web browser, with as low latency as possible.
I did some research here, and there's a ton of different streaming protocols that browsers support. A google search reveals many, like
HLS is really nice, it's just relaying video live over http, and you can access the media directly through a <video>
element. Also, ffmpeg
supports streaming out <video>
out of the box, which is nice. Unfortunately though, it adds a lot of latency. To achieve the latency I want in this case, I really don't have any option other than WebRTC.
WebRTC is a peer to peer browser streaming protocol. It's pretty simple to use since browsers have a good unified API for it, where you essentially have a signaling websocket server that tells clients when and whom to connect to, and then they can stream things to each other from the browser itself, like video camera output.
That's great, but in my case I am doing a very centralized broadcast. I have media streaming off of a server, and want to ship that media to browsers directly. This is something that WebRTC isn't really designed for.
There's a few implementations of the WebRTC protocol, like Google's official c++ one, or a community rewrite in rust (of course). To use them, there are projects like node webrtc, which expose a javascript API to set up webrtc connections. The issue here is that I would have to implement a lot of the handshake process myself, even if I didn't need to actual manage the packets and connection stuff myself, which is a lot of work that I rather not get caught up in.
It turns out that I'm not the first person who has wanted extremely low latency server based WebRTC, and there's tools that do roughly exactly what I was trying to do. There's a super cool project called Janus Gateway that is designed to be a centralized WebRTC server.
It can do a ton of different things, and is a very large C project. The build process is a huge pain in the neck, but thankfully it was already packaged with nix
, so I didn't need to deal with it.
The important thing to know about Janus Gateway is that it has a whole bunch of plugins to do common WebRtc media server tasks.
There are a whole bunch documented on their website here. Some fun ones:
There's a really neat project called Jangouts that lets you do google hangouts style conference calling through an open source server running Janus Gateway, which is a neat showcase of the capabilities of the plugins.
Finally, there's an API to create plugins for Janus Gateway in C
, lua
, and even javascript
, if none of the preexisting plugins work. I think this is really neat and may return to trying out their API, or once I learn C
write a native plugin, but for now I found what I needed: the streaming
plugin.
The streaming
Janus Gateway plugin lets you read prerecorded media or access prerecorded media and broadcast it over WebRTC, but, more importantly, it also lets you pass in media over RTC. Here's how they word it
the plugin is configured to listen on a few ports for RTP: this means that the plugin is implemented to receive RTP on those ports and relay them to all peers attached to that stream. Any tool that can generate audio/video RTP streams and specify a destination is good for the purpose: the examples section contains samples that make use of GStreamer (http://gstreamer.freedesktop.org/) but other tools like FFmpeg (http://www.ffmpeg.org/), LibAV (http://libav.org/) or others are fine as well. This makes it really easy to capture and encode whatever you want using your favourite tool, and then have it transparently broadcasted via WebRTC using Janus. Notice that we recently added the possibility to also add a datachannel track to an RTP streaming mountpoint: this allows you to send, via UDP, a text-based message to relay via datachannels (e.g., the title of the current song, if this is a radio streaming channel). When using this feature, though, beware that you'll have to stay within the boundaries of the MTU, as each message will have to stay within the size of an UDP packet.
Literally just what I want! How convenient. But now, we have to configure it.
To say it briefly, Janus is a pain in the neck to configure. It's a monster of a project and there are a billion different options that all need to be configured correctly.
We talked about this a bit earlier, but Janus is a all in one WebRTC, so that means that it both is a "peer" that is sharing media, but also is the signaling server itself (the thing that tells clients to start getting content and from who).
It does the signaling through an API that it exposes, which can either be in the form of Websockets or HTTP (or both!). It also exposes a "admin api" that you can use to query metadata about janus
, like
First things first, Janus has its own DSL for configuring it. It's not that bad, it's mostly key pair stuff.
Janus provides a nice set of example configurations that have a ton of comments on their github. It's very helpful, and I would have been totally lost (well, more totally lost than the amount of totally lost that I was) had they not had example configurations.
There's a few different things we have to configure:
Janus
hosts and general configurationThere were a few manual things I had to set up for the configuration. First, I explicitly enabled the streaming plugin. I'm not entirely sure if this is necessary, but I believe it would at least give me an error if it couldn't find it, which is good.
1plugins: { 2 enable = "janus.plugin.streaming" 3}
Second, I set up an admin secret, but janus
does have support for token auth and eventually I'll figure out how to use that. For auth, it's not well documented, but to pass the token when accessing you literally just add a apisecret: "secret"
header to the request.
1admin_secret = "secret" # String that all Janus requests must contain
There's also a configuration section called media
, which seems to be where you put global media configuration settings.
I'll be honest in saying I don't totally understand every setting here, but I did a lot of trial and error to figure things out. It seems pretty important to do nack_optimizations
for this project, which significantly helped with stuttering.
1media: { 2 #ipv6 = true 3 #ipv6_linklocal = true 4 min_nack_queue = 1200 5 rtp_port_range = "20000-40000" 6 dtls_mtu = 1500 7 no_media_timer = 2 8 slowlink_threshold = 4 9 twcc_period = 100 10 dtls_timeout = 500 11 12 # Janus can do some optimizations on the NACK queue, specifically when 13 # keyframes are involved. Namely, you can configure Janus so that any 14 # time a keyframe is sent to a user, the NACK buffer for that connection 15 # is emptied. This allows Janus to ignore NACK requests for packets 16 # sent shortly before the keyframe was sent, since it can be assumed 17 # that the keyframe will restore a complete working image for the user 18 # anyway (which is the main reason why video retransmissions are typically 19 # required). While this optimization is known to work fine in most cases, 20 # it can backfire in some edge cases, and so is disabled by default. 21 nack_optimizations = true 22 23 # If you need DSCP packet marking and prioritization, you can configure 24 # the 'dscp' property to a specific values, and Janus will try to 25 # set it on all outgoing packets using libnice. Normally, the specs 26 # suggest to use different values depending on whether audio, video 27 # or data are used, but since all PeerConnections in Janus are bundled, 28 # we can only use one. You can refer to this document for more info: 29 # https://tools.ietf.org/html/draft-ietf-tsvwg-rtcweb-qos-18#page-6 30 # That said, DON'T TOUCH THIS IF YOU DON'T KNOW WHAT IT MEANS! 31 dscp = 46 32}
Getting the streaming plugin working at first seemed really annoying since the plugins are all C
files that have to be built and then placed in a specific folder on your system, which is not that portable. However, analyzing the janus
cli (the thing you use to launch janus gateway
), I figured out that you can set flags to specify where to look things up. Also, the nix
build of janus
comes with all the plugins!
1$JANUS \ 2 -P "$JANUS_INSTALL" \ 3 -F "$JANUS_CONFIG_DIR" \ 4 -C "$JANUS_CONFIG"
Now these official names are really confusing. TLDR: $JANUS_INSTALL
is the directory where the janus
binary and plugin binaries live, $JANUS_CONFIG_DIR
is the location where plugin configuration files go, and $JANUS_CONFIG
is where the general configuration file goes.
Now, where is Janus installed? Well, I've installed it with nix, so the location it's installed will look something like /nix/store/jfieao...hash....fjeioa/bin/janus
, which could change and is bad practice to directly reference. Eventually, I'll generate a shell script using pkgs.writeShellScriptBin
, where I can then reference Janus's root path in the generator for the shell script as pkgs.janus
. To get started though, I did just do it in a janky way with which
. I'll clean it up eventually.
The directory where janus
lives looks like
/nix/store/46c284cqdgia0dxzmi8rs5vzwszxalwg-janus-gateway-1.2.3
├── bin
│ ├── janus
│ ├── janus-cfgconv
│ ├── janus-pp-rec
│ └── mjr2pcap
└── lib
└── janus
├── events
│ ├── libjanus_gelfevh.la
│ ├── libjanus_gelfevh.so -> libjanus_gelfevh.so.2.0.3
│ ├── libjanus_gelfevh.so.2 -> libjanus_gelfevh.so.2.0.3
│ ├── libjanus_gelfevh.so.2.0.3
│ ├── libjanus_sampleevh.la
│ ├── libjanus_sampleevh.so -> libjanus_sampleevh.so.2.0.3
│ ├── libjanus_sampleevh.so.2 -> libjanus_sampleevh.so.2.0.3
│ ├── libjanus_sampleevh.so.2.0.3
│ ├── libjanus_wsevh.la
│ ├── libjanus_wsevh.so -> libjanus_wsevh.so.2.0.3
│ ├── libjanus_wsevh.so.2 -> libjanus_wsevh.so.2.0.3
│ └── libjanus_wsevh.so.2.0.3
├── loggers
│ ├── libjanus_jsonlog.la
│ ├── libjanus_jsonlog.so -> libjanus_jsonlog.so.2.0.3
│ ├── libjanus_jsonlog.so.2 -> libjanus_jsonlog.so.2.0.3
│ └── libjanus_jsonlog.so.2.0.3
├── plugins
│ ├── libjanus_audiobridge.la
│ ├── libjanus_audiobridge.so -> libjanus_audiobridge.so.2.0.3
│ ├── libjanus_audiobridge.so.2 -> libjanus_audiobridge.so.2.0.3
│ ├── libjanus_audiobridge.so.2.0.3
│ ├── libjanus_echotest.la
│ ├── libjanus_echotest.so -> libjanus_echotest.so.2.0.3
│ ├── libjanus_echotest.so.2 -> libjanus_echotest.so.2.0.3
│ ├── libjanus_echotest.so.2.0.3
│ ├── libjanus_nosip.la
│ ├── libjanus_nosip.so -> libjanus_nosip.so.2.0.3
│ ├── libjanus_nosip.so.2 -> libjanus_nosip.so.2.0.3
│ ├── libjanus_nosip.so.2.0.3
│ ├── libjanus_recordplay.la
│ ├── libjanus_recordplay.so -> libjanus_recordplay.so.2.0.3
│ ├── libjanus_recordplay.so.2 -> libjanus_recordplay.so.2.0.3
│ ├── libjanus_recordplay.so.2.0.3
│ ├── libjanus_sip.la
│ ├── libjanus_sip.so -> libjanus_sip.so.2.0.3
│ ├── libjanus_sip.so.2 -> libjanus_sip.so.2.0.3
│ ├── libjanus_sip.so.2.0.3
│ ├── libjanus_streaming.la
│ ├── libjanus_streaming.so -> libjanus_streaming.so.2.0.3
│ ├── libjanus_streaming.so.2 -> libjanus_streaming.so.2.0.3
│ ├── libjanus_streaming.so.2.0.3
│ ├── libjanus_textroom.la
│ ├── libjanus_textroom.so -> libjanus_textroom.so.2.0.3
│ ├── libjanus_textroom.so.2 -> libjanus_textroom.so.2.0.3
│ ├── libjanus_textroom.so.2.0.3
│ ├── libjanus_videocall.la
│ ├── libjanus_videocall.so -> libjanus_videocall.so.2.0.3
│ ├── libjanus_videocall.so.2 -> libjanus_videocall.so.2.0.3
│ ├── libjanus_videocall.so.2.0.3
│ ├── libjanus_videoroom.la
│ ├── libjanus_videoroom.so -> libjanus_videoroom.so.2.0.3
│ ├── libjanus_videoroom.so.2 -> libjanus_videoroom.so.2.0.3
│ └── libjanus_videoroom.so.2.0.3
└── transports
├── libjanus_http.la
├── libjanus_http.so -> libjanus_http.so.2.0.3
├── libjanus_http.so.2 -> libjanus_http.so.2.0.3
├── libjanus_http.so.2.0.3
├── libjanus_pfunix.la
├── libjanus_pfunix.so -> libjanus_pfunix.so.2.0.3
├── libjanus_pfunix.so.2 -> libjanus_pfunix.so.2.0.3
├── libjanus_pfunix.so.2.0.3
├── libjanus_websockets.la
├── libjanus_websockets.so -> libjanus_websockets.so.2.0.3
├── libjanus_websockets.so.2 -> libjanus_websockets.so.2.0.3
└── libjanus_websockets.so.2.0.3
So to reference the plugins in the correct places I can write a bash script that uses janus
like this
1echo "Starting janus in $PWD" 2CONFIGS=./src/janus/configs 3JANUS_INSTALL=$(dirname "$(dirname "$(which janus)")") 4echo "$JANUS_INSTALL" 5janus -P "$JANUS_INSTALL/lib/janus/plugins" -F "$CONFIGS" -C "./src/janus/janus.jcfg"
dirname
just gets the directory something is located in, so I'm effectively hopping two directories up.
Once I was able to load the streaming
plugin, I then had to configure it. They have this handy example of various streaming plugin setups for different use cases.
The first example that called out to me is this one:
# This is an example of an RTP source stream, which is what you'll need
# in the vast majority of cases: here, the Streaming plugin will bind to
# some ports, and expect media to be sent by an external source (e.g.,
# FFmpeg or Gstreamer). This sample listens on 5002 for audio (Opus) and
# 5004 for video (VP8), which is what the sample gstreamer script in the
# plugins/streams folder sends to. Whatever is sent to those ports will
# be the source of a WebRTC broadcast users can subscribe to.
#
rtp-sample: {
type = "rtp"
id = 1
description = "Opus/VP8 live stream coming from external source"
metadata = "You can use this metadata section to put any info you want!"
audio = true
video = true
audioport = 5002
audiopt = 111
audiocodec = "opus"
videoport = 5004
videopt = 100
videocodec = "vp8"
secret = "adminpwd"
}
Since it claims to do exactly what I want: stream media over RTP.
Okay, so now I just place all the janus
configurations into the right spots, run janus
using my hacky bash
script, and pray it works.
It didn't at first, or for the first day of trying to get it to, but eventually I got it to a functional state where janus would at least run.
Now what? I'm not streaming media, but now I can stream through RTP on localhost.
The next step was to figure out how to connect to janus
, which is totally a pain in the neck and nontrivial.
Modern browsers are designed to support WebRTC connections. They provide an API that you can use to create PeerConnections
, do handshaking, set up media streams, and all that fun stuff.
It looks something like this (signalingServer is a websocket server)
Note that this example is ai-written/modified, since it's a very minimal short example of what you'd do to get webrtc working.
ICE is a network technique to establish peer to peer connections while going through some central server (google hosts a commonly used one).
1// 1. Create peer connection 2const pc = new RTCPeerConnection(); 3 4// 2. Create and set local description 5const offer = await pc.createOffer(); 6await pc.setLocalDescription(offer); 7 8// 3. Send offer to remote peer (via signaling server) 9signalingServer.send(JSON.stringify(offer)); 10 11// 4. Receive answer from remote peer 12signalingServer.onmessage = async (event) => { 13 const answer = JSON.parse(event.data); 14 await pc.setRemoteDescription(answer); 15}; 16 17// 5. Exchange ICE candidates 18pc.onicecandidate = (event) => { 19 if (event.candidate) { 20 signalingServer.send(JSON.stringify(event.candidate)); 21 } 22}; 23 24// 6. Handle incoming ICE candidates 25signalingServer.onmessage = async (event) => { 26 const iceCandidate = JSON.parse(event.data); 27 await pc.addIceCandidate(iceCandidate); 28}; 29 30// 7. Handle getting streams 31pc.ontrack = (event) => { 32 const [remoteStream] = event.streams; 33 console.log("Received remote stream", remoteStream); 34 // Use the remoteStream, e.g., attach it to a video element 35 const videoElement = document.querySelector("#remoteVideo"); 36 videoElement.srcObject = remoteStream; 37}; 38 39// 8. Connection established 40pc.onconnectionstatechange = (event) => { 41 if (pc.connectionState === "connected") { 42 console.log("WebRTC connection established!"); 43 } 44};
In this case though, I didn't actually create the websocket server, and janus
's is much more complicated than this. It isn't a matter of just accepting the first message received, but rather janus
requires you actually have a "conversation" and tell it what you want -- you ask for a list of streams, choose one by id, etc, all while sending heartbeats. It's annoying to work with, but they provide a javascript
sdk with type declarations.
To procrastinate figuring out how to use their sdk, I started off by setting up a super simple vite
react
app so that I could nicely abstract things.
I learned a bit about vite
here, since I hadn't used it before. It seems like vite's entrypoint is an index.html
file that looks something like
1<body> 2 <div id="root"></div> 3 <script type="module" src="/src/main.tsx"></script> 4</body>
Where vite hosts a server, and then automatically intercepts requests for /src/main.tsx
and serves a javascript
bundle (which it can prebuild or dynamically generate).
I found this random helpful example usage that was pretty helpful for figuring out how to interface with janus
.
React hooks are ways to move logic away from your components so that you can reuse logic. The difference between hooks and regular functions is that you can use hooks
within hooks
, so, like, hooks can call useState
to maintain their own states.
The way you define a hook is by placing it in a file with the use
prefix, and then provide a function as a default export. So like, in my case, useJanusStream.ts
and export default function useJanusStream
.
In that function body we can use hooks like useState
. Okay, so let's start writing the logic for connecting to janus
and getting a stream into a <video>
element
There are docs on their javascript sdk, but it's kinda awful to work with. It sends all the right api
requests and works well enough, but it was implemented before const
and async/await
, so it's full of callbacks and is awful to deal with.
To start, we init
Janus
, which is already kinda yucky -- we're setting a global state of how Janus
is to behave.
1Janus.init({ 2 debug: true, 3 dependencies: Janus.useDefaultDependencies({ adapter }), 4 callback: () => {
We get the adapter
with import adapter from "webrtc-adapter"
, which is from here. It's a common package that exposes the WebRTC
api in a browser agnostic way.
Okay, now we instantiate a new Janus (yes, init just set a global config state, it didn't actually create a connection or anything like that)
1const janus = new Janus({ 2 server: servers, // a list of websocket/http server IPs (of janus servers) 3 apisecret: "secret", 4 success: () => {
And, once the Janus gets created we handle the success with
1janus.attach({ 2 plugin: "janus.plugin.streaming",
This attaches the streaming plugin, which makes an api request that asserts that there is a streaming plugin running on the janus
server, and then (as usual) has a callback for once the assertion is done...
1success: (receivedStreamingPluginHandle) => { 2 console.debug("Got streaming plugin information"); 3 streamingPluginHandle = receivedStreamingPluginHandle; 4 console.debug("Requesting stream from plugin"); 5 streamingPluginHandle.send({ 6 message: { request: "list" }, 7 success: (list: any) => { 8 console.debug("Listed!", list); 9 }, 10 }); 11 streamingPluginHandle.send({ 12 message: { request: "info", id: 1 }, 13 success: (info: any) => { 14 console.debug("Got info", info); 15 }, 16 }); 17 streamingPluginHandle.send({ 18 message: { request: "watch", id: 1 }, 19 success: (resp: any) => { 20 console.debug("Resp", resp); 21 console.debug( 22 "Watching stream success. Now waiting to start stream.", 23 ); 24 }, 25 }); 26},
Once it is ready, we ask for a list of streams (to log, for debugging purposes for now), we get information on the stream that should be the one we are going to connect to, and then we ask to watch the stream.
That last step is the tricky part -- in order to obtain a MediaStreamTrack
, which is just raw video/audio, which we can create a MediaStream
from, which can be slotted into a <video>
element, we need to ask janus
to send us the stream, and define a callback for once it's done so.
Before it can send the stream, it'll ask to do an auth handshake, which
1onmessage: (msg, jsep) => { 2 console.debug("Received msg from Janus server", msg, jsep); 3 if (streamingPluginHandle === null) return; 4 if (jsep === undefined) return; 5 console.debug("Received JSEP!", jsep); 6 console.debug("Answering the JSEP request."); 7 streamingPluginHandle.createAnswer({ 8 jsep: jsep, 9 media: { audioSend: false, videoSend: false }, 10 success: (jsep: any) => { 11 console.debug("Successful SDP answer created"); 12 let body = { request: "start" }; 13 streamingPluginHandle.send({ message: body, jsep: jsep }); 14 }, 15 error: (error: any) => { 16 console.error("WebRTC Error:", error); 17 }, 18 }); 19},
JSEP's a complicated handshake that goes on to establish a webrtc
connection. It's mostly abstracted away from us.
Once the handshake is done, we just define a onRemoteTrack
function for when the MediaTrack
is ready.
1onremotetrack: (track: MediaStreamTrack) => 2 onReceivedMediaTrack(track),
Then we have a MediaStreamTrack
, which is a "container" of sorts for the inbound video stream. We can attach it to our <video>
element by creating a MediaStream
with it. A MediaStream
is a collection of tracks -- so, like, audio and video, for example.
It looks something like this...
1const setupStream = () => { 2 if (videoPlayer.current && mediaStreamTrack.readyState === "live") { 3 const newMediaStream = new MediaStream(); 4 newMediaStream.addTrack(mediaStreamTrack); 5 videoPlayer.current.srcObject = newMediaStream; 6 } 7};
MediaStreamTracks
have a enum for their readyState, so if it's "live" (video is streaming), then create a MediaStream
with the track, and then set our video element (which we can grab with a selector, or, in this case, a useRef
)'s srcObject (what it's playing) to the track we just received.
Okay, so I implemented all of this, and, to put it briefly, it did connect, but no video would play.
The next major obstacle