skip to Main Content

Interconnecting Jitsi Video Bridge, ZipDX & YouTube Live

In the production of over 530 VUC sessions we’ve undertaken some odd and occasionally rather complicated arrangements. Quite possibly the most complex is when we interconnect the WebRTC-based Jitsi Video Bridge with YouTube Live and the ZipDX conference bridge. I set about described aspects of this process a year ago, but stopped short of describing how the entire arrangement worked. Well, worked most of the time. This article will bring you current with my various attempts to make this process robust and repeatable.

Preface: When we use Jitsi Video Bridge we lose a couple of the conveniences that come with a Hangout-On-Air. Where a Hangout-On-Air has an automatic link to a YouTube Live event, we must do this manually when we use JVB.

The Audio Setup

With respect to audio, a major hurdle was overcome when VB-Audio released VoiceMeeter Pro (aka Banana) in August 2014. VoiceMeeter Banana is very much an extension of VoiceMeeter, which I described earlier. However, where VoiceMeeter has two physical outputs and one virtual output, Banana provides three physical outputs and two virtual outputs. The added virtual outputs make it much easier to create multiple, simultaneous mix-minus feeds. It also has a built-in playback function, which is how I generally roll the prerecorded opening music.

This entire arrangement is built around a singular, critical difference between Jitsi Video Bridge and a Google+ Hangout. A Hangout give you explicit control of both the microphone and the speaker. WebRTC-based services like JVB only provide flexibility in microphone selection. They don’t presently allow the end-user to select where the playback audio is routed. It goes to the default playback device of the system.

The value in VoiceMeeter and VoiceMeeter Banana is that, when used correctly, they become the default audio device! They effectively virtualize the default audio device. That’s the root of their flexibility, and that’s AWESOME!



  • Bria Mic setting = VoiceMeeter Aux Output
  • Bria Speaker setting = VB-Audio Cable A Input
  • VoiceMeeter Input #1 = Sennheiser headset mic, route to output B1 & B2
  • VoiceMeeter Input #3 = VB-Audio Cable A Output (from Bria), route to output B1, A1
  • VoiceMeeter Output A1 = Sennheiser headset speakers
  • VoiceMeeter Output B1 = to Jitsi Video Bridge
  • VoiceMeeter Output B2 = to Bria (ZipDX)
  • Wirecast set to use VoiceMeeter Output as its primary audio source
  • Wirecast set to use my headset microphone as its secondary audio source

The Video Setup

Jitsi Video Bridge (JVB) is a great video conference tool. It leverages the evolving WebRTC standard allowing multi-party video chat using only the Chrome web browser for a client.

JVB differs from Hangouts in a couple of significant ways. First, it doesn’t provide a means of selecting the output audio device. This is true of all WebRTC-based services at present.

Secondly, it doesn’t have server-side integration with YouTube. That means that, when using JVB, we must either capture a local recording, or preferably, find a way send the stream to a YouTube Live event.

Fortunately, we’ve been able to work through these problems. In fact, we’ve tried a couple of approaches.

Method #1: Wirecast Local Desktop Capture

With the release of Wirecast v5.0 Telestream trumpeted a new, high-performance “local desktop capture” capability. This feature was targeted at gamers who were seeking a way to stream their game play. The application can capture all or part of a local display, sending that stream onward to any of a variety of services, including YouTube.

At the same time, the application can mimic a webcam. It does this in support of VC soft clients, like Jitsi Video Bridge.

My desktop computer has two monitors. On the primary display I had my production tools running, including; Wirecast, VoiceMeeter and Bria. I put Chrome with JVB on the second display, using Wirecast local desktop capture tool to capture the JVB session.

It’s curious to consider just how far back this sort of “screen scraping” goes in the realm of computers. One of the earliest corporate uses of personal computers was to emulate a serial terminal connection to a mainframe or mini-computer.

The advantage of keeping the entire production process on a single host computer is that audio handling is easier. VoiceMeeter Banana can provide appropriate mix-minus feeds to JVB & ZipDX, and a one-way feed to the YouTube Live event.

The disadvantage of keeping the entire production process on a single host computer is that the workload presented to the host is substantial. A WebRTC app like Jitsi Video Bridge is not a lightweight process. Wirecast is itself a beast. The pair are near the limit of what my 2+ year old HP desktop (AMD FX-6100 6-cores @ 3.3 GHz, 10 GB memory, Radeon HD7400 1 GB GPU, 256 GB SSD, 2TB HD) can accommodate.

I put a lot of effort into fine tuning the setup to make it work. For example, I found that I had to disable the Windows Aero desktop animations in order to conserve GPU resources.

I also had to set the display resolution of the second monitor to 1280×720 pixels. Thus I avoided the effort of capturing & scaling the JVB session. After all, 720p was the native resolution of the JVB session, and the typical resolution of our Hangouts/YouTube archives.

I made every tweak I could find to minimize the workload of the system. Even so, when it was all in motion the CPU was running around 75% occupied.

I tried to add an more powerful GPU card, but found that the HP H8-1214 lacks the power supply output to support a more capable video card. It only supports cards that can be powered from the PCIe bus. More capable cards require a dedicated power feed. Upgrading the the video card and the power supply seemed like more than I should invest in this aging box.

I further discovered that YouTube Live is picky about jitter in the source stream. When the system was running with very high CPU load I occasionally could not get the YouTube Live event to start. In such cases the service didn’t acknowledge the stream from the Wirecast host.

This reality was also impacted by the fact that JVB presents a variable amount of load to the system. The load varies with the number of participants in the JVB session. This is in the nature of the SFU architecture, where the participants streams are sent to each end-point. This is is very different from a traditional MCU, where the various participants streams are composited by the server, sending only one stream onward to the end-point for viewing.

The upshot of the variability was that I could test with 2-3 people in JVB and the setup would work perfectly, sending 720p30 to YouTube reliably. Later on, with 6-8 people in JVB the setup would become considerably less reliable.

The second approach involves splitting the workload between two computers. I’ll detail how that’s done in second post.

Back To Top