How-To: Using an RTSP Stream as a Source for a WebRTC application
This post arises from a question posed by someone via Quora. I’m not all that engaged with that Q&A platform, but this question seemed novel, so I offered an answer. I thought the answer worth sharing in a little more depth, so I offer it here as well.
The question was, “How can I use the RTSP stream from an IP camera as a source for a WebRTC application?”
There are two parts to solving this puzzle; (1) Connect to the RTSP stream and (2) Make it appear like a webcam to the client application.
Obvious Answer: vMix
At the outset, let me say that I would address this using vMix. vMix solves both parts of the puzzle handily. If this is all that you needed to achieve, the $60 Basic HD license would suffice.
Of course, you’d need to learn a little about the application, which is deep. To my mind it’s fun, but some might find it daunting. Further, vMix requires a considerable host platform. You’re not going to run it on trivial hardware.
Let’s just say that we’d like to solve the problem with less spending and requiring less knowledge overhead.
Less Obvious Answer: VLC & NDI Tools
VLC is the ubiquitous, open source media player. Available on all platforms it can play anything I’ve every wanted to open. Beyond files, it can open network streams. I’ve used it to listen to my local PBS radio station. I’ve also used it to watch video streams from our Grandstream surveillance cameras, as shown below.
NDI stands for Network Device Interface. It’s a network protocol, developed by Newtek of TriCaster and Video Toaster fame, that allows low-latency, lightly compressed video to be passed over a gigabit Ethernet network. NDI is impressive, but I won’t wax poetic about that here.
Newtek offers a free set of NDI tools that are handy when dealing with an NDI installation or project. To solve this puzzle I’ll use three items from v3.6 of that suite:
1. NDI Plugin for VLC
This piece of software allows VLC to turn any media that it an open into an NDI stream. It’s effectively an on-ramp to networked video. Have an MP4 video file? Use VLC to “play” it, and it becomes available on the network. It can be viewed by any NDI-capable destination.
You configure VLC to do this as described here. So configured, you won’t see the video in VLC. The output is redirected to the network.
In this case, I open the RTSP stream from the camera. VLC looks like it’s playing, but the window is blank.
NDI Studio Monitor
I can confirm this fact using the NDI Studio Monitor application. This is a program that sniffs the network for available NDI sources. It then allows you to watch your desired stream(s) over the network. It can do a lot of novel things, like picture-in-picture, overlay a graphic, or show audio metering. Very handy.
The very fact that we can see the camera’s live stream in NDI Studio Monitor verifies that everything to this point is working as expected. The application title bar reports that it’s seeing the stream from VLC at 1080p30. So far, so good.
NDI Virtual Input
NDI Virtual Input is a small application that allows you to select one of the NDI streams available on the network, making it appear as a webcam to another local application. It’s a lot like SparkoCam or ManyCam for NDI sources, but without the goofy consumer features.
Returning to the task at hand, I find the NDI Virtual Input icon in the Windows system tray. I right-click and select the stream from VLC as the source.
Note that the Virtual Input app and VLC are running on the same host PC. I’m using network tools, but all are on localhost.
Then I go into my communications application and select Newtek NDI Video as the camera.
I tried this in several common video chat applications.
It works in Skype for Windows.
It works in Talky.io accessed in in Chrome.
It works in Meet.Jit.si accessed via Chrome or Firefox.
It works in Google Hangouts accessed via Chrome or Firefox.
The Qualitative
Ok, so this approach to bridging an RTSP stream into a WebRTC application can work. How well does it work? And what sort of compute power is required to do it reliably?
The screen shots you see here were from an experiment I performed using the following items:
- Lenovo X1 Carbon (2013) with i5-3427U 2-core CPU@1.8 GHz & 8 GB RAM
- Gigabit Ethernet connection to my LAN
- Grandstream GXV3672 FHD IP camera delivering 1080p30@2048 kbps
The host PC has to:
- VLC – Receive the RTSP stream decoding it into a blind buffer.
- VLC – Encode the result using NDI
- Virtual Camera – Decode the NDI stream
- Virtual Camera – Encode the result into YUY2 or MJPEG as required by the WebRTC app
- Browser – Collect the video stream, encode to VP8 and send it afield
All of that is a not inconsiderable load on the host. The RTSP stream is decoded from H264 at 2 Mbps to NDI at around 70 Mbps, then scaled from 1080p to 720p, encoded to YUY2 (what webcam mostly deliver), and finally encoded to VP8 by the WebRTC application.
On my admittedly older laptop the process consumed 80-90% of available CPU power. That’s high enough that I don’t expect the process would be stable. Anything else that might occur on the PC would definitely rock the boat.
Since my experiment didn’t involve audio, it’s not clear if the laptop would be stable enough to handle sound reliably.
As a practical matter, few WebRTC apps require 1080p30 from a video source. This the process was required to scale the video from 1080p30 to 720p30, a not inconsiderable task on its own.
I also found that the Virtual Camera app had an option to deliver a “reduced quality” stream to the WebRTC application. Setting this option reduced the overall CPU load to a more manageable 63% at the expense of a little stream quality.
If this was something that I needed to do on an non-going basis, I’d likely split the load across two machines. I could use something like a NUC to act as an RTSP to NDI gateway. Then run Virtual Camera on my desktop, which is most likely where the WebRTC app would need the stream.
Summary
As promised, I’ve described an approach to using an RTSP stream from an IP camera a video source to an application that would otherwise like a webcam. And I’ve done it without spending any money on software, or requiring deep skills in either computer science or video production.
This is certainly not the only possible approach. It’s just one that I might use, given the tools that I find in my belt. I can imagine an approach that leveraged FFMPEG and v4l2loopback, but that would require some fairly serious Linux skills that I simply do not possess.