Jan Grulich

WebRTC: journey to make wayland screen sharing enabled by default

While we have pretty good support for screen sharing on Wayland in WebRTC, which is included in browsers like Chromium or Firefox, it is still not enabled by default in Chromium and it is kept behind a flag. Not only you have to remember to always enable it for new configurations, but for many users it is not even something they are aware of. This has been my main focus recently and I would like to share with you steps that has been done and what are the plans for the future.

What are the changes to expect in Chromium soon?

DMA-BUF improvements/fixes:

Last year I landed proper DMA-BUF support in WebRTC, which made things way faster. It was working, but it was not perfect and there were some corner cases where it might not be working at all. Here are changes I made recently:

  • Advertise DMA-BUF support when it is really supported. Older versions of PipeWire don’t handle the new way of DMA-BUF negotiation and therefore it shouldn’t be used in such cases. Also using DMA-BUF modifiers requires some recent versions of PipeWire on both sides.
  • Implemented stream renegotiation. In situation when we fail to import a DMA-BUF with given modifier, we will drop this modifier and try to renegotiate stream parameters and go with a different modifier or fallback to shared memory buffers in case we fail completely.
  • Make sure to import DMA-BUF with correct render node. In case of multi-gpu setups, we always picked the first render node to import DMA-BUFs, but it can happen that they were actually produced by a different render node and for that reason we might fail to import them. We now try to get default EGLDisplay, which should be the same one used by the wayland compositor and we should be using same render node.

Better mouse cursor support:

Until now we had mouse cursor as part of the screen content. This means that everytime you moved with your mouse cursor, we had to update whole image and that is very inefficient. The API in WebRTC allows you to implement MouseCursorMonitor which can be used to track mouse changes only and each platform can have both MouseCursorMonitor and DesktopCapturer implementations combined in DesktopAndCursorComposer to get complete image and this all works automatically like a magic. Unlike X11 implementation, our only option is to get everything from one PipeWire stream we connect to and there was no way how to make it shared from DesktopCapturer implementation so it can be used by MouseCursorMonitor implementation. I had to split DesktopCapturer to have xdg-desktop-portal and PipeWire separate implementations. Code for PipeWire is now a SharedScreenCastStream class which is being shared through DesktopCaptureOptions. This is set of parameters associated with each capturer instance and luckily this is also passed to MouseCursorMonitor so we can have access to already initialized PipeWire stream and get the cursor data from there. Implement MouseCursorMonitor with SharedScreenCastStream was then piece of cake.

List of merge requests:

This should again significantly improve performance of screen sharing, because moving with a mouse over a static screen content doesn’t need full screen content update.

Misc:

Last but not least, I’m now in touch with Google developers who help me to review all my changes and discuss with me the current state, issues I have, etc. on monthly meetings we have. The plan is to make this finally enabled by default, hopefully in the first half of this year. There are still some things that need to be solved before this is enabled and there is lot of work ahead, but things look promising.

Plans for the future:

  • Implement stream restoration
    • this will allow us to skip the second portal dialog and I already have plan in my head how to do this in WebRTC. This is currently only supported by xdg-desktop-portal-gnome and xdg-desktop-portal-kde lacks this functionality.
  • Improve UX of the Chromium screen sharing dialog
  • Write tests for all PipeWire/portal code in WebRTC

Even though WebRTC is used in Firefox, I mostly talk about Chromium, because Firefox doesn’t use most recent WebRTC and will need to pick all the changes I did or rebase to newer WebRTC in order to have them. Firefox also has PipeWire/Wayland screen sharing enabled by default and doesn’t have UX issues as there is no internal screen sharing dialog like in Chromium.

I hope all these changes will make your experience better and next time when you read a new blog post I will be informing you about end of this journey.

WebRTC/Chromium updates in 2020

In 2019, I started with my first contribution to WebRTC. This was all about screen sharing support on Linux Wayland sessions, using xdg-desktop-portal and PipeWire. Back then, it was quite simple, we only had PipeWire 0.2 and all portal backends supported only screen sharing (no window sharing). While this was relatively easy, it was not ideal as each screen sharing request involved two portal dialogs to get the screen content on the web page itself. For me it was a big success, because I made quite a significant contribution to such a big project, which is used by many people, and a project which is used by all modern web browsers.

At the beginning of 2020, the year everyone would like to erase from their memories, we got PipeWire 0.3 (with slightly different API) and later with xdg-desktop-portal-gtk and xdg-desktop-portal-kde (later this year) people were finally able to share application windows. Support for all of this was lacking in WebRTC, because back then those were not available. I wanted to tackle all issues at once, bring support for window sharing and get rid of the “dialog hell” with portals, which was even worse with the new window sharing capabilities in portal backends.

This is what the situation looks like. With each request to share a screen, you got the preview dialog from Chromium. This dialog consists from three pages. One is for screen sharing making one portal request, second one is for window sharing, which is another portal request, and the last one is just to allow you to share a web page you have opened. You had to confirm both portal dialogs, then confirm the Chromium dialog and finally you got one more portal dialog (ouch) to get the screen content on the web page itself.

I had a solution. I made all portal calls identified with an ID and shared this ID (portal call) in Chromium between both pages in the Chromium preview dialog and with the request made for the web page itself. With this solution we only had ONE portal dialog. This was a perfect solution (at least seemed to be). I started working on this at the beginning of this year, we exchanged many emails with people from Chromium UX team, because I wanted to do also some minor UI changes in the preview dialog. Unfortunately, those were rejected for consistency with all platforms. It was not a big deal and I submitted my changes for review, keeping UI as it was, just adding all necessary bits into Chromium and WebRTC to make it all work.

I wish to say things went smoothly since then, but the opposite is true. It took a while to get everything reviewed, but this is probably no surprise with this year being weird and many people working from home with less than ideal conditions. Anyway, few months passed away, I ended up rewriting my changes many times, not even counting hours I spent on it. This all resulted into me being obsessed with this change, it mattered to me so much to get it merged. I was constantly thinking about how to make it better, I was many times fixing issues in the evening (as reviewers were mostly US based), instead spending time with my family. It would be even better to waste my time with my beloved Playstation. This had really negative impact on my mental health and I realized this has to stop and I simply gave up, because I couldn’t continue this way and needed a break. I abandoned both changes (WebRTC and Chromium) and decided to just pick changes I will be able to successfuly upstream. I probably made my change too ambitious and complicated or maybe it’s just Chromium not being ready for this kind of change, because some tweaks were specific for my use-case. It’s also hard to say I wish upstream devs had helped me more, because there is so much to understand around Wayland, portals and PipeWire and way how it all works together.

Anyway, with a new start, without pressure after gaving up on the change, I picked the most important changes and submitted them separately. I was surprised now how smoothly this went and how fast those changes were upstreamed. Simply those changes were simple, understandable and easy to review. I didn’t gave up on fixing the “dialog hell” completely, I have some other ideas, but next time I will try to submit them step by step and will keep some distance and my free time.

And what are the changes you can expect in upcoming Chromium release in 2021?

Support for PipeWire 0.3

You can now build Chromium/WebRTC with both PipeWire 0.2 and Pipewire 0.3. There is a new “rtc_pipewire_version” option you can pass to your build configs.

Window sharing support

There is probably no description needed. You will be able to share application windows in case you don’t want to share whole screen.

Suppport for DmaBuf and MemFd buffer types

This should allow faster transfer of your screen content from your Wayland compositor, through PipeWire to your browser.

Less portal dialogs involved

If you look back into the screenshot I posted above, you can see there are two portal dialogs opened just for the Chromium preview dialog. I at least tried to reduce this to just one portal dialog. This was done by removing the page for window sharing, because the screen share request will already handle both screen and windows.

I think you can expect above mentioned changes in Chromium 89 and I hope you will at least appreciate some of these improvements even though I didn’t deliver everything I wanted to. Also, thanks to Martin Stránský from our Firefox team, you can expect all these changes to be also part of Firefox.

Happy holidays and see you in a better year.

Screen sharing support in WebRTC for Wayland sessions

Last time I wrote about possibility to share a screen of Plasma wayland session, using xdg-desktop-portal and our xdg-desktop-portal-kde backend implementation. Problem was that during that time, there was no application which would implement support for this, leaving my previous effort useless so far. Luckily, this should change pretty soon. I, together with my Red Hat collegues Tomáš Popela and Eike Rathke, have been working for past few weeks on bringing support for screen sharing on Wayland to web browsers. All modern browsers use WebRTC for all audio-video communication, including screen sharing, meaning that in a perfect world, just one implementation would be needed, which is not that exactly this case. Let’s go a bit into the details first.

Each system (Windows, Mac and X11) in WebRTC reimplements an abstract class called DesktopCapturer, which defines API used by applications to support screen sharing. For our wayland support, we started with a new implementation using Pipewire as the core technology used for screen content delivery and for the communication part, to request which screen to share and to obtain Pipewire stream information, we use xdg-desktop-portal, providing simple API to do so. Advantage of using xdg-desktop-portal is that it will work also in sandbox (Flatpak and Snap) and that there is support in Plasma (using xdg-desktop-portal-kde backend) and support for Gnome (using xdg-desktop-portal-gtk backend), both using same API. Using our desktop capturer implementation, WebRTC starts to communicate with xdg-desktop-portal, we set up a session associated with our request, we tell xdg-desktop-portal that we have interest in screen sharing so xdg-desktop-portals asks your backend implementation to provide a dialog to select a screen to share and once this is settled, we request to start screen sharing and backend implementation will set up Pipewire stream, sending us back file descriptor of a Pipewire remote which we can open and connect to it. Once we are connected, we finally start receiving buffers from Pipewire with screen data and providing them to applications. This so far sounds simple and that the work is basically done, but this is unfortunately not an ideal world.

We found out that e.g. Firefox uses some older copy of WebRTC, so while working on WebRTC trunk, to have our work ready for upstream, we had to modify slightly our changes for Firefox older copy in order to be able to test our changes. There is also one thing we haven’t figured out yet, the thing is that Firefox has its own dialog to select a screen (for other capturer implementations) and we were not able to avoid displaying it when our capturer is used. Another thing is Chromium, which seems to be using WebRTC in a different way when compared to Firefox as Chromium is using plugins so this is also something we have to figure out. There are probably still plenty of other things to do before all of this can be upstream, but we have made great progress on this so far. We even had couple of Bluejeans calls where both sides could share their screens, one running Gnome Wayland session and me running Plasma Wayland session.

And for those who like adventures, I have set up a Fedora COPR repository (currently building), with Firefox containing our changes so you can test it yourself, we will need testers soon or later anyway. You just need to make sure that Pipewire is running, otherwise this won’t work, portals (both xdg-desktop-portal and backend implementation) should be started automatically.

Screenshot of Firefox in action:

Scroll To Top