Take

Surfaces

I have been lucky enough so far to avoid doing screen sharing and streaming, but I've been in meetings where other people are doing it and I've seen plenty of streamed videos on all subjects where something is demoed, and it is astonishing to me how much effort is put into moving stuff around the screen, and how the thing you want to show is always locked into what a window shows.

For example: The .NET Community Standups have regressed tremendously even as they got a professional studio with its own technology staff, but it still comes down to sharing a big dumb thing with floating buttons that no one who's watching actually cares about in the way of the person. And this is just what one of the biggest company in the world, in control of its own OS and several video conferencing solutions can muster.

There is a big need for a new user-visible, user-manipulable primitive on the OS level: surfaces that an application can project. Skype or Slack or Zoom or Discord should be able to have a separate surface for just the video, laid out and with no UI, and a surface that's just the chat, and you should be able to go into your streaming software and grant the 75 OS-level mother-may-I permissions and say: I want the chat from Discord, the video from Skype, my own webcam in this corner and this window from this game or application.

All those things would be latently available, would render directly into some buffer when needed, would be easy to add support for and would cost nothing when not used. The word surface comes from many similar concepts, including IOSurface on Apple platforms where it would most certainly be used to implement this, but the thing I'm proposing is something concrete and user-level. People who care about streaming would know that a surface is available, could hook up and preview any surface from any app (subject to completion of the obligatory round of permission Twister), and would know to ask application authors to add support for them.

Answers to presumptive questions:

  • Doesn't this exist? For all I know, this does exist in some form, but since I've never heard about it as an application developer, and never heard of any video conferencing applications implementing something like it (but making enormous hacky workarounds for call recording to work), it effectively isn't what it could be.

  • Since OBS plugins and overlays exist, isn't this superfluous? OBS plugins and overlays exist and are great in-so-far that people can tweak and add in what they want and they shouldn't go away, but it's not this, since it's not open-ended. Application A can't say "I've got views A, B and C if anyone is interested" and application B can't say "I'll take view B and C" without application A and application B having engaged in some form of blood pact beforehand.

  • Should this really be called "surfaces"? This could very well be named "views" to users, but it would be very difficult to call it that to developers who are used to "views" meaning controls in a window. Calling it "surfaces" has a similar issue but can be seen as raising some surfaces to the user level and making them accessible.

  • Okay, so how isn't this like "sources", then? What I'm asking for is pretty much like the interface that allows streaming applications (or any application) to take a stream from a webcam, which is why I think it's technically a solved problem. But by convention they all come from a hardware device, and not being a hardware engineer or driver developer my guess is that it's hard for just an application to project one of these things. And besides, having a separate name could be good too: "please add a pseudo-webcam to your application" is ripe for confusion.

  • Why the OS specifically, couldn't this be polyfilled/provided by something else? Sure, and software like OBS is in a good position to do so if there is a good technical path to stream video to some sort of receptacle that can facilitate it. Making it suitable for real use means keeping it low-latency and efficient and preferably low on copying while respecting security, privacy and permissions. But, hell, if you can figure out a way for applications to advertise that "if you connect to me in this way I'll talk libretro to you and you do with that what you wish", we're most of the way there.

And the final one: Can't you just do this with windows?

Yes, almost, but the problem is that the window has to present a good UI within it. Something that's a good visual that you want to show – when what you want to show isn't literally a UI itself – doesn't have a bunch of buttons and crap on it to annoy, distract and occlude. It's the difference between just looking at the slideshow in PowerPoint, confined to a segment of the screen, with UI affordances and borders and grip handles and what have you, and entering presentation mode, except that surfaces could be even cleaner because no one but the presenter would see the little toolbar with back/forward buttons and drawing tools because they wouldn't be on the surface.

And on a technical level, now you're getting into syncing with the window manager and compositing and screen recording and god help you if someone moves a window or something shows in front, compared to the application saying "this is what you want to see" and filling a buffer, and that buffer being rendered. Recognizing that some things are more like screws than they are like nails, and putting away the hammer for a while.

Previous post: Programming difficulty texture Following post: Lori Lakin Hutcherson explains white privilege through examples