An Esoteric Type of Memory "Leak"

Last modified: 5 days ago

A little while ago, my colleague Sebastian started complaining about OOMs caused by Evolution taking up tens of gigabytes of memory. We discussed using sysprof to debug it, but it was too busy a time for Sebastian to set aside a few hours to do that.

Funnily enough, the most efficient fix at the time was to buy more RAM, since rust-analyzer was also causing OOM issues.

A few weeks went by. Restarting Evolution had become a daily ritual for Sebastian.

Then, on a whim, I decided investigating this might be a good test for an LLM.

I updated my Evolution git repo, built it, and started up Claude Code in the source root. This was the only prompt I supplied:

Find memory leaks in Evolution, current sourcedir. Particularly leaks that could accumulate over several hours. A colleague has a leak that slowly accumulates memory usage to several GB over the course of a day, requiring a restart of Evolution. That is the main focus, but we can fix other leaks in the process.

I wish I was lying, but that was all Claude Code needed to find the problem: Evolution just needed to call malloc_trim(0) from time to time.

I refused to believe it at first. I was only convinced when we saw the memory drop after running gdb -p $(pidof evolution) -batch -ex "call malloc_trim(0)" -ex detach

This seems absurd! Doesn't glibc reclaim freed memory from time to time?

Yes, it does. It calls sbrk() to do that. However, sbrk() can only reclaim free memory at the top of the heap, since it simply moves the program break downward to do so. malloc_trim(0) calls sbrk() and then also calls madvise(..., MADV_DONTNEED) on the free pages, which allows the kernel to reclaim them.

So if you have 10GB of unused memory followed by 4 bytes allocated at the top of the heap, your RSS is >10GB, even if you're using a few hundred megs. Till you call malloc_trim(0).

Note that you can only get into this situation if you have hundreds of thousands of small allocs/deallocs happening repeatedly. If your alloc is >128KB, mmap() is used for the allocation, and none of this applies.

Coincidentally, GLib's use of GSlice for GObject allocations was masking this issue in the past, but GSlice has been a no-op for some time now (for good reasons). Ideally, Evolution should not be using GObject for such ephemeral objects.

Lesson learned: if you have memory usage issues and you suspect fragmentation, try malloc_trim(0) before you go thinking about fancy allocators.

GStreamer 1.28.3 stable bug fix release

Last modified: a week ago

From GStreamer News by GStreamer

The GStreamer team is pleased to announce another bug fix release in the new stable 1.28 release series of your favourite cross-platform multimedia framework!

This release only contains bug fixes as well as a number of security fixes. It should be safe to update from 1.28.x, and we recommend you do so at your earliest convenience.

Highlighted bugfixes:

Various security fixes and playback fixes
applemedia: vtdec stability, MoltenVK integration and planar video format handling fixes
audioresample: Fix regression on armv7hf
bpmdetect: Fixes for stereo and multi-channel modes
devicemonitor: wait for start thread to finish when listing devices so all the info is there for e.g. v4l2 provider
fallbacksrc: Add fallback-source and enable-dummy properties
nvidia: fix cudaconvert performance regression and nvdec device creation regression
opengl: add GBRA swizzle support, and fix glcolorconvert vertical flip issue on crop
rtspsrc: include user-agent property in HTTP tunnel requests and fix mikey regression
threadshare: add leaky mode to dataqueue-based elements
v4l2: fix negotiation error when trying to force stateful decoders to output dmabufs
webrtcsink: Add support imx8mp vpuenc_hevc hardware H.265 encoder
cerbero: Extend gst-plugins-rs melding to Darwin platforms for smaller binary sizes and static linking improvements
inno Windows installer fixes, including silent install mode via the command line
macOS: provide script to allow uninstalling the package; relocate absolute paths to Python.framework in wheels
Various bug fixes, build fixes, memory leak fixes, and other stability and reliability improvements

See the GStreamer 1.28.3 release notes for more details.

Binaries for Android, iOS, Mac OS X and Windows will be available shortly and will be published on the Downloads page.

New high-performance GStreamer UDP source element and SMPTE ST2110 capture

Last modified: 2 weeks ago

From Centricular Devlog by Sebastian Dröge (Centricular)

New `udpsrc2` element

Over the past few years, I have worked on a new GStreamer UDP source element. This is finally merged now and will be part of both the GStreamer 1.30.0 release and the gst-plugins-rs 0.16.0 release.

The old element uses GIO for networking, which is quite inefficient by design. The new implementation uses about 50% less CPU on my machine compared to the old element for a 3 Gbit/s stream.

As can be seen from the docs of the new element, it preserves the API of the old element. As such it should generally be possible to use it as a drop-in replacement.

In addition to performance improvements, the new element also includes various other improvements:

Support for faster packet receiving via Generic Receive Offload (GRO) on Linux, and for using recvmmsg() on platforms where it is available to significantly improve receive performance.
Complete support for multicast source filtering, including negative filters, and support for platforms that do not have APIs for the IGMPv3 SSM mechanism.
Always obtaining kernel-side packet receive times if available, which was opt-in in the old element due to GIO performance issues with socket control messages.
New preserve-packetization property that allows outputting multiple packets in the same buffer, which improves performance for formats like MPEG-TS where the UDP packetization is not necessary.

Give it a try with your pipelines and workloads and share your feedback or any issues you encounter.

In the future, io_uring support on Linux could be added for even better receive performance.

SMPTE ST2110 capture

While udpsrc2 is an improvement in general, its primary motivation is better SMPTE ST2110 support in GStreamer. The old element could not handle the packet rates typically used for such streams very well.

ST2110 defines a UDP/RTP-based set of standards for transmitting raw or very-high bitrate audio / video / ancillary data over Ethernet. It is intended as a replacement for SDI.

Related to this, we recently also merged some other improvements:

A new, improved raw video RTP depayloader that supports ST2110-20.
A new, improved raw PCM audio RTP depayloader that supports ST2110-30.
An ST291 ancillary data RTP depayloader that supports ST2110-40.
Various improvements to the rtprecv element, especially for performance and handling of high-packet rate streams.

For all the new depayloaders there are also new, improved implementations of the corresponding payloaders available.

Together, these improvements enable reliable ST2110 stream capture in GStreamer.

An example pipeline putting it all together would look as follows:

$ gst-launch-1.0 \
    \ # Video capture pipeline part
    udpsrc2 address=239.255.64.20 port=16388 multicast-iface=enp15s0 buffer-size=20000000 caps='application/x-rtp, media=video, payload=96, clock-rate=90000, encoding-name=RAW, sampling=YCbCr-4:2:2, depth=10, width=1920, height=1080, exactframerate=60, colorimetry=BT709, pm=2110GPM, ssn=ST2110-20:2017, tp=2110TPN, a-sendonly="", a-ts-refclk="ptp=IEEE1588-2008:7C-2E-0D-FF-FE-1C-81-14:127", a-mediaclk="direct=0", ssrc-327995485-cname=E055FF0F3D6E4B349F7B786D8B6C837B' ! \
      rtprecv latency=0 ! queue max-size-bytes=0 max-size-buffers=0 max-size-time=500000000 ! rtpvrawdepay2 ! \
    \
    \ # Ancillary data capture pipeline part
    udpsrc2 address=239.255.64.20 port=16386 multicast-iface=enp15s0 buffer-size=20000000 caps='application/x-rtp, media=video, payload=98, clock-rate=90000, encoding-name=SMPTE291, vpid_code=138, a-sendonly="", a-ts-refclk="ptp=IEEE1588-2008:7C-2E-0D-FF-FE-1C-81-14:127", a-mediaclk="direct=0", ssrc-2672978631-cname=E055FF0F3D6E4B349F7B786D8B6C837B' ! \
      rtprecv latency=0 ! queue max-size-bytes=0 max-size-buffers=0 max-size-time=500000000 ! rtpsmpte291depay ! combiner.st2038 \
    \
    \ # Combination of video and ancillary data streams and output
    st2038combiner name=combiner start-time-selection=first ! videoconvert ! queue max-size-bytes=0 max-size-time=0 max-size-buffers=3 ! autovideosink \
    \
    \ # Audio capture and output pipeline part
    udpsrc2 address=239.255.64.20 port=16384 multicast-iface=enp15s0 buffer-size=20000000 caps='application/x-rtp, media=audio, payload=(int)97, clock-rate=48000, encoding-name=(string)L24, encoding-params=64, a-sendonly="", a-ptime=0.125, a-ts-refclk="ptp\=IEEE1588-2008:7C-2E-0D-FF-FE-1C-81-14:127", a-mediaclk="direct\=0", ssrc-603238248-cname=(string)E055FF0F3D6E4B349F7B786D8B6C837B' \
      rtprecv latency=0 ! queue max-size-bytes=0 max-size-buffers=0 max-size-time=500000000 ! rtpL24depay2 ! audioconvert ! autoaudiosink

This pipeline receives a 1080p60 4:2:2 YUV 10-bit video stream, ST291 ancillary data, and a 24-bit 48kHz 64-channel PCM audio stream. The video and ancillary data are combined to a single stream, and then both the combined video-ancillary stream and the audio are output.

rtprecv is used here for translating packet capture timestamps and RTP header timestamps to consistent GStreamer timestamps.

Ancillary data

The pipeline above captures all three streams and merges the ancillary data stream with the video. The ancillary data itself is not processed further.

One way to process the ancillary data further is to extract ST12 timecodes from it and overlay them over the video.

For this, insert the following elements before the video sink:

 ... ! timecodestamper source=ancillary-meta ancillary-meta-locations='8:2000,570:2000' \
     ! videoconvert ! timeoverlay time-mode=time-code \
     ! autovideosink

Here timecodes from ancillary data at positions (8,2000) and (570,2000) would be extracted and converted to GstVideoTimeCodeMeta on the video buffers.

We recently added support for extracting ST12 timecodes from ancillary meta as well.

The positions depend on the video signal standard in use and can be found in the ST12 specifications.

Improved Support for iOS and new tvOS support in GStreamer 1.28

Last modified: a month ago

From Centricular Devlog by Nirbheek Chauhan (Centricular)

We've been hard at work doing numerous small and large improvements to GStreamer for people who want to target Apple platforms: macOS, iOS, and tvOS.

iOS ARM64 Simulator Support via an XCFramework

With the GStreamer 1.28.0 release, the project now releases an XCFramework for iOS. As expected, this XCFramework supports iOS arm64, iOS Simulator x86_64, and iOS Simulator arm64. The legacy iOS framework that lipo-ed iOS arm64 and iOS Simulator x86_64 is now deprecated, and will be removed in a future release.

You can download the XCFramework from the official download page.

Thanks to Amy for helping me with this!

tvOS Support

As of version 1.28.1, GStreamer officially supports tvOS, and binaries for it are shipped as part of the iOS XCFramework. This means that the GStreamer 1.28.1 iOS XCFramework contains: ios-arm64, ios-arm64_x86_64-simulator, tvos-arm64, tvos-arm64_x86_64-simulator.

Most of the relevant Apple-specific plugins are supported:

osxaudio: Audio source/sink, using CoreAudio
atdec: Audio decoder, using AudioToolbox
atenc: Audio encoder, using AudioToolbox
vtdec: Video decoder, using VideoToolbox
vtenc: Video encoder, using VideoToolbox
glimagesink: Deprecated EAGL video sink
vulkansink: Metal-based video sink, using MoltenVK
vulkancolorconvert: Metal-accelerated video conversion, using MoltenVK
vulkanoverlaycompositor: Metal-accelerated video overlay compositor, using MoltenVK
... more Metal/Vulkan elements

Two elements that use AVCaptureDevice had to be disabled because they need more work to support tvOS:

avfvideosrc: Video capture source, using AVFoundation
avfdeviceprovider: Video capture device provider, using AVFoundation

Thanks to Remote Studio for sponsoring this work!

Improved support for using Rust plugins on Apple platforms

Linking more than one Rust plugin into your app had been broken on macOS and iOS for some time. The fix for that requires prelinking, which Amy has written about previously, but it couldn't be enabled on macOS due to some LLVM/LLD issues. We had to wait for the fixes to percolate down to a Rust toolchain release. That finally happened in Rust 1.93, but by that time a new problem had cropped up: Xcode 26.

Due to some toolchain changes in Xcode 26, linking Rust plugins was failing on macOS and also on iOS with the legacy framework. After weighing all the options, the best solution was to add -no_compact_unwind to the linker flags on macOS, and direct people to use the XCFramework when using Rust plugins on iOS.

This is now added automatically if you use pkg-config (using CMake or Meson, for example), but if you're using a plain Xcode project, you need to add -no_compact_unwind manually to linker flags in Xcode.

This fix will be available in the upcoming 1.28.3 release.

Many more macOS, iOS, tvOS improvements

Contributors have been hard at work with small and large improvements to the Apple-specific elements in GStreamer. Ranging from AV1 and VP9 decoding support in vtdec to better debug info, bugfixes, memory leak fixes, crash fixes, and much more. The patches are too many to list or even link!

Introducing GstPrinceOfParser 0.4.3

Last modified: a month ago

From Dabrain34's igalian blog by Stéphane Cerveau

GstPrinceOfParser: An All-in-One Tool to Play With GStreamer on Any Platform #

Introducing gst-pop, the GStreamer Prince of Parser — a tool to make interaction with GStreamer easier, global, and remotely accessible.

What is GStreamer? #

GStreamer is an open-source multimedia framework started in 1999. It lets you build pipelines of interconnected elements to stream, encode, decode, and manipulate media. The core idea is simple: a source element produces data, passes it through one or more transform elements, and delivers it to a sink. For example, here is a pipeline that decodes an MP3 audio file:

    filesrc --> mp3dec --> audiosink

For more than 20 years, GStreamer has relied on its in-house toolbox to demonstrate the power of its pipelines. As this toolbox is used in thousands of projects and serves as a reference implementation, modifications and enhancements are deliberately kept minimal to maintain stability. gst-pop was created to go beyond these limitations.

A Unified Interface for GStreamer #

Accessible over the network, via CLI arguments, or through D-Bus, gst-pop aims to provide a multi-pipeline-capable command-line tool.

With a simple invocation of gst-pop (or its alias gst-popd), you can run a daemon that accepts multiple pipelines simultaneously, accessible through D-Bus or WebSocket via the pipeline ID. You’ll be able to control, query, and get information about each pipeline — all of that over a remote network, secured with API key authentication and origin validation to prevent unauthorized access.

As demonstrated in the blog post related to GstPipelineStudio, it will be possible to connect to a remote pipeline or launch new pipelines through the GStreamer GUI. If a GUI is not available on the platform, it will soon be possible to use a web interface to control GStreamer, offering everything GStreamer can provide and more, limited only by your imagination.

Remote Element Inspection #

gst-pop (or its alias gst-pop-inspect) is also capable of listing the elements on a local or remote host, inspecting their capabilities, and providing a remote way to interact with your GStreamer installation.

Media Discovery #

It can also provide information on a media file using GStreamer’s discovery interface using gst-pop-discovery, offering an easy and remote-capable media discovery system for your setup.

Playback #

And of course, it can serve as an alternative to the gst-play tool, with gst-pop-play, allowing you to instantiate as many playback sessions as you need, with the ability to use any sink you want.

The possibilities are vast: provide multimedia services such as transcoding, media analysis, or remote playback to your setup using the power of a remote machine, all controllable from your terminal or a GUI such as GstPipelineStudio.

Cross-Platform and Language Support #

The tool is written in Rust for memory safety and reliability and provides client libraries in both Rust and C, offering all the flexibility needed for your existing applications. It is available on Linux (deb, rpm or docker), MacOS, and Windows, see the release page.

Examples #

# Start the daemon
gst-pop

# Launch a pipeline
gst-pop launch videotestsrc ! autovideosink

# Inspect an element
gst-pop inspect videotestsrc

# Discover media info
gst-pop discover file:///path/to/video.mp4

# Play a media file
gst-pop play file:///path/to/video.mp4

# Create a pipeline with the client
gst-popctl create "videotestsrc ! autovideosink"

# List pipelines on a remote daemon
gst-popctl list

# Play the pipeline with ID 0
gst-popctl play 0

# Stop the pipeline with ID 0
gst-popctl stop 0

# Run via Docker
docker run -d -p 9000:9000 ghcr.io/dabrain34/gstpop:latest

Give it a try and let us know what ideas you might have — we have plenty coming, so stay tuned.

As usual, if you would like to learn more about gst-pop, GStreamer, or any other open multimedia framework, please contact us!

GStreamer 1.28.2 stable bug fix release

Last modified: a month ago

From GStreamer News by GStreamer

The GStreamer team is pleased to announce another bug fix release in the new stable 1.28 release series of your favourite cross-platform multimedia framework!

This release only contains bug fixes as well as a number of security fixes. It should be safe to update from 1.28.x, and we recommend you do so at your earliest convenience.

Highlighted bugfixes:

Various security fixes and playback fixes
audioencoder: allow change of channel configuration with avenc_aac
audioinvert: fix float format handling
h264parse, h265parse, baseparse: Preserve upstream buffer duration if possible
compositor: fix segfault with force-live=true and no sink pads (regression)
fallbacksrc: send select-streams event to collection source element directly
hlsdemux2: fix seekable range for live HLS streams
glupload: Fix linking glupload with restrictive caps filter
nvcodec: Add capability caching to speed up plugin initialization
RTP and RTCP packet handling fixes
RTSP server fixes for clean-up of timed out play requests
video-converter: fix I420/A420 BGRA/ARGB output on big-endian
qtdemux: fix invalid WebVTT timestamps, and other fixes
qmlgl6sink: Qt6GLVideoItem caps update handling fixes
threadshare udp sink and source fixes
transcriberbin and speechmatics text-to-speech fixes and improvements
videorate: Fix wrong caps in case of PTS going backward
vtdec: more Apple VideoToolbox decoder fixes
wavparse: Fix parsing of RF64 wave files
wasapi2sink: Ignore transient device errors from default device
waylandsink: various fixes and improvements
WebRTC DTLS robustness/stability improvements
Cerbero: Various inno Windows installer fixes and improvements; new 'gstreamer_bundle' wheels meta-package
Various bug fixes, build fixes, memory leak fixes, and other stability and reliability improvements

See the GStreamer 1.28.2 release notes for more details.

Binaries for Android, iOS, Mac OS X and Windows will be available shortly and will be published on the Downloads page.

Introducing GstPipelineStudio 0.5.1

Last modified: a month ago

From Dabrain34's igalian blog by Stéphane Cerveau

GstPipelineStudio 0.5.1 #

Your GStreamer Pipelines, at a Glance #

New version of GstPipelineStudio is out!

After months of improvements and intermediate releases since October 2024, it’s time for an official announcement for 0.5.1.

GstPipelineStudio provides a visual interface to GStreamer, the marvelous Swiss Army knife of multimedia pipelines. But what is GStreamer exactly?

What is GStreamer? #

    filesrc --> mp3dec --> audiosink

GStreamer is written in C, with a growing ecosystem of plugins in Rust and bindings for languages such as Python and C++. It ships with many command-line tools to build and test pipelines, but validating ideas still requires writing C/Rust/Python code or using the command line. That’s where GstPipelineStudio comes in — providing a visual interface to help newcomers discover and adopt GStreamer, and skilled developers debug their pipelines.

The Story Behind GstPipelineStudio #

The GstPipelineStudio project started in 2021 with the idea to provide the same environment that brought me to multimedia: GraphEdit on Windows with DirectShow. Indeed, DirectShow and GStreamer share the same idea of plugins sharing data. As I started to implement a DVB decoder with DirectShow, the graphical interface made it easier to validate which filters to use. But DirectShow only works natively on Windows, unlike GStreamer which can run everywhere — Linux, macOS, Windows, iOS, Android, and even low-power devices such as a Raspberry Pi.

GstPipelineStudio aims to work on all these platforms, easing GStreamer adoption where its use was not always obvious, such as on Windows. GStreamer is based on GLib, a cross-platform toolkit that abstracts system calls and provides a common base layer. For the GUI, since Rust was offering very good bindings, GTK was the natural choice to achieve cross-platform support. There was an attempt to create a GUI using Qt, named pipeviz, which has been a great inspiration for GPS, but the Qt Rust bindings were not mature enough, unlike those for GTK.

The first official release of GPS was 0.3.4, and you can read its official blog post published in 2023. Since then, we have been devoted to providing new features to bring GPS to another level.

A first revision, GPS 0.4.0, came out before Christmas 2024 with a refreshed interface — including zoom on the graph and contextual menus on any element or pad of the pipeline. The versions of GStreamer and GTK have also been updated to get the latest plugins and features from both frameworks. A new icon has also been introduced to let GPS dive into another dimension.

What’s New in 0.5.1 #

0.5.1 is here, and it brings a game changer: the dot file reader. Previously, it was possible to open a command-line pipeline or save/open pipelines with an XML-based format, but now you can also open the generated dot files, the native format in GStreamer, to display a pipeline graphically. This is still a beta version as it can only display high-level pipelines such as those described with the command line. Nevertheless this is a great improvement and allows users to see their pipeline and manipulate it.

Here is the list of other improvements you’ll find in this release:

Open Dot Folder menu entry for loading dot files from the common GStreamer folder
Remote pipeline introspection using the GStreamer tracers
App ID renamed to dev.mooday.GstPipelineStudio
Improved look and feel of the interface
Auto-connect on node click (node-link-request)
File selector button for location property
Logger copy to clipboard with multi-selection support
Auto-arrange elements on screen
GStreamer 1.28.0
GTK 4.20
RPM and AppImage artifacts

Remote Pipeline Introspection #

The remote pipeline introspection is a new way to connect to the WebSocket tracer available in GStreamer, pipeline-snapshot. In addition to dot file loader, it allows you to visualize a pipeline directly in GPS from an external process running with the tracer.

As you may know, GStreamer pipelines can be very complex, so one dream was to be able to visualize them live. There is already a mini tool in GStreamer named gst-dots-viewer which creates a web server to display pipelines in a browser from the $XDG_CACHE_DIR folder, see the blog post from Thibault about it.

Now with GPS, you can directly create a WebSocket server and let the tracer connect to it and provide available dot files to be displayed.

For example, to visualize a running pipeline in GPS:

In GPS: Menu → Remote Pipeline → Listen…
Enter the WebSocket address (e.g., ws://localhost:8080)
Run your GStreamer pipeline with the pipeline-snapshot tracer:

GST_TRACERS="pipeline-snapshot(dots-viewer-ws-url=ws://localhost:8080)" \
  gst-launch-1.0 videotestsrc ! autovideosink

The pipeline graph will appear in GPS once the tracer connects.

These dot files are converted to GPS pipelines, making it possible to modify them. That’s a first step for real interaction with GStreamer pipelines — and there are more features coming in the pipeline.

Coming in 0.6.0 #

In parallel, a new tool named GstPrinceOfParser (gst-pop) has also been implemented. This tool allows remote control of all pipelines instantiated locally or over the network. It is a multi-pipeline daemon accessible through WebSocket or D-Bus, aiming to centralize all GStreamer options in one tool for launch, inspection, and discovery. GstPipelineStudio will be able to control this daemon, making gst-pop the backbone of the GStreamer GUI. A blog post will come soon, stay tuned…

A new tracer is under development: a WebSocket server that will allow you to inspect and interact with the current pipeline — modify the play state (pause, seek), fetch the logs, and of course see the current dot representation, all from the GstPipelineStudio interface.

In addition, more features are on the way: a new look and feel based on libadwaita on Linux/macOS/Windows, better localization, an auto-plug feature, seek and step-by-step playback, and bug fixes on demand.

We hope you’ll enjoy this new version of the tool and please feel free to propose new features with an RFC here or merge requests here.

Stay tuned for the next GStreamer Spring hackfest 2026 coming soon (end of May) where new features and deeper interaction with GStreamer pipelines will be discussed.

As usual, if you would like to learn more about GstPipelineStudio, GStreamer, or any other open multimedia framework, please contact us!

Using AI to create some hardware tools and bring back the past

Last modified: 2 months ago

From Christian F.K. Schaller by Christian Fredrik Kalager Schaller (Christian Schaller)

As I talked about in a couple of blog posts now I been working a lot with AI recently as part of my day to day job at Red Hat, but also spending a lot of evenings and weekend time on this (sorry kids pappa has switched to 1950’s mode for now). One of the things I spent time on is trying to figure out what the limitations of AI models are and what kind of use they can have for Open Source developers.

One thing to mention before I start talking about some of my concrete efforts is that I more and more come to conclude that AI is an incredible tool to hypercharge someone in their work, but I feel it tend to fall short for fully autonomous systems. In my experiments AI can do things many many times faster than you ordinarily could, talking specifically in the context of coding here which is what is most relevant for those of us in the open source community.

So one annoyance I had for years as a Linux user is that I get new hardware which has features that are not easily available to me as a Linux user. So I have tried using AI to create such applications for some of my hardware which includes an Elgato Light and a Dell Ultrasharp Webcam.

I found with AI and this is based on using Google Gemini, Claude Sonnet and Opus and OpenAI codex, they all required me to direct and steer the AI continuously, if I let the AI just work on its own, more often than not it would end up going in circles or diverging from the route it was supposed to go, or taking shortcuts that makes wanted output useless.On the other hand if I kept on top of the AI and intervened and pointed it in the right direction it could put together things for me in very short time spans.
My projects are also mostly what I would describe as end leaf nodes, the kind of projects that already are 1 person projects in the community for the most part. There are extra considerations when contributing to bigger efforts, and I think a point I seen made by others in the community too is that you need to own the patches you submit, meaning that even if an AI helped your write the patch you still need to ensure that what you submit is in a state where it can be helpful and is merge-able. I know that some people feel that means you need be capable of reviewing the proposed patch and ensuring its clean and nice before submitting it, and I agree that if you expect your patch to get merged that has to be the case. On the other hand I don’t think AI patches are useless even if you are not able to validate them beyond ‘does it fix my issue’.

My friend and PipeWire maintainer Wim Taymans and I was talking a few years ago about what I described at the time as the problem of ‘bad quality patches’, and this was long before AI generated code was a thing. Wim response to me which I often thought about afterwards was “a bad patch is often a great bug report”. And that would hold true for AI generated patches to. If someone makes a patch using AI, a patch they don’t have the ability to code review themselves, but they test it and it fixes their problem, it might be a good bug report and function as a clearer bug report than just a written description by the user submitting the report. Of course they should be clear in their bug report that they don’t have the skills to review the patch themselves, but that they hope it can be useful as a tool for pinpointing what isn’t working in the current codebase.

Anyway, let me talk about the projects I made. They are all found on my personal website Linuxrising.org a website that I also used AI to update after not having touched the site in years.

Elgato Light GNOME Shell extension

Elgato Light GNOME Shell extension

The first project I worked on is a GNOME Shell extension for controlling my Elgato Key Wifi Lamp. The Elgato lamp is basically meant for podcasters and people doing a lot of video calls to be able to easily configure light in their room to make a good recording. The lamp announces itself over mDNS, and thus can be controlled via Avahi. For Windows and Mac the vendor provides software to control their lamp, but unfortunately not for Linux.

There had been GNOME Shell extensions for controlling the lamp in the past, but they had not been kept up to date and their feature set was quite limited. Anyway, I grabbed one of these old extensions and told Claude to update it for latest version of GNOME. It took a few iterations of testing, but we eventually got there and I had a simple GNOME Shell extension that could turn the lamp off and on and adjust hue and brightness. This was a quite straightforward process because I had code that had been working at some point, it just needed some adjustments to work with current generation of GNOME Shell.

Once I had the basic version done I decided to take it a bit further and try to recreate the configuration dialog that the windows application offers for the full feature set which took me quite a bit of back and forth with Claude. I found that if I ask Claude to re-implement from a screenshot it recreates the functionality of the user interface first, meaning that it makes sure that if the screenshot has 10 buttons, then you get a GUI with 10 buttons. You then have to iterate both on the UI design, for example telling Claude that I want a dark UI style to match the GNOME Shell, and then I also had to iterate on each bit of functionality in the UI. Like most of the buttons in the UI didn’t really do anything from the start, but when you go back and ask Claude to add specific functionality per button it is usually able to do so.

Elgato Light Settings Application

So this was probably a fairly easy thing for the AI because all the functionality of the lamp could be queried over Avahi, there was no ‘secret’ USB registers to be set or things like that.
Since the application was meant to be part of the GNOME Shell extension I didn’t want to to have any dependency requirements that the Shell extension itself didn’t have, so I asked Claude to make this application in JavaScript and I have to say so far I haven’t seen any major differences in terms of the AIs ability to generate different languages. The application now reproduce most of the functionality of the Windows application. Looking back I think it probably took me a couple of days in total putting this tool together.

Dell Ultrasharp Webcam 4K

Dell UltraSharp 4K settings application for Linux

The second application on the list is a controller application for my Dell UltraSharp Webcam 4K UHD (WB7022). This is a high end Webcam I that have been using for a while and it is comparable to something like the Logitech BRIO 4K webcam. It has mostly worked since I got it with the generic UVC driver and I been using it for my Google Meetings and similar, but since there was no native Linux control application I could not easily access a lot of the cameras features. To address this I downloaded the windows application installer and installed it under Windows and then took a bunch of screenshots showcasing all features of the application. I then fed the screenshots into Claude and told it I wanted a GTK+ version for Linux of this application. I originally wanted to have Claude write it in Rust, but after hitting some issues in the PipeWire Rust bindings I decided to just use C instead.

I took me probably 3-4 days with intermittent work to get this application working and Claude turned out to be really good and digging into Windows binaries and finding things like USB property values. Claude was also able to analyze the screenshots and figure out the features the application needed to have. It was a lot of trial and error writing the application, but one way I was able to automate it was by building a screenshot option into the application, allowing it to programmatically take screenshots of itself. That allowed me to tell Claude to try fixing something and then check the screenshot to see if it worked without me having to interact with the prompt. Also to get the user interface looking nicer, once I had all the functionality in I asked Claude to tweak the user interface to follow the guidelines of the GNOME Human Interface Guidelines, which greatly improved the quality of the UI.

At this point my application should have almost all the features of the Windows application. Since it is using PipeWire underneath it is also tightly integrated with the PipeWire media graph, allowing you to see it connect and work with your application in PipeWire patchbay applications like Helvum. The remaining features are software features of Dell’s application, like background removal and so on, but I think that if I decided to to implement that it should be as a standalone PipeWire tool that can be used with any camera, and not tied to this specific one.

Red Hat Planet

The application shows the worlds Red Hat offices and include links to latest Red Hat news.

The next application on my list is called Red Hat Planet. It is mostly a fun toy, but I made it to partly revisit the Xtraceroute modernisation I blogged about earlier. So as I mentioned in that blog, Xtraceroute while cute isn’t really very useful IMHO, since the way the modern internet works rarely have your packets jump around the world. Anyway, as people pointed out after I posted about the port is that it wasn’t an actual Vulkan application, it was a GTK+ application using the GTK+ Vulkan backend. The Globe animation itself was all software rendered.

I decided if I was going to revisit the Vulkan problem I wanted to use a different application idea than traceroute. The idea I had was once again a 3D rendered globe, but this one reading the coordinates of Red Hats global offices from a file and rendering them on the globe. And alongside that provide clickable links to recent Red Hat news items. So once again maybe not the worlds most useful application, but I thought it was a cute idea and hopefully it would allow me to create it using actual Vulkan rendering this time.

Creating this turned out to be quite the challenge (although it seems to have gotten easier since I started this effort), with Claude Opus 4.6 being more capable at writing Vulkan code than Claude Sonnet, Google Gemini or OpenAI Codex was when I started trying to create this application.
When I started this project I had to keep extremely close tabs on the AI and what is was doing in order to force it to keep working on this as a Vulkan application, as it kept wanting to simplify with Software rendering or OpenGL and sometimes would start down that route without even asking me. That hasn’t happened more recently, so maybe that was a problem of AI of 5 Months ago.

I also discovered as part of this that rendering Vulkan inside a GTK4 application is far from trivial and would ideally need the GTK4 developers to create such a widget to get rendering timings and similar correct. It is one of the few times I have had Claude outright say that writing a widget like that was beyond its capabilities (haven’t tried again so I don’t know if I would get the same response today). So I started moving the application to SDL3 first, which worked as I got a spinning globe with red dots on, but came with its own issues, in the sense that SDL is not a UI toolkit as such. So while I got the globe rendered and working the AU struggled badly with the news area when using SDL.

So I ended up trying to port the application to Qt, which again turned out to be non-trivial in terms of how much time it took with trial and error to get it right. I think in my mind I had a working globe using Vulkan, how hard could it be to move it from SDL3 to Qt, but there was a million rendering issues. In fact I ended up using the Qt Vulkan rendering example as a starting point in the end and then ‘porting’ the globe over bit by bit, testing it for each step, to finally get a working version. The current version is a Vulkan+Qt app and it basically works, although it seems the planet is not spinning correctly on AMD systems at the moment, while it seems to work well on Intel and NVIDIA systems.

WMDock

WmDock fullscreen with config application.

This project came out of a chat with Matthias Clasen over lunch where I mused about if Claude would be able to bring the old Window Maker dockapps to GNOME and Wayland. Turns out the answer is yes although the method of doing so changed as I worked on it.

My initial thought was for Claude to create a shim that the old dockapps could be compiled against, without any changes. That worked, but then I had a ton of dockapps showing up in things like the alt+tab menu. It also required me to restart my GNOME Shell session all the time as I was testing the extension to house the dockapps. In the end I decided that since a lot of the old dockapps don’t work with modern Linux versions anyway, and thus they would need to be actively ported, I should accept that I ship the dockapps with the tool and port them to work with modern linux technologies. This worked well and is what I currently have in the repo, I think the wildest port was porting the old dockapp webcam app from V4L1 to PipeWire. Although updating the soundcontroller from ESD to PulesAudio was also a generational jump.

XMMS resuscitated

XMMS brought back to life

So the last effort I did was reviving the old XMMS media player. I had tried asking Claude to do this for Months and it kept failing, but with Opus 4.6 it plowed through it and had something working in a couple of hours, with no input from me beyond kicking it off. This was a big lift,moving it from GTK2 and Esound, to GTK4, GStreamer and PipeWire. One thing I realized is that a challenge with bringing an old app back is that since keeping the themeable UI is a big part of this specific application adding new features is a little kludgy. Anyway I did set it up to be able to use network speakers through PipeWire and also you can import your Spotify playlists and play those, although you need to run the Spotify application in the background to be able to play sound on your local device.

Monkey Bubble
Monkey Bubble game
Monkey Bubble was a game created in the heyday of GNOME 2 and while I always thought it was a well made little game it had never been updated to never technologies. So I asked Claude to port it to GTK4 and use GStreamer for audio.This port was fairly straightforward with Claude having little problems with it. I also asked Claude to add highscores using the libmanette library and network game discovery with Avahi. So some nice little.improvements.

All the applications are available either as Flatpaks or Fedora RPMS, through the gitlab project page, so I hope people enjoy these applications and tools. And enoy the blasts from the past as much as I did.

Worries about Artifical Intelligence

When I speak to people both inside Red Hat and outside in the community I often come across negativity or even sometimes anger towards Artificial Intelligence in the coding space. And to be clear I to worry about where things could be heading and how it will affect my livelihood too, so I am not unsympathetic to those worries at all. I probably worry about these things at least a few times a day. At the same time I don’t think we can hide from or avoid this change, it is happening with or without us. We have to adapt to a world where this tool exists, just like our ancestors have adapted to jobs changing due to industrialization and science before. So do I worry about the future, yes I do. Do I worry about how I might personally get affected by this? yes, I do. Do I worry about how society might change for the worse due to this? yes, I do. But I also remind myself that I don’t know the future and that people have found ways to move forward before and society has survived and thrived. So what I can control is that I try to be on top of these changes myself and take advantage of them where I can and that is my recommendation to the wider open source community on this too. By leveraging them to move open source forward and at the same time trying to put our weight on the scale towards the best practices and policies around Artificial Intelligence.

The Next Test and where AI might have hit a limit for me.

So all these previous efforts did teach me a lot of tricks and helped me understand how I can work with an AI agent like Claude, but especially after the success with the webcam I decided to up the stakes and see if I could use Claude to help me create a driver for my Plustek OpticFilm 8200i scanner. So I have zero backround in any kind of driver development and probably less than zero in the field of scanner driver specifically. So I ended up going down a long row of deadends on this journey and I to this day has not been able to get a single scan out of the scanner with anything that even remotely resembles the images I am trying to scan.

My idea was to have Claude analyse the Windows and Mac driver and build me a SANE driver based on that, which turned out to be horribly naive and lead nowhere. One thing I realized is that I would need to capture USB traffic to help Claude contextualize some of the findings it had from looking at the Windows and Mac drivers.I started out with Wireshark and feeding Claude with the Wireshark capture logs. Claude quite soon concluded that the Wireshark logs wasn’t good enough and that I needed lower level traffic capture. Buying a USB packet analyzer isn’t cheap so I had the idea that I could use one of the ARM development boards floating around the house as a USB relay, allowing me to perfectly capture the USB traffic. With some work I did manage to set up my LibreComputer Solitude AML-S905D3-CC arm board going and setting it in device mode. I also had a usb-relay daemon going on the board. After a lot of back and forth, and even at one point trying to ask Claude to implement a missing feature in the USB kernel stack, I realized this would never work and I ended up ordering a Beagle USB 480 USB hardware analyzer.

At about the same time I came across the chipset documentation for the Genesys Logic GL845 chip in the scanner. I assumed that between my new USB analyzer and the chipset docs this would be easy going from here on, but so far no. I even had Claude decompile the windows driver using ghidra and then try to extract the needed information needed from the decompiled code.
I bought a network controlled electric outlet so that Claude can cycle the power of the scanner on its own.

So the problem here is that with zero scanner driver knowledge I don’t even know what I should be looking for, or where I should point Claude to, so I keept trying to brute force it by trial and error. I managed to make SANE detect the scanner and I managed to get motor and lamp control going, but that is about it. I can hear the scanner motor running and I ask for a scan, but I don’t know if it moves correctly. I can see light turning on and off inside the scanner, but I once again don’t know if it is happening at the correct times and correct durations. And Claude has of course no way of knowing either, relying on me to tell it if something seems like it has improved compared to how it was.

I have now used Claude to create two tools for Claude to use, once using a camera to detect what is happening with the light inside the scanner and the other recording sound trying to compare the sound this driver makes compared to the sounds coming out when doing a working scan with the MacOS X application. I don’t know if this will take me to the promised land eventually, but so far I consider my scanner driver attempt a giant failure. At the same time I do believe that if someone actually skilled in scanner driver development was doing this they could have guided Claude to do the right things and probably would have had a working driver by now.

So I don’t know if I hit the kind of thing that will always be hard for an AI to do, as it has to interact with things existing in the real world, or if newer versions of Claude, Gemini or Codex will suddenly get past a threshold and make this seem easy, but this is where things are at for me at the moment.

GStreamer 1.29.1 unstable development snapshot

Last modified: 2 months ago

From GStreamer News by GStreamer

The GStreamer team is pleased to announce the first development snapshot in the API/ABI-unstable 1.29 release series.

The API/ABI-unstable 1.29 release series is for testing and development purposes in the lead-up to the stable 1.30 series which is scheduled for release in Q4 2026. Any newly-added API can still change until that point.

This development release is primarily for developers and early adopters, and distros should probably not package it.

Highlighted changes:

ac4parse: New basic AC-4 parser element, plus AC-4 typefinding
analytics: New GstAnalyticsMtd derivative to represent grouping of Mtd's and Keypoint
Parse HDR10+ metadata out of H.265 and AV1 bitstreams
Matroska demuxer: Can build a dynamic seek index now if needed
New h264seiinserter and h265seiinserter elements that support both closed captions and unregistered user data SEIs
Add HLS WebVTT sink element to the hlssink3 plugin
New plugin for general purpose compress/decompress
New udpsrc2 element with better performance for high bitrate streams
New VA-API overlay compositor
Opus audio support for F32 and S24_32 samples and 96kHz sample rate
Playbin3 subtitle switching fixes
Bump ranks of the new Rust RTP (de)payloaders to PRIMARY and default to mtu 1200 for payloaders
rtspsrc2 authentication support
GstPlay track selection notification improvements
QML6 GL Source now supports navigation events
QuickTime demuxer gained Bayer support
Splitmuxsink now includes the start and end timecodes in fragment-opened and closed messages
srtpdec gained a way to invalidate keys for a specific SSRC
The APE tag demuxer can extract cover art tags now
translationbin can control the textaccumulate latency now via a new property
Allow device providers rank override using GST_PLUGIN_FEATURE_RANK
cerbero gained support for Android on RISC-V64
Countless bug fixes, build fixes, memory leak fixes, and other stability and reliability improvements

Binaries for Android, iOS, Mac OS X and Windows will be made available shortly at the usual location.

Release tarballs can be downloaded directly here:

As always, please give it a spin and let us know of any issues you run into by filing an issue in GitLab.

GStreamer Spring Hackfest on 29-31 May 2026 in Nice, France

Last modified: 2 months ago

From GStreamer News by GStreamer

The GStreamer project is thrilled to announce that there will be a spring hackfest on Friday-Sunday 29-31 May 2026 in Nice, France.

For more details and latest updates check out the announcement on Discourse.

We will announce any further updates on Discourse, but you can also follow us on Bluesky and on on Mastodon.

We hope to see you in Nice!

Please spread the word!

GStreamer 1.26.11 old-stable bug fix release

Last modified: 2 months ago

From GStreamer News by GStreamer

The GStreamer team is pleased to announce another bug fix release in the now old-stable 1.26 release series of your favourite cross-platform multimedia framework!

Please note that the 1.26 old-stable series is no longer actively maintained and has been superseded by the GStreamer 1.28 stable series now.

This release only contains bugfixes, and it should be safe to update from 1.26.x.

Highlighted bugfixes:

Security fixes for the JPEG, H.265 and H.266 video parsers and the DVB subtitle overlay
Security fixes for the ASF, RealMedia and QuickTime/MP4 demuxers and RIFF library
Security fixes for the WAV audio parser and the RTP QDM2 depayloader
avviddec: Handle field/order changes in mixed interlace mode
awstranscriber2: workaround AWS SDK regression, plus transcriberbin latency reporting fixes
videodecoder: Handle recovery from temporary reordered output
Decodebin3 collection handling improvements
fmp4mux: fragment duration handling improvements for HLS
Gap event handling fixes in audio/video decoders and aggregator/muxers
GstPlay stream selection handling improvements
Input selector fixes
MPEG-TS demuxer continuity counter handling fixes
Reverse playback fixes for the MP4 demuxer
RTP, SRTP, RTCP and RTSP and Transport-Wide Congestion Control fixes
Webrtcsink: better handling of platform-specific NVIDIA encoder differences, plus other improvements
Apple macOS VideoToolbox decoder improvements
Python bindings: more type hinting
Various bug fixes, build fixes, memory leak fixes, and other stability and reliability improvements

See the GStreamer 1.26.11 release notes for more details.

Binaries for Android, iOS, Mac OS X and Windows will be available shortly and will be published on the Downloads page.

GStreamer 1.28.1 stable bug fix release

Last modified: 2 months ago

From GStreamer News by GStreamer

The GStreamer team is pleased to announce the first bug fix release in the new stable 1.28 release series of your favourite cross-platform multimedia framework!

This release only contains bug fixes as well as a number of security fixes. It should be safe to update from 1.28.0, and we recommend you do so at your earliest convenience.

Highlighted bugfixes:

Various security fixes and playback fixes
Add new whisper-based speech-to-text transcription element
Add new debugseimetainserter plugin for testing SEI meta insertion
Fix scaling and resizing with UIView on EAGL and Vulkan
Reverse playback and gap handling fixes in various components
avviddec: Handle field/order changes in mixed interlace mode
awstranscriber2: workaround for suspected Rust SDK regression
cudaupload, cudadownload: Fix CUDA/GL interop copy path
decodebin3: Fix switch to smaller collections and improve collection change on existing pad
devenv: Add a subproject for providing the LunarG MoltenVK SDK
livesync: fixes and reverse playback handling; ignore upstream latency when upstream is not live
objectdetectionoverlay: add support for rotated bounding boxes
qml6glsrc: Fix rendering of scene with clipped items
speechmatics: allow configuring audio events such as detecting applause, laughter and music
livekit webrtc: emit session-requested only for Producer role
tsdemux: Fix Continuity Counter handling and handle clock change/resets without skew correction
v4l2: Add support for AV1 stateful V4l2 decoder
vpxdec: Support downstream pools with alignment requirements
vtdec, vtenc: Lots of Apple VideoToolbox decoder and encoder fixes
applemedia build improvements, patches for tvOS support, tvos cross file
wavpack: Fix handling of format changes, extend parser with new features, handle non-S32 samples
webrtcsink: allow specifying custom headers to signalling server
webrtcsink: negotiate profile and level for input encoded in H.264
webrtcsrc: add request type pads and allow sending encoded data downstream
cerbero: wheel: Add a new `gstreamer_meta` package with fewer deps
Various bug fixes, build fixes, memory leak fixes, and other stability and reliability improvements

See the GStreamer 1.28.1 release notes for more details.

Binaries for Android, iOS, Mac OS X and Windows will be available shortly and will be published on the Downloads page.

Python Wheels for GStreamer

Last modified: 3 months ago

From Centricular Devlog by Nirbheek Chauhan (Centricular)

GStreamer has shipped binaries for all the major platforms for many years now: Windows, Android, macOS, iOS. Linux packages are, of course, handled by all the various distros.

However, if you wanted to use the Python bindings on macOS or Windows, you had to jump through hoops. Till now. GStreamer 1.28.0 ships Python wheels supporting Python 3.9, 3.10, 3.11, 3.12, 3.13, 3.14 on macOS (GIL) and Windows (GIL and free-threading). All you need to do is to run:

python3 -m pip install gstreamer-bundle==1.28.0

And that's it! You will have a complete GStreamer install, with all the plugins you expect on macOS and Windows, and all utilities including gst-launch-1.0 gst-inspect-1.0 gst-device-monitor-1.0 ges-launch-1.0 and so on.

The gstreamer-bundle package is a complete distribution, so it will pull in all the plugins, libraries, cmd-line tools, etc. If you want to depend on a more minimal GStreamer installation or you want to avoid pulling in GPL or known-patent-encumbered ("restricted") plugins, you can use the gstreamer-meta package. That puts plugins behind "extras" like gpl cli restricted gtk4 etc.

Many thanks to Pollen Robotics for sponsoring this work. The Reachy Mini companion robot by Pollen Robotics/Hugging Face uses GStreamer via the Python bindings and is the first production user of these wheels!

We're very excited to see more people make use of these wheels.

Read on for technical details on how all this was accomplished.

Step 1: Ship Python bindings via introspection on macOS and Windows

After many years, Python bindings support was re-introduced in GStreamer 1.26 and was shipped with the installers on macOS and Windows. This required significant work:

Re-introduce gobject-introspection support
Re-introduce Python bindings support and ship it on macOS + Windows
Load typelibs relocatably

Thanks to Amy for doing the bulk of the work here, and to everyone else who contributed towards this over the years: Andoni, Nacho, Thibault, Tjitte, and more that I'm sure I've missed.

Step 2: Build wheels for all supported Python versions

When shipping Python bindings for C libraries, it is necessary to also ship the accompanying libraries and plugins, lest ABI mismatches and incompatibilities arise. That's why the wheels we ship constitute a complete GStreamer distribution, including all plugin dependencies such as GTK4. This means you also have Python bindings for GTK4 available on macOS and Windows.

This wasn't easy to accomplish, especially because PyGObject doesn't use the limited Python C API. That means we can't just build for Python 3.9 and call it a day. We need separate wheels for each Python version × target.

The count goes something like this:

We split the gstreamer libraries, plugins, and dependencies across 11 wheels
We support 16 Python versions: 3.9 3.10 3.11 3.12 3.13 3.13t 3.14 3.14t
And 3 platforms: macOS universal, Windows MSVC x86_64, Windows MSVC x86

That's 11 × 16 × 3 = 528 wheels. That is absolutely untenable!

So we have to do some chicanery to trim that down:

Put everything that links to or loads Python in one wheel called gstreamer_python, so that everything else is agnostic to the Python version being used
Override py_limited_api to be cp39 for all agnostic wheels and mark them as not containing ext modules
Rebuild the recipes responsible for generating libraries or plugins that go into gstreamer_python with each Python version we need to support
On macOS, override plat_name to be macosx_10_13_universal2 for all agnostic wheels even if the Python version we're using doesn't support macOS 10.13, so that they can be reused across all Python versions

That brings us down to 92 wheels. Still quite a lot, but now it's a manageable number!

The long-term solution is to port PyGObject over to the Limited Python C API—which is quite a big undertaking—but should allow us to skip most of this for Python >=3.12.

Thanks to Amy once again for doing most of the work to make this possible, and to Pollen Robotics for sponsoring us to do it. Here are the relevant merge requests:

Step 3: Linux support

You may have noticed that there was no mention of wheels targeting Linux. That's a much harder problem to solve than shipping on macOS or Windows, so we had to punt it for a later release, likely one of the 1.28.x stable releases.

We're planning to target manylinux_2_28 and support Python 3.9+, but there are still unknowns that could throw a spanner in our plans. For instance:

GStreamer often utilizes subtle characteristics of the Linux graphics stack for good performance, which may break by targeting such an old base.
The difference in library versions shipped with the wheels vs on the system may cause subtle or catastrophic breakage in apps that also load system libraries.

We're hoping that we can overcome all this and ship something that allows users on any Linux distro to get a functional GStreamer just by doing pip install gstreamer-bundle.

In the meantime, please continue to use the distro-provided GStreamer packages and Python bindings, and if they're missing plugins or are too old, please contact your distro maintainer(s).

GStreamer Whisper Speech-to-Text Element

Last modified: 3 months ago

From Centricular Devlog by Mathieu Duponchelle (Centricular)

At the '25 GStreamer conference I gave a talk titled Costly Speech: an introduction.

This was in reference to the fact that all the speech-related elements used in the pipeline I presented were wrappers around for-pay cloud services or for-pay on-site servers.

At the end of the talk, I mentioned that plans for future development included new, "free" backends. The first piece of the puzzle was a Whisper-based transcriber.

I have the pleasure to announce that it is now implemented and published, thank you to Ray Tiley from Tightrope Media Systems for sponsoring this work!

Design / Implementation

The main design goal was for the new transcriber to behave identically to the existing transcribers, in particular:

It needed to output timestamped words one at a time
It needed to handle live streams with a configurable latency

In order to fulfill that second requirement, the implementation has to feed the model with chunks of a configurable duration.

This approach works well for constraining the latency, but didn't give the best results accuracy-wise, as words close to the chunk boundaries would often go misssing, poorly transcribed or duplicated.

To address this, the implementation uses two mechanisms:

It always feeds the previous chunk when running inference for a given chunk
It extracts tokens from a sliding window at a configurable distance from the "live edge"

Here's an example with a 4-second chunk duration and a 1 second live edge offset:

0     1     2     3     4     5     6     7     8
| 4-second chunk        | 4-second chunk        |
                  | 4-second token window |

This approach greatly mitigates the boundary issues, as the tokens are always extracted from a "stable" region of the model's output.

With the above settings, the element reports a 5-second latency, to which a configurable processing latency is added. That processing latency is dependent on the hardware, on my machine using CUDA and a NVIDIA RTX 5080 GPU processing time is around 10x real time, which means 1 second processing latency is sufficient.

The obvious drawback of this approach is a doubling of the resource usage as each chunk is fed twice through the inference model, it could be further refined to only feed part of the previous chunk and thus increase performance without sacrificing accuracy.

As the interface of the element follows that of other transcribers, it can be used as an alternative transcriber within transcriberbin.

Future prospects

The biggest missing piece to bring the transcriber to feature parity with other transcribers such as the speechmatics-based one is speaker diarization (~ identification).

Whisper itself does not support diarization. The tinydiarize project aimed to finetune models to address this, but it has unfortunately been put on hold for now, and only supported detecting speaker changes, not identifying individual speakers.

It is not clear at the moment what would be the best open source option to integrate for this task. Models such as NVidia's streaming sortformer are promising, but limited to four speakers for example.

We are very interested in suggestions on this front. Don't hesitate to hit us up if you have any or are interested in sponsoring further improvements to our growing stack of speech-related elements!

New GStreamer icecastsink with AAC support

Last modified: 3 months ago

From Centricular Devlog by Tim-Philipp Müller (Centricular)

Icecast is a Free and Open Source multimedia streaming server, primarily used for audio and radio streaming over HTTP(S).

In GStreamer you can send an audio stream to such a server with the shout2send sink element based on libshout2.

This works perfectly fine, but has one limitation: it does not support the AAC audio codec, which for some use cases and target systems is the preferred audio codec. This is because libshout2 does not support it and will not support it, at least not officially upstream.

Some streaming servers such as the Rocket Streaming Audio Server (RSAS) do support this though, and as such it would be nice to be able to send streams to them in AAC format as well.

Enter icecastsink, which is a new sink element written in Rust to send audio to an Icecast server.

It supports sending AAC audio in addition to Ogg/Vorbis, Ogg/Opus, FLAC and MP3, and also has support for automatic re-connect in case the server kicks off the client, which might happen if the client doesn't send data for a while.

Give it a spin and let us know how it goes!

GStreamer 1.28 Natively Supports Windows ARM64

Last modified: 3 months ago

From Centricular Devlog by Nirbheek Chauhan (Centricular)

One of the many items on my "nice-to-have" TODO list has been shipping a GStreamer installer that natively targets Windows ARM64. Cerbero has had support for cross-compiling to Windows ARM64 since GStreamer 1.16 in the form of targeting UWP. However, once that was laid to rest with GStreamer 1.22, we didn't start shipping Windows ARM64 installers instead because it was looking like Microsoft's ARM64 experiment had also failed.

Lately, however, there's been a significant resurgence of ARM64 laptops that run Windows, and they seem to actually have compelling features for some types of users. So I spent a day or two and reinstated support for Windows ARM64 built with MSVC in Cerbero.

My purpose was just to find the shortest path to getting that to a usable state, so a bunch of plugins are missing. In particular all Rust plugins had to be disabled due to an issue building the ring crate. I am optimistic that someone will come along and help fix these issues 😉

You can find the installer at the usual location: https://gstreamer.freedesktop.org/download/#windows

Note that these binaries are cross-compiled from x86_64, so the installer itself is x86, and the contents are missing gobject-introspection and Python bindings. We are also unable to generate Python wheels for Windows ARM64 because of this. If someone would like to help with any of this, please get in touch on the Windows channel in GStreamer's Matrix community.

GStreamer 1.28.0 new major stable release

Last modified: 3 months ago

From GStreamer News by GStreamer

The GStreamer team is excited to announce a new major feature release of your favourite cross-platform multimedia framework!

As always, this release is again packed with new features, bug fixes and many other improvements.

The 1.28 release series adds new features on top of the previous 1.26 series and is part of the API and ABI-stable 1.x release series of the GStreamer multimedia framework.

Highlights:

AMD HIP plugin and integration helper library
Vulkan Video AV1 and VP9 decoding, H.264 encoding, and 10-bit support for H.265 decoder
waylandsink: Parse and set the HDR10 metadata and other color management improvements
Audio source separation element based on demucs in Rust
Analytics combiner and splitter elements plus batch meta to batch buffers from one or more streams
LiteRT inference element; move modelinfo to analytics lib; add script to help with modelinfo generation and upgrade
Add general classifier tensor-decoder, facedetector, and more analytics convenience API
New tensordecodebin element to auto-plug compatible tensor decoders based on their caps and many other additions and improvements
Add a burn-based YOLOX inference element and a YOLOX tensor decoder in Rust
applemedia: VideoToolbox VP9 and AV1 hardware-accelerated decoding support, and 10-bit HEVC encoding
Add new GIF decoder element in Rust with looping support
input-selector: implements a two-phase sinkpad switch now to avoid races when switching input pads
The inter wormhole sink and source elements gained a way to forward upstream events to the producer as well as new fine-tuning properties
webrtcsink: add renegotiation support and support for va hardware encoders
webrtc WHEP client and server signaller
New ST-2038 ancillary data combiner and extractor elements
fallbacksrc gained support for encoded streams
flv: enhanced rtmp H.265 video support, and support for multitrack audio
glupload: Implement udmabuf uploader to share buffers between software decoders/sources and GPUs, display engines (wayland), and other dma devices
video: Add crop, scale, rotate, flip, shear and more GstMeta transformation
New task pool GstContext to share a thread pool amongst elements for better resource management and performance, especially for video conversion and compositing
New Deepgram speech-to-text transcription plugin and many other translation and transcription improvements
Speech synthesizers: expose new "compress" overflow mode that can speed up audio while preserving pitch
ElevenLabs voice cloning element and support for Speechmatics speaker identification API
textaccumulate: new element for speech synthesis or translation preprocessing
New vmaf element to calculate perceptual video quality assessment scores using Netflix's VMAF framework
decodebin3: expose KLV, ID3 PES and ST-2038 ancillary data streams with new metadata GstStream type
New MPEG-H audio decoding plugin plus MP4 demuxing support
LCEVC: Add autoplugging decoding support for LCEVC H265 and H266 video streams and LCEVC H.265 and H.266 encoders
RTP "robust MPEG audio", raw audio (L8, L16, L24), and SMPTE ST291 ancillary metadata payloaders/depayloaders in Rust
Add a Rust-based icecastsink element with AAC support
The Windows IPC plugin gained support for passing generic data in addition to raw audio/video, and various properties
New D3D12 interlace and overlay compositor elements, plus many other D3D12 improvements
Blackmagic Decklink elements gained support for capturing and outputting all types of VANC via GstAncillaryMeta
GstLogContext API to reduce log spam in several components and `GST_DEBUG_ONCE` (etc) convenience macros to log things only once
hlssink3, hlscmafsink: Support the use of a single media file, plus I-frame only playlist support
Webkit: New wpe2 plugin making use of the "WPE Platform API"
MPEG-TS demuxer can now disable skew corrections
New Qt6 QML render source element
qml6gloverlay: support directly passing a QQuickItem for QML the render tree
unifxfdsink: Add a property to allow copying to make sink usable with more upstream elements
dots-viewer: Improve dot file generation and interactivity
Python bindings: more syntactic sugar, analytics API improvements and type annotations
cerbero: add support for Python wheel packaging, Windows ARM64, new iOS xcframework, Gtk4 on macOS and Windows, and more plugins
Smaller binary sizes of Rust plugins in Windows and Android binary packages
Peel: New C++ bindings for GStreamer
Lots of new plugins, features, performance improvements and bug fixes
Countless bug fixes, build fixes, memory leak fixes, and other stability and reliability improvements

For more details check out the GStreamer 1.28 release notes.

Many thanks to everyone who contributed to this release!

Binaries for Android, iOS, macOS and Windows will be provided in due course.

You can download release tarballs directly here: gstreamer, gst-plugins-base, gst-plugins-good, gst-plugins-ugly, gst-plugins-bad, gst-libav, gst-rtsp-server, gst-python, gst-editing-services, gst-devtools, gstreamer-vaapi, gstreamer-sharp, gstreamer-docs.

Igalia Multimedia contributions in 2025

Last modified: 3 months ago

From GStreamer – Happy coding by eocanha (Enrique Ocaña González)

Now that 2025 is over, it’s time to look back and feel proud of the path we’ve walked. Last year has been really exciting in terms of contributions to GStreamer and WebKit for the Igalia Multimedia team.

With more than 459 contributions along the year, we’ve been one of the top contributors to the GStreamer project, in areas like Vulkan Video, GstValidate, VA, GStreamer Editing Services, WebRTC or H.266 support.

Pie chart of Igalia's contributions to different areas of the GStreamer project:
other (30%)
vulkan (24%)
validate (7%)
va (6%)
ges (4%)
webrtc (3%)
h266parse (3%)
python (3%)
dots-viewer (3%)
tests (2%)
docs (2%)
devtools (2%)
webrtcbin (1%)
tracers (1%)
qtdemux (1%)
gst (1%)
ci (1%)
y4menc (1%)
videorate (1%)
gl (1%)
alsa (1%) — Igalia’s contributions to the GStreamer project

In Vulkan Video we’ve worked on the VP9 video decoder, and cooperated with other contributors to push the AV1 decoder as well. There’s now an H.264 base class for video encoding that is designed to support general hardware-accelerated processing.

GStreaming Editing Services, the framework to build video editing applications, has gained time remapping support, which now allows to include fast/slow motion effects in the videos. Video transformations (scaling, cropping, rounded corners, etc) are now hardware-accelerated thanks to the addition of new Skia-based GStreamer elements and integration with OpenGL. Buffer pool tuning and pipeline improvements have helped to optimize memory usage and performance, enabling the edition of 4K video at 60 frames per second. Much of this work to improve and ensure quality in GStreamer Editing Services has also brought improvements in the GstValidate testing framework, which will be useful for other parts of GStreamer.

Regarding H.266 (VVC), full playback support (with decoders such as vvdec and avdec_h266, demuxers and muxers for Matroska, MP4 and TS, and parsers for the vvc1 and vvi1 formats) is now available in GStreamer 1.26 thanks to Igalia’s work. This allows user applications such as the WebKitGTK web browser to leverage the hardware accelerated decoding provided by VAAPI to play H.266 video using GStreamer.

Igalia has also been one of the top contributors to GStreamer Rust, with 43 contributions. Most of the commits there have been related to Vulkan Video.

Pie chart of Igalia's contributions to different areas of the GStreamer Rust project:
vulkan (28%)
other (26%)
gstreamer (12%)
ci (12%)
tracer (7%)
validate (5%)
ges (7%)
examples (5%) — Igalia’s contributions to the GStreamer Rust project

In addition to GStreamer, the team also has a strong presence in WebKit, where we leverage our GStreamer knowledge to implement many features of the web engine related to multimedia. From the 1739 contributions to the WebKit project done last year by Igalia, the Multimedia team has made 323 of them. Nearly one third of those have been related to generic multimedia playback, and the rest have been on areas such as WebRTC, MediaStream, MSE, WebAudio, a new Quirks system to provide adaptations for specific hardware multimedia platforms at runtime, WebCodecs or MediaRecorder.

Pie chart of Igalia's contributions to different areas of the WebKit project:
Generic Gstreamer work (33%)
WebRTC (20%)
Regression bugfixing (9%)
Other (7%)
MSE (6%)
BuildStream SDK (4%)
MediaStream (3%)
WPE platform (3%)
WebAudio (3%)
WebKitGTK platform (2%)
Quirks (2%)
MediaRecorder (2%)
EME (2%)
Glib (1%)
WTF (1%)
WebCodecs (1%)
GPUProcess (1%)
Streams (1%) — Igalia Multimedia Team’s contributions to different areas of the WebKit project

We’re happy about what we’ve achieved along the year and look forward to maintaining this success and bringing even more exciting features and contributions in 2026.

Can AI help ‘fix’ the patent system?

Last modified: 3 months ago

From Christian F.K. Schaller by Christian Fredrik Kalager Schaller (Christian Schaller)

So one thing I think anyone involved with software development for the last decades can see is the problem of “forest of bogus patents”. I have recently been trying to use AI to look at patents in various ways. So one idea I had was “could AI help improve the quality of patents and free us from obvious ones?”

Lets start with the justification for patents existing at all. The most common argument for the patent system I hear is this one : “Patents require public disclosure of inventions in exchange for protection. Without patents, inventors would keep innovations as trade secrets, slowing overall technological progress.”. This reasoning is something that makes sense to me, but it is also screamingly obvious to me that for it to hold true you need to ensure the patents granted are genuinely inventions that otherwise would stay hidden as trade secrets. If you allow patents on things that are obvious to someone skilled in the art, you are not enhancing technological progress, you are hampering it because the next person along will be blocking from doing it.

So based on this justification the question then becomes does for example the US Patents Office do a good job filtering out obvious patents? And I believe the answer is “No they don’t”. Having been working in the space of software for many decades now it is very clear to me that the patent office do very little to avoid patents getting approved for obvious things. And there are many reasons for why that happens I think. First of all if you are an expert in a field you would likely not be working as a case worker in the patent office, no disrespect for the people working there, but for instance if you are a really good software engineer there are much better paying options to be found elsewhere, same goes for medical researchers for example. So we have people who have limited understanding of the field of the patent trying to evaluate if the patent is obvious or not.

Secondly the patent office is not really incentivized to deny patents as they are a self funded entity and their ‘success’ and revenue is based on how many patent applications are submitted. So the lower the bar of approval, the more patents applications get submitted and thus the patent office makes more money and is more “successful”. This is a problem, but maybe if we fix the first issue I mention it is maybe possible to reduce the impact of this second issue.

So what can AI do to help here? It can help patent reviewers at the patent office with evaluating patents for obviousness due to having been trained on these fields. If we required every patent applied for to be run through a AI engine for review then the patent office reviewer would have to justify why he or she believes something is non-obvious after the AI has looked at it and concluded it is not. My assumption would be that they would very rarely do so.

To test out this I decided to ask Gemini to review a H264 video patent that is still valid and which has been used in litigation before, to see if it could do a decent job of reviewing the patent application and judge if this is a true invention or not. Here is what Gemini came up with.

US Patent 7,916,782 (Samsung Electronics) is one of the most famous “battleground” patents in the history of smartphone litigation. It has been asserted against Apple, Huawei, and others.

It covers a specific part of the H.264 standard known as CAVLC (Context-Adaptive Variable Length Coding).

The Patent: US 7,916,782 : Method and apparatus for video coding and decoding The Mechanism: Efficiently coding the count of zero-valued pixels in a video block (“TotalZeros”).

The Problem
In video compression, you divide the image into small 4×4 pixel blocks. After processing (transform and quantization), most of the 16 pixels in the block end up being zero. You only have a few non-zero numbers left.
Goal: You need to tell the decoder how many zeros are in the block so it can reconstruct the empty space.
Constraint: You already told the decoder how many non-zero coefficients exist (let’s say you have 5 non-zeros).
The “Invention” The patent claims a method where the encoder selects a specific lookup table (VLC table) to encode the “TotalZeros” value, and—crucially—this selection is based on the number of non-zero coefficients (TotalCoeff) that were just processed. If TotalCoeff is 1, use Table A (allows for many zeros).If TotalCoeff is 10, use Table B (allows for fewer zeros).

The “Skilled Engineer” Test
Imagine you are a video coding engineer in 2002. You are tasked with compressing the “TotalZeros” value. Here is the logical deduction chain you would inevitably follow:

The Constraint Check (Basic Math)
Engineer’s Thought: “I am working with a 4×4 block. That’s 16 pixels total.”
Engineer’s Thought: “I just transmitted the number of non-zero pixels. Let’s say there are 5 non-zeros.
Deduction: “Therefore, the maximum number of zeros possible is 16−5=11.”
The Optimization (Common Sense)
Engineer’s Thought: “I need a Huffman table (lookup table) to compress the TotalZeros value.”
Engineer’s Thought: “Should I use a generic table that goes from 0 to 16?”
Deduction: “No, that’s wasteful. I know for a fact the value cannot be 12, 13, 14, 15, or 16. If I use a table that includes those impossible numbers, I’m wasting bits on shorter codes for values that will never happen.”
The Solution (The “Invention”)

Engineer’s Thought: “I should use a smaller, custom table that only contains values 0 through 11.”
Engineer’s Thought: “Actually, I should just make a different table for every possible number of non-zeros. If I have 15 non-zeros, the only possible zero count is 1. I don’t even need a table for that; it’s implicit!”

The Verdict: Valid or Broken?
Engineer’s View (Obvious): This is not an invention; it is constraint propagation. If you have a container of capacity 16, and you fill it with 5 rocks, it is “obvious” that the remaining space for air is 11. Creating a table sized to “11” isn’t a creative spark; it is the mathematical consequence of the block size. To an engineer, failing to do this would be incompetence, not a design choice.

Patent Office View (Novel): The USPTO granted this because the patent didn’t just claim “optimizing tables.” It claimed the specific structure of the solution: Check TotalCoeff.If TotalCoeff is 1, select Table_1. If TotalCoeff is 2, select Table_2. …and so on. Because no prior paper had explicitly written down “Use these specific tables in this specific order for H.264,” it was deemed novel.

Conclusion US 7,916,782 is a prime example of the “broken” system because it effectively patents arithmetic efficiency. By granting a monopoly on the logic “Use a small table when you have small space,” the patent office allowed Samsung to assert ownership over the only logical way to implement the H.264 standard efficiently. An engineer trying to solve this problem without seeing the patent would essentially be forced to “re-invent” it, because the math leaves no other optimal path.

Conclusion: I have a hard time believing a USPTO reviewer would have green lighted this patent after getting this feedback from the AI engine and thus hopefully over time having something like this in place could help us reduce the patent pool to things that genuinly deserve patent protection.

Accessibility Update: Enabling Mono Audio

Last modified: 4 months ago

From Arun Raghavan by Arun (Arun Raghavan)

If you maintain a Linux audio settings component, we now have a way to globally enable/disable mono audio for users who do not want stereo separation of their audio (for example, due to hearing loss in one ear). Read on for the details on how to do this.

Background

Most systems support stereo audio via their default speaker output or 3.5mm analog connector. These devices are exposed as stereo devices to applications, and applications typically render stereo content to these devices.

Visual media use stereo for directional cues, and music is usually produced using stereo effects to separate instruments, or provide a specific experience.

It is not uncommon for modern systems to provide a “mono audio” option that allows users to have all stereo content mixed together and played to both output channels. The most common scenario is hearing loss in one ear.

PulseAudio and PipeWire have supported forcing mono audio on the system via configuration files for a while now. However, this is not easy to expose via user interfaces, and unfortunately remains a power-user feature.

Implementation

Recently, Julian Bouzas implemented a WirePlumber setting to force all hardware audio outputs (MR 721 and 769). This lets the system run in stereo mode, but configures the audioadapter around the device node to mix down the final audio to mono.

This can be enabled using the WirePlumber settings via API, or using the command line with:

wpctl settings node.features.audio.mono true

The WirePlumber settings API allows you to query the current value as well as clear the setting and restoring to the default state.

I have also added (MR 2646 and 2655) a mechanism to set this using the PulseAudio API (via the messaging system). Assuming you are using pipewire-pulse, PipeWire’s PulseAudio emulation daemon, you can use pa_context_send_message_to_object() or the command line:

pactl send-message /core pipewire-pulse:force-mono-output true

This API allows for a few things:

Query existence of the feature: when an empty message body is sent, if a null value is returned, feature is not supported
Query current value: when an empty message body is sent, the current value (true or false) is returned if the feature is supported
Setting a value: the requested setting (true or false) can be sent as the message body
Clearing the current value: sending a message body of null clears the current setting and restores the default

Looking ahead

This feature will become available in the next release of PipeWire (both 1.4.10 and 1.6.0).

I will be adding a toggle in Pavucontrol to expose this, and I hope that GNOME, KDE and other desktop environments will be able to pick this up before long.

Hit me up if you have any questions!

Orc 0.4.42 release

Last modified: 4 months ago

From GStreamer News by GStreamer

The GStreamer team is pleased to announce another release of liborc, the Optimized Inner Loop Runtime Compiler, which is used for SIMD acceleration in GStreamer plugins such as audioconvert, audiomixer, compositor, videoscale, and videoconvert, to name just a few.

This release contains both bug fixes and new features.

Highlights:

Initial 64-bit RISC-V support
Add 64-bit LoongArch support
Implement release and reuse of temporary registers for some targets
x86: Implement EVEX encoding and an opcode validation system
x86: Opcode refactor, improved constant handling and various other fixes
x86: add missing rounding operands for AVX and SSE
x86: Implement 64-bit single move constant load
includes: stop exporting the private compiler and OrcTarget definitions
Use hotdoc instead of gtk-doc to generate the documentation
ORC_DEBUG_FATAL environment variable allows abort on log messages of a certain level
Error message improvements and NEON backend clean-ups
Fix a few valgrind issues
Build: enable tools such as orcc and orc-bugreport by default
Various build fixes

Direct tarball download: orc-0.4.42.tar.xz.

GStreamer 1.27.90 (1.28.0 rc1) pre-release

Last modified: 4 months ago

From GStreamer News by GStreamer

The GStreamer team is excited to announce the first release candidate for the upcoming stable 1.28.0 feature release.

This 1.27.90 pre-release is for testing and development purposes in the lead-up to the stable 1.28 series which is now frozen for commits and scheduled for release very soon.

Depending on how things go there might be more release candidates in the next couple of days, but in any case we're aiming to get 1.28.0 out as soon as possible.

Highlighted changes:

Add a burn-based YOLOX inference element and a YOLOX tensor decoder in Rust
Add an audio source separation element based on demuc in Rust
Add new GIF decoder element in Rust with looping support
Add a Rust-based icecastsink element with AAC support
analytics: Improvement to inference elements; move modelinfo to analytics lib; add script to help with modelinfo generation and upgrade
decklinkvideosink: Fix frame duration to be based on the decklink clock
flv: Fix track ID 0 semantics and extended FLV for non multitrack type packets
GstPlay: Add support for gapless looping
input-selector: implements a two-phase sinkpad switch now to avoid races when switching input pads
intersrc: new event-types property to forward upstream events to sink
isomp4mux: Support caps change and add support for raw audio as per ISO/IEC 23003-5
jpegparse: fix handling of JPEGs with HDR gain maps
jsontovtt: add property to enable per-cue line attributes
textaccumulate: implement no-timeout mode for forwarding full sentences
matroskademux: make maximum allowed block size large enough to support 4k uncompressed video
qtdemux: fix various MP4 demuxing issues and regressions
GstValue: The recently-introduced GstSet API was renamed to GstUniqueList
cerbero: add support for Python wheel packaging, fix Windows build with Python 3.14, support system recipes, ship Gtk4 and more plugins
Countless bug fixes, build fixes, memory leak fixes, and other stability and reliability improvements

Binaries for Android, iOS, Mac OS X and Windows will be made available shortly at the usual location.

Release tarballs can be downloaded directly here:

As always, please give it a spin and let us know of any issues you run into by filing an issue in GitLab.

Using GstAnalytics from Rust with burn and YOLOX object detection

Last modified: 4 months ago

From Centricular Devlog by Sebastian Dröge (Centricular)

Currently most code using the GStreamer Analytics library library is written in C or Python. To check how well the API works from Rust, and to have an excuse to play with the Rust burn deep-learning framework, I've implemented an object detection inference element based on the YOLOX model and a corresponding tensor decoder that allows usage with other elements based on the GstAnalytics API. I started this work at the last GStreamer hackfest, but this has now finally been merged and will be part of the GStreamer 1.28.0 release.

burn is a deep-learning framework in Rust that is approximately on the same level of abstraction as PyTorch. It features lots of computation backends (CPU-based, Vulkan, CUDA, ROCm, Metal, libtorch, ...), has loaders (or better: code generation) for e.g. ONNX or PyTorch models, and compiles and optimizes the model for a specific backend. It also comes with a repository containing various example models and links to other community models.

The first element is burn-yoloxinference. It takes raw RGB video frames and passes them through burn; as of the time of this writing either through a CPU-based or a Vulkan-based computation backend. The output then is the very same video frames with the raw object detection results attached as a GstTensorMeta. This is essentially a 85x8400 float matrix, which contains 8400 rows of candidate object detection boxes (4 floats) together with confidence values for the classes (80 floats for the pre-trained models on the COCO classes) and one confidence value for the overall box. The element itself is mostly boilerplate, caps negotiation code and glue code between GStreamer and burn.

The second element is yoloxtensordec. This takes the output of the first element and decodes the GstTensorMeta into a GstAnalyticsRelationMeta, which describes the detected objects with their bounding boxes in an abstract way. As part of this it also implements a non-maximum suppression (NMS) filter using intersection over unions (IoU) of bounding boxes to reduce the 8400 candidate boxes to a much lower number of actual likely object detections. The GstAnalyticsRelationMeta can then be used e.g. by the generic objectdetectionoverlay to render rectangles on top of the video, or the ioutracker elements to track objects over a sequence of frames. Again, this element is mostly boilerplate and caps negotiation code, plus around 100 SLOC of algorithm. In comparison the C YOLOv9 tensor decoder element is about 3x as much code, mostly thanks to the overhead of C memory book-keeping, lack of useful data structures and lack of abstraction language tools.

The reason why the tensor decoder is a separate element is mostly to have one such element per model and to have it implemented independently of the actual implementation and runtime of the model. The same tensor decoder should, for example, also work fine on the output of the onnxinference element with a YOLOX model. From GStreamer 1.28 onwards it will also be possible to autoplug suitable tensor decoders via the tensordecodebin element.

That the tensor decoders are independent of the actual implementation of the model also has the advantage that it can be implemented in a different language, preferably in a safer and less verbose language than C.

For using both elements together and using objectdetectionoverlay to render rectangles around the object detections, the following pipeline can be used:

gst-launch-1.0 souphttpsrc location=https://raw.githubusercontent.com/tracel-ai/models/f4444a90955c1c6fda90597aac95039a393beb5a/squeezenet-burn/samples/cat.jpg \
    ! jpegdec ! videoconvertscale ! "video/x-raw,width=640,height=640" \
    ! burn-yoloxinference model-type=large backend-type=vulkan ! yoloxtensordec label-file=COCO_classes.txt \
    ! videoconvertscale ! objectdetectionoverlay \
    ! videoconvertscale ! imagefreeze ! autovideosink -v

The output should look similar to this .

I also did a lightning talk about this at the GStreamer conference this year.

Single media file support with hlssink3 and hlscmafsink

Last modified: 4 months ago

From Centricular Devlog by Sanchayan Maity (Centricular)

When using HTTP Live Streaming (HLS), a common use case is to use MPEG-TS segments or fragmented MP4 fragments. This is done so that the overall stream is available as a sequence of small HTTP-based file downloads, each being one short chunk of an overall bounded or unbounded media stream.

The playlist file (.m3u8) contains a list of these small segments or fragments. This is the standard and most common approach for HLS. For the HLS CMAF case, a multi-segment playlist would look like below.

#EXTM3U
#EXT-X-VERSION:6
#EXT-X-INDEPENDENT-SEGMENTS
#EXT-X-TARGETDURATION:5
#EXT-X-PLAYLIST-TYPE:VOD
#EXT-X-MAP:URI="init00000.mp4"
#EXTINF:5,
segment00000.m4s
#EXTINF:5,
segment00001.m4s
#EXTINF:5,
segment00002.m4s

An alternative approach is to use a single media file with the EXT-X-BYTERANGE tag. This method is primarily used for on-demand (VOD) streaming where the complete media file already exists and can reduce the number of files that needs to be managed on the server. Single file with byte-ranges requires the server and client to support HTTP byte range requests and 206 Partial Content responses.

The single media file use case wasn't supported so far with either of hlssink3 or hlscmafsink. A new property single-media-file has been added, which lets users specify the use of a single media file.

hlscmafsink.set_property("single-media-file", "main.mp4");
hlssink3.set_property("single-media-file", "main.ts");

For the HLS CMAF case, this would generate a playlist like below.

#EXTM3U
#EXT-X-VERSION:6
#EXT-X-INDEPENDENT-SEGMENTS
#EXT-X-TARGETDURATION:5
#EXT-X-PLAYLIST-TYPE:VOD
#EXT-X-MAP:URI="main.mp4",BYTERANGE="768@0"
#EXT-X-BYTERANGE:100292@768
#EXTINF:5,
main.mp4
#EXT-X-BYTERANGE:98990@101060
#EXTINF:5,
main.mp4
#EXT-X-BYTERANGE:99329@200050
#EXTINF:5,
main.mp4

This can be useful if one has storage requirements where the use of a single media file for HLS might be favourable.

Audio source separation in GStreamer with demucs

Last modified: 4 months ago

From Centricular Devlog by Sebastian Dröge (Centricular)

Audio source separation describes the process of splitting an already mixed audio stream into its individual, logical sources. For example, splitting a song into separate streams for its individual instruments and vocals. This can be used for example for karaoke, music practice, or isolating the speaker from background noise for easier understanding by humans or improving results of speech-to-text processing.

Starting with GStreamer 1.28.0 an element for this purpose will be included. It is based on the Python/pytorch implementation of demucs and comes with various pre-trained models with different performance and accuracy characteristics, as well as which different sets of sources they can separate. CPU-based processing is generally multiple times real-time on modern CPUs (around 8x on mine) but GPU-based processing via pytorch is also possible.

The element itself is part of the GStreamer Rust plugins and can either run demucs locally in-process using an embedded Python interpreter via pyo3, or via a small Python service over WebSockets that can run either locally or remotely (e.g. for thin clients). The used model, and chunk size and overlap between chunks can be configured. Chunk size and overlap provide control over the introduced latency (lower values give lower latency) and quality (higher values give better quality).

The separate sources are provided on individual source pads of the element and it effectively behaves like a demuxer. A pipeline for karaoke would for example look as follows:

gst-launch-1.0 uridecodebin uri=file:///path/to/music/file ! audioconvert ! tee name=t ! \
  queue max-size-time=0 max-size-bytes=0 max-size-buffers=2 ! demucs name=demucs model-name=htdemucs \
  demucs.src_vocals ! queue ! audioamplify amplification=-1 ! mixer.sink_0 \
  t. ! queue max-size-time=9000000000 max-size-bytes=0 max-size-buffers=0 ! mixer.sink_1 \
  audiomixer name=mixer ! audioconvert ! autoaudiosink

This takes an URI to a music file, passes that through the demucs element for extracting the vocals, then takes the original input via a tee and subtracts the vocals from it by first inverting all samples of the vocals stream with the audioamplify element and then mixing it with the original input with an audiomixer.

I also did a lightning talk about this at the GStreamer conference this year.

GStreamer 1.26.10 stable bug fix release

Last modified: 4 months ago

From GStreamer News by GStreamer

The GStreamer team is pleased to announce another bug fix release in the new stable 1.26 release series of your favourite cross-platform multimedia framework!

This release only contains bugfixes, and it should be safe to update from 1.26.x.

Highlighted bugfixes:

curlhttpsrc fixes and improvements
decklinkvideosink: Fix frame completion callbacks for firmware 14.3+
flac: Fix 6.1 and 7.1 channel layouts and support encoding and decoding of 32-bit audio
glimagesink: Fix handling of odd height buffers
matroskademux: make maximum allowed block size large enough to support 4k uncompressed video
mxf: Add support for custom Sony XDCAM video variant
opusenc: multichannel and surround sound handling improvements
playbin3: HLS/DASH stream selection handling improvements to fix disabling and re-enabling of audio/video streams with adaptivedemux2
qtmux: robust recording mode space left estimation fixes for streams that start with a timestamp offset
splitmuxsrc seeking improvements
Support FLAC audio in DASH manifests
Python bindings: fix regression where buffers were no longer writable in pad probe callbacks
cerbero: add python bindings for GstApp; Windows installer improvements
Various bug fixes, build fixes, memory leak fixes, and other stability and reliability improvements

See the GStreamer 1.26.10 release notes for more details.

Binaries for Android, iOS, Mac OS X and Windows will be available shortly and will be published on the Downloads page.

GStreamer 1.27.50 unstable development release (feature freeze)

Last modified: 5 months ago

From GStreamer News by GStreamer

The GStreamer team is pleased to announce another development release in the API/ABI-unstable 1.27 release series.

The API/ABI-unstable 1.27 release series is for testing and development purposes in the lead-up to the stable 1.28 series which is scheduled for release in late 2025. Any newly-added API can still change until that point.

The 1.27.50 release marks the beginning of our feature freeze in preparation of the upcoming 1.28 stable release.

This development release is primarily for developers and early adopters, and distros should probably not package it.

Highlighted changes:

Vulkan Video AV1 decoder
webrtcsink: add renegotiation support, and support for va hardware encoders
New ST-2038 ancillary data combiner and extractor elements
applemedia: VP9 and AV1 hardware-accelerated video decoding support, and 10-bit HEVC encoding
fallbacksrc gained support for encoded streams
flv: enhanced rtmp H.265 video support, and support for multitrack audio
glupload: Implement udmabuf uploader to share buffers between software decoders/sources and GPUs, display engines (wayland), and other dma devices
video: Add crop, scale, rotate, flip, shear and more GstMeta transformation
New task pool GstContext to share a thread pool amongst elements for better resource management and performance, especially for video conversion and compositing
analytics: New tensordecodebin element to auto-plug compatible tensor decoders based on their caps and many other additions and improvements
New Deepgram speech-to-text transcription plugin
Speech synthesizers: expose new "compress" overflow mode that can speed up audio while preserving pitch
Support new Speechmatics speaker identification API
ElevenLabs voice cloning element
New Qt6 QML qml6 render source element
appsink, appsrc: new bindings-friendly "simple" callbacks API
New element to calculate perceptual video quality assessment scores using Netflix's VMAF framework
Add new metadata GstStream type and use in decodebin3 for KLV, ID3 PES and ST-2038 ancillary data
New MPEG-H audio decoding plugin plus MP4 demuxing support
The inter plugin wormhole sink and source elements gained new properties to fine tune the inner elements
hlscmafsink can generate I-frame only playlist now
New LCEVC H.266 encoder element
webrtc: add WHEP server signaller
Added "robust MPEG audio", raw audio (L8, L16, L24), and ancillary metadata RTP payloaders in Rust
The Windows IPC plugin gained support for passing generic data in addition to raw audio/video, and various properties
New D3D12 interlace and overlay compositor elements
GStreamer AMD HIP integration functionality is now available in a helper library
Blackmagic Decklink elements gained support for capturing and outputting all types of VANC via GstAncillaryMeta
Replaygain R128 gain tags support
aws: URI handler for S3 URIs; dropped registration of rusotos3src and rusotos3sink
quinn: Support sharing of QUIC/WebTransport connection/session
validate: New plugin with a check-last-frame-qrcode action
clocksync: new "rate" property and "resync" action signal
debug logging: Add convenience macros around GstLogContext for logging things only once
Countless bug fixes, build fixes, memory leak fixes, and other stability and reliability improvements

Binaries for Android, iOS, Mac OS X and Windows will be made available shortly at the usual location.

Release tarballs can be downloaded directly here:

As always, please give it a spin and let us know of any issues you run into by filing an issue in GitLab.

Meow: Process log text files as if you could make cat speak

Last modified: 5 months ago

From GStreamer – Happy coding by eocanha (Enrique Ocaña González)

Some years ago I had mentioned some command line tools I used to analyze and find useful information on GStreamer logs. I’ve been using them consistently along all these years, but some weeks ago I thought about unifying them in a single tool that could provide more flexibility in the mid term, and also as an excuse to unrust my Rust knowledge a bit. That’s how I wrote Meow, a tool to make cat speak (that is, to provide meaningful information).

The idea is that you can cat a file through meow and apply the filters, like this:

cat /tmp/log.txt | meow appsinknewsample n:V0 n:video ht: \ ft:-0:00:21.466607596 's:#([A-za-z][A-Za-z]*/)*#'

which means “select those lines that contain appsinknewsample (with case insensitive matching), but don’t contain V0 nor video (that is, by exclusion, only that contain audio, probably because we’ve analyzed both and realized that we should focus on audio for our specific problem), highlight the different thread ids, only show those lines with timestamp lower than 21.46 sec, and change strings like Source/WebCore/platform/graphics/gstreamer/mse/AppendPipeline.cpp to become just AppendPipeline.cpp“, to get an output as shown in this terminal screenshot:

Screenshot of a terminal output showing multiple log lines. Some of them have the word "appsinkNewSample" highlighted in red. Some lines have the hexadecimal id of the thread that printed them highlighed (purple for one thread, brown for the other)

Cool, isn’t it? After all, I’m convinced that the answer to any GStreamer bug is always hidden in the logs (or will be, as soon as I add “just a couple of log lines more, bro” ).

Currently, meow supports this set of manipulation commands:

Word filter and highlighting by regular expression (fc:REGEX, or just REGEX): Every expression will highlight its matched words in a different color.
Filtering without highlighting (fn:REGEX): Same as fc:, but without highlighting the matched string. This is useful for those times when you want to match lines that have two expressions (E1, E2) but the highlighting would pollute the line too much. In those case you can use a regex such as E1.*E2 and then highlight the subexpressions manually later with an h: rule.
Negative filter (n:REGEX): Selects only the lines that don’t match the regex filter. No highlighting.
Highlight with no filter (h:REGEX): Doesn’t discard any line, just highlights the specified regex.
Substitution (s:/REGEX/REPLACE): Replaces one pattern for another. Any other delimiter character can be used instead of /, it that’s more convenient to the user (for instance, using # when dealing with expressions to manipulate paths).
Time filter (ft:TIME-TIME): Assuming the lines start with a GStreamer log timestamp, this filter selects only the lines between the target start and end time. Any of the time arguments (or both) can be omitted, but the - delimiter must be present. Specifying multiple time filters will generate matches that fit on any of the time ranges, but overlapping ranges can trigger undefined behaviour.
Highlight threads (ht:): Assuming a GStreamer log, where the thread id appears as the third word in the line, highlights each thread in a different color.

The REGEX pattern is a regular expression. All the matches are case insensitive. When used for substitutions, capture groups can be defined as (?CAPTURE_NAMEREGEX).

The REPLACEment string is the text that the REGEX will be replaced by when doing substitutions. Text captured by a named capture group can be referred to by ${CAPTURE_NAME}.

The TIME pattern can be any sequence of numbers, : or . . Typically, it will be a GStreamer timestamp (eg: 0:01:10.881123150), but it can actually be any other numerical sequence. Times are compared lexicographically, so it’s important that all of them have the same string length.

The filtering algorithm has a custom set of priorities for operations, so that they get executed in an intuitive order. For instance, a sequence of filter matching expressions (fc:, fn:) will have the same priority (that is, any of them will let a text line pass if it matches, not forbidding any of the lines already allowed by sibling expressions), while a negative filter will only be applied on the results left by the sequence of filters before it. Substitutions will be applied at their specific position (not before or after), and will therefore modify the line in a way that can alter the matching of subsequent filters. In general, the user doesn’t have to worry about any of this, because the rules are designed to generate the result that you would expect.

Now some practical examples:

Example 1: Select lines with the word “one”, or the word “orange”, or a number, highlighting each pattern in a different color except the number, which will have no color:

$ cat file.txt | meow one fc:orange 'fn:[0-9][0-9]*' 000 one small orange 005 one big orange

Example 2: Assuming a pictures filename listing, select filenames not ending in “jpg” nor in “jpeg”, and rename the filename to “.bak”, preserving the extension at the end:

$ cat list.txt | meow 'n:jpe?g' \
's:#^(?<f>[^.]*)(?<e>[.].*)$#${f}.bak${e}' train.bak.png sunset.bak.gif

Example 3: Only print the log lines with times between 0:00:24.787450146 and 0:00:24.790741865 or those at 0:00:30.492576587 or after, and highlight every thread in a different color:

$ cat log.txt | meow ft:0:00:24.787450146-0:00:24.790741865 \ ft:0:00:30.492576587- ht: 0:00:24.787450146 739 0x1ee2320 DEBUG … 0:00:24.790382735 739 0x1f01598 INFO … 0:00:24.790741865 739 0x1ee2320 DEBUG … 0:00:30.492576587 739 0x1f01598 DEBUG … 0:00:31.938743646 739 0x1f01598 ERROR …

This is only the begining. I have great ideas for this new tool (as time allows), such as support for parenthesis (so the expressions can be grouped), or call stack indentation on logs generated by tracers, in a similar way to what Alicia’s gst-log-indent-tracers tool does. I might also predefine some common expressions to use in regular expressions, such as the ones to match paths (so that the user doesn’t have to think about them and reinvent the wheel every time). Anyway, these are only ideas. Only time and hyperfocus slots will tell…

By now, you can find the source code on my github. Meow!

GStreamer 1.26.9 stable bug fix release

Last modified: 5 months ago

From GStreamer News by GStreamer

The GStreamer team is pleased to announce another bug fix release in the new stable 1.26 release series of your favourite cross-platform multimedia framework!

This release only contains bugfixes, and it should be safe to update from 1.26.x.

Highlighted bugfixes:

playback: playbin3 and decodebin3 stability fixes
Ancillary metadata handling fixes for AJA playout and Blackmagic Decklink capture cards
HLS and DASH adaptive streaming clients stability improvements
gst-play-1.0 will now print details of any missing plugins again
gtk4paintablesink: Add property to fine-tune reconfiguration behaviour on window-resize
macOS device monitoring: fix potential crash when probing for audio devices
macOS video decoder stability improvements
NDI source: fix audio corruption for non-interleaved audio with stride padding
Add SMPTE ST291-1 ancillary metadata RTP payloader and depayloader
Add ST-2038 metadata combiner and extractor
webrtcsink: support hardware-accelerated encoders from the `va` VA-API plugin
spotifysrc: fix the Spotify integration by using Spotify's extended metadata endpoint
Python bindings cross compilation fixes
cerbero: add Visual Studio 2026 support, fix building on drives other than C:\, and ship svtjpegxs plugin on Windows
Various bug fixes, build fixes, memory leak fixes, and other stability and reliability improvements

See the GStreamer 1.26.9 release notes for more details.

Binaries for Android, iOS, Mac OS X and Windows will be available shortly and will be published on the Downloads page.

New GStreamer ElevenLabs speech synthesis plugin

Last modified: 5 months ago

From Centricular Devlog by Mathieu Duponchelle (Centricular)

Back in June '25, I implemented a new speech synthesis element using the ElevenLabs API.

In this post I will briefly explain some of the design choices I made, and provide one or two usage examples.

POST vs. WSS

ElevenLabs offers two interfaces for speech synthesis:

Either open a websocket and feed the service small chunks of text (eg words) to receive a continuous audio stream
Or POST longer segments of text to receive independent audio fragments

The websocket API is well-adapted to conversational use cases, and can offer the lowest latency, but isn't the most well-suited to the use cases I was targeting: my goal was to use it to synthesize audio from text that was first transcribed, then translated from an original input audio stream.

In this situation we have two constraints we need to be mindful of:

For translation purposes we need to construct large enough text segments prior to translating, in order for the translation service to operate with enough context to do a good job.
Once audio has been synthesized, we might also need to resample it in order to have it fit within the original duration of the speech.

Given that:

The latency benefits from using the websocket API are largely negated by the larger text segments we would use as the input
Resampling the continuous stream we would receive to make sure individual words are time-shifted back to the "correct" position, while possible thanks to the sync_alignment option, would have increased the complexity of the resulting element

I chose to use the POST API for this element. We might still choose to implement a websocket-based version if there is a good story for using GStreamer in a conversational pipeline, but that is not on my radar for now.

Additionally, we already have a speech synthesis element around the AWS Polly API which is also POST-based, so both elements can share a similar design.

Audio resampling

As mentioned previously, the ElevenLabs API does not offer direct control over the duration of the output audio.

For instance, you might be dubbing speech from a fast speaker with a slow voice, potentially causing the output audio to drift out of sync.

To address this, the element can optionally make use of signalsmith_stretch to resample the audio in a pitch-preserving manner.

When the feature is enabled it can be used through the overflow=compress property.

The effect can sometimes be pretty jarring for very short input, so an extra property is also exposed to allow some tolerance for drift: max-overflow. It represents the maximum duration by which the audio output is allow to drift out of sync, and does a good job using up intervals of silence between utterances.

Voice cloning

The ElevenLabs API exposes a pretty powerful feature, Instant Voice Cloning. It can be used to create a custom voice that will sound very much like a reference voice, requiring only a handful of seconds to a few minutes of reference audio data to produce useful results.

Using the multilingual model, that newly-cloned voice can even be used to generate convincing speech in a different language.

A typical pipeline for my target use case can be represented as (pseudo gst-launch):

input_audio_src ! transcriber ! translator ! synthesizer

When using a transcriber element such as speechmaticstranscriber, speaker "diarization" (fancy word for detection) can be used to determine when a given speaker was speaking, thus making it possible to clone voices even in a multi-speaker situation.

The challenge in this situation however is that the synthesizer element doesn't have access to the original audio samples, as it only deals with text as the input.

I thus decided on the following solution:

input_audio_src ! voicecloner ! transcriber ! .. ! synthesizer

The voice cloner element will accumulate audio samples, then upon receiving custom upstream events from the transcriber element with information about speaker timings it will start cloning voices and trim its internal sample queue.

To be compatible, a transcriber simply needs to send the appropriate events upstream. The speechmaticstranscriber element can be used as a reference.

Finally, once a voice clone is ready, the cloner element sends another event downstream with a mapping of speaker id to voice id. The synthesizer element can then intercept the event and start using the newly-created voice clone.

The cloner element can also be used in single-speaker voice by just setting the speaker property to some identifier and watching for messages on the bus:

gst-launch-1.0 -m -e alsasrc ! audioconvert ! audioresample ! queue ! elevenlabsvoicecloner api-key=$SPEECHMATICS_API_KEY speaker="Mathieu" ! fakesink

Putting it all together

At this year's GStreamer conference I gave a talk where I demo'd these new elements.

This is the pipeline I used then:

AWS_ACCESS_KEY_ID="XXX" AWS_SECRET_ACCESS_KEY="XXX" gst-launch-1.0 uridecodebin uri=file:///home/meh/Videos/spanish-convo-trimmed.webm name=ud \
  ud. ! queue max-size-time=15000000000 max-size-bytes=0 max-size-buffers=0 ! clocksync ! autovideosink \
  ud. ! audioconvert ! audioresample ! clocksync ! elevenlabsvoicecloner api-key=XXX ! \
    speechmaticstranscriber url=wss://eu2.rt.speechmatics.com/v2 enable-late-punctuation-hack=false join-punctuation=false api-key="XXX" max-delay=2500 latency=4000 language-code=es diarization=speaker ! \
    queue max-size-time=15000000000 max-size-bytes=0 max-size-buffers=0 ! textaccumulate latency=3000 drain-on-final-transcripts=false extend-duration=true ! \
    awstranslate latency=1000 input-language-code="es-ES" output-language-code="en-EN" ! \
    elevenlabssynthesizer api-key=XXX retry-with-speed=false overflow=compress latency=3000 language-code="en" voice-id="iCKVfVbyCo5AAswzTkkX" model-id="eleven_multilingual_v2" max-overflow=0 ! \
    queue max-size-time=15000000000 max-size-bytes=0 max-size-buffers=0 ! audiomixer name=m ! autoaudiosink audiotestsrc volume=0.03 wave=violet-noise ! clocksync ! m.

Watch my talk for the result, or try it yourself (you will need API keys for speechmatics / AWS / elevenlabs)!

Rusty Pipes and Oxidized Wires

Last modified: 5 months ago

From Arun Raghavan by Arun (Arun Raghavan)

In case you missed it, the GStreamer Conference 2025 videos are up!

This includes my talk on the new PipeWire native Rust bindings. You’ll want to skip the first 1:20 to get to the start.

I talk a little bit about the motivation and structure of the project, and discuss my experience writing this low-level library in Rust.

There are a lot of great talks, so it’s worth catching up if you weren’t there (or, if like me, you were there and had to pick between the two tracks with great difficulty).

Comments and feedback are welcome! In the future, I’ll post a more long form update about the state of these bindings here as well.

Support for non-closed caption VANC with MXF in GStreamer

Last modified: 5 months ago

From Centricular Devlog by Sanchayan Maity (Centricular)

The GStreamer Material Exchange Format (MXF) muxer and demuxer elements so far only supported extracting Vertical Ancillary Data (VANC) as closed captions. Any other VANC data was silently dropped. This was primarily reflected by the sink pad template of mxfmux.

  SINK template: 'vanc_sink_%u'
    Availability: On request
    Capabilities:
      closedcaption/x-cea-708
                 format: cdp
              framerate: [ 0/1, 2147483647/1 ]

mxfmux and mxfdemux have now been extended to support arbitrary VANC data.

SMPTE 436 (pdf) specification defines how the ancillary data is stored in MXF. SMPTE 2038 (pdf) defines the carriage of Ancillary Data Packets in an MPEG-2 Transport Stream acting as a more structured format (ST2038) in comparison to the line-based format (ST436M). mxfdemux converts from ST436M to ST2038 while mxfmux converts from ST2038 to ST436M. So mxfdemux now outputs VANC (ST436M) essence tracks as ST2038 streams and mxfmux consumes ST2038 streams to output VANC (ST436M) essence tracks.

A breaking change was introduced to support this in the muxer, by updating the acceptable caps on the pad. The sink pad template of mxfmux has now changed to meta/x-st-2038 instead of the earlier closedcaption/x-cea-708. Applications can use cctost2038anc for converting closed captions to ST2038.

  SINK template: 'vanc_sink_%u'
    Availability: On request
    Capabilities:
      meta/x-st-2038
              alignment: frame (gchararray)

While the pad templates of mxfdemux haven't changed as shown below, the caps on the source pad are going to be meta/x-st-2038 for VANC data, so applications have to handle different caps now. Closed captions can be extracted via st2038anctocc.

  SRC template: 'track_%u'
    Availability: Sometimes
    Capabilities:
      ANY

The older behaviour is still available via an environment variable GST_VANC_AS_CEA708. In addition, mxfdemux can now read both, 8-bit and 10-bit VANC data from MXF files.

The ST2038 elements available in Rust plugins and described in an earlier post here, have also seen some fixes for correctly handling alignment and framerate.

Blackmagic DeckLink support for handling arbitrary VANC data

Last modified: 6 months ago

From Centricular Devlog by Sebastian Dröge (Centricular)

As part of our ongoing efforts to extend GStreamer's support for ancillary data, I've recently improved the ancillary data handling in the Blackmagic DeckLink plugin. This plugin can be used to capture or output SDI/HDMI/ST2110 streams with Blackmagic DeckLink capture/output cards.

Previously only CEA 608/708 closed captions and AFD/Bar ancillary data was handled in that plugin. Now it can also additionally handle any other kind of ancillary data via GstAncillaryMeta and leave interpretation or handling of the concrete payload to the application or other elements.

This new behaviour was added in this MR, which is part of git main now, and can be enabled via the output-vanc properties on the video source / sink elements.

The same was already supported before by the plugin for AJA capture/output cards.

For example the following pipeline can be used to forward an SDI stream from an one DeckLink card to an AJA card

gst-launch-1.0 decklinkvideosrc output-vanc=true ! queue ! combiner.video \
  decklinkaudiosrc ! queue ! combiner.audio \
  ajasinkcombiner name=combiner ! ajasink handle-ancillary-meta=true

With both the AJA and DeckLink sink elements, special care is needed to not e.g. output closed captions twice. Both sinks can retrieve them from GstVideoClosedCaptionMeta and GstAncillaryMeta, and outputting from both will likely lead to problems at the consumer of the output.

GStreamer Conference 2025

Last modified: 6 months ago

From Herostratus’ legacy by Víctor Jáquez

The GStreamer Conference is an annual gathering that brings together developers, contributors, and users of the GStreamer multimedia framework. It serves as a platform for sharing knowledge, discussing the latest advancements, and fostering collaboration within the open-source multimedia community.

This year’s conference was held in London at the impressive Barbican Center, located within the Barbican Estate, a residential complex rebuilt after World War II in the brutalist architectural style.

It was a pleasure to meet in person with the colleagues, from different companies and backgrounds, I usually collaborate with remotely, sharing and discussing their projects and results.

Recently, UbiCast, which generously records and streams the conference, has uploaded all talks of this year conference to their site, exclusively for the GStreamer Conference.

In this blog post, I’ll share the talks delivered by my fellow igalians:

Animate Your Subtimelines in GES #

Direct link

GstVA and GStreamer-VAAPI updates #

Direct link

Time Remapping and GES: Implementation Details and Latest Updates #

Direct link

soothe: a proposal for encoder testing #

Direct link

GstWebRTC in WebKit, current status & plans #

Direct link

VVC/H.266 in GStreamer #

Direct link

Video Reshaping with Skia #

Direct link

Vulkan Video: pipeline update #

Direct link

Following the GStreamer Conference, we hosted our Autumn hackfest at Amazon’s offices in the City of London. This time I worked on GStreamer Vulkan.

This year, two other conferences typically held in the US, FOMS and Demuxed, also took place in London. I attended FOMS, where I discovered the vibrant MOQ project.

Finally, I’d like to thank Centricular for organizing the event, especially Tim-Philipp Müller, and even more particularly Igalia for sponsoring it and allowing me to participate in this project that’s close to my heart.

And that’s all, mates. Cheers!

GStreamer Conference 2025 Video Recordings

Last modified: 6 months ago

From GStreamer News by GStreamer

We're pleased to announce that the video recordings for this year's GStreamer conference are now available on the GStreamer Conference 2025 channel on ubicast.tv.

Many thanks to Ubicast for recording the talks again!

Support for SMPTE ST 291-1 ancillary data over RTP

Last modified: 6 months ago

From Centricular Devlog by Sebastian Dröge (Centricular)

While working on other ancillary data related features in GStreamer (more on that some other day), I noticed that we didn't have support for sending or receiving ancillary data via RTP in GStreamer despite it being a quite simple RTP mapping defined in RFC 8331 and it being used as part of ST 2110.

The new RTP rtpsmpte291pay payloader and rtpsmpte291depay depayloader can be found in this MR for gst-plugins-rs, which should be merged in the next days.

The new elements pass the SMPTE ST 291-1 ancillary data as ST 2038 streams through the pipeline. ST 2038 streams can be directly extracted from or stored in MXF or MPEG-TS containers, can be extracted or inserted into SDI streams with the AJA or Blackmagic Decklink sources/sinks, or can be handled generically by the ST 2038 elements from the rsclosedcaption plugin.

For example the following pipeline can be used to convert an SRT subtitle file to CEA-708 closed captions, which are then converted to an ST 2038 stream and sent over RTP:

$ gst-launch-1.0 filesrc location=file.srt ! subparse ! \
    tttocea708 ! closedcaption/x-cea-708,framerate=30/1 ! ccconverter ! \
    cctost2038anc ! rtpsmpte291pay ! \
    udpsink host=123.123.123.123 port=45678

Now you might be wondering how ST 291-1 and ST 2038 are related to each other and what ST 2038 has to do with RTP.

ST 291-1 is the basic standard that defines the packet format for ancillary packets as e.g. transmitted over SDI. ST 2038 on the other hand defines a mechanism for packaging ST 291-1 into MPEG-TS, and in addition to the plain ST 291-1 packets provides some additional information like the line number on which the ST 291-1 packet is to be stored. RFC 8331 defines a similar mapping just for RTP, and apart from one field it provides exactly the same information and conversion between the two formats is relatively simple.

Using ST 2038 as generic ancillary data stream format in GStreamer seemed like the pragmatic choice here. GStreamer already had support for handling ST 2038 streams in various elements, a set of helper elements to handle ST 2038 streams, and e.g. GStreamer's MXF ANC support (ST 436) also uses ST 2038 as stream format.

GStreamer 1.26.8 stable bug fix release

Last modified: 6 months ago

From GStreamer News by GStreamer

The GStreamer team is pleased to announce another bug fix release in the new stable 1.26 release series of your favourite cross-platform multimedia framework!

This release only contains bugfixes, and it should be safe to update from 1.26.x.

Highlighted bugfixes:

Fix showtime video player showing washed-out colours for HDR videos when subtitles are active
core: performance improvements for elements with many source pads
aacparse: support streams which do not have frequent LOAS config
av1parse: Fix duplicated frames issue in frame splitting
fmp4mux: Fix EAC3 datarate calculation and substream writing
gtk4painablesink: fixes glitches with padded buffers such as for sub-sampled video formats with odd sizes
mpegtsmux: PUSI flag and ID3 tag handling fixes
rtpbaseaudiopay2: Fix marker bit handling for DISCONT and RESYNC buffer flags
rtpvp9pay: Fix parsing of show-existing-frame flag, fixes compatibility with vavp9lpenc
splitmuxsink: accept pads named 'sink_%u' on the muxer for fmp4 muxer support
webrtcsink: Correct lock ordering to prevent deadlock
gst-plugins-rs meson build gained an `auto_plugin_features` option and no longer requires all gstreamer libraries to be available
v4l2 device monitor fixes
x265enc: advertise latency based on encoder parameters instead of hard-coding it to 5 frames
cerbero package builder: Add Rust support for 32-bit Linux x86
Various bug fixes, build fixes, memory leak fixes, and other stability and reliability improvements

See the GStreamer 1.26.8 release notes for more details.

Binaries for Android, iOS, Mac OS X and Windows will be available shortly and will be published on the Downloads page.

Multitrack Audio Capability in GStreamer FLV Plugin

Last modified: 6 months ago

From Centricular Devlog by Taruntej Kanakamalla (Centricular)

For one of our recent projects, we worked on adding multitrack audio capabilities to the GStreamer FLV plugin following the Enhanced RTMP (v2) specification. All changes are now merged upstream (see MR 9682).

Enhanced RTMP

As the name suggests, this is an enhancement to the RTMP (and FLV) specifications. The latest version was released earlier this year and is aimed at meeting the technical standards of current and future online media broadcasting requirements, which include:

Contemporary audio/video codecs (HEVC, AV1, Opus, FLAC, etc.)
Multitrack capabilities (for concurrent management and processing)
Connection stability and resilience
and more

FLV and RTMP in GStreamer

The existing FLV and RTMP2 plugins followed the previous versions of the RTMP/FLV specifications, so they could handle at most one video and one audio track at a time. This is where most of the work was needed, to add the ability to handle multiple tracks.

Multitrack Audio

We considered a couple of options for adding multitrack audio and enhanced FLV capabilities:

Write completely new element(s), preferably in Rust (or)
Extend the current FLV muxer and demuxer elements

Writing a fresh set of elements from scratch, perhaps even in Rust, would have potentially made it easier to accommodate newer versions of the specification. But the second option, extending the existing FLV muxer/demuxer elements turned out to be simpler.

Problems to Solve

So, at a high level, we had two problems to solve:

Handle multiple tracks

As mentioned above, the FLV and RTMP plugins were equipped to handle only one audio and one video track. So we needed to add support for handling multiple audio and video tracks.
Maintain backwards compatibility

There should be no breakage in any existing applications that stream using the legacy FLV format. So, the muxer needs a mechanism to decide whether a given audio input needs to be written into the FLV container in the enhanced format or the legacy format.

A two-step solution

We arrived at a two-step solution for the implementation of multiple track handling:

Use the audio template pad only for the legacy format and define a new audio_%u template for the enhanced format. That makes it clear which stream needs to be written as a legacy FLV track or an enhanced FLV track. The index of the audio_%u pads is also used as the track ID when writing enhanced FLV.
Derive a new element from the existing FLV muxer called eflvmux, which defines the new audio_%u pad templates. The old flvmux will continue to support only the legacy codec/format. That way, the existing applications that use flvmux for legacy FLV streaming will not face any conflicts while requesting the pads.

Minor Caveat

Note that applications that use eflvmux need to specify the correct pad template name (audio or audio_%u) when requesting sink pads to ensure that the input audio data is written to the correct FLV track (legacy or enhanced).

Some formats such as MP3 and AAC are supported in both legacy and enhanced tracks, so we can't just auto-detect the right thing to do.

Interoperability issues

An interesting thing we noticed while testing streaming of multitrack audio with Twitch.tv is that when we tried to stream multiple enhanced FLV tracks or a mix of single legacy track and one or more enhanced FLV tracks, none of the combinations worked.

On the other hand, OBS was able to stream multitrack audio just fine to the same endpoint. Dissecting the RTMP packets sent out by OBS revealed that Twitch can accept at most two tracks, one legacy and one enhanced, and the enhanced FLV track's ID needs to be a non-zero value. To our knowledge, this is not documented anywhere.

It was a simple matter of track ID semantics which could be easily missed without referring to the OBS Studio code. This is also the case with FFmpeg which we recently noticed.

So we have requested a clarification on the track ID semantics from the enhanced RTMP specification maintainers and got a confirmation that 0 remains a valid value for track ID. As mentioned in the specification, it can be used to represent the highest priority track or the default track.

However, when streaming to servers like Twitch you may need to take care to request only pads with index greater than 0 from eflvmux because it may not accept tracks with ID 0.

Sample Pipelines to test

Here are some sample pipelines I used for testing the muxer and demuxer during the implementation.

Scope for other features

The FLV muxer and demuxer have undergone significant structural changes in order to support multiple audio tracks. This should make it easy to update the existing multitrack video capability merge request as well as add support for advanced codecs listed in the specification, some of which (like H265 and AV1) are already in progress.

There is also a work-in-progress merge request to add the eRTMP related support to the rtmp2 plugin.

P.S.: You can also refer to my talk on this topic at the GStreamer Conference that took place in London last month. The recording will be soon published on Ubicast.

Linking and shrinking Rust static libraries: a tale of fire

Last modified: 6 months ago

From Centricular Devlog by Amy (Centricular)

At the GStreamer project, we produce SDKs for lots of platforms: Linux, Android, macOS, iOS, and Windows. However, as we port more and more plugins to Rust 🦀, we are finding ourselves backed into a corner.

Rust static libraries are simply too big.

To give you an example, the AWS folks changed their SDK back in March to switch their cryptographic toolkit over to their aws-lc-rs crate ^[1]. However, that causes a 2-10x increase in code size (bug reports here and here), which gets duplicated on every plugin that makes use of their ecosystem!

What are Rust staticlibs made of?

To summarise, each Rust plugin packs a copy of its dependencies, plus a copy of the Rust standard library. This is not a problem on shared libraries and executables by their very nature, but on static libraries it causes several issues:

Rust leaks unexported symbols from native staticlibs
On some platform, linking against multiple Rust staticlibs is impossible

First approach: Single-Object Prelinking

I won't bore you with the details as I've written another blog post on the subject; the gist is that you can unpack the library, and then ask the linker to perform "partial linking" or "relocatable linking" (Linux term) or "Single-Object Prelinking" (the Apple term, which I'll use throughout the post) over the object files. Setting which symbols you want to be visible for downstream consumers lets dead-code elimination take place at the plugin level, ensuring your libraries are now back to a reasonable size.

Why is it not enough?

Single-Object Prelinking has two drawbacks:

Unoptimized code: the linker won't be able to deduplicate functions between melded objects, as they've been hidden by the prelinking process.
Windows: there are no officially supported tools (read: Visual Studio, LLVM, GCC) to perform this at the compiler level. It is possible to do this with binutils, but the PE-COFF format doesn't allow to change the visibility of unexported functions.

Melt all the object files with the power of dragons' fire!

As said earlier, no tools on Windows support prelinking officially yet, but there's another thing we can do: library deduplication.

Thanks to Rust's comprehensive crate ecosystem, I wrote a new CLI tool which I called dragonfire. Given a complete Rust workspace or list of static libraries, dragonfire:

reads all the static libraries in one pass
deduplicates the object files inside them based on their size and naming (Rust has its own, unique naming convention for object files -- pretty useful!)
copies the duplicate objects into a new static library (usually called gstrsworkspace as its primary use is for the GStreamer ecosystem)
removes the duplicates from the rest of the libraries
updates the symbol table in each of the libraries with the bundled LLVM tools

Thanks to the ar crate, the unpacking and writing only happens at stage 3, ensuring no wasteful I/O slowdowns takes place. The llvm-tools-preview component in turn takes care of locating and calling up llvm-ar for updating the workspace's symbol tables.

A special mention is deserved to the object files' naming convention. Assume a Rust staticlib named libfoo, its object files will be named as:

crate_name-hash1.crate_name.hash2-cgu.nnn.rcgu.o
On Windows only: foo.crate_name-hash1.crate_name.hash2-cgu.nnn.rcgu.o
On non-Windows platforms: same as above, but replacing foo with libfoo-hash

In all cases, crate_name means a dependency present somewhere in the workspace tree, and nnn is a number that will be bigger than zero whenever -C codegen-units was set to higher than 1.

For dragonfire purposes, dropping the library prefix is enough to be able to deduplicate object files; however, on Windows we can also find import library stubs, which LLVM can generate on its own by the use of the #[raw-dylib] annotation ^[2]. Import stubs can have any extension, e.g. .dll, .exe and .sys (the latter two coming from private Win32 APIs). These stubs cannot be deduplicated as they are generated individually per imported function, so dragonfire must preserve them where they are.

Drawbacks of object file deduplication

Again there are several disadvantages of this approach. On Apple platforms, deduplicating libraries triggers a strange linker error, which I've not seen before:

ld: multiple errors: compact unwind must have at least 1 fixup in '<framework>/GStreamer[arm64][1021](libgstrsworkspace_a-3f2b47962471807d-lse_ldset4_acq.o)'; r_symbolnum=-19 out of range in '<framework>/GStreamer[arm64][1022](libgstrsworkspace_a-compiler_builtins-350c23344d78cfbc.compiler_builtins.5e126dca1f5284a9-cgu.162.rcgu.o)'

This also led me to find that Rust libraries were packing bitcode, which is forbidden by Apple. (This was thankfully already fixed before shipping time, but we've not yet updated our Rust minimum version to take advantage of it.)

Another drawback is that Rust's implementation of LTO causes dead-code elimination at the crate level, as opposed to the workspace level. This makes object file deduplication impossible, as each copy is different.

For the Windows platform, there is an extra drawback which affects specifically object files produced by LLVM: the COMDAT sections are set to IMAGE_COMDAT_SELECT_NODUPLICATES. This means that the linker will outright reject functions with multiple definitions, rather than realise they're all duplicates and discarding all but one of the copies. MSVC in particular performs symbol resolution before dead-code elimination. This means that linking will fail because of unresolved symbols before dead code elimination kicks in; to use deduplicated libraries, one must set the linker flags /OPT:REF /FORCE:UNRESOLVED to ensure the dead code can be successfully eliminated.

Results

With library deduplication, we can make static libraries up to 44x smaller when building under MSVC ^[3] (you can expand the tables below for the full comparison):

gstaws.lib: from 173M to 71M (~2.5x)
gstrswebrtc.lib: from 193M to 66M (~2.9x)
gstwebrtchttp.lib: from 66M to 1,5M (~ 44x)

Table: before and after melding under MSVC

file	no prelinking	melded
gstaws.lib	173M	71M
gstcdg.lib	36M	572K
gstclaxon.lib	32M	568K
gstdav1d.lib	34M	936K
gstelevenlabs.lib	59M	1008K
gstfallbackswitch.lib	37M	2,3M
gstffv1.lib	34M	744K
gstfmp4.lib	39M	3,2M
gstgif.lib	34M	1,1M
gstgopbuffer.lib	30M	456K
gsthlsmultivariantsink.lib	46M	1,6M
gsthlssink3.lib	41M	1,2M
gsthsv.lib	34M	796K
gstjson.lib	31M	704K
gstlewton.lib	33M	1,2M
gstlivesync.lib	33M	728K
gstmp4.lib	38M	2,2M
gstmpegtslive.lib	31M	704K
gstndi.lib	38M	2,8M
gstoriginalbuffer.lib	34M	376K
gstquinn.lib	75M	23M
gstraptorq.lib	33M	2,4M
gstrav1e.lib	46M	11M
gstregex.lib	38M	404K
gstreqwest.lib	58M	1,4M
gstrsanalytics.lib	35M	1000K
gstrsaudiofx.lib	54M	22M
gstrsclosedcaption.lib	52M	8,4M
gstrsinter.lib	35M	604K
gstrsonvif.lib	46M	2,0M
gstrspng.lib	35M	1,2M
gstrsrtp.lib	59M	11M
gstrsrtsp.lib	57M	4,4M
gstrstracers.lib	40M	2,4M
gstrsvideofx.lib	48M	11M
gstrswebrtc.lib	193M	66M
gstrsworkspace.lib	N/A	137M
gststreamgrouper.lib	30M	376K
gsttextahead.lib	30M	332K
gsttextwrap.lib	32M	2,1M
gstthreadshare.lib	52M	12M
gsttogglerecord.lib	35M	808K
gsturiplaylistbin.lib	31M	648K
gstvvdec.lib	34M	564K
gstwebrtchttp.lib	66M	1,5M

The results from the melding above can be compared with the file sizes obtained using LTO on Windows ^[4] (remember it doesn't actually fix linking against plugins):

gstaws.lib: from 71M (LTO) to 67M (melded) (-5.6%)
gstrswebrtc.lib: from 105M to 66M (-37.1%)
gstwebrtchttp.lib: from 28M to 1,5M (-94.6%)

Table: before and after LTO under MSVC (no melding involved)

file (codegen-units=1 in all cases)	no prelinking	lto=thin	opt-level=s + lto=thin	debug=1 + opt-level=s	debug=1 + lto=thin + opt-level=s
old/gstaws.lib	199M	199M	171M	78M	67M
old/gstcdg.lib	11M	11M	11M	7,5M	7,5M
old/gstclaxon.lib	11M	11M	11M	7,7M	7,7M
old/gstdav1d.lib	12M	12M	12M	7,9M	7,8M
old/gstelevenlabs.lib	52M	52M	49M	24M	22M
old/gstfallbackswitch.lib	18M	18M	17M	11M	11M
old/gstffv1.lib	11M	11M	11M	7,6M	7,6M
old/gstfmp4.lib	20M	20M	19M	12M	11M
old/gstgif.lib	12M	12M	12M	7,9M	7,9M
old/gstgopbuffer.lib	9,7M	9,7M	9,7M	7,5M	7,4M
old/gsthlsmultivariantsink.lib	16M	16M	16M	9,6M	9,4M
old/gsthlssink3.lib	14M	14M	14M	8,9M	8,8M
old/gsthsv.lib	11M	11M	11M	7,8M	7,7M
old/gstjson.lib	12M	12M	12M	8,4M	8,2M
old/gstlewton.lib	12M	12M	12M	8,1M	8,1M
old/gstlivesync.lib	12M	12M	12M	8,3M	8,2M
old/gstmp4.lib	17M	17M	17M	9,9M	9,7M
old/gstmpegtslive.lib	12M	12M	12M	8,0M	7,9M
old/gstndi.lib	21M	21M	20M	12M	11M
old/gstoriginalbuffer.lib	9,6M	9,6M	9,7M	7,4M	7,3M
old/gstquinn.lib	94M	94M	86M	39M	35M
old/gstraptorq.lib	18M	18M	17M	9,8M	9,4M
old/gstrav1e.lib	39M	39M	37M	19M	18M
old/gstregex.lib	26M	26M	25M	14M	14M
old/gstreqwest.lib	53M	53M	49M	24M	22M
old/gstrsanalytics.lib	15M	15M	14M	9,2M	8,9M
old/gstrsaudiofx.lib	57M	57M	56M	23M	22M
old/gstrsclosedcaption.lib	40M	40M	36M	20M	18M
old/gstrsinter.lib	14M	14M	13M	8,5M	8,4M
old/gstrsonvif.lib	21M	21M	20M	11M	11M
old/gstrspng.lib	13M	13M	13M	8,2M	8,2M
old/gstrsrtp.lib	47M	47M	44M	22M	20M
old/gstrsrtsp.lib	35M	35M	33M	16M	15M
old/gstrstracers.lib	28M	28M	27M	16M	15M
old/gstrsvideofx.lib	16M	16M	35M	9,2M	15M
old/gstrswebrtc.lib	329M	329M	284M	124M	105M
old/gststreamgrouper.lib	9,6M	9,6M	9,7M	7,2M	7,2M
old/gsttextahead.lib	9,6M	9,6M	9,5M	7,4M	7,3M
old/gsttextwrap.lib	13M	13M	13M	8,4M	8,4M
old/gstthreadshare.lib	49M	49M	45M	23M	20M
old/gsttogglerecord.lib	13M	13M	13M	8,5M	8,4M
old/gsturiplaylistbin.lib	11M	11M	11M	7,9M	7,9M
old/gstvvdec.lib	11M	11M	11M	7,5M	7,5M
old/gstwebrtchttp.lib	69M	69M	63M	30M	28M

Conclusion

This article presents several longstanding pain points in Rust, namely staticlib binary sizes, symbol leaking, and incompatibilities between Rust and MSVC. I demonstrate the tool dragonfire that aims to address and work around, where possible, these issues, along with remaining issues to be addressed.

As explained earlier, dragonfire treated libraries are live on all platforms except Apple's, if you use the development packages from mainline; it's on track hopefully for the 1.28 release of GStreamer. There's already a merge request pending to enable it for Apple platforms, we're only waiting to update the Rust mininum version.

If you want to have a look, dragonfire's source code is available at Freedesktop's GitLab instance. Please note that at the moment I have no plans to submit this to crates.io.

Feel free to contact me with any feedback, and thanks for reading!

See its default-https-client feature at lib.rs, you will find it throughout the AWS SDK ecosystem. ↩︎
https://doc.rust-lang.org/reference/items/external-blocks.html#dylib-versus-raw-dylib ↩︎
In all cases the -C flags are debug=1 + codegen-units=1 + opt-level=s; see this comment for the complete results across all platforms. ↩︎
Source: https://gitlab.freedesktop.org/gstreamer/cerbero/-/merge_requests/1895 ↩︎

GStreamer 1.26.7 stable bug fix release

Last modified: 7 months ago

From GStreamer News by GStreamer

The GStreamer team is pleased to announce another bug fix release in the new stable 1.26 release series of your favourite cross-platform multimedia framework!

This release only contains bugfixes, and it should be safe to update from 1.26.x.

Highlighted bugfixes:

cea608overlay: improve handling of non-system memory
cuda: Fix runtime kernel compile with CUDA 13.0
d3d12: Fix crop meta support in converter and passthrough handling in deinterlacer
fallbacksrc: source handling improvements; no-more-pads signal for streams-unaware parents
inter: add properties to fine tune the inner elements
qtdemux: surround sound channel layout handling fixes and performance improvements for GoPro videos
rtp: Add linear audio (L8, L16, L24) RTP payloaders / depayloaders
rtspsrc: Send RTSP keepalives in TCP/interleaved modes
rtpamrpay2: frame quality indicator flag related fixes
rtpbasepay2: reuse last PTS when possible, to work around problems with NVIDIA Jetson AV1 encoder
mpegtsmux, tsdemux: Opus audio handling fixes
threadshare: latency related improvements and many other fixes
matroskamux, tsmux, flvmux, cea608mux: Best pad determination fixes at EOS
unixfd: support buffers with a big payload
videorate unknown buffer duration assertion failure with variable framerates
editing services: Make GESTimeline respect `SELECT_ELEMENT_TRACK` signal discard decision; memory leak fixes
gobject-introspection annotation fixes
cerbero: Update meson to 1.9.0 to enable Xcode 26 compatibility
Various bug fixes, build fixes, memory leak fixes, and other stability and reliability improvements

See the GStreamer 1.26.7 release notes for more details.

Binaries for Android, iOS, Mac OS X and Windows will be available shortly.

GStreamer Conference 2025: Full Schedule, Talk Abstracts and Speakers Biographies now available

Last modified: 7 months ago

From GStreamer News by GStreamer

The GStreamer Conference team is pleased to announce that the full conference schedule including talk abstracts and speaker biographies is now available for this year's lineup of talks and speakers, covering again an exciting range of topics!

The GStreamer Conference 2025 will take place on 23-24 October 2025 in London, UK, followed by a hackfest.

Details about the conference, hackfest and how to register can be found on the conference website.

This year's topics and speakers:

GStreamer State of the Union
Tim-Philipp Müller, Centricular
Region-Based Compression in GStreamer
Axel Tobieson, Spiideo
Rusty Pipes and Oxidized Wires
Arun Raghavan, Valve Corporation
The Quest for Low-Latency Desktop Audio
Nirbheek Chauhan, Centricular
Rebuilding Our Video Server Engine on GStreamer
Ray Tiley, Tightrope Media Systems
Enabling I-frame playlists with HLS CMAF
Sanchayan Maity, Centricular
From Streams to Insights: Advancing GstAnalytics
Daniel Morin, Collabora
Brief history of GStreamer adoption at Twilio
Jeff Foster and Andrey Kovalenko, Twilio
Threadshare, a plugin collection to increase scalability
François Laignel, Centricular
Costly Speech: an introduction
Mathieu Duponchelle, Centricular
GstWebRTC in WebKit, current status & plans
Philippe Normand, Igalia
Lessons Learned - Transforming Our Video Management System with GStreamer
Bumi̇n Kaan Aydin, HAVELSAN
VVC/H.266 Alpha Channel support in GStreamer
Andoni Morales Alastruey, Fluendo
librice: the TURNing point
Matthew Waters, Centricular
Virtual Hardware: Emulating a Video4linux Hardware Decoder
Jan Schmidt, Centricular
A Global Crisis => Global Leadership - WildCamera
Will Robertson
Tools to profile a video encoder
Diego Nieto Munoz, Fluendo
A new era for GStreamer C++ bindings
Sebastian Dröge, Centricular
Auxiliary Stream Wrangling in playbin3
Jan Schmidt, Centricular
VVC/H.266 in GStreamer
Carlos Bentzen, Igalia
GStreamer in the Medical Simulation Environment
Thomas Goodwin and Jeff Wilson, Laerdal Labs
What’s New in GStreamer D3D12
Seungha Yang, Centricular
Fluster news
Rubén Gonzalez, Fluendo
dcSCTP in GStreamer
Tulio Beloqui, Pexip
The road to Enhanced FLV and RTMP in GStreamer
Taruntej Kanakamalla, Centricular
Improving WebRTC datachannel performance
Emil Ljungdahl, Axis Communications
Cutting audio latency with bidirectional WebRTC
Albert Sjölund, Axis Communications
GStreamer in VR devices manufacturing
Ivan Loskutov, Meta
GStreamer for audio distribution at Sveriges Radio
Karl Johannes Jondell and Christofer Bustad, Sveriges Radio
GStreamer at Scale: Recent Lessons from Real-World Video Conferencing
Håvard Graff, Pexip
Gst.wasm season 3
Jorge Zapata and Fabián Orccón, Fluendo
Time Remapping and GES: Implementation Details and Latest Updates
Thibault Saunier, Igalia
PipeWire’s pipeline operation vs GStreamer’s explained
George Kiagiadakis, Collabora
Why Keep a Thread Running? Meet GstBaseIdleSrc
Camilo Celis Guzman, Pexip
Bringing AMD HIP into GStreamer
Max Campbell, Veo Technologies; Seungha Yang, Centricular
WirePlumber, present challenges and future directions
Julian Bouzas, Collabora
Hardware-Accelerated Live Broadcasting of Uncompressed ST 2110 Streams with GStreamer leveraging NVIDIA GPUs and NICs
Johan Jino, Nvidia
The Art of Debugging GStreamer Software
Nicolas Dufresne, Collabora
Making GStreamer Go!
Wilhelm Bartel

Lightning Talks:

Showing the invisible: Analysing buffer flow with tracers
François Laignel, Centricular
Video Frame Scheduler Plugin for Improving WebRTC Playback Quality
Haejung Hwang, LG Electronics
Audio source separation using snakes, crabs and torches
Sebastian Dröge, Centricular
soothe: a proposal for encoder testing
Victor Manuel Jáquez Leal, Igalia
burn: a little case study on using GstAnalytics from Rust
Sebastian Dröge, Centricular
GstVA and GStreamer-VAAPI updates
Victor Manuel Jáquez Leal, Igalia
Vulkan Video: pipeline update
Stéphane Cerveau, Igalia
Fallback Streaming for RTSP Server
Axel Tobieson, Spiideo
Fast and dirty: vibe coding use case plugging the NVIDIA kernel into GStreamer streams
Jesús Luque, Mediapro
What's up with Video4Linux support
Nicolas Dufresne, Collabora
pexLGPL bundle
Tulio Beloqui, Pexip
Reading v4l2 data from MR813 devices in a way that doesn't suck
Vivia Nikolaidou
Rewriting CoreAudio-based elements on macOS
Piotr Brzeziński, Centricular
fallbacksrc, now with more streams
Piotr Brzeziński, Centricular
State of MPEG-TS in GStreamer
Edward Hervey, Centricular
Full GPU driven AI workloads with GStreamer and Raven
Andoni Morales Alastruey, Fluendo
...and more to come
Submit your lightning talk now!

Many thanks to our amazing sponsors ‒ Platinum sponsors Centricular, Collabora, Igalia, and Pexip, Gold sponsors Axis Communications and Fluendo, and Silver sponsors MediaScribe and Veo, without whom the conference would not be possible in this form.

We hope to see you all in London soon! Don't forget to register as soon as possible if you're planning on joining us, so we can order enough food and drinks!

There's also a hackfest after the conference. If you're planning on joining us for the hackfest you must register separately for that, by 21 October 2025 at the latest.

Asymptotic on hiatus

Last modified: 7 months ago

From Arun Raghavan by Arun (Arun Raghavan)

Asymptotic was started 6 years ago, when I wanted to build something that would be larger than just myself.

We’ve worked with some incredible clients in this time, on a wide range of projects. I would be remiss to not thank all the teams that put their trust in us.

In addition to working on interesting challenges, our goal was to make sure we were making a positive impact on the open source projects that we are part of. I think we truly punched above our weight class (pardon the boxing metaphor), on this front – all the upstream work we have done stands testament to that.

Of course, the biggest single contributor to what we were able to achieve is our team. My partner, Deepa, was instrumental in shaping how the company was formed and run. Sanchayan (who took a leap of faith in joining us first), and Taruntej were stellar colleagues and friends on this journey.

It’s been an incredibly rewarding experience, but the time has come to move on to other things, and we have now paused operations. I’ll soon write about some recent work and what’s next.

GStreamer 1.26.6 stable bug fix release

Last modified: 8 months ago

From GStreamer News by GStreamer

The GStreamer team is pleased to announce another bug fix release in the new stable 1.26 release series of your favourite cross-platform multimedia framework!

This release only contains bugfixes, and it should be safe to update from 1.26.x.

Highlighted bugfixes:

analytics GstTensorMeta handling changes (see release notes)
closed caption combiner and transcriberbin stability fixes
decklinkvideosrc: fix unrecoverable state after failing to start streaming because device is busy
decodebin3 tag handling improvements
fallbacksrc: Fix sources only being restarted once, as well as some deadlocks and race conditions on shutdown
gtk4paintablesink: Try importing dmabufs withouth DMA_DRM caps
hlsdemux2: Fix parsing of byterange and init map directives
rtpmp4gdepay2: allow only constantduration with neither constantsize nor sizelength set
spotifysrc: update to librespot 0.7 to make work after recent Spotify changes
threadshare: new blocking adapter element for use in front of block elements such as sinks that sync to the clock
threadshare: various other threadshare element fixes and improvements
v4l2: Add support for WVC1 and WMV3
videorate: possible performance improvements when operating in drop-only mode
GstBaseParse fixes
Vulkan video decoder fixes
Fix gst-device-monitor-1.0 tool device-path regression on Windows
Python bindings: Handle buffer PTS, DTS, duration, offset, and offset-end as unsigned long long (regression fix)
Cerbero: Reduce recipe parallelism in various cases and dump cerbero and recipe versions into datadir during packaging
Various bug fixes, build fixes, memory leak fixes, and other stability and reliability improvements

See the GStreamer 1.26.6 release notes for more details.

Binaries for Android, iOS, Mac OS X and Windows will be available shortly.

More adventures in the land of AI and Open Source

Last modified: 8 months ago

From Christian F.K. Schaller by Christian Fredrik Kalager Schaller (Christian Schaller)

I been doing a lot of work with AI recently, both as part of a couple of projects I am part of at work, but I have also taken a personal interest in understanding the current state and what is possible. My favourite AI tool currently is Claude.ai. Anyway I have a Prusa Core One 3D printer now that I also love playing with and one thing I been wanting to do is to print some multicolor prints with it. So the Prusa Core One is a single extruder printer, which means it only has 1 filament loaded at any given time. Other printers on the market, like the PrusaXL has 5 extruders, so it can have 5 filaments or colors loaded at the same time.

Prusa Single Extruder Multimaterial setting

The thing is that the Prusa Slicer (the slicer is the software that takes a 3d model and prepares the instructions for the printer based on that 3d model) got this feature called Single Extruder Multi Material. And while it is a process that wastes a lot of filament and takes a lot of manual intervention during the print, it does basically work.

What I quickly discovered was that using this feature is non-trivial. First of all I had to manually add some G Code to the model to actually get it to ask me to switch filament for each color in my print, but the bigger issue is that the printer will ask you to change the color or filament, but you have no way of knowing which one to switch to, so for my model I had 15 filament changes and no simple way of knowing which order to switch in. So people where solving this among other things through looking through the print layer by layer and writing down the color changes, but I thought that this must be possible to automate with an application. So I opened Claude and started working on this thing I ended up calling Prusa Color Mate..

So the idea for the application was simple enough, have it analyze the project file, extract information about the order of color changes and display them for the user in a way that allows them to manually check of each color as its inserted. So I started off with doing a simple python script that would just print to the console. So it quickly turned out that the hard part of this project was to parse the input files and it was made worse by my ignorance. So what I learned the hard way is that if you store a project in Prusa Slicer it will use this format called 3mf. So my thought was, lets just analyze the 3mf file and extract the information I need. It took my quite a bit of back and forth with Claude, feeding claude source code from Prusa’s implementation and pdf files with specifications, but eventually the application did spit out a list of 15 toolchanges and the colors associated with them. So I happily tried to use it to print my model. I quickly discovered that the color ordering was all wrong. And after even more back and forth with Claude and reading online I realized that the 3mf file is a format for storing 3d models, but that is not what is being fed your 3d printer, instead for the printer the file provided is a bgcode file. And while the 3mf file did contain the information that you had to change filament 15 times, the information on in which order is simply not stored in the 3mf file as that is something chosen as part of composing your print. That print composition file is using a file format called bgcode. So I now had to extract the information from the bgcode file which took me basically a full day to figure out with the help of Claude. I could probably have gotten over the finish line sooner by making some better choices underway, but the extreme optimism of the AI probably lead me to believe it was going to be easier than it was to for instance just do everything in Python.
At first I tried using this libbgcode library written in C++, but I had a lot of issues getting Claude to incorporate it properly into my project, with Meson and CMAKE interaction issues (in retrospect I should have just made a quick RPM of libbgcode and used that). After a lot of struggles with this Claude thought that parsing the bgcode file in python natively would be easier than trying to use the C++ library, so I went down that route. I started by feeding Claude a description of the format that I found online and asked it to write me a parser for it. It didn’t work very well and I ended up having a lot of back and forth, testing and debugging, finding more documentation, including a blog post about this meatpack format used inside the file, but it still didn’t really work very well. In the end what probably helped the most was asking it to use the relevant files from libbgcode and Prusa Slicer as documentation, because even if that too took a lot of back and forth, eventually I had a working application that was able to extract the tool change data and associated colors from the file. I ended up using one external dependency which was the heatshrink2 library that I PIP installed, but while that worked correctly, it took a look time for me and Claude to figure out exactly what parameters to feed it to work with the Prusa generated file.

Screenshot of Prusa Color Mate

So know I had the working application going and was able to verify it with my first print. I even polished it up a little, by also adding detection of the manual filament change code, so that people who try to use the application will be made aware they need to add that through Prusa Slicer. Maybe I could bake that into the tool, but atm I got only bgcode decoders, not encoders, in my project.

Warning showed for missing G Code Dialog that gives detailed instructions for how to add G Code

So to conclude, it probably took me 2.5 days to write this application using Claude, it is a fairly niche tool, so I don’t expect a lot of users, but I made it to solve a problem for myself. If I had to write this pre-AI myself it would have taken me weeks, like figuring out the different formats and how library APIs worked etc. would have taken me a long time. So I am not an especially proficient coder, so a better coder than me could probably put this together quicker than I would, but I think this is part of what I think will change with AI, that even with limited time and technical skills you can put together simple applications like this to solve your own problems.

If you are a Prusa Core One user and would like to play with multicolor prints you can find Prusa Color Mate on Gitlab. I have not tested it on any other system or printer than my own, so I don’t even know if it will work with other non-Core One Prusa printers. There are rpms for Fedora you can download in the packaging directory of the gitlab repo, which also includes a RPM for the heatshrink2 library.

As for future plans for this application I don’t really have any. It solves my issue the way it is today, but if there turns out to be an interested user community out there maybe I will try to clean it up and create a proper flatpak for it.

GStreamer 1.27.2 unstable development release

Last modified: 8 months ago

From GStreamer News by GStreamer

The GStreamer team is pleased to announce another development release in the API/ABI-unstable 1.27 release series.

This development release is primarily for developers and early adopters, and distros should probably not package it.

Highlighted changes:

Add more 10bit RGB formats in GStreamer Video, OpenGL and Wayland, as well as in deinterlace and gdkpixbufoverlay
analytics: new analytics combiner and splitter elements plus batch meta to batch buffers from one or more streams
analyticsoverlay: Add expire-overlay property
onnx: Add Verisilicon provider support
awstranscriber2: add property for setting show_speaker_labels
awstranslate: expose property for turning brevity on
speechmatics: expose mask-profanities property
textaccumulate: new element for speech synthesis or translation preprocessing
tttocea608: expose speaker-prefix property
cea708mux: expose "discarded-services" property on sink pads
cuda crop meta support
hlssink3, hlscmafsink: Support the use of a single media file
s302mparse: Add new S302M audio parser
webrtc: add WHEP client signaller; sdp and stats-related improvements
threadshare: many improvements to the various elements, plus examples and a new benchmark program; relicense to MPL-2.0
gtk4paintablesink: Add YCbCr memory texture formats and improve color-state fallbacks
OpenGL: Add support for the NV24 pixel format; support changing caps and `get_gl_context()` in glbasesrc
rtspsrc: Send RTSP keepalives also in TCP/interleaved modes
nvencoder: interlaced video handling improvements
vaav1enc: Enable intrablock copy and palette mode
videopool: support parsing dma_drm caps
Vulkan VP9 video decode support and many other video encode and decode improvements
waylandsink: Parse and set the HDR10 metadata and other color management improvements
LCEVC: Add autoplugging decoding support for LCEVC H265 and H266 video streams and LCEVC H.265 encoder
GstMiniObject: Add missing `take()` and `steal()` functions and convert `is_writable()` and `make_writable()` macros to inline functions
alsa: Improve PCM sink enumeration
d3d12: various d3d12swapchainsink enhancements and bug fixes; fisheye image dewarping support
wasapi2: add support for dynamic device switching, exclusive mode and format negotiation; device provider and latency enhancements
windows: Disable all audio device providers except wasapi2
dots-viewer: Improve dot file generation and interactivity
gst-editing-services: Make framepositioner zorder controllable and expose it
Various introspection fixes and bindings updates
Cerbero: Update to Android API level 24; add config for number of cargo jobs; ship unixfd plugin
Cerbero: Implement library melding for smaller binary sizes of Rust plugins

Binaries for Android, iOS, Mac OS X and Windows will be made available shortly at the usual location.

Release tarballs can be downloaded directly here:

As always, please give it a spin and let us know of any issues you run into by filing an issue in GitLab.

GStreamer Conference 2025 to take place 23-24 October 2025 in London, UK

Last modified: 8 months ago

From GStreamer News by GStreamer

The GStreamer project is thrilled to announce that this year's GStreamer Conference is now confirmed to take place on Thursday and Friday 23-24 October 2025 in London, UK, followed by a 2-3 day hackfests.

You can find more details about the conference on the GStreamer Conference 2025 web site.

A Call for Presentations will be sent out shortly.

Registration is expected to open soon as well.

We will announce those and any further updates on the gstreamer-announce mailing list, the conference website, on Bluesky, on Mastodon, and on our Discourse forum.

Talk slots will be available in varying durations from 20 minutes up to 45 minutes. Whatever you're doing or planning to do with GStreamer, we'd like to hear from you!

We also plan to have sessions with short lightning talks / demos / showcase talks for those who just want to show what they've been working on or do a mini-talk instead of a full-length talk. Lightning talk slots will be allocated on a first-come-first-serve basis, so make sure to reserve your slot if you plan on giving a lightning talk.

A GStreamer hackfest will take place right after the conference.

We hope to see you in London!

Please spread the word!

Perfect Audio Device Switching on macOS and Windows

Last modified: 9 months ago

From Centricular Devlog by Nirbheek Chauhan (Centricular)

Over the past few years, we've been slowly working on improving the platform-specific plugins for Windows, macOS, iOS, and Android, and making them work as well as the equivalent plugins on Linux. In this episode, we will look at audio device switching in the source and sink elements on macOS and Windows.

On Linux, if you're using the PulseAudio elements (both with the PulseAudio daemon and PipeWire), you get perfect device switching: quick, seamless, easy, and reliable. Simply set the device property whenever you want and you're off to the races. If the device gets unplugged, the pipeline will continue, and you will get notified of the unplug via the GST_MESSAGE_DEVICE_REMOVED bus message from GstDeviceMonitor so you can change the device.

As of a few weeks ago, the Windows Audio plugin wasapi2 implements the same behaviour. All you have to do is set the device property to whatever device you want (fetched using the GstDeviceMonitor API), at any time.

A merge request is open for adding the same feature to the macOS audio plugin, and is expected to be merged soon.

For graceful error handling, such as accidental device unplug or other unexpected errors, there's a new continue-on-error property. Setting that will cause the source to output silence after unplug, whereas the sink will simply discard the buffers. An element warning will be emitted to notify the app (alongside the GST_MESSAGE_DEVICE_REMOVED bus message if there was a hardware unplug), and the app can switch the device by setting the device property.

Thanks to Seungha and Piotr for working on this!

GStreamer 1.26.5 stable bug fix release

Last modified: 9 months ago

From GStreamer News by GStreamer

The GStreamer team is pleased to announce another bug fix release in the new stable 1.26 release series of your favourite cross-platform multimedia framework!

This release only contains bugfixes, and it should be safe to update from 1.26.x.

Highlighted bugfixes:

audioconvert: Fix caps negotiation regression when using a mix matrix
aws: Add support for brevity in awstranslate and add option to partition speakers in the transcription output of awstranscriber2
speechmatics speech-to-text: Expose mask-profanities property
cea708mux: Add support for discarding select services on each input
cea608overlay, cea708overlay: Accept GPU memory buffers if downstream supports the overlay composition meta
d3d12screencapture source element and device provider fixes
decodebin3: Don't error on an incoming ONVIF metadata stream
uridecodebin3: Fix potential crash when adding URIs to messages, e.g. if no decoder is available
v4l2: Fix memory leak for dynamic resolution change
VA encoder fixes
videorate, imagefreeze: Add support for JPEG XS
Vulkan integration fixes
wasapi2 audio device monitor improvements
webrtc: Add WHEP client signaller and add whepclientsrc element on top of webrtcsrc using that
threadshare: Many improvements and fixes to the generic threadshare and RTP threadshare elements
rtpbin2 improvements and fixes
gst-device-monitor-1.0 command line tool improvements
Various bug fixes, build fixes, memory leak fixes, and other stability and reliability improvements

See the GStreamer 1.26.5 release notes for more details.

Binaries for Android, iOS, Mac OS X and Windows will be available shortly.

Artificial Intelligence and the Linux Community

Last modified: 9 months ago

From Christian F.K. Schaller by Christian Fredrik Kalager Schaller (Christian Schaller)

I have wanted to write this blog post for quite some time, but been unsure about the exact angle of it. I think I found that angle now where I will root the post in a very tangible concrete example.

So the reason I wanted to write this was because I do feel there is a palpable skepticism and negativity towards AI in the Linux community, and I understand that there are societal implications that worry us all, like how deep fakes have the potential to upend a lot of things from news disbursement to court proceedings. Or how malign forces can use AI to drive narratives in social media etc., is if social media wasn’t toxic enough as it is. But for open source developers like us in the Linux community there is also I think deep concerns about tooling that deeply incurs into something that close to the heart of our community, writing code and being skilled at writing code. I hear and share all those concerns, but at the same time having spent time the last weeks using Claude.ai I do feel it is not something we can afford not to engage with. So I know people have probably used a lot of different AI tools in the last year, some being more cute than useful others being somewhat useful and others being interesting improvements to your Google search for instance. I think I shared a lot of those impressions, but using Claude this last week has opened my eyes to what AI enginers are going to be capable of going forward.

So my initial test was writing a python application for internal use at Red Hat, basically connecting to a variety of sources and pulling data and putting together reports, typical management fare. How simple it was impressed me though, I think most of us having to deal with pulling data from a new source know how painful it can be, with issues ranging from missing, outdated or hard to parse API documentation. I think a lot of us also then spend a lot of time experimenting to figure out the right API calls to make in order to pull the data we need. Well Claude was able to give me python scripts that pulled that data right away, I still had to spend some time with it to fine tune the data being pulled and ensuring we pulled the right data, but I did it in a fraction of the time I would have spent figuring that stuff out on my own. The one data source Claude struggled with Fedora’s Bohdi, well once I pointed it to the URL with the latest documentation for that it figured out that it would be better to use the bohdi client library to pull data and once it had that figured out it was clear sailing.

So coming of pretty impressed by that experience I wanted to understand if Claude would be able to put together something programmatically more complex, like a GTK+ application using Vulkan. [Note: should have checked the code better, but thanks to the people who pointed this out. I told the AI to use Vulkan, which it did, but not in the way I expected, I expected it to render the globe using Vulkan, but it instead decided to ensure GTK used its Vulkan backend, an important lesson in both prompt engineering and checking the code afterwards).]So I thought what would be a good example of such an application and I also figured it would be fun if I found something really old and asked Claude to help me bring it into the current age. So I suddenly remembered xtraceroute, which is an old application orginally written in GTK1 and OpenGL showing your traceroute on a 3d Globe.

Screenshot of the original Xtraceroute application

I went looking for it and found that while it had been updated to GTK2 since last I looked at it, it had not been touched in 20 years. So I thought, this is a great testcase. So I grabbed the code and fed it into Claude, asking Claude to give me a modern GTK4 version of this application using Vulkan. Ok so how did it go? Well it ended up being an iterative effort, with a lot of back and forth between myself and Claude. One nice feature Claude has is that you can upload screenshots of your application and Claude will use it to help you debug. Thanks to that I got a long list of screenshots showing how this application evolved over the course of the day I spent on it.

This screenshot shows Claudes first attempt of transforming the 20 year old xtraceroute application into a modern one using GTK4, Vulkan and also adding a Meson build system. My prompt to create this was feeding in the old code and asking Claude to come up with a GTK4 and Vulkan equivalent. As you can see the GTK4 UI is very simple, but ok as it is. The rendered globe leaves something to be desired though. I assume the old code had some 2d fall backcode, so Claude latched onto that and focused on trying to use the Cairo API to recreate this application, despite me telling it I wanted a Vulkan application. What what we ended up with was a 2d circle that I could spin around like a wheel of fortuen. The code did have some Vulkan stuff, but defaulted to the Cairo code.

Second attempt at updating this application Anyway, I feed the screenshot of my first version back into Claude and said that the image was not a globe, it was missing the texture and the interaction model was more like a wheel of fortune. As you can see the second attempt did not fare any better, in fact we went from circle to square. This was also the point where I realized that I hadn’t uploaded the textures into Claude, so I had to tell it to load the earth.png from the local file repository.

Third attempt from Claude.Ok, so I feed my second screenshot back into Claude and pointed out that it was no globe, in fact it wasn’t even a circle and the texture was still missing. With me pointing out it needed to load the earth.png file from disk it came back with the texture loading. Well, I really wanted it to be a globe, so I said thank you for loading the texture, now do it on a globe.

This is the output of the 4th attempt. As you can see, it did bring back a circle, but the texture was gone again. At this point I also decided I didn’t want Claude to waste anymore time on the Cairo code, this was meant to be a proper 3d application. So I told Claude to drop all the Cairo code and instead focus on making a Vulkan application.

So now we finally had something that started looking like something, although it was still a circle, not a globe and it got that weird division of 4 thing on the globe. Anyway, I could see it using Vulkan now and it was loading the texture. So I was feeling like we where making some decent forward movement. So I wrote a longer prompt describing the globe I wanted and how I wanted to interact with it and this time Claude did come back with Vulkan code that rendered this as a globe, thus I didn’t end up screenshoting it unfortunately.

So with the working globe now in place, I wanted to bring in the day/night cycle from the original application. So I asked Claude to load the night texture and use it as an overlay to get that day/night effect. I also asked it to calculate the position of the sun to earth at the current time, so that it could overlay the texture in the right location. As you can see Claude did a decent job of it, although the colors was broken.

So I kept fighting with the color for a bit, Claude could see it was rendering it brown, but could not initally figure out why. I could tell the code was doing things mostly right so I also asked it to look at some other things, like I realized that when I tried to spin the globe it just twisted the texture. We got that fixed and also I got Claude to create some tests scripts that helped us figure out that the color issue was a RGB vs BRG issue, so as soon as we understood that then Claude was able to fix the code to render colors correctly. I also had a few iterations trying to get the scaling and mouse interaction behaving correctly.

So at this point I had probably worked on this for 4-5 hours, the globe was rendering nicely and I could interact with it using the mouse. Next step was adding the traceroute lines back. By default Claude had just put in code to render some small dots on the hop points, not draw the lines. Also the old method for getting the geocoordinates, but I asked Claude to help me find some current services which it did and once I picked one it on first try gave me code that was able to request the geolocation of the ip addresses it got back. To polish it up I also asked Claude to make sure we drew the lines following the globes curvature instead of just drawing straight lines.

Final version of the updated Xtraceroute application. It mostly works now, but I did realize why I always thought this was a fun idea, but less interesting in practice, you often don’t get very good traceroutes back, probably due to websites being cached or hosted globally. But I felt that I had proven that with a days work Claude was able to help me bring this old GTK application into the modern world.

Conclusions

So I am not going to argue that Xtraceroute is an important application that deserved to be saved, in fact while I feel the current version works and proves my point I also lost motivation to try to polish it up due to the limitations of tracerouting, but the code is available for anyone who finds it worthwhile.

But this wasn’t really about Xtraceroute, what I wanted to show here is how someone lacking C and Vulkan development skills can actually use a tool like Claude to put together a working application even one using more advanced stuff like Vulkan, which I know many more than me would feel daunting. I also found Claude really good at producing documentation and architecture documents for your application. It was also able to give me a working Meson build system and create all the desktop integration files for me, like the .desktop file, the metainfo file and so on. For the icons I ended up using Gemini as Claude do not do image generation at this point, although it was able to take a png file and create a SVG version of it (although not a perfect likeness to the original png).

Another thing I want to say is that the way I think about this, it is not that it makes coding skills less valuable, AIs can do amazing things, but you need to keep a close eye on them to ensure the code they create actually do what you want and that it does it in a sensible manner. For instance in my reporting application I wanted to embed a pdf file and Claude initial thought was to bring in webkit to do the rendering. That would have worked, but would have added a very big and complex dependency to my application, so I had to tell it that it could just use libpoppler to do it, something Claude agreed was a much better solution. The bigger the codebase the harder it also becomes for the AI to deal with it, but I think it hose circumstances what you can do is use the AI to give you sample code for the functionality you want in the programming language you want and then you can just work on incorporating that into your big application.

The other part here if course in terms of open source is how should contributors and projects deal with this? I know there are projects where AI generated CVEs or patches are drowning them and that helps nobody. But I think if we see AI as a developers tool and that the developer using the tool is responsible for the code generated, then I think that mindset can help us navigate this. So if you used an AI tool to create a patch for your favourite project, it is your responsibility to verify that patch before sending it in, and with that I don’t mean just verifying the functionality it provides, but that the code is clean and readable and following the coding standards of said upstream project. Maintainers on the other hand can use AI to help them review and evaluate patches quicker and thus this can be helpful on both sides of the equation. I also found Claude and other AI tools like Gemini pretty good at generating test cases for the code they make, so this is another area where open source patch contributions can improve, by improving test coverage for the code.

I do also believe there are many areas where projects can greatly benefit from AI, for instance in the GNOME project a constant challenge for extension developers have been keeping their extensions up-to-date, well I do believe a tool like Claude or Gemini should be able to update GNOME Shell extensions quite easily. So maybe having a service which tries to provide a patch each time there is a GNOME Shell update might be a great help there. At the same time having a AI take a look at updated extensions and giving an first review of the update might help reduce the load on people doing code reviews on extensions and help flag problematic extensions.

I know for a lot of cases and situations uploading your code to a webservice like Claude, Gemini or Copilot is not something you want or can do. I know privacy is a big concern for many people in the community. My team at Red Hat has been working on a code assistant tool using the IBM Granite model, called Granite.code. What makes Granite different is that it relies on having the model run locally on your own system, so you don’t send your code or data of somewhere else. This of course have great advantages in terms of improving privacy and security, but it has challenges too. The top end AI models out there at the moment, of which Claude is probably the best at the time of writing this blog post, are running on hardware with vast resources in terms of computing power and memory available. Most of us do not have those kind of capabilities available at home, so the model size and performance will be significantly lower. So at the moment if you are looking for a great open source tool to use with VS Code to do things like code completion I recommend giving Granite.code a look. If you on the other hand want to do something like I have described here you need to use something like Claude, Gemini or ChatGPT. I do recommend Claude, not just because I believe them to be the best at it at the moment, but they also are a company trying to hold themselves to high ethical standards. Over time we hope to work with IBM and others in the community to improve local models, and I am also sure local hardware will keep improving, so over time the experience you can get with a local model on your laptop at least has less of a gap than what it does today compared to the big cloud hosted models. There is also the middle of the road option that will become increasingly viable, where you have a powerful server in your home or at your workplace that can at least host a midsize model, and then you connect to that on your LAN. I know IBM is looking at that model for the next iteration of Granite models where you can choose from a wide variety of sizes, some small enough to be run on a laptop, others of a size where a strong workstation or small server can run them or of course the biggest models for people able to invest in top of the line hardware to run their AI.

Also the AI space is moving blazingly fast, if you are reading this 6 Months from now I am sure the capabilities of online and local models will have changed drastically already.

So to all my friends in the Linux community I ask you to take a look at AI and what it can do and then lets work together on improving it, not just in terms of capabilities, but trying to figure out things like societal challenges around it and sustainability concerns I also know a lot of us got.

Whats next for this code

As I mentioned I while I felt I got it to a point where I proved to myself it worked, I am not planning on working anymore on it. But I did make a cute little application for internal use that shows a spinning globe with all global Red Hat offices showing up as little red lights and where it pulls Red Hat news at the bottom. Not super useful either, but I was able to use Claude to refactor the globe rendering code from xtraceroute into this in just a few hours.

Red Hat Offices Globe and news.

Digitising CDs (aka using your phone as an image scanner)

Published: 9 months ago
Last modified: 9 months ago

From /bɑs ˈtjɛ̃ no ˈse ʁɑ/ (hadess) | News by Bastien Nocera

I recently found, under the rain, next to a book swap box, a pile of 90's “software magazines” which I spent my evening cleaning, drying, and sorting in the days afterwards.

Magazine cover CDs with nary a magazine

Those magazines are a peculiar thing in France, using the mechanism of “Commission paritaire des publications et des agences de presse” or “Commission paritaire” for short. This structure exists to assess whether a magazine can benefit from state subsidies for the written press (whether on paper at the time, and also the internet nowadays), which include a reduced VAT charge (2.1% instead of 20%), reduced postal rates, and tax exemptions.

In the 90s, this was used by Diamond Editions[1] (a publisher related to tech shop Pearl, which French and German computer enthusiasts probably know) to publish magazines with just enough original text to qualify for those subsidies, bundled with the really interesting part, a piece of software on CD.

If you were to visit a French newsagent nowadays, you would be able to find other examples of this: magazines bundled with music CDs, DVDs or Blu-rays, or even toys or collectibles. Some publishers (including the infamous and now shuttered Éditions Atlas) will even get you a cheap kickstart to a new collection, with the first few issues (and collectibles) available at very interesting prices of a couple of euros, before making that “magazine” subscription-only, with each issue being increasingly more expensive (article from a consumer protection association).

Other publishers have followed suite.

I guess you can only imagine how much your scale model would end up costing with that business model (50 eurocent for the first part, 4.99€ for the second), although I would expect them to have given up the idea of being categorised as “written press”.

To go back to Diamond Editions, this meant the eventual birth of 3 magazines: Presqu'Offert, BestSellerGames and StratéJ. I remember me or my dad buying a few of those, an older but legit and complete version of ClarisWorks, CorelDraw or a talkie version of a LucasArt point'n'click was certainly a more interesting proposition than a cut-down warez version full of viruses when budget was tight.

3 of the magazines I managed to rescue from the rain

You might also be interested in the UK “covertape wars”.

Don't stress the technique

This brings us back to today and while the magazines are still waiting for scanning, I tried to get a wee bit organised and digitising the CDs.

Some of them will have printing that covers the whole of the CD, a fair few use the foil/aluminium backing of the CD as a blank surface, which will give you pretty bad results when scanning them with a flatbed scanner: the light source keeps moving with the sensor, and what you'll be scanning is the sensor's reflection on the CD.

My workaround for this is to use a digital camera (my phone's 24MP camera), with a white foam board behind it, so the blank parts appear more light grey. Of course, this means that you need to take the picture from an angle, and that the CD will appear as an oval instead of perfectly circular.

I tried for a while to use GIMP perspective tools, and “Multimedia” Mike Melanson's MobyCAIRO rotation and cropping tool. In the end, I settled on Darktable, which allowed me to do 4-point perspective deskewing, I just had to have those reference points.

So I came up with a simple "deskew" template, which you can print yourself, although you could probably achieve similar results with grid paper.

My janky setup

The resulting picture

After opening your photo with Darktable, and selecting the “darkroom” tab, go to the “rotate and perspective tool”, select the “manually defined rectangle” structure, and adjust the rectangle to match the centers of the 4 deskewing targets. Then click on “horizontal/vertical fit”. This will give you a squished CD, don't worry, and select the “specific” lens model and voilà.

Tools at the ready

Targets acquired

Straightened but squished

You can now export the processed image (I usually use PNG to avoid data loss at each step), open things up in GIMP and use the ellipse selection tool to remove the background (don't forget the center hole), the rotate tool to make the writing straight, and the crop tool to crop it to size.

And we're done!

The result of this example is available on Archive.org, with the rest of my uploads being made available on Archive.org and Abandonware-Magazines for those 90s magazines and their accompanying CDs.

[1]: Full disclosure, I wrote a couple of articles for Linux Pratique and Linux Magazine France in the early 2000s, that were edited by that same company.

Summer updates

Last modified: 9 months ago

From Herostratus’ legacy by Víctor Jáquez

Somehow I internalized that my duty as software programmer was to silently work in a piece of code as if it were a magnum opus, until it’s finished, and then release it to the world with no need of explanations, because it should speak for itself. In other words, I tend to consider my work as a form of art, and myself as an artist. But I’m not. There’s no magnus opus and there will never be one. I’m rather a craftsman, in the sense of Richard Sennett: somebody who cares about their craft, making small, quick but thoughtful and clean changes, here and there, hoping that they will be useful to someone, now and in the future. And those little efforts need to be exposed openly, in spaces as this one and social media, as if I were a bazaar merchant.

This reflection invites me to add another task to my duties as software programmer: a periodical exposition of the work done. And this is the first intent to forge a (monthly) discipline in that direction, not in the sense of bragging, or looking to overprice a product (in the sense of commodity fetishism), but to build bridges with those that might find useful those pieces of software.

Let’s start.

GStreamer YUV4MPEG2 encoder and decoder #

We have been working lately on video encoding, and we wanted an easy way to test our work, using common samples such as those shared by the Derf’s collection. They are in a file format known as YUV4MPEG2, or more commonly known as y4m, because of their file name extension.

YUV4MPEG2 is a simple file format designed to hold uncompressed frames of YUV video, formatted as YCbCr 4:2:0, YCbCr 4:2:2 or YCbCr 4:4:4 data for the purpose of encoding. Instead of using raw YUV streams, where the frame size and color format have to be provided out-of-band, these metadata are embedded in the file.

There were already GStreamer elements for encoding and decoding y4m streams, but y4mdec was in gst-plugins-bad while y4menc in gst-plugins-good.

Our first task was to fix and improve y4menc [!8654], then move y4mdec to gst-plugins-good [!8719], but that implied to rewrite the element and add unit tests, while add more features such as handling more color formats.

Soothe — video encoders testing framework #

Heavily inspired by Fluster, a testing framework written in Python for decoder conformance, we are sketching Soothe, a script that aims to be a testing framework for video encoders, using VMAF, a perceptual video quality assessment algorithm.

GStreamer Vulkan H.264 encoder #

This is the reason of the efforts expressed above: vulkanh264enc, a H.264 encoder using the Vulkan Video extension [!7197].

One interesting side of this task was to propose a base class for hardware accelerated H.264 encoders, based on the vah264enc, the GStreamer VA-API H.264 encoder. We talked about this base class in the GStreamer Conference 2024.

Now the H.264 encoder merged and it will be part of the future release of GStreamer 1.28.

Removal of GStreamer-VAAPI subproject #

We’re very grateful with GStreamer-VAAPI. When its maintenance were handed over to us, after a few months we got the privilege to merge it as an official GStreamer subproject.

Now GStreamer-VAAPI functionality has been replaced with the VA plugin in gst-plugins-bad. Still, it isn’t a full featured replacement [#3947], but it’s complete and stable enough to be widely deployed. As Tim said in the GStreamer Conference 2024: it just works.

So, GStreamer-VAAPI subproject has been removed from main branch in git repository [!9200], and its Gitlab project, archived.

Vulkan Video Status page #

We believe that Vulkan Video extension will be one of the main APIs for video encoding, decoding and processing. Igalia participate in the Vulkan Video Technical Sub Group (TSG) and helps with the Conformance Test Suite (CTS).

Vulkan Video extension is big and constantly updated. In order to keep track of it we maintain a web page with the latest news about the specification, proprietary drivers, open source drivers and open source applications, along with articles and talks about it.

https://vulkan-video-status.igalia.com

GStreamer Planet #

Last but not least, GStreamer Planet has been updated and overhauled.

Given that the old Planet script, written in Python 2, is unmaintained, we worked on a new one in Rust: planet-rs. It internally uses tera for templates, feed-rs for feed parsing, and reqwest for HTTP handling. The planet is generated using Gitlab scheduled CI pipelines.

https://gstreamer.freedesktop.org/planet

GStreamer 1.26.4 stable bug fix release

Last modified: 10 months ago

From GStreamer News by GStreamer

The GStreamer team is pleased to announce another bug fix release in the new stable 1.26 release series of your favourite cross-platform multimedia framework!

This release only contains bugfixes as well, and it should be safe to update from 1.26.x.

Highlighted bugfixes:

adaptivedemux2: Fixed reverse playback
d3d12screencapture: Add support for monitor add/remove in device provider
rtmp2src: various fixes to make it play back AWS medialive streams
rtph265pay: add profile-id, tier-flag, and level-id to output rtp caps
vp9parse: Fix handling of spatial SVC decoding
vtenc: Fix negotiation failure with `profile=main-422-10`
gtk4paintablesink: Add YCbCr memory texture formats and other improvements
livekit: add room-timeout
mp4mux: add TAI timestamp muxing support
rtpbin2: fix various race conditions, plus other bug fixes and performance improvements
threadshare: add a `ts-rtpdtmfsrc` element, implement run-time input switching in `ts-intersrc`
webrtcsink: fix deadlock on error setting remote description and other fixes
cerbero: WiX installer: fix missing props files in the MSI packages
smaller macOS/iOS package sizes
Various bug fixes, build fixes, memory leak fixes, and other stability and reliability improvements

See the GStreamer 1.26.4 release notes for more details.

Binaries for Android, iOS, Mac OS X and Windows will be available shortly.

GstHip: A Cross-Vendor HIP Backend for GStreamer with Runtime Flexibility

Last modified: 10 months ago

From Centricular Devlog by Seungha Yang (Centricular)

HIP (formerly known as Heterogeneous-computing Interface for Portability) is AMD’s GPU programming API that enables portable, CUDA-like development across both AMD and NVIDIA platforms.

On AMD GPUs, HIP runs natively via the ROCm stack.
On NVIDIA GPUs, HIP operates as a thin translation layer over the CUDA runtime and driver APIs.

This allows developers to maintain a single codebase that can target multiple GPU vendors with minimal effort.

Where HIP Is Used

HIP has seen adoption in AMD-focused GPU computing workflows, particularly in environments that require CUDA-like programmability. Examples include:

PyTorch ROCm backend for deep learning workloads
Select scientific applications like LAMMPS and GROMACS have experimented with HIP backends for AMD GPU support
GPU-accelerated media processing on systems that leverage AMD hardware

While HIP adoption has been more limited compared to CUDA, its reach continues to expand as support for AMD GPUs grows across a broader range of use cases.

The Challenge: Compile-Time Platform Lock-in

Despite its cross-vendor goal, HIP still has a fundamental constraint at the build level. As of HIP 6.3, HIP requires developers to statically define their target platform at compile time via macros like:

#define __HIP_PLATFORM_AMD__    // for AMD ROCm
#define __HIP_PLATFORM_NVIDIA__ // for CUDA backend

This leads to two key limitations:

You must compile separate binaries for AMD and NVIDIA
A single binary cannot support both platforms simultaneously
HIP does not support runtime backend switching natively

GstHip’s Solution

To overcome this limitation, GstHip uses runtime backend dispatch through:

dlopen() on Linux
LoadLibrary() on Windows

Instead of statically linking against a single HIP backend, GstHip loads both the ROCm HIP runtime and the CUDA driver/runtime API at runtime. This makes it possible to:

Detect available GPUs dynamically
Choose the appropriate backend per device
Even support simultaneous use of AMD and NVIDIA GPUs in the same process

Unified Wrapper API

GstHip provides a clean wrapper layer that abstracts backend-specific APIs via a consistent naming scheme:

hipError_t HipFooBar(GstHipVendor vendor, ...);

The Hip prefix (capital H) clearly distinguishes the wrapper from native hipFooBar(...) functions. The GstHipVendor enum indicates which backend to target:

GST_HIP_VENDOR_AMD
GST_HIP_VENDOR_NVIDIA

Internally, each HipFooBar(...) function dispatches to the correct backend by calling either:

hipFooBar(...) for AMD ROCm
cudaFooBar(...) for NVIDIA CUDA

These symbols are dynamically resolved via dlopen() / LoadLibrary(), enabling runtime backend selection without GPU vendor-specific builds.

Memory Interop

All memory interop in GstHip is handled through the hipupload and hipdownload elements. While zero-copy is not supported due to backend-specific resource management and ownership ambiguity, GstHip provides optimized memory transfers between systems:

System Memory ↔ HIP Memory: Utilizes HIP pinned memory to achieve fast upload/download operations between host and device memory
GstGL ↔ GstHip: Uses HIP resource interop APIs to perform GPU-to-GPU memory copies between OpenGL and HIP memory
GstCUDA ↔ GstHip (on NVIDIA platforms): Since both sides use CUDA memory, direct GPU-to-GPU memory copies are performed using CUDA APIs.

GPU-Accelerated Filter Elements

GstHip includes GPU-accelerated filters optimized for real-time media processing:

hipconvertscale/hipconvert/hipscale: Image format conversion and image scaling
hipcompositor: composing multiple video streams into single video stream

These filters use the same unified dispatch system and are compatible with both AMD and NVIDIA platforms.

Application Integration Support

As of Merge Request!9340, GstHip exposes public APIs that allow applications to access HIP resources managed by GStreamer. This also enables applications to implement custom GstHIP-based plugins using the same underlying infrastructure without duplicating resource management.

Summary of GstHip Advantages

Single plugin/library binary supports both AMD and NVIDIA GPUs
Compatible with Linux and Windows
Supports multi-GPU systems, including hybrid AMD + NVIDIA configurations
Seamless memory interop with System Memory, GstGL, and GstCUDA
Provides high-performance GPU filters for video processing
Maintains a clean API layer via HipFooBar(...) wrappers, enabling backend-neutral development

GStreamer 1.27.1 unstable development release

Last modified: 10 months ago

From GStreamer News by GStreamer

The GStreamer team is pleased to announce the first development release in the API/ABI-unstable 1.27 release series.

This development release is primarily for developers and early adopters, and distros should probably not package it.

Highlighted changes:

Add AMD HIP plugin
Add Vulkan H.264 encoder and add 10-bit support to Vulkan H.265 decoder
Add LiteRT inference element
Aggregator: expose current-level-* properties on sink pads
Analytics: add general classifier tensor-decoder, facedetector, and more convenience API
alsa: Support enumerating virtual PCM sinks
d3d12: Add d3d12remap element
Wayland: Add basic colorimetrie support
Webkit: New wpe2 plugin making use of the "WPE Platform API"
MPEG-TS demuxer: Add property to disable skew corrections
qml6gloverlay: support directly passing a QQuickItem for QML the render tree
unifxfdsink: Add a property to allow copying to make sink usable with more upstream elements
videorate: Revive "new-pref" property for better control in case of caps changes
wasapi2: Port to IMMDevice based device selection
GstReferenceTimestampMeta can carry additional per-timestamp information now
Added GstLogContext API that allows to fix log spam in several components
New tracer hook to track when buffers are queued/dequeued in buffer pools
gst-inspect-1.0: Prints type info for caps fields now
Pipeline graph dot files now contain information about active tracers
Python bindings: add Gst.Float wrapper, Gst.ValueArray.append_value(), analytics API improvements
cerbero packages: ship vvdec and curl plugins; ship wasapi2 on MingW builds
Countless bug fixes, build fixes, memory leak fixes, and other stability and reliability improvements

Binaries for Android, iOS, Mac OS X and Windows will be made available shortly at the usual location.

Release tarballs can be downloaded directly here:

As always, please give it a spin and let us know of any issues you run into by filing an issue in GitLab.

New generic multi-stream analytics batcher element

Last modified: 10 months ago

From Centricular Devlog by Sebastian Dröge (Centricular)

For a project recently it was necessary to collect video frames of multiple streams during a specific interval, and in the future also audio, to pass it through an inference framework for extracting additional metadata from the media and attaching it to the frames.

While GStreamer has gained quite a bit of infrastructure in the past years for machine learning use-cases in the analytics library, there was nothing for this specific use-case yet.

As part of solving this, I proposed as design for a generic interface that allows combining and batching multiple streams into a single one by using empty buffers with a GstMeta that contains the buffers of the original streams, and caps that include the caps of the original streams and allow format negotiation in the pipeline to work as usual.

While this covers my specific use case of combining multiple streams, it should be generic enough to also handle other cases that came up during the discussions.

In addition I wrote two new elements, analyticscombiner and analyticssplitter, that make use of this new API for combining and batching multiple streams in a generic, media-agnostic way over specific time intervals, and later splitting it out again into the original streams. The combiner can be configured to collect all media in the time interval, or only the first or last.

Conceptually the combiner element is similar to NVIDIA's DeepStream nvstreammux element, and in the future it should be possible to write a translation layer between the GStreamer analytics library and DeepStream.

The basic idea for the usage of these elements is to have a pipeline like

-- stream 1 --\                                                                  / -- stream 1 with metadata --
               -- analyticscombiner -- inference elements -- analyticssplitter --
-- stream 2 --/                                                                  \ -- stream 2 with metadata --
   ........                                                                           ......................
-- stream N -/                                                                     \- stream N with metadata --

The inference elements would only add additional metadata to each of the buffers, which can then be made use of further downstream in the pipeline for operations like overlays or blurring specific areas of the frames.

In the future there are likely going to be more batching elements for specific stream types, operating on multiple or a single stream, or making use of completely different batching strategies.

Special thanks also to Olivier and Daniel who provided very useful feedback during the review of the two merge requests.

GStreamer 1.26.3 stable bug fix release

Last modified: 10 months ago

From GStreamer News by GStreamer

The GStreamer team is pleased to announce another bug fix release in the new stable 1.26 release series of your favourite cross-platform multimedia framework!

This release only contains bugfixes as well as a number of security fixes and important playback fixes, and it should be safe to update from 1.26.x.

Highlighted bugfixes:

Various security fixes and playback fixes
Security fix for the H.266 video parser
Fix regression for WAV files with acid chunks
Fix high memory consumption caused by a text handling regression in uridecodebin3 and playbin3
Fix panic on late GOP in fragmented MP4 muxer
Closed caption conversion, rendering and muxing improvements
Decklink video sink preroll frame rendering and clock drift handling fixes
MPEG-TS demuxing and muxing fixes
MP4 muxer fixes for creating very large files with faststart support
New thread-sharing 1:N inter source and sink elements, and a ts-rtpdtmfsrc
New speech synthesis element around ElevenLabs API
RTP H.265 depayloader fixes and improvements, as well as TWCC and GCC congestion control fixes
Seeking improvements in DASH client for streams with gaps
WebRTC sink and source fixes and enhancements, including to LiveKit and WHIP signallers
The macOS osxvideosink now posts navigation messages
QtQML6GL video sink input event handling improvements
Overhaul detection of hardware-accelerated video codecs on Android
Video4Linux capture source fixes and support for BT.2100 PQ and 1:4:5:3 colorimetry
Vulkan buffer upload and memory handling regression fixes
Python bindings: fix various regressions introduced in 1.26.2
cerbero: fix text relocation issues on 32-bit Android and fix broken VisualStudio VC templates
packages: ship pbtypes plugin and update openssl to 3.5.0 LTS
Various bug fixes, build fixes, memory leak fixes, and other stability and reliability improvements

See the GStreamer 1.26.3 release notes for more details.

Binaries for Android, iOS, Mac OS X and Windows will be available shortly.

The Unbearable Anger of Broken Audio

Last modified: 10 months ago

From Arun Raghavan by Arun (Arun Raghavan)

It should be surprising to absolutely nobody that the Linux audio stack is often the subject of varying levels of negative feedback, ranging from drive-by meme snark to apoplectic rage^[1].

A lot of what computers are used for today involves audiovisual media in some form or the other, and having that not work can throw a wrench in just going about our day. So it is completely understandable for a person to get frustrated when audio on their device doesn’t work (or maybe worse, stops working for no perceivable reason).

It is also then completely understandable for this person to turn up on Matrix/IRC/Gitlab and make their displeasure known to us in the PipeWire (and previously PulseAudio) community. After all, we’re the maintainers of the part of the audio stack most visible to you.

To add to this, we have two and a half decades’ worth of history in building the modern Linux desktop audio stack, which means there are historical artifacts in the stack (OSS -> ALSA -> ESD/aRTs -> PulseAudio/JACK -> PipeWire). And a lot of historical animus that apparently still needs venting.

In large centralised organisations, there is a support function whose (thankless) job it is to absorb some of that impact before passing it on to the people who are responsible for fixing the problem. In the F/OSS community, sometimes we’re lucky to have folks who step up to help users and triage issues. Usually though, it’s just maintainers managing this.

This has a number of … interesting … impacts for those of us who work in the space. For me this includes:

Developing thick skin
Trying to maintain equanimity while being screamed at
Knowing to step away from the keyboard when that doesn’t work
Repeated reminders that things do work for millions of users every day

So while the causes for the animosity are often sympathetic, this is not a recipe for a healthy community. I try to be judicious while invoking the fd.o Code of Conduct, but thick skin or not, abusive behaviour only results in a toxic community, so there are limits to that.

While I paint a picture of doom and gloom, most recent user feedback and issue reporting in the PipeWire community has been refreshingly positive. Even the trigger for this post is an issue from an extremely belligerent user (who I do sympathise with), who was quickly supplanted by someone else who has been extremely courteous in the face of what is definitely a frustrating experience.

So if I had to ask something of you, dear reader – the next time you’re angry with the maintainers of some free software you depend on, please get some of the venting out of your system in private (tell your friends how terrible we are, or go for a walk maybe), so we can have a reasonable conversation and make things better.

Thank you for reading!

I’m not linking to examples, because that’s not the point of this post. ︎

GStreamer Direct3D12 Rust bindings

Last modified: 11 months ago

From Centricular Devlog by Seungha Yang (Centricular)

With GStreamer 1.26, a new D3D12 backend GstD3D12 public library was introduced in gst-plugins-bad.

Now, with the new gstreamer-d3d12 rust crate, Rust can finally access the Windows-native GPU feature written in GStreamer in a safe and idiomatic way.

What You Get with GStreamer D3D12 Support in Rust

Pass D3D12 textures created by your Rust application directly into GStreamer pipelines without data copying
Likewise, GStreamer-generated GPU resources (such as frames decoded by D3D12 decoders) can be accessed directly in your Rust app
GstD3D12 base GStreamer element can be written in Rust

Beyond Pipelines: General D3D12 Utility Layer

GstD3D12 is not limited to multimedia pipelines. It also acts as a convenient D3D12 runtime utility, providing:

GPU resource pooling such as command allocator and descriptor heap, to reduce overhead and improve reuse
Abstractions for creating and recycling GPU textures with consistent lifetime tracking
Command queue and fence management helpers, greatly simplifying GPU/CPU sync
A foundation for building custom GPU workflows in Rust, with or without the full GStreamer pipeline

GStreamer 1.24.13 old-stable bug fix release

Last modified: 11 months ago

From GStreamer News by GStreamer

The GStreamer team is pleased to announce another bug fix release in the new old stable 1.24 release series of your favourite cross-platform multimedia framework!

This release only contains bugfixes as well as a number of security fixes and important playback fixes, and it should be safe to update from 1.24.x.

Please note that the 1.24 old-stable series is no longer actively maintained and has been superseded by the GStreamer 1.26 stable series now.

Highlighted bugfixes:

Various security fixes and playback fixes
MP4 demuxer atom parsing improvements and security fixes
H.265 decoder base class and caption inserter SPS/PPS handling fixes
Subtitle parser security fixes
Subtitle rendering and seeking fixes
Closed caption fixes
Matroska rotation tag support and v4 muxing support
Ogg seeking improvements in streaming mode
Windows plugin loading fixes
MIDI parser improvements for tempo changes
Video time code support for 119.88 fps and drop-frames-related conversion fixes
GStreamer editing services fixes for sources with non-1:1 aspect ratios
RTP session handling and RTSP server fixes
Thread-safety improvements for the Media Source Extension (MSE) library
macOS video capture improvements for external devices
Python bindings: Fix compatibility with PyGObject >= 3.52.0
cerbero: bootstrapping fixes on Windows, improved support for RHEL, and openh264 recipe update
Various bug fixes, build fixes, memory leak fixes, and other stability and reliability improvements

See the GStreamer 1.24.13 release notes for more details.

Binaries for Android, iOS, Mac OS X and Windows will be available shortly.

GStreamer 1.26.2 stable bug fix release

Last modified: a year ago

From GStreamer News by GStreamer

The GStreamer team is pleased to announce another bug fix release in the new stable 1.26 release series of your favourite cross-platform multimedia framework!

This release only contains bugfixes as well as a number of security fixes and important playback fixes, and it should be safe to update from 1.26.x.

Highlighted bugfixes:

Various security fixes and playback fixes
aggregator base class fixes to not produce buffers too early in live mode
AWS translate element improvements
D3D12 video decoder workarounds for crashes on NVIDIA cards on resolution changes
dav1d AV1-decoder performance improvements
fmp4mux: tfdt and composition time offset fixes, plus AC-3 / EAC-3 audio support
GStreamer editing services fixes for sources with non-1:1 aspect ratios
MIDI parser improvements for tempo changes
MP4 demuxer atom parsing improvements and security fixes
New skia-based video compositor element
Subtitle parser security fixes
Subtitle rendering and seeking fixes
Playbin3 and uridecodebin3 stability fixes
GstPlay stream selection improvements
WAV playback regression fix
GTK4 paintable sink colorimetry support and other improvements
WebRTC: allow webrtcsrc to wait for a webrtcsink producer to initiate the connection
WebRTC: new Janus Video Room WebRTC source element
vah264enc profile decision making logic fixes
Python bindings gained support for handling mini object writability (buffers, caps, etc.)
Various bug fixes, build fixes, memory leak fixes, and other stability and reliability improvements

See the GStreamer 1.26.2 release notes for more details.

Binaries for Android, iOS, Mac OS X and Windows will be available shortly.

New udpsrc2 element

SMPTE ST2110 capture

Ancillary data

iOS ARM64 Simulator Support via an XCFramework

tvOS Support

Improved support for using Rust plugins on Apple platforms

Many more macOS, iOS, tvOS improvements

GstPrinceOfParser: An All-in-One Tool to Play With GStreamer on Any Platform #

What is GStreamer? #

A Unified Interface for GStreamer #

Remote Element Inspection #

Media Discovery #

Playback #

Cross-Platform and Language Support #

Examples #

GstPipelineStudio 0.5.1 #

Your GStreamer Pipelines, at a Glance #

What is GStreamer? #

The Story Behind GstPipelineStudio #

What’s New in 0.5.1 #

Remote Pipeline Introspection #

Coming in 0.6.0 #

Step 1: Ship Python bindings via introspection on macOS and Windows

Step 2: Build wheels for all supported Python versions

Step 3: Linux support

Design / Implementation

Future prospects

Background

Implementation

Looking ahead

POST vs. WSS

Audio resampling

Voice cloning

Putting it all together

Animate Your Subtimelines in GES #

GstVA and GStreamer-VAAPI updates #

Time Remapping and GES: Implementation Details and Latest Updates #

soothe: a proposal for encoder testing #

GstWebRTC in WebKit, current status & plans #

VVC/H.266 in GStreamer #

Video Reshaping with Skia #

Vulkan Video: pipeline update #

Enhanced RTMP

FLV and RTMP in GStreamer

Multitrack Audio

Problems to Solve

A two-step solution

Minor Caveat

Interoperability issues

Sample Pipelines to test

Scope for other features

What are Rust staticlibs made of?

First approach: Single-Object Prelinking

Why is it not enough?

Melt all the object files with the power of dragons' fire!

Drawbacks of object file deduplication

Results

Conclusion

Conclusions

Whats next for this code

GStreamer YUV4MPEG2 encoder and decoder #

Soothe — video encoders testing framework #

GStreamer Vulkan H.264 encoder #

Removal of GStreamer-VAAPI subproject #

Vulkan Video Status page #

GStreamer Planet #

Where HIP Is Used

The Challenge: Compile-Time Platform Lock-in

GstHip’s Solution

Unified Wrapper API

Memory Interop

GPU-Accelerated Filter Elements

Application Integration Support

Summary of GstHip Advantages

What You Get with GStreamer D3D12 Support in Rust

Beyond Pipelines: General D3D12 Utility Layer

New `udpsrc2` element