Recording system audio in Electron on macOS
Introduction
This article explores two approaches to recording system audio in Electron apps on macOS. One leverages Chromium's built-in capabilities, while the other uses Apple's native Core Audio Taps API via AudioTee.js. Both work, and both have trade-offs. If you've already chosen AudioTee.js and need help shipping it, look out for a forthcoming packaging and distribution guide. If not, read on to try and make your mind up.
The built-in Chromium approach
Alec Armbruster wrote an excellent article on bringing system audio loopback to Electron using Chromium's built-in capabilities. The approach leverages internal Chromium flags (MacLoopbackAudioForScreenShare and MacSckSystemAudioLoopbackOverride) to enable system audio capture without any external dependencies.
It's clever, it's functional, and it requires zero additional binaries to ship. On Windows, it's an excellent choice. On macOS though, there are some user experience drawbacks, which we'll touch upon shortly.
The Core Audio Taps approach
The second approach uses macOS's native Core Audio Taps API, introduced in macOS 14.2, which provides low-level direct access to system audio output. You can use this API directly by writing Swift (or Objective-C) code, or you can use AudioTee.js, which wraps a pre-packaged Swift binary (unimaginatively called AudioTee) and provides a familiar Node.js EventEmitter interface.
I've written about AudioTee.js before, but in brief: it bundles a tiny pre-built universal macOS binary, spawns it as a child process, and streams PCM audio data through to Node.js (in Electron parlance, your 'main' process).
Comparing the approaches
Permissions
The Chromium approach currently requires the macOS "Screen & System Audio Recording" permission, which often requires a full app restart once granted—not a great user experience. AudioTee.js only needs "System Audio Recording Only", which is both more accurate and doesn't require an app restart. You can also customise the permission prompt message by setting NSAudioCaptureUsageDescription in your app's Info.plist.
The Chromium approach asks for screen and audio recording permissions.
After granting permission, macOS often requires a full app restart.
An example of the audio recording only dialog, with a customisable message.
Screen recording warning
Due to requiring both screen and system audio recording permissions, the Chromium approach triggers a purple "screen recording in progress" indicator in Control Centre on macOS, even if you're only capturing audio. This is made worse by the fact that when hovering over the indicator, the user sees a mini picture-in-picture of what the OS thinks you're recording, which is their entire desktop. Confusing, misleading, slightly alarming.
Platform support
The Chromium approach works on both macOS and Windows. AudioTee.js is currently macOS-only (14.2+), though a Windows port using WASAPI is in the works thanks to the community.
Volume independence
Core Audio Taps captures audio pre-mixer, which means you get clean audio regardless of system volume. Turn your speakers down to zero and you'll still record sound. The Chromium approach captures post-mixer audio—at least on Windows—meaning the recording level follows your speaker volume. I can't find anywhere this behaviour is documented, but it makes sense for the original screen sharing use case, where you want others to hear what you're hearing. Whether that's a drawback depends on your needs, but for me it's not ideal.
macOS version compatibility
The Chromium approach has a complex version support matrix, reverse engineered through community effort, that varies wildly depending on your macOS version. Loopback capture only works reliably from macOS 13.2 onwards, and different flags are required for different macOS versions.
AudioTee's support matrix is simple: it requires macOS 14.2—released in December 2023—or later. Note that Apple's Core Audio Taps documentation would have you believe that the API only exists from 26.0 onwards, but this is untrue: that documentation previously said 14.2+, and I've been using the Core Audio Taps API successfully on 14.2+ for months.
Packaging complexity
The Chromium approach requires no additional packaging work. You're using built-in Electron and Chromium capabilities, so there's nothing extra to bundle or configure when you ship your app.
AudioTee.js requires bundling the Swift binary with your app, configuring your packager to extract it to the right location, and setting additional entitlements for audio capture and library validation. It's not difficult, but it's extra work. I'll be writing a guide on this shortly.
General purpose
The Chromium approach is purpose-built for Chromium and Electron. It's a solution to a specific problem in a specific context—you need Chromium to record audio, and Electron to set the necessary feature flags.
AudioTee (the Swift binary) is a general-purpose CLI tool. You can spawn it from any host language—wrap it for Tauri using Rust, call it from Python, or just run it from the command line. AudioTee.js is framework-agnostic and runs anywhere Node.js runs, as long as the host environment is macOS.
Architectural differences
Main vs renderer process
The Chromium approach handles all recording in the renderer process. The main process sets up setDisplayMediaRequestHandler, but the actual MediaStream capture, audio processing, and any PCM conversion happens in the renderer. If you need the audio in the main process, you'll need to send it over IPC.
AudioTee.js runs entirely in the main process. The Swift binary is spawned as a child process and streams PCM data directly to your main process code. No renderer involvement, no IPC overhead for the audio data itself.
Whether this matters depends on your architecture. If you're building a desktop app that sends audio to a remote ASR service, having everything in the main process makes more sense to me. If you're doing client-side audio processing or visualisation in the UI, the renderer-based approach might be more natural.
PCM vs MediaStream
The Chromium approach gives you a MediaStream, which is a renderer-side Web API construct. MediaStream and the Web Audio APIs needed to extract raw audio from it don't exist in the main process, so if you need raw audio, you're forced to extract it in the renderer:
- Create an AudioContext
- Load an AudioWorklet module (in my experience, always a bit of a faff in Electron—you'll either need to bundle the worklet code as an inline string and create a Blob URL, or fight with CSP and file protocol constraints)
- Set up the AudioWorklet to process audio chunks
- Send the audio data to the main process via IPC
The architectural constraint is that you must use Web Audio APIs in the renderer to extract anything useful out of the MediaStream. For more details, see Alec's electron-audio-loopback repo and his linked examples.
If you need sample rate or bit depth conversion—and most ASR services prefer 16kHz/16-bit as the sweet spot for voice quality vs bandwidth—you'll need to implement that yourself somewhere. AudioTee.js handles both for you via configuration, with no additional code required:
import { AudioTee } from 'audiotee'
const audiotee = new AudioTee({
sampleRate: 16000,
chunkDurationMs: 20
})
audiotee.on('data', (chunk) => {
// chunk.data is a Buffer containing 16-bit, 16khz PCM audio
// Ready to send to your ASR service
})
await audiotee.start()
Making the choice
For my use case—realtime ASR applications running primarily on macOS, requiring raw PCM audio in the main process—AudioTee.js wins. I would say that because I wrote it, and try as I might, I'm never going to be completely objective on the subject. But the cleaner permissions model, no misleading screen recording warnings, and PCM out of the box make it the more suitable approach—for me.
If you're targeting Windows, the Chromium approach is compelling, especially if you're comfortable with renderer-side audio processing and don't mind (or don't need) the MediaStream-to-PCM transformation work. If support evolves to the point where audio can be requested without video, which I'm hopeful it will, the Chromium approach will look very appealing on macOS, too.
To be clear: I would love for the Chromium approach to work without the screen recording permission caveats. They don't exist on Windows, which is why I recommend the approach on that platform and why this article is specifically scoped to macOS. The trade-offs I've outlined are based on my own testing and the collective experimentation happening in Alec's Electron PR thread, where I've added my voice to the ongoing effort to explore what's possible. If the permission model improves on macOS, the Chromium approach becomes significantly more attractive—especially given it requires zero packaging overhead.
Next steps
If you've chosen AudioTee.js, the next challenge is packaging it for distribution. That's covered in detail in a forthcoming article on packaging and shipping Electron apps with AudioTee.js, which will cover bundling the binary, setting entitlements, and testing the packaged app.
If you're going with the Chromium approach, Alec's article is the best resource I've found. The Electron PR discussion he opened is also worth reading for the full context on platform support and caveats.
