Realtime audio processing with Linux: part 2
Introduction
In the previous article we have introduced basic concepts about audio processing, and begun to map them to ALSA (Advanced Linux Audio Architecture).
If you didn't have the chance to read the first article, you can find it here:
Now it's time to dig a bit deeper. In this article we will get familiar with ALSA using simple command line tools which come packaged with any Linux distribution, giving us sufficient understanding of ALSA concepts to write a real application.
Command line tools
There's a nifty set of tools called “alsa-utils”. The package name is for Debian-based systems; other distributions might have different names.
Inside alsa-utils we find two utilities we are going to use:
Know thy devices
Before we can do anything useful, we need to know what devices we have on our system.
A PC, for example, could have several sound cards; each card can could manage different type of devices (for instance, a device for analog connections to the PC's headphones and another device for digital HDMI audio).
Each device might have subdevices. For example, an output device might have two subdevices to which software can stream audio, and these two subdevices would then get mixed in hardware without the need of software intervention.
Each subdevice, in turn, consists of one or more channels: a stereo device will have two channels, whereas a device for surround audio will typically have 6 channels (5 full band channels + 1 channel for effects).
We can use aplay and arecord to query the audio configuration of our system:
$ aplay -l
**** List of PLAYBACK Hardware Devices ****
card 0: NVidia [HDA NVidia], device 3: HDMI 0 [HDMI 0]
Subdevices: 1/1
Subdevice #0: subdevice #0
card 0: NVidia [HDA NVidia], device 7: HDMI 1 [HDMI 1]
Subdevices: 1/1
Subdevice #0: subdevice #0
card 0: NVidia [HDA NVidia], device 8: HDMI 2 [HDMI 2]
Subdevices: 1/1
Subdevice #0: subdevice #0
card 0: NVidia [HDA NVidia], device 9: HDMI 3 [HDMI 3]
Subdevices: 1/1
Subdevice #0: subdevice #0
card 0: NVidia [HDA NVidia], device 10: HDMI 4 [HDMI 4]
Subdevices: 1/1
Subdevice #0: subdevice #0
card 0: NVidia [HDA NVidia], device 11: HDMI 5 [HDMI 5]
Subdevices: 1/1
Subdevice #0: subdevice #0
card 1: Generic [HD-Audio Generic], device 0: ALC293 Analog [ALC293 Analog]
Subdevices: 1/1
Subdevice #0: subdevice #0
This output, captured from my PC, shows the following configuration:
As can be seen, the PC includes two cards, an Nvidia digital audio card with 6 devices (each of which maps to an HDMI port), and a generic audio card containing an analog ALC293 chip. All devices include only one subdevice, which is the most common case.
Now that we know our device list we can try to stream some random audio to, for example, the analog output connected to the speaker.
aplay -c 2 -r 44100 -D hw:1,0,0 /dev/urandom
The parameters have the following meaning:
The “hw:” prefix means we are addressing directly a hardware device, more on this later.
When we run this command we hear white noise from the speaker.
Let's now try streaming that random sound at a different sampling frequency:
$ aplay -c 2 -r 8000 -D hw:1,0,0 /dev/urandom
Playing raw data '/dev/urandom' : Signed 16 bit Little Endian, Rate 8000 Hz, Stereo
Warning: rate is not accurate (requested = 8000Hz, got = 44100Hz)
please, try the plug plugin
This happens because we are trying to play sound at a sampling frequency that the device does not support.
The same happens if we try the data format to an unsupported one. Aplay has chosen “Signed 16 bit Little Endian”, which means each sample consists of two bytes, interpreted with sign (therefore with a range of -32768 to +32767) and the lower byte comes before the higher byte (Little Endian, A.K.A. Intel format).
If we try to choose another format we get an error:
$ aplay -c 2 -f S16_BE -D hw:1,0 /dev/urandom
Playing raw data '/dev/urandom' : Signed 16 bit Big Endian, Rate 8000 Hz, Stereo
aplay: set_params:1368: Sample format not available
Available formats:
- S16_LE
- S32_LE
But how do we know in advance which formats a hardware device supports ? Once again, aplay comes to the rescue:
aplay --dump-hw-params -D hw:1,0,0 /dev/urandom
Playing raw data '/dev/urandom' : Signed 16 bit Little Endian, Rate 8000 Hz, Mono
HW Params of device "hw:1,0,0":--------------------
ACCESS: MMAP_INTERLEAVED RW_INTERLEAVED
FORMAT: S16_LE S32_LE
SUBFORMAT: STD
SAMPLE_BITS: [16 32]
FRAME_BITS: [32 64]
CHANNELS: 2
RATE: [44100 192000]
PERIOD_TIME: (83 185760)
PERIOD_SIZE: [16 8192]
PERIOD_BYTES: [128 65536]
PERIODS: [2 32]
BUFFER_TIME: (166 371520)
BUFFER_SIZE: [32 16384]
BUFFER_BYTES: [128 65536]
TICK_TIME: ALL
From this output we can see that our device supports 16 and 32 bits signed little endian samples (S16_LE and S32_LE) at a sampling frequency of 44.1 and 192 kHz.
But what if we want to play a file which has been recorded at a different sampling frequency, or with a different format ? When we tried playing at 8 kHz aplay gave us this hint: “please, try the plug plugin”.
Plug is an ALSA plugin, i.e. an intermediate software layer which can be inserted into an audio processing chain. It sits on top of a real hardware device and it presents itself as a virtual ALSA device to which (or from which) we can connect as if it were a native ALSA device.
“plug” will perform all necessary format conversions between the input format you request and the formats natively supported by the hardware device itself.
领英推荐
Instead of specifying “hw:” we specify “plughw:” on the same card and device number, and plug will do all the magic for us. Let's try violating two formats: sampling rate and data format:
$ aplay -c 2 -r 8000 -f S16_BE -D plughw:1,0,0 /dev/urandom
Playing raw data '/dev/urandom' : Signed 16 bit Big Endian, Rate 8000 Hz, Stereo
Thus, “plug” has digested whatever we fed into it, and converted it to a format which the hardware device likes.
This diagram shows the difference between direct play to a hardware device and mediated play through a plug:
“plug” is one of many “virtual” ALSA devices which don't map directly to a hardware component, but perform some sort of processing instead.
We can get a list of such devices by calling “aplay -L” (notice the uppercase L).
This is a shortened output of that command on my PC:
$ aplay -L
default
Playback/recording through the PulseAudio sound server
null
Discard all samples (playback) or generate zero samples (capture)
pulse
PulseAudio Sound Server
hw:CARD=NVidia,DEV=3
HDA NVidia, HDMI 0
Direct hardware device without any conversions
plughw:CARD=NVidia,DEV=3
HDA NVidia, HDMI 0
Hardware device with all software conversions
hw:CARD=Generic,DEV=0
HD-Audio Generic, ALC293 Analog
Direct hardware device without any conversions
plughw:CARD=Generic,DEV=0
HD-Audio Generic, ALC293 Analog
Hardware device with all software conversions
We can notice our familiar “1,0” device, here referenced by name instead of number (Generic,0), and its plug counterpart. But we can also notice a very important device: pulse (also referred to as “default” in the current configuration).
Enter pulse
So far we have been able to stream audio (albeit random) to an ALSA sink, we should be moderately satisfied.
Our satisfaction however drops abruptly if we try to do the same thing while another application is streaming audio.
So let's try to fire up a youtube video and, while it's playing, type our usual command:
$ aplay -c 2 -r 44100 -D hw:1,0,0 /dev/urandom
aplay: main:852: audio open error: Device or resource busy
This happens because the audio output device is already occupied streaming audio on behalf of another application (our web browser in this specific instance). We have seen that our audio device only has one subdevice, so it can't mix more than one audio stream at a time. What we want is a software mixing solution, so instead of trying to access directly the hardware device we stream to a software device we have met when we ran “aplay -L”: pulse.
$ aplay -c 2 -r 44100 -D pulse /dev/urandom
Playing raw data '/dev/urandom' : Signed 16 bit Little Endian, Rate 44100 Hz, Stereo
In this way not only we don't get any errors, but we should hear both the youtube video and the white noise coming from aplay.
Here is what is going on under the hood:
The pulse server exposes a virtual port called “pulse” which can be shared among applications. So each application is unaware that it's actually talking to the pulse server and it “believes” it has its own instance of an ALSA port. Pulse takes care about mixing all sources together and it also exposes a mixer interface which can be used by system utilities to provide audio control:
In the screenshot above we can see all applications that are using the pulse mixer, we can monitor their output level, and adjust their volume.
A simple loopback “application”
So far we have soon how to use aplay to play an audio stream (a random source producing white noise). Let's have a look to aplay's companion, arecord.
Unsurprisingly, arecord is very similar to aplay. Here's a simple command to record audio into a file:
$ arecord -c 2 -r 44100 -f S16_LE -D default > test.wav
All of these parameters should be familiar by now. arecord writes its output to stdout, so we need to redirect its output to a file (test.wav in this case). If we don't, we will see a lot of unprintable characters which represent the binary stream of audio samples captured by arecord.
Once we are satisfied with recording we can press CTRL-C to stop arecord.
We can then play back the file we created:
$ aplay -c 2 -r 44100 -f S16_LE -D default test.wav
We can now pipe the two commands together (aplay, if no output file name is specified, will take its input from stdin)
$ arecord -c 2 -r 44100 -f S16_LE -D default | aplay -c 2 -r 44100 -f S16_LE -D default
Running this command we can speak in the microphone and hear ourself in the speaker. If you do the test, make sure you are using headphones or keep a low volume on the speaker, otherwise you will get a very loud howl caused by the sound produced by the speaker being captured by the microphone and introduced in the audio loop, causing a positive feedback oscillation (the so called Larsen effect).
One final note: the file produced by arecord is a raw dump of samples coming from the input device. It does not contain any header which allows us to understand how it was recorded (number of channels, sample format, etc). On the other hand, aplay wouldn't be able to decode this kind of header, either. So that's why we need to specify the same information twice in our “loopback” command: arecord needs to know how to capture and how to format its output stream, and aplay needs to know what to expect.
Final considerations
We have now acquired most of the concepts needed to write a real application of our own in C/C++. In the next article we will see how all of the concepts we have seen will map (almost) one-to-one to actual code.
Clever Engineering Processes
2 年Cannot wait for the next one. What do you think about tweaking some Kraftwerk's songs?