Realtime audio processing with Linux: part 2

Realtime audio processing with Linux: part 2

Introduction

In the previous article we have introduced basic concepts about audio processing, and begun to map them to ALSA (Advanced Linux Audio Architecture).

If you didn't have the chance to read the first article, you can find it here:

Now it's time to dig a bit deeper. In this article we will get familiar with ALSA using simple command line tools which come packaged with any Linux distribution, giving us sufficient understanding of ALSA concepts to write a real application.

Command line tools

There's a nifty set of tools called “alsa-utils”. The package name is for Debian-based systems; other distributions might have different names.

Inside alsa-utils we find two utilities we are going to use:

  • aplay: this tool allows us, among other things, to stream audio data from a file to an audio sink (like a speaker)
  • arecord: this tool allows us to capture audio from an audio source (like a microphone) to a file

Know thy devices

Before we can do anything useful, we need to know what devices we have on our system.

A PC, for example, could have several sound cards; each card can could manage different type of devices (for instance, a device for analog connections to the PC's headphones and another device for digital HDMI audio).

Each device might have subdevices. For example, an output device might have two subdevices to which software can stream audio, and these two subdevices would then get mixed in hardware without the need of software intervention.

Each subdevice, in turn, consists of one or more channels: a stereo device will have two channels, whereas a device for surround audio will typically have 6 channels (5 full band channels + 1 channel for effects).

We can use aplay and arecord to query the audio configuration of our system:

$ aplay -l

**** List of PLAYBACK Hardware Devices ****
card 0: NVidia [HDA NVidia], device 3: HDMI 0 [HDMI 0]
  Subdevices: 1/1
  Subdevice #0: subdevice #0
card 0: NVidia [HDA NVidia], device 7: HDMI 1 [HDMI 1]
  Subdevices: 1/1
  Subdevice #0: subdevice #0
card 0: NVidia [HDA NVidia], device 8: HDMI 2 [HDMI 2]
  Subdevices: 1/1
  Subdevice #0: subdevice #0
card 0: NVidia [HDA NVidia], device 9: HDMI 3 [HDMI 3]
  Subdevices: 1/1
  Subdevice #0: subdevice #0
card 0: NVidia [HDA NVidia], device 10: HDMI 4 [HDMI 4]
  Subdevices: 1/1
  Subdevice #0: subdevice #0
card 0: NVidia [HDA NVidia], device 11: HDMI 5 [HDMI 5]
  Subdevices: 1/1
  Subdevice #0: subdevice #0
card 1: Generic [HD-Audio Generic], device 0: ALC293 Analog [ALC293 Analog]
  Subdevices: 1/1
  Subdevice #0: subdevice #0
        

This output, captured from my PC, shows the following configuration:

No alt text provided for this image

As can be seen, the PC includes two cards, an Nvidia digital audio card with 6 devices (each of which maps to an HDMI port), and a generic audio card containing an analog ALC293 chip. All devices include only one subdevice, which is the most common case.

Now that we know our device list we can try to stream some random audio to, for example, the analog output connected to the speaker.

aplay -c 2 -r 44100 -D hw:1,0,0 /dev/urandom        

The parameters have the following meaning:

  • -c 2: this defines the number of audio channels, two because it's a stereo output
  • -r 44100: this defines the sampling rate, 44.1 kHz is a standard frequency for cd-quality audio
  • -D hw:1,0,0 : this defines the output sink we want to stream to. The 3 numbers represent card number, device number, and subdevice number. 1,0,0 represents the first subdevice within the ALC293 device. As most devices only have one subdevice, the subdevice number can be omitted. Hence, -D hw:1,0 is equally valid.
  • /dev/urandom is the file we want to stream, in this case a random number generator

The “hw:” prefix means we are addressing directly a hardware device, more on this later.

When we run this command we hear white noise from the speaker.

Let's now try streaming that random sound at a different sampling frequency:

$ aplay -c 2 -r 8000 -D hw:1,0,0 /dev/urandom
Playing raw data '/dev/urandom' : Signed 16 bit Little Endian, Rate 8000 Hz, Stereo
Warning: rate is not accurate (requested = 8000Hz, got = 44100Hz)
please, try the plug plugin
        

This happens because we are trying to play sound at a sampling frequency that the device does not support.

The same happens if we try the data format to an unsupported one. Aplay has chosen “Signed 16 bit Little Endian”, which means each sample consists of two bytes, interpreted with sign (therefore with a range of -32768 to +32767) and the lower byte comes before the higher byte (Little Endian, A.K.A. Intel format).

If we try to choose another format we get an error:

$ aplay -c 2 -f S16_BE -D hw:1,0 /dev/urandom
Playing raw data '/dev/urandom' : Signed 16 bit Big Endian, Rate 8000 Hz, Stereo
aplay: set_params:1368: Sample format not available
Available formats:
- S16_LE
- S32_LE        

But how do we know in advance which formats a hardware device supports ? Once again, aplay comes to the rescue:

aplay --dump-hw-params -D hw:1,0,0 /dev/urandom

Playing raw data '/dev/urandom' : Signed 16 bit Little Endian, Rate 8000 Hz, Mono
HW Params of device "hw:1,0,0":--------------------
ACCESS: MMAP_INTERLEAVED RW_INTERLEAVED
FORMAT: S16_LE S32_LE
SUBFORMAT: STD
SAMPLE_BITS: [16 32]
FRAME_BITS: [32 64]
CHANNELS: 2
RATE: [44100 192000]
PERIOD_TIME: (83 185760)
PERIOD_SIZE: [16 8192]
PERIOD_BYTES: [128 65536]
PERIODS: [2 32]
BUFFER_TIME: (166 371520)
BUFFER_SIZE: [32 16384]
BUFFER_BYTES: [128 65536]
TICK_TIME: ALL        

From this output we can see that our device supports 16 and 32 bits signed little endian samples (S16_LE and S32_LE) at a sampling frequency of 44.1 and 192 kHz.

But what if we want to play a file which has been recorded at a different sampling frequency, or with a different format ? When we tried playing at 8 kHz aplay gave us this hint: “please, try the plug plugin”.

Plug is an ALSA plugin, i.e. an intermediate software layer which can be inserted into an audio processing chain. It sits on top of a real hardware device and it presents itself as a virtual ALSA device to which (or from which) we can connect as if it were a native ALSA device.

“plug” will perform all necessary format conversions between the input format you request and the formats natively supported by the hardware device itself.

Instead of specifying “hw:” we specify “plughw:” on the same card and device number, and plug will do all the magic for us. Let's try violating two formats: sampling rate and data format:

$ aplay -c 2 -r 8000 -f S16_BE -D plughw:1,0,0 /dev/urandom
Playing raw data '/dev/urandom' : Signed 16 bit Big Endian, Rate 8000 Hz, Stereo        

Thus, “plug” has digested whatever we fed into it, and converted it to a format which the hardware device likes.

This diagram shows the difference between direct play to a hardware device and mediated play through a plug:

No alt text provided for this image

“plug” is one of many “virtual” ALSA devices which don't map directly to a hardware component, but perform some sort of processing instead.

We can get a list of such devices by calling “aplay -L” (notice the uppercase L).

This is a shortened output of that command on my PC:

$ aplay -L
default
  Playback/recording through the PulseAudio sound server
null
  Discard all samples (playback) or generate zero samples (capture)
pulse
  PulseAudio Sound Server
hw:CARD=NVidia,DEV=3
  HDA NVidia, HDMI 0
  Direct hardware device without any conversions
plughw:CARD=NVidia,DEV=3
  HDA NVidia, HDMI 0
  Hardware device with all software conversions
hw:CARD=Generic,DEV=0
  HD-Audio Generic, ALC293 Analog
  Direct hardware device without any conversions
plughw:CARD=Generic,DEV=0
  HD-Audio Generic, ALC293 Analog
  Hardware device with all software conversions        

We can notice our familiar “1,0” device, here referenced by name instead of number (Generic,0), and its plug counterpart. But we can also notice a very important device: pulse (also referred to as “default” in the current configuration).

Enter pulse

So far we have been able to stream audio (albeit random) to an ALSA sink, we should be moderately satisfied.

Our satisfaction however drops abruptly if we try to do the same thing while another application is streaming audio.

So let's try to fire up a youtube video and, while it's playing, type our usual command:

$ aplay -c 2 -r 44100 -D hw:1,0,0 /dev/urandom

aplay: main:852: audio open error: Device or resource busy        

This happens because the audio output device is already occupied streaming audio on behalf of another application (our web browser in this specific instance). We have seen that our audio device only has one subdevice, so it can't mix more than one audio stream at a time. What we want is a software mixing solution, so instead of trying to access directly the hardware device we stream to a software device we have met when we ran “aplay -L”: pulse.

$ aplay -c 2 -r 44100 -D pulse /dev/urandom

Playing raw data '/dev/urandom' : Signed 16 bit Little Endian, Rate 44100 Hz, Stereo        

In this way not only we don't get any errors, but we should hear both the youtube video and the white noise coming from aplay.

Here is what is going on under the hood:

No alt text provided for this image

The pulse server exposes a virtual port called “pulse” which can be shared among applications. So each application is unaware that it's actually talking to the pulse server and it “believes” it has its own instance of an ALSA port. Pulse takes care about mixing all sources together and it also exposes a mixer interface which can be used by system utilities to provide audio control:

No alt text provided for this image

In the screenshot above we can see all applications that are using the pulse mixer, we can monitor their output level, and adjust their volume.

A simple loopback “application”

So far we have soon how to use aplay to play an audio stream (a random source producing white noise). Let's have a look to aplay's companion, arecord.

Unsurprisingly, arecord is very similar to aplay. Here's a simple command to record audio into a file:

$ arecord -c 2 -r 44100 -f S16_LE -D default > test.wav        

All of these parameters should be familiar by now. arecord writes its output to stdout, so we need to redirect its output to a file (test.wav in this case). If we don't, we will see a lot of unprintable characters which represent the binary stream of audio samples captured by arecord.

Once we are satisfied with recording we can press CTRL-C to stop arecord.

We can then play back the file we created:

$ aplay -c 2 -r 44100 -f S16_LE -D default test.wav        

We can now pipe the two commands together (aplay, if no output file name is specified, will take its input from stdin)

$ arecord -c 2 -r 44100 -f S16_LE -D default | aplay -c 2 -r 44100 -f S16_LE -D default        

Running this command we can speak in the microphone and hear ourself in the speaker. If you do the test, make sure you are using headphones or keep a low volume on the speaker, otherwise you will get a very loud howl caused by the sound produced by the speaker being captured by the microphone and introduced in the audio loop, causing a positive feedback oscillation (the so called Larsen effect).

One final note: the file produced by arecord is a raw dump of samples coming from the input device. It does not contain any header which allows us to understand how it was recorded (number of channels, sample format, etc). On the other hand, aplay wouldn't be able to decode this kind of header, either. So that's why we need to specify the same information twice in our “loopback” command: arecord needs to know how to capture and how to format its output stream, and aplay needs to know what to expect.

Final considerations

We have now acquired most of the concepts needed to write a real application of our own in C/C++. In the next article we will see how all of the concepts we have seen will map (almost) one-to-one to actual code.

Paolo Di Maio

Clever Engineering Processes

2 年

Cannot wait for the next one. What do you think about tweaking some Kraftwerk's songs?

要查看或添加评论,请登录

Guido Piasenza的更多文章

  • Realtime audio processing with Linux: part 3

    Realtime audio processing with Linux: part 3

    Introduction In the previous articles we have introduced basic concepts about audio processing and got ourselves…

    3 条评论
  • Realtime Audio processing with Linux

    Realtime Audio processing with Linux

    Introduction Do you drive a recent car ? Chances are your infotainment system, i.e.

    4 条评论
  • 3D visualisation using Qt3D: part 3

    3D visualisation using Qt3D: part 3

    Introduction In this third installement of my Qt3D tutorials we will dig a bit deeper into materials and the impact…

    1 条评论
  • 3D visualisation using Qt3D: part 2

    3D visualisation using Qt3D: part 2

    Introduction In the second part of the tutorial we will dig deeper into the structure used by Qt3D to represent scenes,…

  • 3D visualisation using Qt3D: part 1

    3D visualisation using Qt3D: part 1

    Introduction In my company we are developing a machine vision + AI based assistant which can provide visual hints about…

    6 条评论
  • C++: detailed analysis of the language performance (Part 3)

    C++: detailed analysis of the language performance (Part 3)

    This is the third of a series of articles about C++ perfomance. Did you read part 2 already ? If not, please do so: In…

    2 条评论
  • C++: detailed analysis of the language performance (Part 2)

    C++: detailed analysis of the language performance (Part 2)

    This is the second in a series of articles about C++ performance. Did you read part 1 already ? If not, please do so:…

    3 条评论
  • C++: detailed analysis of the language performance (Part 1)

    C++: detailed analysis of the language performance (Part 1)

    As an avid C++ supporter I frequently had to face several objections to the language, mostly based on its (supposed)…

    16 条评论

社区洞察

其他会员也浏览了