librespeaker Documentation

The librespeaker is an audio processing library which can perform noise suppression, direction of arrival calculation, beamforming, hotword searching. It reads the microphoone stream from linux sound server, e.g. PulseAudio. It exposes a few APIs which enable users to get indicated when hotword is said and the processed microphone data in PCM format, which then can be sent to cloud services like Alexa for further processing.

From the top view, librespeaker has two parts:

Audio processing chain

Namely the chain consists of several nodes which link together in a user specified order. A node can be considered as a blackbox which has input and output, and internally applies filter algorithms to the audio stream passing through. The input / output is a block of audio data, which is specified by some parameters. See respeaker::NodeParameter. Please note that some nodes may accept particular length of block, e.g. the respeaker::VepAecBeamformingNode accepts only 8 milliseconds block, this is decided by the algorithms. Generally users don't need to specially care about these restrictions, as we utilized the nodes into another application - named respeakerd, which is a typicall server application for the ReSpeaker v2 hardware with tuned configurations.

The list of nodes:

Each node inherits from one or more interface classes:

The nodes are linked together by calling the Uplink method, please see the examples to know how to link up.

There's a data structure shared between all the nodes of the chain - respeaker::ChainSharedData. This structure is used to pass status through the chain, the status are:

The items in this shared structure are partially used by internal, part of them are exposed to APIs for users to call, with the supervisor methods.

Supervisor - the ReSpeaker class

After creation of the chain, we need to register the chain into a supervisor - respeaker::ReSpeaker. The supervisor exposes the methods to user calls. Users don't need to directly operate the nodes then. This is done by registering the following node pointers:

After all these done, we have a topological graph like this (this is an exmaple):

+--------------+ +---------------+ +-----------+
|collector node| +-----> |processing node+-------> |output node|
+------+-------+ | +---------------+ +-----+-----+
| | | |
| | +------------+ | |
| +-----> |event node | | |
| +------+-----+ | |
| | | |
| | | |
| supervisor |

Then we call the Start method of the supervisor to start the threads and processing, and call other methods like DetectHotword or Listen to get the outputs. Please see respeaker::ReSpeaker to know the methods it provides.

Next Step

Please briefly see each class's methods, and then go to the examples. To compile the examples: g++ -o EXAMPLE_NAME -lrespeaker -lsndfile -fPIC -std=c++11 -fpermissive -I/usr/include/respeaker/ -DWEBRTC_LINUX -DWEBRTC_POSIX -DWEBRTC_NS_FLOAT -DWEBRTC_APM_DEBUG_DUMP=0 -DWEBRTC_INTELLIGIBILITY_ENHANCER=0