Snowboy, a Customizable Hotword Detection Engine

Contact:snowboy@kitt.ai
Github:https://github.com/kitt-ai/snowboy
Version:1.0.4 (2016-07-13)

Introduction

Snowboy is an embedded and real-time, always-listening but off-line, and highly customizable hotword detection engine that runs on Raspberry Pi, (Ubuntu) Linux, and Mac OS X. A hotword is a key word or phrase that a computer always listens for to trigger other actions. A hotword is also called a wake word or trigger word.

Common usage of hotword include Alexa on Amazon Echo, OK Google on some Android devices and Hey Siri on iPhones. These hotwords are used to initiate a full-fledged speech interaction interface. But hotwords can be used in other ways too, such as performing simple command & control actions.

A hacker’s way to perform hotword detection is running full ASR (Automatic Speech Recognition) on device and watching specific trigger words in ASR transcriptions. ASR consumes a lot of device and bandwidth resource and does not protect your privacy if using a cloud-based solution. Snowboy is created by KITT.AI to solve these pains.

Snowboy is:

  • highly customizable: you can freely define your own magic phrase here –
    let it be “open sesame”, “garage door open”, or “hello dreamhouse”, you name it.
  • always listening but protects your privacy: Snowboy does not use Internet
    and does not stream your voice anywhere.
  • light-weight and embedded: it runs on Raspberry Pi’s and consumes less than 10%
    CPU on the weakest Pi’s (single-core 700M Hz ARMv6).
  • Apache licensed!

Currently Snowboy supports:

  • all versions of Raspberry Pi (with Raspbian based on Debian Jessie 8.0)
  • 64bit Mac OS X
  • 64bit Ubuntu (12.04 and 14.04)

It ships in the form of a C library with Python wrappers. It has limited support for:

  • iOS
  • Android with ARMv7 CPUs

For iOS/Android, please check out Snowboy’s GitHub page.

If you have it working with more devices, OS, or programming languages, please send a pull request.

Downloads

We provide pre-packaged Snowboy binaries and their Python wrappers for:

Or you can check out GitHub to compile a version yourself.

Quick Start

To use Snowboy, you’ll need:

  1. a supported device with microphone input
  2. corresponding decoder (downloaded above)
  3. trained model(s) from https://snowboy.kitt.ai.

Access Microphone

We use PortAudio as a cross-platform support for audio in/out. We also use sox as a quick utility to check whether your microphone setup is correctly.

On Linux systems:

sudo apt-get install python-pyaudio python3-pyaudio sox

On Mac:

brew install portaudio sox

(If you don’t have Homebrew, you can install it here)

Finally install PortAudio’s Python bindings:

pip install pyaudio

Note

If you don’t have pip, you can install it here

Note

If you have a Permission Error from pip, you can either use sudo pip install pyaudio or change the folder owner to yourself: sudo chown $USER -R /usr/local

To check whether you can record via your microphone, open a terminal and run:

rec temp.wav

If you see an error on Raspberry Pi’s, please refer to the Running_on_Pi section.

Decoder Structures

The decoder tarball contains the following files:

├── README.md
├── _snowboydetect.so
├── demo.py
├── demo2.py
├── light.py
├── requirements.txt
├── resources
│   ├── ding.wav
│   ├── dong.wav
│   ├── common.res
│   └── snowboy.umdl
├── snowboydecoder.py
├── snowboydetect.py
└── version

_snowboydetect.so is a dynamically linked library compiled with SWIG. It has dependencies on your system’s Python2 library. However, all snowboy-related libraries are statically linked in this file.

snowboydetect.py is a Python wrapper file generated by SWIG. It’s not very easy to read thus the other high-level wrapper: snowboydecoder.py.

You should already have a trained model file, let’s say snowboy.pmdl. Or simply use the universal model resources/snowboy.umdl.

Running a Demo

(This demo runs on any devices. But we suggest you run it on a laptop/desktop with speaker output because the demo plays a Ding sound when your hotword is triggered.)

The __main__ code of snowboydecoder.py contains a simple demo:

python demo.py snowboy.pmdl

Here snowboy.pmdl is your trained model downloaded from https://snowboy.kitt.ai. The .pmdl suffix indicates a personal model and a .umdl suffix indicates a universal model.

Then speak to your microphone and see whether snowboy detects your magic phrase.

The demo is fairly straight-forward:

import snowboydecoder
import sys
import signal

interrupted = False

def signal_handler(signal, frame):
    global interrupted
    interrupted = True

def interrupt_callback():
    global interrupted
    return interrupted

if len(sys.argv) == 1:
    print("Error: need to specify model name")
    print("Usage: python demo.py your.model")
    sys.exit(-1)

model = sys.argv[1]

signal.signal(signal.SIGINT, signal_handler)

detector = snowboydecoder.HotwordDetector(model, sensitivity=0.5)
print('Listening... Press Ctrl+C to exit')

detector.start(detected_callback=snowboydecoder.play_audio_file,
               interrupt_check=interrupt_callback,
               sleep_time=0.03)

detector.terminate()

The main program loops at detector.start(). Every sleep_time=0.03 second, the function:

  1. checks a ring buffer filled with microphone data to see whether a hotword is detected. If yes, call the detected_callback function.
  2. calls the interrupt_check function: if it returns True, then break the main loop and return.

Here we assigned detected_callback with a default snowboydecoder.play_audio_file so that every time your hotword is heard the computer will play a ding sound.

Note

Do not append () to your callback function: the correct way is to assign detected_callback=your_func instead of detected_callback=your_func(). However, what if you have parameters to assign in your callback functions? Use a lambda function! So your callback would look like: callback=lambda: callback_function(parameters).

Running on Raspberry Pi

Raspberry Pi’s are excellent hardware for running Snowboy. We support all versions of Raspberry Pi (1, 2, 3 and Zero). Supported OS is Raspbian 8.0.

Set up Audio

You’ll need a USB microphone for audio input. The on-board 3.5mm audio jack only has audio out but no audio in so don’t bother to plug in a microphone there. We have successfully used both generic USB microphones and the PlayStation 3 Eye webcam.

Note

You can buy a PS 3 Eye for $5 on Amazon. Linux has builtin kernel modules for it but Windows PCs do not have free drivers for the Eye.

Please first follow Access_Microphone to install portaudio. Then test whether your microphone can be accessed with rec:

rec t.wav

Note

even though USB webcams should be “plug-n-play”, we experienced that for some of them you have to reboot the Pi after plugging in the webcam.

If you see errors, let’s first check whether your alsa/pulseaudio is configured properly. First list the playback device:

$ aplay -l
 **** List of PLAYBACK Hardware Devices ****
 card 0: ALSA [bcm2835 ALSA], device 0: bcm2835 ALSA [bcm2835 ALSA]
   Subdevices: 8/8
   Subdevice #0: subdevice #0
   Subdevice #1: subdevice #1
   Subdevice #2: subdevice #2
   Subdevice #3: subdevice #3
   Subdevice #4: subdevice #4
   Subdevice #5: subdevice #5
   Subdevice #6: subdevice #6
   Subdevice #7: subdevice #7
 card 0: ALSA [bcm2835 ALSA], device 1: bcm2835 ALSA [bcm2835 IEC958/HDMI]
   Subdevices: 1/1
   Subdevice #0: subdevice #0

Here the playback device is card 0, device 0, or hw:0,0 (hw:0,1 is HDMI audio out).

Then list your recording device:

$ arecord -l
**** List of CAPTURE Hardware Devices ****
card 1: Camera [Vimicro USB2.0 UVC Camera], device 0: USB Audio [USB Audio]
  Subdevices: 1/1
  Subdevice #0: subdevice #0

Here the recording device is card 1, device 0, or hw1:0.

Finally change your ~/.asoundrc file to:

pcm.!default {
  type asym
   playback.pcm {
     type plug
     slave.pcm "hw:0,0"
   }
   capture.pcm {
     type plug
     slave.pcm "hw:1,0"
   }
}

And try rec temp.wav again. Your microphone in should be set up properly now.

Now please go back to try Running_a_Demo. If that is successful, let’s try to Blink an LED light and Toggle an AC-powered Lamp with your Pi.

Note

If you see the following error:

ImportError: /usr/lib/arm-linux-gnueabihf/libstdc++.so.6:
version `GLIBCXX_3.4.20' not found (required by rpi-arm-raspbian-8.0-1.0.1/_snowboy.so)

It means that your g++ library is not up-to-date. You are probably still using Debian Wheezy 7.5 (check with lsb_release -a). However we compiled the snowboy library under Raspbian based on Debian Jessie 8.0 that comes with g++-4.9. You can either upgrade your Raspbian version to Jessie, or follow this post to install g++-4.9 on your Wheezy, or compile a version youself from GitHub.

Note

If you cannot hear any audio from the 3.5mm audio jack, it could be that audio is streamed to the HDMI port. Follow this config to make audio output to the 3.5mm audio jack.

Toggle an AC-powered Lamp

Controlling an LED light is just a toy example, let’s control some real home appliances! In this example we want to use Raspberry Pi’s GPIO output to connect and disconnect a higher voltage AC circuit.

This can be done with help of a bipolar transistor, but luckily someone already built it from a successful kickstarter campaign: the IoT relay, which can be purchased on Amazon for $15 (as of April 2016).

_images/iot-relay.jpg

Mechanism of the IoT Relay is very simple:

  • when red wire has high DC voltage (say, 3.3V or 12V), the top two “normally ON”
    outlets will turn off and the bottom two “normally OFF” outlets will turn on
  • when red wire has no DC voltage, the top two “normally ON”
    outlets will turn on and the bottom two “normally OFF” outlets will turn off
  • the top two and bottom two outlets can only be controlled in two groups. There
    is no way to control each of them individually.

Now if we directly connect the red wire to Pin 17 of a Raspberry Pi and reuse light.py or demo.py above, we can control any home appliances that are plugged into the IoT Relay!

The following is a little video with Snowboy running on a Raspberry Pi controlling three small LED lights on the right and a lamp on the left through the IoT relay:

Advanced Usage

Multiple Models and Callbacks

So far we have worked with only one model that can dictate a binary state. Isn’t it nice to listen to multiple models at the same time? demo2.py shows how to do it:

import snowboydecoder
import sys
import signal

interrupted = False


def signal_handler(signal, frame):
    global interrupted
    interrupted = True


def interrupt_callback():
    global interrupted
    return interrupted

if len(sys.argv) != 3:
    print("Error: need to specify 2 model names")
    print("Usage: python demo.py 1st.model 2nd.model")
    sys.exit(-1)

models = sys.argv[1:]

# capture SIGINT signal, e.g., Ctrl+C
signal.signal(signal.SIGINT, signal_handler)

sensitivity = [0.5]*len(models)
detector = snowboydecoder.HotwordDetector(models, sensitivity=sensitivity)
callbacks = [lambda: snowboydecoder.play_audio_file(snowboydecoder.DETECT_DING),
             lambda: snowboydecoder.play_audio_file(snowboydecoder.DETECT_DONG)]
print('Listening... Press Ctrl+C to exit')

# main loop
# make sure you have the same numbers of callbacks and models
detector.start(detected_callback=callbacks,
               interrupt_check=interrupt_callback,
               sleep_time=0.03)

detector.terminate()

Here we used two models for the decoder and provided two callback functions. If the first hotword is detected, it’ll play a Ding sound. If the second hotword is detected, it’ll play a Dong sound.

You are not limited to using only two models. You are also not limited to using only the personal or the universal models. You can give HotwordDetector a mixture of multiple personal and universal models as long as your CPU is powerful enough to process them all.

FAQ

  • What’s the CPU/RAM usage?

Snowboy takes minimal CPU on modern computers. On Raspberry Pi’s with decade-old CPU chips, it takes less than 5% ~ 10% of CPU depending on specific chips. In terms of memory usage, the PortAudio Python wrapper usually uses about 10MB of RAM while the standalone C binary uses less than 2MB.

Name CPU CPU Usage RAM Usage
RPi 1 single-core 700MHz ARMv6 <10%
  • Python: < 15MB
  • C: < 2MB
RPi 2 quad-core 900MHz ARMv7 <5%
RPi 3 quad-core 1.2GHz ARMv8 <5%
RPi Zero single-core 1GHz ARMv6 <5%
Macbooks Intel Core i3/5/7 <1%
  • What is detection sensitivity

Detection sensitivity controls how sensitive the detection is. It is a value between 0 and 1. Increasing the sensitivity value lead to better detection rate, but also higher false alarm rate. It is an important parameter that you should play with in your actual application.

  • Audio format

Supported audio format is WAVE (with linear PCM, 8-bits unsigned integer, 16-bits signed integer or 32-bits signed integer). The actual audio format to be used is decided by Snowboy. See SampleRate(), NumChannels() and BitsPerSample() for the required sampling rate, number of channels and bits per sample values.

  • My pmdl model works well for me, but does not work well for others

Models with suffix pmdl are personal models, and they are supposed to only work well for the person who provides the audio samples. If you are looking for a model that works well for everyone, please use the universal model (with suffix umdl).

  • My trained model works well on laptops but not on Pi’s

It’s due to acoustic distortion through microphones. If you record your voice with two different microphones: one on your laptop and the other on your Pi, then play them (play t.wav), you’ll hear they sound very differently, even with the same voice!

The best way around this is to use the same recording for both training your model and testing your voice. If you want to use Snowboy on a Raspberry Pi, you can first record your voice with rec t.wav (make sure to apt-get install sox), then upload the 3 recordings to the Snowboy website, and finally download the trained model. The uploading button is as the following:

_images/upload.png

Another trick is to play with the audio gain (see the answer regarding audio_gain below). We have the experience that USB microphones on a Raspberry Pi usually have low volume, thus increasing the audio gain may help.

  • The volume of my recording is too low/high

When you construct a HotwordDetector from snowboydecoder, there is an audio_gain parameter:

HotwordDetector(decoder_model,
                resource=RESOURCE_FILE,
                sensitivity=[],
                audio_gain=1)

Set audio_gain to be larger than 1 if your test recording’s volume is too low, or smaller than 1 if too high.

  • Does Snowboy come with VAD?

Yes, Snowboy comes with Voice Activity Detection. Usually detecting whether there’s human voice in the audio needs much less resource than detecting a hotword. Thus VAD serves as a filtering layer before hotword detection to reduce CPU usage.

  • How to use Snowboy’s VAD to detect voice and silence?

The return value of SnowboyDetect.RunDetection() function indicates silence, voice, error, and triggered words:

return meaning
-2 silence
-1 error
0 voice
1,.. triggered index

Check out snowboydecoder.py for usages.

  • Who wrote Snowboy?

The KITT.AI co-founders. Core modules of Snowboy are created by Guoguo Chen, who is also a contributor to the open-source speech recognition software Kaldi and general purpose neural network toolkit CNTK.

Indices and tables