Server-side Audio Processing in Node.js
Server-side Audio Processing in Node.js
A major benefit of writing code for the web is that you can access the multitude of APIs that are available in modern browsers. Unfortunately, when writing server-side code, we are not afforded such luxury, so we have to find another way. In this tutorial, we will design a simple Node.js application that uses Transformers.js for speech recognition with Whisper, and in the process, learn how to process audio on the server.
The main problem we need to solve is that the Web Audio API is not available in Node.js, meaning we can’t use the AudioContext class to process audio. So, we will need to install third-party libraries to obtain the raw audio data. For this example, we will only consider .wav files, but the same principles apply to other audio formats.
This tutorial will be written as an ES module, but you can easily adapt it to use CommonJS instead. For more information, see the node tutorial.
Useful links:
Prerequisites
Getting started
Let’s start by creating a new Node.js project and installing Transformers.js via NPM:
Copied
npm init -y
npm i @xenova/transformersRemember to add "type": "module" to your package.json to indicate that your project uses ECMAScript modules.
Next, let’s install the wavefile package, which we will use for loading .wav files:
Copied
Creating the application
Start by creating a new file called index.js, which will be the entry point for our application. Let’s also import the necessary modules:
Copied
For this tutorial, we will use the Xenova/whisper-tiny.en model, but feel free to choose one of the other whisper models from the BOINC AI Hub. Let’s create our pipeline with:
Copied
Next, let’s load an audio file and convert it to the format required by Transformers.js:
Copied
Finally, let’s run the model and measure execution duration.
Copied
You can now run the application with node index.js. Note that when running the script for the first time, it may take a while to download and cache the model. Subsequent requests will use the cached model, and model loading will be much faster.
You should see output similar to:
Copied
That’s it! You’ve successfully created a Node.js application that uses Transformers.js for speech recognition with Whisper. You can now use this as a starting point for your own applications.
Last updated