Server-side Audio Processing in Node.js
Last updated
Last updated
A major benefit of writing code for the web is that you can access the multitude of APIs that are available in modern browsers. Unfortunately, when writing server-side code, we are not afforded such luxury, so we have to find another way. In this tutorial, we will design a simple Node.js application that uses Transformers.js for speech recognition with , and in the process, learn how to process audio on the server.
The main problem we need to solve is that the is not available in Node.js, meaning we can’t use the class to process audio. So, we will need to install third-party libraries to obtain the raw audio data. For this example, we will only consider .wav
files, but the same principles apply to other audio formats.
This tutorial will be written as an ES module, but you can easily adapt it to use CommonJS instead. For more information, see the .
Useful links:
version 18+
version 9+
Let’s start by creating a new Node.js project and installing Transformers.js via :
Copied
Remember to add "type": "module"
to your package.json
to indicate that your project uses ECMAScript modules.
Copied
Start by creating a new file called index.js
, which will be the entry point for our application. Let’s also import the necessary modules:
Copied
Copied
Next, let’s load an audio file and convert it to the format required by Transformers.js:
Copied
Finally, let’s run the model and measure execution duration.
Copied
You can now run the application with node index.js
. Note that when running the script for the first time, it may take a while to download and cache the model. Subsequent requests will use the cached model, and model loading will be much faster.
You should see output similar to:
Copied
That’s it! You’ve successfully created a Node.js application that uses Transformers.js for speech recognition with Whisper. You can now use this as a starting point for your own applications.
Next, let’s install the package, which we will use for loading .wav
files:
For this tutorial, we will use the Xenova/whisper-tiny.en
model, but feel free to choose one of the other whisper models from the . Let’s create our pipeline with: