Offline Whisper Audio Transcription and Ollama-Voice Assistant

The WhisperLive project is a real-time transcription application that utilizes the OpenAI Whisper model to convert speech input into text output. This technology can be employed for transcribing both live audio input from a microphone as well as pre-recorded audio files.

To set up the server-side of this integration on a Mac, you’ll need to use:

from whisper_live.server import TranscriptionServer
server = TranscriptionServer()
server.run("0.0.0.0", 9090)

This code snippet initializes a TranscriptionServer instance and begins running it on port 9090. For the client-side, you can utilize:

from whisper_live.client import TranscriptionClient
client = TranscriptionClient("localhost", 9090, model="base.en")
client()

Before you start with this integration, ensure that the necessary tools are installed on your Mac. Specifically, you’ll need to install PortAudio and Whisper-live via:

brew install portaudio whisper-live

This setup doesn’t require any online connectivity as it’s designed for offline use cases, making it an ideal solution for projects where real-time transcription capabilities are needed without the requirement of internet access.

Using Ollama-Voice for Whisper Audio Transcription

The Ollama-voice is a simple yet effective combination of three tools, designed to work seamlessly together in offline mode. This setup is ideal for applications where internet connectivity is limited or not available at all. The system consists of:

Speech Recognition: Whisper running local models in offline mode. This involves utilizing the Whisper model’s ability to transcribe speech-to-text locally on a device, without requiring any online connections.

whisper run --model your_model_name --lang en This line of code is used to start the Whisper model in offline mode, where “your_model_name” should be replaced with the actual name of the model being used.

Large Language Model: Ollama running local models in offline mode. This part utilizes the capabilities of the Ollama model for generating human-like responses based on the input provided.

ollama run --model your_model_name Similar to the Whisper model, this line starts the Ollama model in offline mode. Again, replace “your_model_name” with the actual name of the model being used.

Offline Text To Speech: Pyttsx3 is used for converting text into speech without needing any internet connection. This feature ensures that the system can produce audible responses even when it’s not connected to the internet.

pyttsx3.init(driverName='sapi5').say('Hello, World!') This code initializes the pyttsx3 driver and produces a simple “Hello, World!” message as an audio output. In a more complex setup, this tool would be used to convert the text generated by the Ollama model into spoken language.

By combining these three tools in offline mode, you can create a comprehensive system for speech recognition, large language processing, and text-to-speech conversion. This setup is particularly useful for applications that require local operation without internet access.