Skip to content

SingleApi

Internet, programming, artificial intelligence

Menu
  • Home
  • About
  • My Account
  • Registration
Menu

Offline Whisper Audio Transcription and Ollama-Voice Assistant

Posted on September 23, 2024

The WhisperLive project is a real-time transcription application that utilizes the OpenAI Whisper model to convert speech input into text output. This technology can be employed for transcribing both live audio input from a microphone as well as pre-recorded audio files.

To set up the server-side of this integration on a Mac, you’ll need to use:

from whisper_live.server import TranscriptionServer
server = TranscriptionServer()
server.run("0.0.0.0", 9090)

This code snippet initializes a TranscriptionServer instance and begins running it on port 9090. For the client-side, you can utilize:

from whisper_live.client import TranscriptionClient
client = TranscriptionClient("localhost", 9090, model="base.en")
client()

Before you start with this integration, ensure that the necessary tools are installed on your Mac. Specifically, you’ll need to install PortAudio and Whisper-live via:

brew install portaudio whisper-live

This setup doesn’t require any online connectivity as it’s designed for offline use cases, making it an ideal solution for projects where real-time transcription capabilities are needed without the requirement of internet access.

Using Ollama-Voice for Whisper Audio Transcription

The Ollama-voice is a simple yet effective combination of three tools, designed to work seamlessly together in offline mode. This setup is ideal for applications where internet connectivity is limited or not available at all. The system consists of:

Speech Recognition: Whisper running local models in offline mode. This involves utilizing the Whisper model’s ability to transcribe speech-to-text locally on a device, without requiring any online connections.

whisper run --model your_model_name --lang en This line of code is used to start the Whisper model in offline mode, where “your_model_name” should be replaced with the actual name of the model being used.

Large Language Model: Ollama running local models in offline mode. This part utilizes the capabilities of the Ollama model for generating human-like responses based on the input provided.

ollama run --model your_model_name Similar to the Whisper model, this line starts the Ollama model in offline mode. Again, replace “your_model_name” with the actual name of the model being used.

Offline Text To Speech: Pyttsx3 is used for converting text into speech without needing any internet connection. This feature ensures that the system can produce audible responses even when it’s not connected to the internet.

pyttsx3.init(driverName='sapi5').say('Hello, World!') This code initializes the pyttsx3 driver and produces a simple “Hello, World!” message as an audio output. In a more complex setup, this tool would be used to convert the text generated by the Ollama model into spoken language.

By combining these three tools in offline mode, you can create a comprehensive system for speech recognition, large language processing, and text-to-speech conversion. This setup is particularly useful for applications that require local operation without internet access.

Recent Posts

  • GPT-5.3 Codex and Anthropic Opus 4.6 AI Advancements
  • Claude Opus 4.6 and GPT-5.3-Codex Autonomous Agents Advances
  • Claude Code Integration Enhances Autonomous AI Development Workflows
  • Polish AI & Tech Industry Roundup – 2026-02-03
  • OpenAI Codex and Anthropic Claude Code Advancements

Recent Comments

  • adrian on Anthropic Launches Claude Cowork Powered by Claude Code for AI-Driven Workplace Task Automation and Agentic AI Development
  • adrian on Advancements in AI Foundation Models Agentic Frameworks and Robotics Integration Driving Next Generation AI Ecosystems
  • adrian on n8n DrawThings
  • adrian on Kokoro TTS Model, LLM Apps Curated List
  • adrian on Repo Prompt and Ollama

Archives

Categories

agents ai apps automation blender cheatsheet claude codegen comfyui deepseek devsandbox docker draw things flux gemini gemini cli google hidream hobby huggingface hugging face java langchain4j llama llm mcp meta mlx movies n8n news ollama openai personal thoughts quarkus rag release repo prompt speech-to-speech spring stable diffusion tts vibe coding whisper work

Meta

  • Register
  • Log in
  • Entries feed
  • Comments feed
  • WordPress.org

Terms & Policies

  • Comments Policy
  • Privacy Policy

Other websites: jreactor gaming.singleapi

©2026 SingleApi | Design: Newspaperly WordPress Theme
We use cookies on our website to give you the most relevant experience by remembering your preferences and repeat visits. By clicking “Accept”, you consent to the use of ALL the cookies.
Do not sell my personal information.
Cookie settingsACCEPT
Privacy & Cookies Policy

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these cookies, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may have an effect on your browsing experience.
Necessary
Always Enabled
Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.
Non-necessary
Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics, ads, other embedded contents are termed as non-necessary cookies. It is mandatory to procure user consent prior to running these cookies on your website.
SAVE & ACCEPT