Short: Neural speech narrator.device via Piper Author: simond@irrelevant.org (Simon Dick) Uploader: simond irrelevant org (Simon Dick) Type: util/sys Version: 44.0 Architecture: m68k-amigaos Distribution: Aminet narrator.wyoming ================ A drop-in replacement for the Amiga narrator.device that speaks with a modern neural voice instead of the old Paula formant synthesizer. Rather than running local formant synthesis, it forwards text to a Piper neural text-to-speech server over the Wyoming protocol (plain TCP) and plays the returned PCM audio through AHI. Because it replaces narrator.device by name in DEVS:, existing Amiga software gets neural speech transparently - the stock Say command, and any program that opens narrator.device, work with no modification. It ships as a pair: * narrator.device - the drop-in device (goes in DEVS:) * translator.library - a pass-through translator so callers deliver plain English to the device (Piper wants English, not the classic ARPABET phonemes) Aimed at PiStorm / emu68k accelerated and other fast 68k Amigas, where the accelerated CPU makes the network round-trip practical. Developed and tested under the Amiberry emulator. REQUIREMENTS ------------ * AHI v4 or later, with a Unit 0 audio mode configured (the paula.audio driver is fine under emulation). * A TCP/IP stack providing bsdsocket.library (Roadshow or AmiTCP). * A fast 68k (PiStorm/emu68k, or roughly 68030/40MHz and up). No FPU required. * A Piper TTS server reachable on your LAN with the Wyoming protocol enabled (default port 10200), e.g. the wyoming-piper add-on/container. * No TLS / AmiSSL - Wyoming is plain TCP. INSTALLATION ------------ Copy narrator.device to DEVS:narrator.device Copy translator.library to LIBS:translator.library Create the prefs file ENV:narrator.wyoming (and ENVARC: to persist it across reboots) with at least your server address: host 192.168.1.50 port 10200 Then use speech as normal, for example: Say "Hello from the Amiga." CONFIGURATION (ENV:narrator.wyoming) ------------------------------------ "key value" per line; # or ; start comments. All keys are optional except host. host Piper/Wyoming server address (default 127.0.0.1) port server port (default 10200) voice default Piper voice name (server default) voice_male voice when the caller selects MALE (Say's default sex) voice_female voice when the caller selects FEMALE ahi_unit ahi.device unit to play through (default 0) split_words break long input into ~N-word pipelined chunks for a faster start (0 = off) Voice names must exist on your Piper server. Choose a clearly-articulating voice for voice_male - it is the everyday voice, since Say defaults to the MALE sex. COMPANION SOFTWARE ------------------ Anything that speaks through narrator.device benefits from the neural voice. A good companion is speak-handler by Alexander Fritsch (Aminet: util/sys/speak-handler), a from-scratch native replacement for the Commodore SPEAK: handler. Mount its SPEAK: and you can pipe text straight to neural speech: Type myfile.txt TO SPEAK: echo "Hello from the Amiga" >SPEAK: NOTES / LIMITATIONS ------------------- * Latency: warm first-audio is around 0.4s on an emulated 68020 over a local network. The first request after the server has been idle is slower (~1.8s) while Piper loads the voice model; warm requests are fast. split_words cuts the start delay on long sentences. * rate and pitch from the IOSpeech request are accepted but ignored - the Wyoming synthesize request has no per-request rate/pitch knob (those are properties of the Piper voice, set server-side). volume and sex are honoured (sex selects the configured voice). * Direct ARPABET phoneme input (the classic narrator contract, bypassing translator.library) is silently discarded - Piper takes text, not phonemes. Normal English, including all-caps words, is always spoken. * CMD_READ mouth-shapes and word/syllable sync are not implemented (Piper returns no phoneme timing). CMD_READ returns ND_NoWrite, so talking-head software gets no animation but does not hang. DEVELOPMENT ----------- This project was developed with the assistance of AI tooling (Anthropic's Claude, via Claude Code), under human direction and tested on-target. SOURCE ------ Full source, build instructions and design notes: https://github.com/sidick/narrator.wyoming Built with the Bebbo m68k-amigaos GCC cross-toolchain. LICENSE ------- MIT - see the LICENSE file in the source repository.