edge-tts

Safe
Speech & Transcription

|.

SKILL.md

# Edge-TTS Skill ## Overview Generate high-quality text-to-speech audio using Microsoft Edge's neural TTS service via the node-edge-tts npm package. Supports multiple languages, voices, adjustable speed/pitch, and subtitle generation. ## Quick Start When you detect TTS intent from triggers or user request: 1. **Call the tts tool** (Clawdbot built-in) to convert text to speech 2. The tool returns a MEDIA: path 3. Clawdbot routes the audio to the current channel ```javascript // Example: Built-in tts tool usage tts("Your text to convert to speech") // Returns: MEDIA: /path/to/audio.mp3 ``` ## Trigger Detection Recognize "tts" keyword as TTS requests. The skill automatically filters out TTS-related keywords from text before conversion to avoid converting the trigger words themselves to audio. ## Advanced Customization ### Using the Node.js Scripts For more control, use the bundled scripts directly: #### TTS Converter ```bash cd scripts npm install node tts-converter.js "Your text" --voice en-US-AriaNeural --rate +10% --output output.mp3 ``` **Options:** - `--voice, -v`: Voice name (default: en-US-AriaNeural) - `--lang, -l`: Language code (e.g., en-US, es-ES) - `--format, -o`: Output format (default: audio-24khz-48kbitrate-mono-mp3) - `--pitch`: Pitch adjustment (e.g., +10%, -20%, default) - `--rate, -r`: Rate adjustment (e.g., +10%, -20%, default) - `--volume`: Volume adjustment (e.g., +0%, -10%, default) - `--save-subtitles, -s`: Save subtitles as JSON file - `--output, -f`: Output file path (default: tts_output.mp3) - `--proxy, -p`: Proxy URL (e.g., http://localhost:7890) - `--timeout`: Request timeout in milliseconds (default: 10000) - `--list-voices, -L`: List available voices #### Configuration Manager ```bash cd scripts npm install node config-manager.js --set-voice en-US-AriaNeural node config-manager.js --set-rate +10% node config-manager.js --get node config-manager.js --reset ``` ### Voice Selection Common voices (use `--list-voices` for full list): **English:** - `en-US-MichelleNeural` (female, natural, **default**) - `en-US-AriaNeural` (female, natural) - `en-US-GuyNeural` (male, natural) - `en-GB-SoniaNeural` (female, British) - `en-GB-RyanNeural` (male, British) **Other Languages:** - `es-ES-ElviraNeural` (Spanish, Spain) - `fr-FR-DeniseNeural` (French) - `de-DE-KatjaNeural` (German) - `ja-JP-NanamiNeural` (Japanese) - `zh-CN-XiaoxiaoNeural` (Chinese) - `ar-SA-ZariyahNeural` (Arabic) ### Rate Guidelines Rate values use percentage format: - `"default"`: Normal speed - `"-20%"` to `"-10%"`: Slow, clear (tutorials, stories, accessibility) - `"+10%"` to `"+20%"`: Slightly fast (summaries) - `"+30%"` to `"+50%"`: Fast (news, efficiency) ### Output Formats Choose audio quality based on use case: - `audio-24khz-48kbitrate-mono-mp3`: Standard quality (voice notes, messages) - `audio-24khz-96kbitrate-mono-mp3`: High quality (presentations, content) - `audio-48khz-96kbitrate-stereo-mp3`: Highest quality (professional audio, music) ## Resources ### scripts/tts-converter.js Main TTS conversion script using node-edge-tts. Generates audio files with customizable voice, rate, volume, pitch, and format. Supports subtitle generation and voice listing. ### scripts/config-manager.js Manages persistent user preferences for TTS settings (voice, language, format, pitch, rate, volume). Stores config in `~/.tts-config.json`. ### scripts/package.json NPM package configuration with node-edge-tts dependency. ### references/node_edge_tts_guide.md Complete documentation for node-edge-tts npm package including: - Full voice list by language - Prosody options (rate, pitch, volume) - Usage examples (CLI and Module) - Subtitle generation - Output formats - Best practices and limitations ### Voice Testing Test different voices and preview audio quality at: https://tts.travisvn.com/ Refer to this when you need specific voice details or advanced features. ## Installation To use the bundled scripts: ```bash cd /home/user/clawd/skills/public/tts-skill/scripts npm install ``` This installs: - `node-edge-tts` - TTS library - `commander` - CLI argument parsing ## Workflow 1. **Detect intent**: Check for "tts" trigger or keyword in user message 2. **Choose method**: Use built-in `tts` tool for simple requests, or `scripts/tts-converter.js` for customization 3. **Generate audio**: Convert the target text (message, search results, summary) 4. **Return to user**: The tts tool returns a MEDIA: path; Clawdbot handles delivery ## Testing ### Basic Test Run the test script to verify TTS functionality: ```bash cd /home/user/clawd/skills/public/edge-tts/scripts npm test ``` This generates a test audio file and verifies the TTS service is working. ### Voice Testing Test different voices and preview audio quality at: https://tts.travisvn.com/ ### Integration Test Use the built-in `tts` tool for quick testing: ```javascript // Example: Test TTS with default settings tts("This is a test of the TTS functionality.") ``` ### Configuration Test Verify configuration persistence: ```bash cd /home/user/clawd/skills/public/edge-tts/scripts node config-manager.js --get node config-manager.js --set-voice en-US-GuyNeural node config-manager.js --get ``` ## Troubleshooting - **Test connectivity**: Run `npm test` to check if TTS service is accessible - **Check voice availability**: Use `node tts-converter.js --list-voices` to see available voices - **Verify proxy settings**: If using proxy, test with `node tts-converter.js "test" --proxy http://localhost:7890` - **Check audio output**: The test should generate `test-output.mp3` in the scripts directory ## Notes - node-edge-tts uses Microsoft Edge's online TTS service (updated, working authentication) - No API key needed (free service) - Output is MP3 format by default - Requires internet connection - Supports subtitle generation (JSON format with word-level timing) - **Temporary File Handling**: By default, audio files are saved to the system's temporary directory (`/tmp/edge-tts-temp/` on Unix, `C:\Users\<user>\AppData\Local\Temp\edge-tts-temp\` on Windows) with unique filenames (e.g., `tts_1234567890_abc123.mp3`). Files are not automatically deleted - the calling application (Clawdbot) should handle cleanup after use. You can specify a custom output path with the `--output` option if permanent storage is needed. - **TTS keyword filtering**: The skill automatically filters out TTS-related keywords (tts, TTS, text-to-speech) from text before conversion to avoid converting the trigger words themselves to audio - For repeated preferences, use `config-manager.js` to set defaults - **Default voice**: `en-US-MichelleNeural` (female, natural) - Neural voices (ending in `Neural`) provide higher quality than Standard voices

More in Speech & Transcription