News listLove the Tesla sound? xAI officially launches Grok voice API, TTS at $4.2 per million characters, outperforming ElevenLabs in recognition rate
動區 BlockTempo2026-04-19 03:39:41

Love the Tesla sound? xAI officially launches Grok voice API, TTS at $4.2 per million characters, outperforming ElevenLabs in recognition rate

ORIGINAL喜歡特斯拉聲音?xAI 正式開放 Grok 語音 API,TTS 每百萬字元 4.2 美元、辨識率擊敗 ElevenLabs
AI Impact AnalysisGrok analyzing...
📄Full Article· Automatically extracted by trafilaturaGemini 翻譯1397 words
xAI officially launched its standalone Grok Speech-to-Text (STT) and Text-to-Speech (TTS) APIs this week. This technology stack is already operational within Grok Voice, Tesla vehicles, and Starlink customer support systems. The STT service is priced at $0.10 per hour for batch processing and $0.20 per hour for streaming, with support for over 25 languages. (Previous coverage: Grok 4.3 beta released for Heavy subscribers! Musk: True flagship version training to be completed in 5 days) (Background: Google launches Gemini 3.1 Flash TTS: Audio tags for more vivid AI voiceovers, supporting 70+ languages, available for free on Google AI Studio) On the 17th, xAI officially announced the launch of standalone Grok Speech-to-Text (STT) and Text-to-Speech (TTS) APIs, allowing external developers to directly access the voice infrastructure already powering xAI products. The voice technology that enables Tesla vehicles to speak and Starlink customer service to respond to users is now available via API. According to official documentation, the Grok STT API offers two access modes: batch processing via REST API and low-latency real-time streaming via WebSocket API. In terms of pricing, batch processing is $0.10 per hour and streaming is $0.20 per hour. The company states that this pricing offers a significant advantage over mainstream competitors such as ElevenLabs and Deepgram. In terms of functionality, Grok STT supports over 25 languages and features word-level timestamps, speaker diarization, multi-channel audio, and intelligent inverse text normalization. It is suitable for enterprise scenarios requiring high precision, such as meeting transcription, legal and medical records, and customer service call logs. In entity recognition benchmarks, Grok STT demonstrated superior performance. When identifying key entities such as names, account numbers, and dates in phone calls, Grok STT achieved an error rate of 5.0%, compared to 12.0% for ElevenLabs, 13.5% for Deepgram, and 21.3% for AssemblyAI. The Grok TTS API offers five distinct voice options: Ara (female, warm and friendly), Eve (female, lively and energetic), Leo (male, authoritative and powerful), Rex (male, confident and clear), and Sal (neutral, smooth and balanced). The API automatically detects the input language, natively supports over 20 languages, and allows pronunciation control via BCP-47 language codes. Audio output formats include MP3, WAV, PCM (Linear16), G.711 μ-law, and G.711 A-law. The latter two are common telephony codecs, signaling xAI's strategic focus on telecommunications integration. A key feature of the TTS API is "voice tagging," which allows developers to embed instructions within text to precisely control pauses, laughter, whispers, emphasis, speech rate, and pitch, making synthesized speech closer to natural human expression. The pricing is $4.20 per million characters. xAI emphasized that these two APIs are not newly developed technologies but the same infrastructure already running in Grok Voice, Tesla vehicle voice interactions, and Starlink customer support systems. This infrastructure first debuted in late 2025 as the Grok Voice Agent API, which provided real-time voice conversational agent capabilities. It ranked first in the Big Bench Audio benchmark, achieving a time-to-first-audio response of under 1 second, approximately 5 times faster than its nearest competitors. The launch of these standalone STT and TTS endpoints effectively unbundles the individual components of this integrated voice pipeline, allowing developers to combine them according to their specific needs.
Data Status✓ Full text extractedRead Original (動區 BlockTempo)
🔍Historical Similar Events· Keyword + Asset Matching6 items
💡 Currently matching via keywords + symbols (MVP) · Will be upgraded to embedding semantic search later
Raw Information
ID:f3ce4b6237
Source:動區 BlockTempo
Published:2026-04-19 03:39:41
Category:zh_news · Export Category zh
Symbols:Unspecified
Community Votes:+0 /0 · ⭐ 0 Important · 💬 0 Comments