Generate ultra-realistic audio from text. This endpoint supports multiple models, voices, and advanced features like word-level timestamps.

Endpoint

POST https://sonna.web.id/api/v1/audio/speech

Request Parameters

ParameterTypeRequiredDescription
inputstringYesThe text to generate audio for. Max 4096 characters.
voicestringYesThe ID of the voice to use (e.g., amanda, pale-perkasa).
modelstringNoThe model to use. Default is tts-1.
speedfloatNoSpeed of speech from 0.5 to 4.0. Default is 1.0.
languagestringNoLanguage hint (e.g., id, en-us). Default is en-us.
word_timestampsbooleanNoIf true, returns word-level timing data. Default is false.

Response

The API returns a JSON object with a link to the generated audio file hosted on our ultra-fast CDN.

Standard Response

json
{
  "url": "https://cdn.sonna.web.id/api-generated/user_id/audio_id.mp3",
  "cache": "MISS"
}

Response with Timestamps

If word_timestamps is set to true:
json
{
  "url": "https://cdn.sonna.web.id/api-generated/user_id/audio_id.mp3",
  "word_timestamps": [
    { "word": "Hello", "start": 0.1, "end": 0.4 },
    { "word": "World", "start": 0.5, "end": 0.9 }
  ],
  "cache": "MISS"
}

Credits Consumption

Each character in the input (including spaces) consumes 1 character credit. If you have insufficient credits, the API returns a 402 Payment Required error.