API Documentation

DevelopersDocs

Generate ultra-realistic audio from text. This endpoint supports multiple models, voices, and advanced features like word-level timestamps.

Endpoint

POST https://sonna.web.id/api/v1/audio/speech

Request Parameters

Parameter	Type	Required	Description
`input`	string	Yes	The text to generate audio for. Max 4096 characters.
`voice`	string	Yes	The ID of the voice to use (e.g., `amanda`, `pale-perkasa`).
`model`	string	No	The model to use. Default is `tts-1`.
`speed`	float	No	Speed of speech from `0.5` to `4.0`. Default is `1.0`.
`language`	string	No	Language hint (e.g., `id`, `en-us`). Default is `en-us`.
`word_timestamps`	boolean	No	If `true`, returns word-level timing data. Default is `false`.

Response

The API returns a JSON object with a link to the generated audio file hosted on our ultra-fast CDN.

Standard Response

json

{
  "url": "https://cdn.sonna.web.id/api-generated/user_id/audio_id.mp3",
  "cache": "MISS"
}

Response with Timestamps

If word_timestamps is set to true:

json

{
  "url": "https://cdn.sonna.web.id/api-generated/user_id/audio_id.mp3",
  "word_timestamps": [
    { "word": "Hello", "start": 0.1, "end": 0.4 },
    { "word": "World", "start": 0.5, "end": 0.9 }
  ],
  "cache": "MISS"
}

Credits Consumption

Each character in the input (including spaces) consumes 1 character credit. If you have insufficient credits, the API returns a 402 Payment Required error.

NextAuthentication