Word-level timestamps in milliseconds

The fastest, developer-first WhatsApp Audio API.

Convert raw .ogg voice notes into structured JSON with precise word-level timing, clean API responses, and prepaid billing valid for 3, 6, 9, or 12 months depending on the plan.

600 seconds included
No subscriptions
Simple checkout
600s Start with 600 seconds before you upgrade.

Audiex Pro overview

Clean API, predictable billing, production-ready output

Live

Input

Raw WhatsApp .ogg audio, posted as multipart/form-data or uploaded from your app.

Output

Structured JSON with transcript text, segments, and word-level timestamps in milliseconds.

Billing

Pay once, consume credits only when you transcribe, and keep the balance active for the plan's validity window.

Access

Account access stays tied to Google sign-in for now.

Response integrity Milliseconds preserved
text "Let's ship the release after lunch." words 26 aligned tokens with start_ms and end_ms.

Built for

WhatsApp AI Agents, CRM Automation, Customer Support, Voice Forms, Lead Qualification, and n8n Workflows.

Turn voice notes into structured data for support tickets, lead capture, and workflow automations without manually parsing audio.

600 seconds included
WhatsApp AI Agents
CRM Automation
Customer Support
Voice Forms
Lead Qualification
n8n Workflows

Live Playground

Ship against a response your backend can trust.

One request in, structured JSON out. The layout below mirrors the requests and responses your production integration will use.

Request

Common SDKs and automation patterns

POST /v1/transcribe

cURL request with auto language detection

curl -X POST \
  "https://audiex.pro/v1/transcribe" \
  -H "Authorization: Bearer sk_live_xxx" \
  -F "file=@whatsapp-voice-note.ogg" \
  -F "language=auto"

Authorization

Bearer token

Body

multipart/form-data

Audio

.ogg from WhatsApp

Response

Structured JSON with timestamps in milliseconds and remaining seconds

200 OK
{
  "text": "Let's ship the release.",
  "language": "en",
  "duration_ms": 1150,
  "billed_seconds": 2,
  "remaining_seconds": 17842,
  "segments": [
    { "text": "Let's ship the release.", "start_ms": 0, "end_ms": 1150 }
  ],
  "words": [
    { "word": "Let's", "start_ms": 220, "end_ms": 460 },
    { "word": "ship", "start_ms": 468, "end_ms": 690 },
    { "word": "release.", "start_ms": 792, "end_ms": 1130 }
  ]
}

Pricing

Prepaid credits built for teams that want predictable spend.

Pay-as-you-go, credits stay valid for 3, 6, 9, or 12 months by plan, and there are no subscriptions hiding in the fine print.

Prepaid only
By purchasing a plan, you agree to our Terms of Service and Refund Policy.

Starter

$7.99

18,000 seconds included (5 hours)

For product validation, small integrations, and early experiments.

Credits valid for 3 months.
Buy Starter
Most popular

Growth

$19.99

54,000 seconds included (15 hours)

Ideal for a real feature launch, active users, and weekly processing.

Credits valid for 6 months.
Buy Growth

Agency

$49.99

162,000 seconds included (45 hours)

Built for client work, recurring operations, and multiple projects.

Credits valid for 9 months.
Buy Agency

Enterprise

$99.99

360,000 seconds included (100 hours)

For high-volume teams that want room to grow without re-buying each month.

Credits valid for 12 months.
Buy Enterprise

API Documentation

A single endpoint designed to feel boring in the best way.

POST /v1/transcribe accepts multipart audio, checks your token, and returns structured data ready for product use.

Request overview

Headers
Authorization: Bearer sk_live_..., Content-Type is set automatically by multipart/form-data clients.
Body
file (required): .ogg voice note from WhatsApp. language (optional, default auto): BCP-47 code such as en, es, pt-BR.
Response
Transcript text, language, duration, segments, and words with timestamps in milliseconds.
Performance recommendation: if your bot will always receive one language, pin it in the request. For example, use language=es or language=ja. That skips auto-detection, can reduce latency by a few milliseconds, and avoids unnecessary inference work.

Standard errors

Predictable failures
401

Unauthorized

Missing or invalid bearer token.

Auth issue

402

Payment required

No credits left or billing balance is exhausted.

Billing issue

400

Bad request

Empty payload, unreadable audio, or malformed form data.

Payload issue

Example response shape

Transcript text, language, duration, remaining balance, and segments or words with timestamps in milliseconds.

{
  "text": "Let's ship the release.",
  "language": "en",
  "duration_ms": 1150,
  "billed_seconds": 2,
  "remaining_seconds": 17842,
  "segments": [
    { "text": "Let's ship the release.", "start_ms": 0, "end_ms": 1150 }
  ],
  "words": [
    { "word": "Let's", "start_ms": 220, "end_ms": 460 },
    { "word": "ship", "start_ms": 468, "end_ms": 690 },
    { "word": "release.", "start_ms": 792, "end_ms": 1130 }
  ]
}

Account access

Google sign-in will unlock account access.

The product already supports a clean API-first flow, and the social login layer can be enabled without changing the billing model.

1

Connect your identity

Use Google once sign-in is enabled on the account portal.

2

Claim trial access

The 10-minute trial stays tied to a real account instead of disposable email addresses.

3

Manage billing and usage

One account keeps credits, invoices, and API access in the same place.

Disposable and temporary emails stay blocked from the public onboarding flow.

Account sign-in

Use Google to link your account and keep access in one place.

Terms of Service

Audiex Pro is a prepaid transcription service. Usage is measured against processed audio duration, credits are deducted when a request is accepted and processed, and unused balances expire after 3, 6, 9, or 12 months depending on the plan purchased.

Refund Policy

If the service does not meet your expectations, we offer a 14-day refund for unused balance. Requests are reviewed against account activity and the remaining credit balance.