Video Transcription API

Submit a video or audio file and receive a word-level transcript with speaker diarization (each segment labelled with the detected speaker). The API supports videos from major platforms and locally uploaded video/audio files, along with language settings.

Supported Video Sources

The API accepts URLs from the following platforms: YouTube, Vimeo, Dailymotion, Kick, Twitch, TikTok, Facebook, Zoom, Rumble and more.

You can also transcribe local audio or video files you upload — local upload requires a Standard plan or above.

Workflow

Submit a transcription task from a video URL
Poll the results until status is SUCCEEDED

Submit Transcription Task

Submit a new transcription task from a video or audio URL.

POST https://wayinvideo-api.wayin.ai/api/v2/transcripts

Request Body

Parameter	Type	Required	Default	Description
`video_url`	string	Yes	—	The source video/audio URL or uploaded file identifier
`source_lang`	string	No	`null`	Source language of the video (see Supported Languages). When `null`, the system auto-detects the original language.
`target_lang`	string	No	`null`	Target language for the transcript (see Supported Languages). When `null`, no translation is applied. If `target_lang` differs from the video's original language, the transcript will be automatically translated into the target language.

Example Request

curl -X POST https://wayinvideo-api.wayin.ai/api/v2/transcripts \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "x-wayinvideo-api-version: v2" \
  -d '{
    "video_url": "https://www.youtube.com/watch?v=example",
    "target_lang": "en"
  }'

import requests

requests.post(
    "https://wayinvideo-api.wayin.ai/api/v2/transcripts",
    headers={
        "Authorization": "Bearer YOUR_API_KEY",
        "x-wayinvideo-api-version": "v2",
    },
    json={
        "video_url": "https://www.youtube.com/watch?v=example",
        "target_lang": "en",
    },
)

await fetch("https://wayinvideo-api.wayin.ai/api/v2/transcripts", {
  method: "POST",
  headers: {
    Authorization: "Bearer YOUR_API_KEY",
    "x-wayinvideo-api-version": "v2",
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    video_url: "https://www.youtube.com/watch?v=example",
    target_lang: "en",
  }),
});

Response

{
  "data": {
    "id": "trans_proj_001",
    "name": "sample project name",
    "status": "CREATED"
  }
}

Field	Type	Description
`id`	string	Task identifier (used to retrieve results)
`name`	string	Task name
`status`	string	`CREATED`, `QUEUED`, `ONGOING`, `SUCCEEDED`, `FAILED`

Examples

Common transcription scenarios. Replace YOUR_API_KEY with a key from the API Dashboard.

Transcribe a YouTube video

curl -X POST https://wayinvideo-api.wayin.ai/api/v2/transcripts \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "x-wayinvideo-api-version: v2" \
  -H "Content-Type: application/json" \
  -d '{"video_url": "https://www.youtube.com/watch?v=EXAMPLE"}'

Transcribe a podcast with multiple speakers

Pass any podcast audio URL (or an uploaded file identifier) — speaker diarization is automatic; each segment in the response carries a speaker label.

curl -X POST https://wayinvideo-api.wayin.ai/api/v2/transcripts \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "x-wayinvideo-api-version: v2" \
  -H "Content-Type: application/json" \
  -d '{"video_url": "https://www.youtube.com/watch?v=EXAMPLE"}'

Translate a non-English transcript to English

Set target_lang to translate on the fly. Combine with source_lang if you already know the source.

curl -X POST https://wayinvideo-api.wayin.ai/api/v2/transcripts \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "x-wayinvideo-api-version: v2" \
  -H "Content-Type: application/json" \
  -d '{
    "video_url": "https://www.youtube.com/watch?v=EXAMPLE",
    "source_lang": "ja",
    "target_lang": "en"
  }'

Get Transcription Results

Retrieve the transcript with word-level timestamps and speaker labels. Poll until status is SUCCEEDED.

GET https://wayinvideo-api.wayin.ai/api/v2/transcripts/results/{id}

Path Parameters

Parameter	Type	Required	Description
`id`	string	Yes	The task ID returned by the submit endpoint

Example Request

curl -X GET https://wayinvideo-api.wayin.ai/api/v2/transcripts/results/trans_proj_001 \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "x-wayinvideo-api-version: v2"

Response

{
  "data": {
    "status": "SUCCEEDED",
    "cost_usage": 27.0,
    "transcript": [
      {
        "text": "Welcome to today's presentation",
        "language": null,
        "start": 200,
        "end": 4500,
        "speaker": "Speaker 1"
      },
      {
        "text": "Thanks for coming",
        "language": null,
        "start": 5000,
        "end": 8200,
        "speaker": "Speaker 2"
      }
    ]
  }
}

Response Fields

Field	Type	Description
`status`	string	`CREATED`, `QUEUED`, `ONGOING`, `SUCCEEDED`, `FAILED`
`error_message`	string	Error reason (only present when `status` is `FAILED`)
`cost_usage`	number	API units consumed for this request
`transcript`	array	List of transcript segments (see below)

Transcript Segment

Field	Type	Description
`text`	string	Transcribed text
`language`	string \| null	Detected language code, or `null` if not detected
`start`	integer	Start time in milliseconds
`end`	integer	End time in milliseconds
`speaker`	string	Speaker label (e.g. `"Speaker 1"`)

FAQ

What is the maximum video length?

There is no hard length limit. The API supports both short clips and long-form video or audio content across supported source platforms.

Does the API return word-level timestamps?

Yes. Each transcript segment includes start and end timestamps in milliseconds, the transcribed text, the detected language, and the assigned speaker label from speaker diarization.

How does speaker diarization work?

Speakers are auto-detected and labelled (Speaker 1, Speaker 2, …) per segment. No configuration is required — diarization runs on every transcription task.

Which audio and video formats are supported?

Source URLs are supported from YouTube, Vimeo, Dailymotion, Kick, Twitch, TikTok, Facebook, Zoom, Rumble, and more. For local uploads, send mp4, mov, webm, or avi (audio-only files can be muxed into one of these containers).

Can I translate the transcript into another language?

Yes — pass the target_lang parameter. The transcript is translated when target_lang differs from the source language. See Supported Languages for the full list of language codes.