Speech, transcribed.
Twenty-eight hours for one staff stenographer. Two minutes for our pipeline. Audio in, correctly spelled text out, with a timestamp on every word. Thirty languages. One export.
Drop a file — get a transcript.
Up to 5 minutes per sample. MP3 · WAV · M4A · MP4 · MOV. Auto language detection. No card, no account, no watermark — for this issue only.
Datasheet of the instrument.
Four operations from file to transcript.
Drag your file in or paste a URL. We accept MP4, MOV, MKV, MP3, WAV, FLAC, M4A, OGG, OPUS, AAC. Up to eight hours per pass.
Audio is extracted, normalized to −23 LUFS, denoised and segmented. Every phoneme is timestamped to the millisecond.
Diarization labels each turn A/B/C. Names are editable in one click. Overlaps are annotated on a dedicated track.
Nine formats with one command. Defaults: SRT for video, DOCX for journalists, JSON for engineers. Clean, no marks.
Recognized sources.
Nine export formats.
| FORMAT | EXT | DOMAIN | SPEC |
|---|---|---|---|
| SRT | .srt | Subtitles — video players | SubRip · UTF-8 · 1-indexed · CR/LF |
| VTT | .vtt | HTML5 web players | WebVTT · style blocks optional |
| TXT | .txt | Clean printout | no timestamps · paragraphs by speaker |
| JSON | .json | Engineers · pipeline | word-level array · ms precision |
| Lawyers · archive | A4 · pdf/A-1b · signature optional | ||
| DOCX | .docx | Journalists · editors | Office Open XML · speaker styles |
| SBV | .sbv | Legacy YouTube · upload | YouTube subtitle format |
| ASS | .ass | Karaoke · stylization | Advanced Substation · layered |
| CSV | .csv | Analysis · spreadsheets | one row per segment · ; separator |
Four tariffs. One counter.
- 014,000 credits · one-time
- 02≈ 130 minutes of transcription
- 03all 9 export formats
- 04«transcript.pt» mark in PDF/DOCX
- 05queue priority: standard
- 01142,000 credits · monthly
- 02≈ 4,700 minutes of transcription
- 03no watermark
- 04dubbing in 30+ languages
- 05API token · sandbox tier
- 01260,000 credits · monthly
- 02≈ 8,600 minutes of transcription
- 03everything in Growth
- 04API + webhooks · production tier
- 05priority queue · dedicated support
- 01600,000 credits · monthly
- 02≈ 20,000 minutes of transcription
- 03everything in Pro
- 04extended commercial license
- 05dedicated session with engineer
Benchmarks against the field.
| METRIC | TRANSCRIPT.PT | OTTER.AI | REV.COM | DESCRIPT |
|---|---|---|---|---|
| WER (avg) | 3.6% | 6.1% | 4.0% (human) | 5.4% |
| Languages | 30+ | < 7 | ~ 35 (EN AI only) | ~ 23 |
| Max file | 8 h | 4 h | 5 h | 10 h |
| Word timestamps | Yes · ms | Yes · ms | Pro tier only | Yes · ms |
| Spike throughput | 90 min/min | ~ 10 min/min | human — N/A | ~ 30 min/min |
| EUR billing | Yes · with NIF | USD only | USD only | USD only |
| No watermark | Paid tiers | Paid tiers | Yes | Paid tiers |
| Free tier | Yes · 130 min | Yes · 300 min/mo | None | Yes · 60 min/mo |
Ten frequently asked, by incidence.
What does transcript.pt actually do?
It converts spoken audio into synchronized written text. A file goes in; out comes a readable transcript with word-level timestamps and speaker labels. No camera, no hands, no copyediting.
Which input formats do you accept?
MP4, MOV, MKV, MP3, WAV, FLAC, M4A, OGG, OPUS, AAC. Plus URLs from S3, Vimeo, YouTube, Google Drive. Video is split into tracks; the extracted audio enters the STT pipeline.
How long does an hour-long file take?
Twenty to thirty seconds on the standard queue. About ninety seconds under load. Pro plan runs on a priority queue regardless of traffic.
Do you support European Portuguese, not only Brazilian?
Yes. pt-PT and pt-BR are separate dialect models. Auto-detect picks by phonetics; force it with --lang=pt-PT.
How do you label speakers?
Diarization in Grok STT v3.1. Default labels A/B/C/... Names are editable from the UI in one click; the export is re-saved automatically.
Are exports watermarked?
On the free tier, a discreet «transcript.pt» line appears in the footer of PDF and DOCX. All paid tiers export clean.
What is your retention policy?
Files stay in your account until you delete them. AWS S3, eu-central-1 region. Stripe EUR billing. Deletion-on-request is final and logged.
Can I get dubbing and lip-sync after the transcription?
Yes. One click in the same session — we translate to 30+ languages and pipe to the lip-sync engine. Dubbing is billed by output-video minutes.
Is there a developer API?
There is. POST /v1/transcribe for synchronous calls; webhooks for async. Sandbox token on Growth, production tier from Pro upward. Documentation at /docs.
Who runs transcript.pt?
SPACEFOX UNIPESSOAL LDA, registered in Fernão Ferro, Portugal. The same team has run doitong.com since 2023. We read hello@transcript.pt and reply in the language we receive.
Five entries from real work.
— Six hours of source recording. Transcribed in eleven minutes, speakers labeled correctly. I tagged the names and went straight into the text. What used to take two nights took a lunch break.
— Parliamentary hearing — four and a half hours. Transcript with a timestamp on every word. Quoted in my thesis to the millisecond. My supervisor asked how — I didn't say.
— I publish episodes weekly. The transcript becomes the show notes; I edit in DOCX, save back — captions re-sync themselves. A six-hour workflow shrunk to forty-five minutes.
— I've been recording lectures on my phone for a year and a half. Uploaded them in series — got a word-searchable archive back. Students prepare faster now. So do I.
— Used for witness depositions. Signed PDF/A-1b drops straight into the case file. Courts accept without challenge. Cost per case is below half an hour of paralegal time.
Open the lab.
A starter pack lands in your account on signup. No card. No commitment. The cancel button sits one click away from the dashboard.