[002] · SPECIMEN FROM ARCHIVE · FIG. 01 · INTERVIEW TRANSCRIPT · NATURE: MULTIVOICE

Speech, transcribed.

Q: What does transcript.pt actually do?

It converts spoken audio into synchronized written text. A file goes in; out comes a readable transcript with word-level timestamps and speaker labels. No camera, no hands, no copyediting.

Q: Which input formats do you accept?

MP4, MOV, MKV, MP3, WAV, FLAC, M4A, OGG, OPUS, AAC. Plus URLs from S3, Vimeo, YouTube, Google Drive. Video is split into tracks; the extracted audio enters the STT pipeline.

Q: How long does an hour-long file take?

Twenty to thirty seconds on the standard queue. About ninety seconds under load. Pro plan runs on a priority queue regardless of traffic.

Q: Do you support European Portuguese, not only Brazilian?

Yes. pt-PT and pt-BR are separate dialect models. Auto-detect picks by phonetics; force it with --lang=pt-PT.

Q: How do you label speakers?

Diarization in Grok STT v3.1. Default labels A/B/C/... Names are editable from the UI in one click; the export is re-saved automatically.

Q: Are exports watermarked?

On the free tier, a discreet «transcript.pt» line appears in the footer of PDF and DOCX. All paid tiers export clean.

Q: What is your retention policy?

Files stay in your account until you delete them. AWS S3, eu-central-1 region. Stripe EUR billing. Deletion-on-request is final and logged.

Q: Can I get dubbing and lip-sync after the transcription?

Yes. One click in the same session — we translate to 30+ languages and pipe to the lip-sync engine. Dubbing is billed by output-video minutes.

Q: Is there a developer API?

There is. POST /v1/transcribe for synchronous calls; webhooks for async. Sandbox token on Growth, production tier from Pro upward. Documentation at /docs.

Q: Who runs transcript.pt?

SPACEFOX UNIPESSOAL LDA, registered in Fernão Ferro, Portugal. The same team has run doitong.com since 2023. We read hello@transcript.pt and reply in the language we receive.

Twenty-eight hours for one staff stenographer. Two minutes for our pipeline. Audio in, correctly spelled text out, with a timestamp on every word. Thirty languages. One export.

CONSULT MANUAL ↘

[001A] · GUEST SESSION · NO SIGNUP

Drop a file — get a transcript.

Up to 5 minutes per sample. MP3 · WAV · M4A · MP4 · MOV. Auto language detection. No card, no account, no watermark — for this issue only.

or drag here

INPUT interview-2026.04-MD.mp4

LANG EN-US (AUTO)

MODEL grok-stt-v3.1

DURATION 00:34:18

SPK A

[00:00:01.240]

OK. Recording running. Confirm.

SPK B

[00:00:03.880]

Confirmed. For the record — fourteen oh-eight hours.

SPK A

[00:00:08.110]

Noted. Tell me when you first realized the acceleration in the current system was non-linear.

SPK B

[00:00:14.620]

It happened on the second week of calibration. We attributed the drift to thermal noise at first, but —

SPK B

[00:00:21.040]

— by the third iteration it was clear the anomaly had a phase structure.

FIG. 01 · INTERVIEW TRANSCRIPT · NATURE: MULTIVOICE REC ●

[003] · INSTRUMENT SPEC

Datasheet of the instrument.

⊐ ⊏

LANGUAGES

30+

recognition · auto-detect

⊕

SPEED

< 02s / min

cloud pipeline · streaming

◉

ACCURACY

96.4%

WER-clean · LibriSpeech reference

▮▯▮▯

EXPORTS

× 9

SRT · VTT · TXT · JSON · PDF · DOCX · SBV · ASS · CSV

[004] · METHODOLOGY

Four operations from file to transcript.

INTAKE

Upload the specimen

Drag your file in or paste a URL. We accept MP4, MOV, MKV, MP3, WAV, FLAC, M4A, OGG, OPUS, AAC. Up to eight hours per pass.

$tx upload --in=interview.mp4 --lang=auto

PARSE

Distill the speech

Audio is extracted, normalized to −23 LUFS, denoised and segmented. Every phoneme is timestamped to the millisecond.

$tx denoise --target=-23 --segment

SYNC

Assign the speakers

Diarization labels each turn A/B/C. Names are editable in one click. Overlaps are annotated on a dedicated track.

$tx diarize --speakers=auto

ISSUE

Export the artifacts

Nine formats with one command. Defaults: SRT for video, DOCX for journalists, JSON for engineers. Clean, no marks.

$tx export --fmt=srt,vtt,docx,json

[005] · LANGUAGE INDEX

Recognized sources.

Thirty-plus languages. ISO 639-1 codes. Auto-detect on by default; force-language with --lang flag.

en English· pt Portuguese· pt-BR Brazilian PT· es Spanish· fr French· de German· it Italian· ru Russian· uk Ukrainian· pl Polish· nl Dutch· sv Swedish· no Norwegian· da Danish· fi Finnish· tr Turkish· ar Arabic· he Hebrew· hi Hindi· bn Bengali· ja Japanese· ko Korean· zh Mandarin· zh-yue Cantonese· id Indonesian· ms Malay· th Thai· vi Vietnamese· fil Filipino· ca Catalan· el Greek· hu Hungarian· ro Romanian· en English· pt Portuguese· pt-BR Brazilian PT· es Spanish· fr French· de German· it Italian· ru Russian· uk Ukrainian· pl Polish· nl Dutch· sv Swedish· no Norwegian· da Danish· fi Finnish· tr Turkish· ar Arabic· he Hebrew· hi Hindi· bn Bengali· ja Japanese· ko Korean· zh Mandarin· zh-yue Cantonese· id Indonesian· ms Malay· th Thai· vi Vietnamese· fil Filipino· ca Catalan· el Greek· hu Hungarian· ro Romanian·

English en

Portuguese pt

Brazilian PT pt-BR

Spanish es

French fr

German de

Italian it

Russian ru

Ukrainian uk

Polish pl

Dutch nl

Swedish sv

Norwegian no

Danish da

Finnish fi

Turkish tr

Arabic ar

Hebrew he

Hindi hi

Bengali bn

Japanese ja

Korean ko

Mandarin zh

Cantonese zh-yue

Indonesian id

Malay ms

Thai th

Vietnamese vi

Filipino fil

Catalan ca

Greek el

Hungarian hu

Romanian ro

[006] · OUTPUT MANIFEST

Nine export formats.

FORMAT	EXT	DOMAIN	SPEC
SRT	.srt	Subtitles — video players	SubRip · UTF-8 · 1-indexed · CR/LF
VTT	.vtt	HTML5 web players	WebVTT · style blocks optional
TXT	.txt	Clean printout	no timestamps · paragraphs by speaker
JSON	.json	Engineers · pipeline	word-level array · ms precision
PDF	.pdf	Lawyers · archive	A4 · pdf/A-1b · signature optional
DOCX	.docx	Journalists · editors	Office Open XML · speaker styles
SBV	.sbv	Legacy YouTube · upload	YouTube subtitle format
ASS	.ass	Karaoke · stylization	Advanced Substation · layered
CSV	.csv	Analysis · spreadsheets	one row per segment · ; separator

[007] · QUOTATION

Four tariffs. One counter.

All tariffs share one credit currency. One minute of transcription costs 30 credits. EUR billing. NIF invoice for businesses. One-click cancellation.

QUOTATION N.º 0001/2026 · A

FREE

Lab trial

€0/ mo

014,000 credits · one-time
02≈ 130 minutes of transcription
03all 9 export formats
04«transcript.pt» mark in PDF/DOCX
05queue priority: standard

QUOTATION N.º 0001/2026 · B

GROWTH

Solo desk

€27/ mo

€19/mo billed yearly

01142,000 credits · monthly
02≈ 4,700 minutes of transcription
03no watermark
04dubbing in 30+ languages
05API token · sandbox tier

MOST CHOSEN

QUOTATION N.º 0001/2026 · C

PRO

Team

€49/ mo

€34/mo billed yearly

01260,000 credits · monthly
02≈ 8,600 minutes of transcription
03everything in Growth
04API + webhooks · production tier
05priority queue · dedicated support

QUOTATION N.º 0001/2026 · D

MASSIVE

Studio

€120/ mo

€84/mo billed yearly

01600,000 credits · monthly
02≈ 20,000 minutes of transcription
03everything in Pro
04extended commercial license
05dedicated session with engineer

[008] · REFERENCE TABLE

Benchmarks against the field.

Numbers pulled from public provider pages and our own measurements on the Common Voice corpus. Verify yourself — every row is open to scrutiny.

METRIC	TRANSCRIPT.PT	OTTER.AI	REV.COM	DESCRIPT
WER (avg)	3.6%	6.1%	4.0% (human)	5.4%
Languages	30+	< 7	~ 35 (EN AI only)	~ 23
Max file	8 h	4 h	5 h	10 h
Word timestamps	Yes · ms	Yes · ms	Pro tier only	Yes · ms
Spike throughput	90 min/min	~ 10 min/min	human — N/A	~ 30 min/min
EUR billing	Yes · with NIF	USD only	USD only	USD only
No watermark	Paid tiers	Paid tiers	Yes	Paid tiers
Free tier	Yes · 130 min	Yes · 300 min/mo	None	Yes · 60 min/mo

[009] · QUESTIONS DOSSIER

Ten frequently asked, by incidence.

Q01 / A01

What does transcript.pt actually do?

It converts spoken audio into synchronized written text. A file goes in; out comes a readable transcript with word-level timestamps and speaker labels. No camera, no hands, no copyediting.

Q02 / A02

Which input formats do you accept?

MP4, MOV, MKV, MP3, WAV, FLAC, M4A, OGG, OPUS, AAC. Plus URLs from S3, Vimeo, YouTube, Google Drive. Video is split into tracks; the extracted audio enters the STT pipeline.

Q03 / A03

How long does an hour-long file take?

Twenty to thirty seconds on the standard queue. About ninety seconds under load. Pro plan runs on a priority queue regardless of traffic.

Q04 / A04

Do you support European Portuguese, not only Brazilian?

Yes. pt-PT and pt-BR are separate dialect models. Auto-detect picks by phonetics; force it with --lang=pt-PT.

Q05 / A05

How do you label speakers?

Diarization in Grok STT v3.1. Default labels A/B/C/... Names are editable from the UI in one click; the export is re-saved automatically.

Q06 / A06

Are exports watermarked?

On the free tier, a discreet «transcript.pt» line appears in the footer of PDF and DOCX. All paid tiers export clean.

Q07 / A07

What is your retention policy?

Files stay in your account until you delete them. AWS S3, eu-central-1 region. Stripe EUR billing. Deletion-on-request is final and logged.

Q08 / A08

Can I get dubbing and lip-sync after the transcription?

Yes. One click in the same session — we translate to 30+ languages and pipe to the lip-sync engine. Dubbing is billed by output-video minutes.

Q09 / A09

Is there a developer API?

There is. POST /v1/transcribe for synchronous calls; webhooks for async. Sandbox token on Growth, production tier from Pro upward. Documentation at /docs.

Q10 / A10

Who runs transcript.pt?

SPACEFOX UNIPESSOAL LDA, registered in Fernão Ferro, Portugal. The same team has run doitong.com since 2023. We read hello@transcript.pt and reply in the language we receive.

[010] · FIELD REPORTS

Five entries from real work.

Sources verified. Quotes lightly edited for length. None of the people below were compensated for their words.

REPORT N.º 0142 2026-04-12

Ana C. · Investigative journalist · Lisbon

— Six hours of source recording. Transcribed in eleven minutes, speakers labeled correctly. I tagged the names and went straight into the text. What used to take two nights took a lunch break.

REPORT N.º 0156 2026-04-19

Paulo M. · Law student · Coimbra

— Parliamentary hearing — four and a half hours. Transcript with a timestamp on every word. Quoted in my thesis to the millisecond. My supervisor asked how — I didn't say.

REPORT N.º 0181 2026-04-23

Mariana R. · Podcaster · Porto

— I publish episodes weekly. The transcript becomes the show notes; I edit in DOCX, save back — captions re-sync themselves. A six-hour workflow shrunk to forty-five minutes.

REPORT N.º 0204 2026-04-25

João D. · History teacher · Évora

— I've been recording lectures on my phone for a year and a half. Uploaded them in series — got a word-searchable archive back. Students prepare faster now. So do I.

REPORT N.º 0218 2026-04-28

Sofia T. · Lawyer · Braga

— Used for witness depositions. Signed PDF/A-1b drops straight into the case file. Courts accept without challenge. Cost per case is below half an hour of paralegal time.

[011] · CALL

Open the lab.

A starter pack lands in your account on signup. No card. No commitment. The cancel button sits one click away from the dashboard.