Skip to content

Split Translate and Speech into separate stages with TTS word timestamps#268

Merged
nicpottier merged 5 commits intomainfrom
elasticsounds/split-translate-speech
Apr 8, 2026
Merged

Split Translate and Speech into separate stages with TTS word timestamps#268
nicpottier merged 5 commits intomainfrom
elasticsounds/split-translate-speech

Conversation

@elasticsounds
Copy link
Copy Markdown
Contributor

Summary

  • Split Translate/Speech stages: The monolithic Translate stage is now two independent pipeline stages (Translate and Speech) with separate DAG nodes, settings panels, and run controls
  • Word-level timestamps: Whisper-based word timestamp generation with inline playback highlighting, editable multi-column timecode tables with custom TimecodeInput controls, and background task queue with progress tracking
  • Whisper accuracy: Source text passed as prompt parameter to Whisper API for better transcription alignment
  • Language picker: Country lists updated to show only linguistically relevant countries when browsing; all countries still searchable by typing
  • UI polish: Visual separator between run card controls and config sections; task progress messages displayed in sidebar

Test plan

  • Verify Translate and Speech stages appear independently in the sidebar and can be run separately
  • Run Speech generation, then calculate timestamps — confirm background task shows progress (e.g. 42/632)
  • Play audio and verify word-by-word highlighting syncs with playback
  • Expand timestamp viewer, edit timecodes, save — confirm edits persist
  • Re-run translations and verify pipeline completes without errors
  • Open language picker and confirm only relevant countries appear per language

…estamps and word highlighting

Separate the monolithic Translate stage into distinct Translate and Speech stages
with independent settings, DAG dependencies, and UI views. Add Whisper-based word
timestamp generation with inline playback highlighting, editable multi-column
timecode tables, and background task queue for batch transcription. Improve language
picker to show only linguistically relevant countries, add visual separation to
stage run cards, and pass source text as Whisper prompt for better accuracy.
Write timestamps incrementally during batch transcription instead of
accumulating in a stale snapshot, preventing concurrent user edits from
being silently overwritten. Show human-readable language name (e.g.
"English") instead of locale code in the generate timestamps confirmation.
Fixes CI lint failure where `inputClass` (a Tailwind class string variable)
was flagged as an unlocalized string.
@gbergengruen
Copy link
Copy Markdown
Contributor

@elasticsounds, I saw that the language and speech steps are independent now, but you need to use the language step to run the speech step, as it seems to do some required step to generate the tts. The timestamps are amazing. I loved it.

tts-timestamps node data was not included in clearNodesByType calls
alongside tts, leaving stale word-timestamp data after speech deletion
or upstream page/caption edits.
@nicpottier nicpottier merged commit f9055bd into main Apr 8, 2026
3 checks passed
@nicpottier nicpottier deleted the elasticsounds/split-translate-speech branch April 8, 2026 16:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants