BlogTutorial

Ask Questions in :Scira AIChatGPTClaude

Video content is dominant right now, but here is a statistic that matters: over 80% of social media videos are watched on mute. If you aren't burning subtitles into your videos, you are losing the vast majority of your audience.

If you have researched this topic—perhaps scrolling through Reddit threads on r/ffmpeg or r/node—you likely ran into the same common pain points:

  1. FFmpeg is complex: "Filter complex", "stream mapping", and syntax errors are nightmares.
  2. Quality Loss: Hardcoding subtitles requires re-encoding the video. If you don't know what CRF or bitrate to set, your crystal clear 4K video turns into a blocky mess.
  3. Styling Limitations: Standard SRT files are ugly. Making them look like "TikTok style" captions requires advanced libass configs or custom drawing filters.

In this guide, we will look at two ways to solve this: the Automated API method (using Orshot) and the Manual Node.js method (building it yourself).


Method 1: The Automated Way (Orshot API)

If you want to skip the engineering headaches of server-side FFmpeg, GPUs, and file storage, Orshot handles the entire pipeline in a single API request.

Orshot's Video Generation API automatically:

  • Extracts audio from your video.
  • Transcribes it using high-quality AI.
  • Generates the subtitles with perfect sync.
  • Burns them in with custom styling (colors, fonts, backgrounds).
  • Returns the final .mp4.

Example: Auto-Generate Subtitles

You simply pass your video URL and styling preferences. You can even use Google Fonts instantly.

const response = await fetch("https://api.orshot.com/v1/studio/render", {
  method: "POST",
  headers: {
    "Content-Type": "application/json",
    Authorization: "Bearer YOUR_API_KEY",
  },
  body: JSON.stringify({
    templateId: "your-video-template",
    response: { format: "mp4" },
    videoOptions: {
      // The magic happens here
      subtitleSource: "https://your-bucket.com/video.mp4",
      subtitleColor: "#FFFFFF",
      subtitleBackground: "rgba(0,0,0,0.6)", // Semi-transparent box
      subtitleFontFamily: "Inter", // Auto-loads Google Font
      subtitleFontSize: "32px",
      subtitleBottom: "80px", // Position from bottom
    },
  }),
});

That's it. You get a finished URL back. No servers to manage, no fluent-ffmpeg callbacks to debug.


Method 2: The DIY Way (Node.js + OpenAI Whisper)

If you prefer to build this pipeline yourself, we will use Node.js, fluent-ffmpeg, and OpenAI's Whisper API.

Prerequisites

Step 1: Install Dependencies

npm install fluent-ffmpeg openai fs

Step 2: Extract Audio

OpenAI's Whisper API accepts audio files. First, we need to strip the audio track from your video to reduce upload size and processing time.

const ffmpeg = require("fluent-ffmpeg");

function extractAudio(videoPath, audioPath) {
  return new Promise((resolve, reject) => {
    ffmpeg(videoPath)
      .output(audioPath)
      .noVideo()
      .audioCodec("libmp3lame")
      .on("end", resolve)
      .on("error", reject)
      .run();
  });
}

Step 3: Transcribe with OpenAI Whisper

We upload the audio to OpenAI. The pricing is $0.006 per minute. That means a 10-minute video costs about $0.06 to transcribe.

const OpenAI = require("openai");
const fs = require("fs");

const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

async function transcribe(audioPath) {
  const transcription = await openai.audio.transcriptions.create({
    file: fs.createReadStream(audioPath),
    model: "whisper-1",
    response_format: "srt", // Request SRT format directly
  });
  return transcription; // Returns standard SRT string
}

Step 4: Burn Subtitles into Video (Hardcoding)

This is the tricky part. We need to take that SRT string, save it to a file, and use FFmpeg's subtitles filter to burn it in.

Critical Note on Quality: By default, FFmpeg might re-encode your video at a low bitrate. To preserve quality, we use the -crf (Constant Rate Factor) flag. Lower is better quality. A CRF of 23 is standard, 18 is nearly visually lossless.

function burnSubtitles(videoPath, subtitlePath, outputPath) {
  return new Promise((resolve, reject) => {
    ffmpeg(videoPath)
      .outputOptions([
        "-vf subtitles=" + subtitlePath, // The complex filter
        "-crf 23", // Maintain visual quality
        "-preset fast", // Encoding speed
      ])
      .save(outputPath)
      .on("end", () => console.log("Subtitles burned successfully!"))
      .on("error", (err) => console.error("Error burning subs:", err));
  });
}

The Full Pipeline

Putting it all together, here is a complete script you can run. This handles audio extraction, transcription, and subtitle burning in one flow.

/* generate-subs.js */
const ffmpeg = require("fluent-ffmpeg");
const OpenAI = require("openai");
const fs = require("fs");

const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

async function extractAudio(videoPath, audioPath) {
  console.log("🔊 Extracting audio...");
  return new Promise((resolve, reject) => {
    ffmpeg(videoPath)
      .output(audioPath)
      .noVideo()
      .audioCodec("libmp3lame")
      .on("end", resolve)
      .on("error", reject)
      .run();
  });
}

async function transcribe(audioPath) {
  console.log("🎙️ Transcribing with OpenAI Whisper...");
  const transcription = await openai.audio.transcriptions.create({
    file: fs.createReadStream(audioPath),
    model: "whisper-1",
    response_format: "srt",
  });
  return transcription;
}

async function burnSubtitles(videoPath, subtitlePath, outputPath) {
  console.log("🔥 Burning subtitles (this may take a while)...");
  return new Promise((resolve, reject) => {
    ffmpeg(videoPath)
      .outputOptions([
        `-vf subtitles=${subtitlePath}`,
        "-crf 23", // Balanced quality
        "-preset fast", // Faster encoding
      ])
      .save(outputPath)
      .on("end", resolve)
      .on("error", reject);
  });
}

(async () => {
  try {
    const INPUT_VIDEO = "input.mp4";
    const TEMP_AUDIO = "temp_audio.mp3";
    const TEMP_SRT = "temp_subs.srt";
    const OUTPUT_VIDEO = "output.mp4";

    await extractAudio(INPUT_VIDEO, TEMP_AUDIO);

    const srtData = await transcribe(TEMP_AUDIO);
    fs.writeFileSync(TEMP_SRT, srtData);

    await burnSubtitles(INPUT_VIDEO, TEMP_SRT, OUTPUT_VIDEO);

    console.log("✅ Done! saved to", OUTPUT_VIDEO);

    // Cleanup temp files
    fs.unlinkSync(TEMP_AUDIO);
    fs.unlinkSync(TEMP_SRT);
  } catch (error) {
    console.error("❌ Error:", error);
  }
})();

Addressing Common Issues


Conclusion

Building your own subtitle generator is a great learning project. You get granular control and only pay for the OpenAI credits ($0.006/min).

However, you inherit the maintenance of an FFmpeg pipeline, which includes managing file storage, server scaling for concurrent renders, and handling weird video formats.

If you need a reliable, scalable solution that just works, give the Orshot Video Generation API a try.

Automate marketing with Orshot!

"My brands can now post 100 times per day on auto-pilot thanks to Orshot"
[object Object]
[object Object]
Elciudadano.cl
CreatorCore
Mihai Cirstea
[object Object]

Related Blogs