Why does my video quality look worse after adding subtitles?

This happens because FFmpeg re-encodes the video when burning subtitles. If you don't specify a bitrate or Constant Rate Factor (CRF), it defaults to a lower quality. Use `-crf 18` or `-crf 23` in your FFmpeg command to maintain visual quality.

How can I change the subtitle font or color?

Standard SRT files don't support rich styling natively. To get custom fonts, colors, or backgrounds with FFmpeg, you typically need to convert your subtitles to **ASS (Advanced Substation Alpha)** format first, or mess with complex `drawtext` filters. Orshot handles this automatically via CSS.

Is OpenAI Whisper accurate for non-English videos?

Yes, Whisper is currently state-of-the-art for multilingual transcription. It handles accents and multiple languages exceptionally well compared to older tools like Vosk or Tesseract.

Can I run this on my laptop without a GPU?

Yes, but it will be slow. Transcribing is fast via API, but the 'burning' step (FFmpeg re-encoding) is CPU intensive. A 1-minute 4K video might take 5-10 minutes to render on a standard laptop CPU.

Why creates the 'Stream mapping' error in FFmpeg?

This usually happens when your input video has multiple audio tracks or weird codecs. FFmpeg tries to map streams automatically and fails. You often need to explicitly map streams using `-map 0:v -map 0:a` flags.

Can I adjust the subtitle timing if it's slightly off?

If you are using the manual Node.js method, you would need to parse the SRT file and shift the timestamps (e.g., add +500ms to every line) before burning it. This requires an extra parsing library/step.

BlogTutorial

How to Generate Subtitle from Video using Node.js & OpenAI

A complete guide to adding automated subtitles to videos. Compare the easy way using Orshot's Video API versus the manual DIY approach using FFmpeg and OpenAI Whisper

Rishi Mohan

Jan 26, 2026

8 min read

Ask Questions in :Scira AI ChatGPT Claude

Orshot Suite (Studio + API + Embed)

$30/ month

Start Free Trial →

No credit card required. Cancel anytime.

What's included:

Image, PDF and Video Generation via API
Canva like editor with AI and smart features
No-Code Integrations (Zapier, Make, n8n etc.)
Embed Orshot Studio in your app

Video content is dominant right now, but here is a statistic that matters: over 80% of social media videos are watched on mute. If you aren't burning subtitles into your videos, you are losing the vast majority of your audience.

If you have researched this topic—perhaps scrolling through Reddit threads on r/ffmpeg or r/node—you likely ran into the same common pain points:

FFmpeg is complex: "Filter complex", "stream mapping", and syntax errors are nightmares.
Quality Loss: Hardcoding subtitles requires re-encoding the video. If you don't know what CRF or bitrate to set, your crystal clear 4K video turns into a blocky mess.
Styling Limitations: Standard SRT files are ugly. Making them look like "TikTok style" captions requires advanced libass configs or custom drawing filters.

In this guide, we will look at two ways to solve this: the Automated API method (using Orshot) and the Manual Node.js method (building it yourself).

Method 1: The Automated Way (Orshot API)

If you want to skip the engineering headaches of server-side FFmpeg, GPUs, and file storage, Orshot handles the entire pipeline in a single API request.

Orshot's Video Generation API automatically:

Extracts audio from your video.
Transcribes it using high-quality AI.
Generates the subtitles with perfect sync.
Burns them in with custom styling (colors, fonts, backgrounds).
Returns the final .mp4.

Example: Auto-Generate Subtitles

You simply pass your video URL and styling preferences. You can even use Google Fonts instantly.

const response = await fetch("https://api.orshot.com/v1/studio/render", {
  method: "POST",
  headers: {
    "Content-Type": "application/json",
    Authorization: "Bearer YOUR_API_KEY",
  },
  body: JSON.stringify({
    templateId: "your-video-template",
    response: { format: "mp4" },
    videoOptions: {
      // The magic happens here
      subtitleSource: "https://your-bucket.com/video.mp4",
      subtitleColor: "#FFFFFF",
      subtitleBackground: "rgba(0,0,0,0.6)", // Semi-transparent box
      subtitleFontFamily: "Inter", // Auto-loads Google Font
      subtitleFontSize: "32px",
      subtitleBottom: "80px", // Position from bottom
    },
  }),
});

That's it. You get a finished URL back. No servers to manage, no fluent-ffmpeg callbacks to debug.

Method 2: The DIY Way (Node.js + OpenAI Whisper)

If you prefer to build this pipeline yourself, we will use Node.js, fluent-ffmpeg, and OpenAI's Whisper API.

Prerequisites

Node.js installed.
FFmpeg installed on your system path.
An OpenAI API Key.

Step 1: Install Dependencies

npm install fluent-ffmpeg openai fs

Step 2: Extract Audio

OpenAI's Whisper API accepts audio files. First, we need to strip the audio track from your video to reduce upload size and processing time.

const ffmpeg = require("fluent-ffmpeg");

function extractAudio(videoPath, audioPath) {
  return new Promise((resolve, reject) => {
    ffmpeg(videoPath)
      .output(audioPath)
      .noVideo()
      .audioCodec("libmp3lame")
      .on("end", resolve)
      .on("error", reject)
      .run();
  });
}

Step 3: Transcribe with OpenAI Whisper

We upload the audio to OpenAI. The pricing is $0.006 per minute. That means a 10-minute video costs about $0.06 to transcribe.

const OpenAI = require("openai");
const fs = require("fs");

const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

async function transcribe(audioPath) {
  const transcription = await openai.audio.transcriptions.create({
    file: fs.createReadStream(audioPath),
    model: "whisper-1",
    response_format: "srt", // Request SRT format directly
  });
  return transcription; // Returns standard SRT string
}

Step 4: Burn Subtitles into Video (Hardcoding)

This is the tricky part. We need to take that SRT string, save it to a file, and use FFmpeg's subtitles filter to burn it in.

Critical Note on Quality: By default, FFmpeg might re-encode your video at a low bitrate. To preserve quality, we use the -crf (Constant Rate Factor) flag. Lower is better quality. A CRF of 23 is standard, 18 is nearly visually lossless.

function burnSubtitles(videoPath, subtitlePath, outputPath) {
  return new Promise((resolve, reject) => {
    ffmpeg(videoPath)
      .outputOptions([
        "-vf subtitles=" + subtitlePath, // The complex filter
        "-crf 23", // Maintain visual quality
        "-preset fast", // Encoding speed
      ])
      .save(outputPath)
      .on("end", () => console.log("Subtitles burned successfully!"))
      .on("error", (err) => console.error("Error burning subs:", err));
  });
}

The Full Pipeline

Putting it all together, here is a complete script you can run. This handles audio extraction, transcription, and subtitle burning in one flow.

/* generate-subs.js */
const ffmpeg = require("fluent-ffmpeg");
const OpenAI = require("openai");
const fs = require("fs");

const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

async function extractAudio(videoPath, audioPath) {
  console.log("🔊 Extracting audio...");
  return new Promise((resolve, reject) => {
    ffmpeg(videoPath)
      .output(audioPath)
      .noVideo()
      .audioCodec("libmp3lame")
      .on("end", resolve)
      .on("error", reject)
      .run();
  });
}

async function transcribe(audioPath) {
  console.log("🎙️ Transcribing with OpenAI Whisper...");
  const transcription = await openai.audio.transcriptions.create({
    file: fs.createReadStream(audioPath),
    model: "whisper-1",
    response_format: "srt",
  });
  return transcription;
}

async function burnSubtitles(videoPath, subtitlePath, outputPath) {
  console.log("🔥 Burning subtitles (this may take a while)...");
  return new Promise((resolve, reject) => {
    ffmpeg(videoPath)
      .outputOptions([
        `-vf subtitles=${subtitlePath}`,
        "-crf 23", // Balanced quality
        "-preset fast", // Faster encoding
      ])
      .save(outputPath)
      .on("end", resolve)
      .on("error", reject);
  });
}

(async () => {
  try {
    const INPUT_VIDEO = "input.mp4";
    const TEMP_AUDIO = "temp_audio.mp3";
    const TEMP_SRT = "temp_subs.srt";
    const OUTPUT_VIDEO = "output.mp4";

    await extractAudio(INPUT_VIDEO, TEMP_AUDIO);

    const srtData = await transcribe(TEMP_AUDIO);
    fs.writeFileSync(TEMP_SRT, srtData);

    await burnSubtitles(INPUT_VIDEO, TEMP_SRT, OUTPUT_VIDEO);

    console.log("✅ Done! saved to", OUTPUT_VIDEO);

    // Cleanup temp files
    fs.unlinkSync(TEMP_AUDIO);
    fs.unlinkSync(TEMP_SRT);
  } catch (error) {
    console.error("❌ Error:", error);
  }
})();

Addressing Common Issues

Conclusion

Building your own subtitle generator is a great learning project. You get granular control and only pay for the OpenAI credits ($0.006/min).

However, you inherit the maintenance of an FFmpeg pipeline, which includes managing file storage, server scaling for concurrent renders, and handling weird video formats.

If you need a reliable, scalable solution that just works, give the Orshot Video Generation API a try.