Captioning

I recently occasionally started doing captions for movies as a hobby and figured I'd put together my notes, both in general and for specific files.

Note: These pages are super rough because this site in general is not ready, but I had to make it available online because I've been sitting on a much-improved copy of my original Radioactive Dreams captions for a long time and I didn't want to push them to OpenSubtitles until I had something up at cdvr.org because I reference this site in my opening and closing notes. The old WIP has some significant errors, so it's not just pedantry or perfectionism. Anyway, that explains why this page is so rough in content and has default browser styles.

Projects

VTT (a.k.a. WebVTT) vs SRT?

These are two different, but very similar caption formats. The main differences are:

I prefer to write in VTT, but provide both files for each project, since I've had minor issues with VTT in VLC (specifically, with overlapping captions).

Principles

I try to stay as accurate to the dialogue as I can. I know this contradicts the received wisdom that captions should be paraphrased for efficient reading, but this approach feels way too utilitarian, too at risk of flattening out interesting or clever or fun or insightful or interconnected dialogue. For instance, what if there's a connection between a short line at one point and a longer line somewhere else that's lost because the longer line is paraphrased? Besides, people read at different speeds and no captions file needs to be the definitive version.

I leave at least the time of one frame (at 24fps) between each line and the next, so there's a clear visual change as one line blinks out and another pops in.

I stick to a maximum of 60 characters per line, and divide longer individual captions into roughly-similar lengths to get under that limit while avoiding breaking linguistic units across lines. However, I try to stick to much less than 60 characters, ideally going with half that or even less.

I try to span captions over at least a second and normally group bits of related dialogue within 3–5 seconds. If a piece of dialogue is very quick, then I leave the caption on screen up to a total of 1 second or so.

I caption even quiet dialogue, only ignoring lines if they blend into the background sound, unless they're blending into the soundscape because of obviously bad sound mixing.

I prefer to write full captions (including sound effects, song lyrics, etc.), not just subtitles, firstly because sound design is a key part of much cinema/TV/etc., secondly because they're rarer than sub files, and thirdly (and more selfishly) because in translating soundscapes and individual sound effects to text there's room for my own expression.

I match my style to the film. For instance, for Radioactive Dreams I went with a maximalist style, adding lots of expressive SFX notes with a kinda jokey approach. I wouldn't take that exact approach for something like, say, Angel's Egg.

Formatting

Here's how I format:

Dialogue
[SFX and instrumental music notes]
{significant on-screen text}
♪ music lyrics (backup vocals) ♪
♪ repeated lyrics (×2) ♪
<i>voiceovers, flashbacks, etc.</i>

♪ All wrap formatting, like square brackets,
braces, music notes, and italics tags,
can extend over multiple lines ♪

Extra formatting stuff:

When two people on-screen talk in the same caption, each line of dialogue starts with an en dash (–).

Off-screen and voiceover dialogue starts with the name of the character speaking, in all-caps (e.g. PHIL, VOICE UPSTAIRS). Since this makes it clear who says what, I don't use en dashes in these cases.

I ignored backup vocals in lyrics except where they were different from main vocals, e.g. Eat You Alive (from the Radioactive Dreams soundtrack) has a refrain “Gonna eat (eat) you (you), eat you alive”, but the backups don't add meaning so I ignored them.

“Best” practice is wildly inconsistent

By “best practice” I mean what's publicly reported as such. I don't have any textbooks on subtitles or captioning (if there are any; I'm guessing there are), I don't follow any expert bloggers, I've never taken a training course, and I don't do it professionally. What I really mean by “best practices” is basically advice pushed out by slop blogs and social media accounts.

The issue is that every major question has multiple given answers all presented as the One True Way. For example: When you have to split a sub/caption into multiple lines of text, where should you split it? Should you make the parts as even in width as possible? Divide the text at a linguistic units? By dialogue flow? Everyone seems to disagree, and some people even advise you to use more than one of these rules even though they're almost always incompatible. You'll almost never be able to split text evenly and split it at clauses/subclauses, or split it evenly and split it at a natural pause or breath in the dialogue. That's just not how dialogue or real speech work!