NCDAE Tips and Tools: Web Captioning

Created: December 2006

The following is a brief introduction to the principles and potential challenges of captioning for the web. It is meant to be a starting point, not a definitive guide to captioning. If you are interested in learning more, read the NCDAE captioning article and the several captioning resources provided by WebAIM. WebAIM is a partner with NCDAE.

What are Captions?

Captions are text equivalents of the spoken word and other audio content. They allow the audio content of web multimedia to be accessible to those who do not have access to audio, primarily the Deaf and hard-of-hearing. Captioning can be expensive, and a little daunting at first, but it is also a very important part of making content accessibleaccessible design.

Common web accessibility guidelines indicate that captions should be:

Synchronized - the text content should appear at approximately the same time that audio would be available
Equivalent - content provided in captions should be equivalent to that of the spoken word
Accessible - caption content should be readily accessible and available to those who need it

Captioning media

There are five main steps to captioning a file for the web.

1. Create or obtain a transcript

There are a few ways that a transcript can be created, and each has its advantages and disadvantages.

From production script: If a script is available, you may already have a transcript.
Generated by stenographer: A stenographer can create a caption in a very short time, but this process can be extremely expensive, usually $75-$100 per hour.
Typed by hand: This is usually the most time consuming way to create a script, but it may be the most cost effective if fast production time is not critical. A fast typist can create a transcript at a much lower cost than a stenographer.
Created using voice recognition: Although voice recognition is an exciting alternative, the technology is far from perfect. In order for voice recognition to be reliable, a person must train the software and speak very clearly. Some people take advantage of voice recognition through a process called "shadow speaking." That is where a person repeats live or recorded speech.

2. Segment into individual caption displays and add speaker names

Before captions can be created, text must be chunked into smaller units of one or two short sentences. This is usually accomplished by adding manual line breaks between units (hit Enter twice). New speakers or a change in speaker should also be identified by starting the line with the person's name, a colon and a space. Sometimes you will see the speaker identified in a separate line, but this is usually a waste of space.

Note: This step can be combined with Step 1.

3. Assign timecode for each caption to synchronize with audio

Several programs exist to help people synchronize text transcripts with media. The two most popular tools are MAGpie (a free tool) and Hi-Caption. For more information on using these tools, see the following tutorials.

4. Create appropriate caption files (QTtext, RealText, SAMI)

Every media player uses a different format for their caption files. This can be frustrating if your media files exist in more than one format, but many tools, including those listed in Step 3, can create files in these different formats. The following is a list of the most common file types.

SAMI (Synchronized Accessible Media Interchange) – The file that contains caption data with timing information for Windows Media Player.
Quicktime Text Track – The file that contains captions and timing information for Quicktime media.
RealText – The file that contains caption and timecode data for RealPlayer.
SMIL (Synchronized Multimedia Integration Language) – The layout language used by Quicktime and RealPlayer.

There are also some tools that allow you to create captions for Adobe Flash content, although there is not currently a single specified format for captions in Flash.

5. Combine with media and distribute the captioned media

There is no easy way to learn how to combine media and caption files. It can be a difficult process. If you are interested in captioning for a specific format, the following WebAIM tutorials might be helpful.

Captioning Accessibility Challenges and Solutions

The following table lists common challenges associated with captions, the people with disabilities that might be impacted and possible solutions to these challenges.

Accessibility challenge	Disability type(s)	Solution(s)
A person cannot hear or easily understand audio or video content.	Deaf, Cognitive, Low literacy, non-native language, All	Provide captions for all video content. Provide captions for all live audio and video content. Provide transcripts for recorded audio content.
Captions may be too long, causing part of the caption to be hidden, or making it difficult to read.	All, Cognitive	Each excerpt should be no more than two lines long.
Captioned media is not accessible to a person relying on a Refreshable Braille device	Deaf Blind	Provide a text transcript in addition to captions.
In a video, there may be important content conveyed visually that is not included in the captions.	Blind	Provide Audio Descriptions. Ensure that all visual content is part of the audio as well. For example, in a video of a PowerPoint presentation, make sure all the slide content is read by the presenter.
Embedded media players may not be as keyboard-accessible	Blind, Cannot use a mouse	Avoid using embedded media players. If an embedded player is necessary, ensure that it is keyboard-accessible.
Small videos may be hard to view	Low Vision	When possible, offer the option of a higher-resolution video.
Large videos may be very difficult to view by someone with a slow internet connection	All users	When possible, offer the option of a lower-resolution video. Optimize the video so that the file is as small as possible.
Small font size and poor fonts may make a caption unreadable	Low vision, all users	14 pt. font size is ideal. White text on a black background is usually best. Use a clear sans serif font such as Helvetica or Arial.
It may be difficult for some deaf people to read captions, as Engligh is not the first language for many Deaf people (it is ASL or another sign language).	Deaf	It may be appropriate to provide a video with an ASL (in the U.S.) interpreter in addition to a captioned video. This approach is not always recommended, because it can be very expensive, and because an ASL alternative will not benefit as many people as a captioned video.
Nonverbal audio cues are not always included in captions.	Deaf, Anyone who is unable to hear the audio	Ensure that captions include all important audio cues including non-verbal sounds and change of speaker.

Real-time Captions

Real-time, streaming web multimedia introduces an additional challenge for captioning. The difficulties in generating real-time captions are 1) Audio information must be converted into text in real time, and 2) The text captions must be delivered to the end user so they are synchronized with the audio. Both of these issues introduce difficulties when dealing with live, real-time web multimedia.