Web Captioning and Education
This article was written to set the framework for the Web Captioning and its uses in Education webcast. It provides an overview of captioning technologies, implications for education, and ideas for future development.
What is Captioning?
Captions are text versions of the spoken word. Captions allow the audio content of web multimedia to be accessible to those who do not have access to audio, primarily the Deaf and hard-of-hearing.
Common web accessibility guidelines indicate that captions should be:
- Synchronized - the text content should appear at approximately the same time that audio would be available
- Equivalent - content provided in captions should be equivalent to that of the spoken word
- Accessible - caption content should be readily accessible and available to those who need it
Captions should not be mistaken for subtitles. Captions are in the same language as the audio, whereas subtitles usually provide a translation of the audio or other visual language. Because captions are primarily targeted to the Deaf and hard-and-hearing, captions are usually closed (meaning they can be turned on and off), whereas subtitles, which are intended for everyone, are usually open (they cannot be turned off). The primary distinction is that captions are vital to ensure accessibility to the Deaf and hard-of-hearing and as such, they provide a verbatim, textual equivalent of all necessary auditory information. Subtitles, on the other hand, can provide additional, clarifying information that may not be necessarily vital for accessibility.
Web Captioning Technologies
There are innumerable multimedia technologies on the web. From streaming video to teleconferencing to podcasts. Many of these technologies support captioning. Many do not. Web multimedia can be placed into two categories - real-time and archival.
For archival media (meaning anything that is not live or real-time), the primary media players on the web - Windows Media Player, QuickTime, RealPlayer, and Adobe Flash - all support captioning. However, the technologies for providing captions in these media players vary greatly. Windows Media Player technology uses the Synchronized Accessible Media Interchange (SAMI) format for defining the timing and display of captions. QuickTime uses QuickTime Text Track. RealPlayer uses RealText. SAMI, QuickTime Text Track, and RealText are all proprietary formats and are not compatible with other media players. QuickTime and RealPlayer also use Synchronized Multimedia Integration Language (SMIL) to define the positioning and layout of the captions within the media player. Adobe Flash supports captions in a variety of ways, though primarily the captions are either defined directly within the multimedia timeline or by using an XML format that defines when the captions should display. Few tools natively support real-time captions.
The World Wide Web Consortium is developing a Timed Text specification to help address the incompatible formats currently used on the web for captioning and subtitling.
There are several tools that are available for the generation and delivery of web captioning. Examples include MAGpie and Hi-Caption which can be used to generate captions for archived multimedia.
Support for captioning in other web technologies, such as video conferencing, VoIP (Voice-over-Internet Protocol), podcasting, etc., are rather limited. In many cases, external captioning services and technologies can be used in conjunction with these technologies to provide the necessary accessibility. However, native support for captioning in these technologies would do much for increasing the accessibility for persons with disabilities.
Real-time, streaming web multimedia introduces an additional challenge for captioning. The difficulties in generating real-time captions are 1) Audio information must be converted into text in real time, and 2) The text captions must be delivered to the end user so they are synchronized with the audio. Both of these issues introduce difficulties when dealing with live, real-time web multimedia.
Captions have also been found to provide additional benefit to end users who may not be Deaf or hard-of-hearing, including many with cognitive disabilities, language deficits, and even those whose computer hardware may not support audio or have audio enabled (such as computers in a public library).
Implications for Education
There is no question that use of web multimedia in education is increasing. Many online education courses are provided entirely using web multimedia. The education system has an ethical and, in many cases, legal obligation for providing equivalent access to individuals who are deaf and hard-of-hearing. Despite this, a vast majority of educational web multimedia is not captioned. There are many reasons why this may be the case:
- Lack of awareness. Those producing web multimedia do not understand the accessibility implications.
- Lack of policies or standards that require or even suggest accessibility.
- Captioning is considered cost prohibitive.
- Lack of technical knowledge to provide captions and/or tools to generate captions.
- Use of technologies that do not natively support captions (and no method for providing captions using alternative technologies).
Increasing the accessibility of web multimedia should be a top priority for all in education. The issues listed above are all ones that can be overcome to ensure accessibility to the Deaf and hard-of-hearing.
The Future
Voice recognition has long been regarded as a 'silver bullet' that will greatly ease the development of captions. While voice recognition is being used in controlled environments (primarily one speaker with 'trained' voice recognition software) to provide captions, it is not yet a viable solution for optimal accessibility of web multimedia. Of great interest should be technologies that can convert digitized audio into text and perhaps provide captions for any digitized multimedia format in real-time. (Of note, is Automatic Sync Technologies which provides a service wherein synchronized captions can be generated by syncing text from a provided transcript with the audio portion of multimedia).
The Time Text standard needs to be refined and when complete, needs to be adopted and supported by web multimedia technologies. This standard could then provide a format for captioning that is compatible with many technologies. It also provides a standard methodology for the creation and delivery of captions.
Most of all, there needs to be increased awareness and training in web captioning technologies. When education providers understand the issues of accessibility and know how to address them, they typically provide the accessibility that is vital for many, primarily those who are Deaf and hard-of-hearing.