Workshop Proposal for WWW9 – Amsterdam – May 2000

Applying a Navigation Layer to Digital Talking Books: SMIL, XML and NCX

Markku Hakkinen (Productivity Works, Inc.) and George Kerscher (Recording for the Blind and Dyslexic and the Daisy Consortium)

Digital Talking Books (DTBs) are an example of a SMIL 1.0 application that incorporates text and audio synchronization, in combination with a navigation structure to allow non-visual readers to easily locate and navigate significant structural elements in a book. A typical DTB consists of a source document (HTML or XML) that contains the full text (and possibly includes images or other content), a full audio narration (encoded in MP3, for example), a SMIL file that synchronizes the audio to the corresponding text elements in the source document, and, finally, a navigation structure definition file.

The Navigation Structure defines, external to both the source document and SMIL, what might simply be considered a table of contents. The table of contents contains a hierarchical extraction of the navigationally significant elements in the “book” and points to the SMIL files that correspond to the presentation start points.

The impetus behind the Navigation Structure was to provide thin-playback devices (e.g., digital talking book reading devices) an efficient mechanism to allow users a means to explore and navigate an audio book. Because SMIL itself did not provide the necessary semantics to encode the navigation elements, and since we could not expect thin-devices to implement DOM or parse an entire book to determine a navigation structure, an external structure definition file was created.

Our first generation Navigation Structure, the NCC (Navigation Control Center) was developed as part of the Daisy Consortium’s Daisy 2.0 DTB format. The Daisy 2.0 format is currently in use internationally among a variety of libraries and agencies serving the needs of the print disabled.

Subsequently, the US Library of Congress began working with Daisy on a formal NISO DTB standard. Out of that work, which has tracked developments in areas such as SMIL Boston and XHTML, we have developed an advanced model of the NCC, called NCX.

The NCX is a formal, extensible, XML application that encodes a navigation layer on top of digital talking book content. A DTB producer, who may or may not be the original content author will, using DTB authoring tools, define what are navigationally significant elements in the source content. These elements become the exposed entry points into the DTB presentation, and are structured into a hierarchy that reflects the structure of the source content. Because of the diversity of source styles for DTBs, which range from well-structured XML files to flat, to unstructured legacy HTML, to books which may consist of audio only, the NCX can be used to provide a structured navigation flow over non-structured content.

Within the DTB model using the NCX, we also have a notion of global vs. local navigation. Global navigation is defined as navigation to and between the elements exposed in the NCX. Local navigation is defined as reading at the granularity of elements in the source document, essentially everything below the NCX elements in terms of navigational significance, such as sentences or words.

For example, an NCX element may identify a scene in a play. Using an NCX aware playback device, a user navigates to, and begins presentation of the selected scene. The user, unable to understand a word spoken by a character, pauses the playback and begins navigation among the lines spoken, and then requests that a word be spelled. The line elements, though navigable, are not exposed in the global structure of the NCX.

One advantage of the NCX is that it may not necessarily represent the same structure as the original author’s defined table of contents. The implication that a content producer (for example, an editor, anthologist, or educator) can determine the navigational significance of the source, allows for the easy creation of alternate views, without touching or modifying the source content. For example, a teacher can create an NCX that highlights only relevant quotations from a digital version of the complete works of Shakespeare. A student can follow the teacher’s view, but also revert to the navigation of the full content. We can envision complex presentations being shipped with multiple NCXs, providing different views or tours through the content.

We propose that the NCX model has a more general applicability to SMIL-based content, providing both accessibility and general usability benefits. This presentation will describe the DTB navigation issues and describe how the NCX provides a solution.

Acknowledgements:

We thank members of the NISO Digital Talking Book Committee, chaired by Michael Moodie of the Library of Congress National Library Service, for a spirited review of this paper.

References:

Daisy Consortium. Daisy 2.0 Specification. http://www.daisy.org/dtbook/spec/2/final/daisy-2.html

National Information Standards Organization. Digital Talking Book Standard (AQ). http://www.niso.org/commitaq.html