David Shrimpton, University of Kent at Canterbury
Christopher Dobbyn, Oxford Brookes University

There is now a general consensus that Digital TV and the World Wide Web are
converging technologies, in that Digital TV users will shortly be able to
access, present and interact with WWW documents by means of their TV
service, combining these documents seamlessly with TV pictures and sound.
However, no agreed universal models or mechanisms exist for the combination
of the two technologies. We are currently researching into models and
mechanisms for the integration of the existing interactive Digital TV
technologies with the WWW. We are developing models for an integrated
service and investigating extensions to the MHEG-5 component of the DAVIC
specification for Digital TV and to the various WWW systems, by means of
which the two may be interfaced. The project objectives are:

1. To research and implement a model for integration of the digital
television standards with the current and emerging World Wide Web standards.

2. To provide a set of tools to be used by content providers that will
enable digital television users to access, present and interact with WWW
documents.

3. To contribute specific proposals, and to the discussion generally of,
the appropriate standards bodies (e.g. ISO) and industrial consortia (e.g.
DAVIC)

Future web browsers and the software for interactive TV STB's have similar
requirements: obviously both are concerned with multimedia presentation;
for both, interaction with, and navigation around, documents are essential;
and for both, standardisation is required. Moreover, it is clearly
desirable for users to be able to move seamlessly between the TV and Web
environments: for example a user might wish to download a form from her TV
to a browser on a PDA, work on it offline, and then upload the form back to
the TV for back-transmission. This is not possible with many of the current
interactive TV standards: for example, DAVID MHEG-5 documents can only be
handled by special MHEG Engines. To render an MHEG presentation on an
existing browser technology, either the browser must be made MHEG-aware or
the MHEG presentation must somehow use tag-set that a browser can
understand. A common model for interactive TV and the WWW is therefore a
clear necessity.

The proposed ISO MHEG-8 standard describes document structure and rendering
information in terms of XML tags, defined in the MHEG-8 DTD. The majority
of the MHEG-8 tags can be mapped onto XHTML tags using XSL-T; however,
there are many aspects of MHEG-8 presentation tags whose semantics cannot
be expressed in XHTML. For example, XHTML has no notion of object ordering
and it is not possible to specify that one object is in front of or behind
another. Nor can opacity, which is expressible in MHEG, be encoded in
XHTML. In addition to the document structure, MHEG-8 also describes the
MHEG event model by means of tags. There is no equivalent to this in XHTML,
as tags are mainly associated with structure.

One solution to this problem is for multimedia documents to load into a DOM
capable browser in the form of an XHTML document, with script tags
referring to a .jar file containing MHEG and other support classes encoded
in JavaScript; these classes constitute a small-footprint Document
Interpreter. The document containing these tags can be loaded from the Web;
or-since TV users expect instant access, wired into the STB as a start-up
document. Tags later in the document and the tags of documents subsequently
loaded into the browser, make calls the methods of these classes. As the
browser software processes these tags, they make calls to the Document
Interpreter, which establishes MHEG links between the constituent objects
of the document; these links being constituted by JavaScript objects. The
Interpreter maps the links onto the DOM event-handling model, by creating
and registering appropriate action listeners through the DOM API. Events
arising from user interaction with document elements are handled by
JavaScript demons associated with each tag for which user interactions are
enabled, these demons making calls to the API of the Document Interpreter.

Although such a system provides an event-handling model, it does not
provide any equivalent of the model for synchronisation and timing of
objects that are defined in MHEG; however this is the domain of SMIL. SMIL
provides the tags, which can be used to express parallel rendering of real
time synchronised streams, etc. Such tags can also be integrated into the
XHTML/XML documents that are loaded into the browser.
The equivalent of the MHEG engine functionality thus becomes spread between
the DOM model and the SMIL functionality that will reside in future
browsers, and the JavaScript library that is associated with the MHEG
presentation and forms the Document Interpretation engine.

Another similar example to MHEG is the Japanese, Broadcast Mark-up
Language. BML is an XML-based representation incorporating XHTML1.0,
ECMAScript, CSS1/2, and DOM Level 1 with extensions. New tags are defined
to handle synchronization, dynamic lists; there are extensions to CSS to
encode navigation, resolution and colour information; and ECMAScript
classes are defined to handle transmission stream, document switching and
persistent memory functions.

We argue that the W3C standards described above provide the basis for a
common model of representation and processing of WWW and Interactive TV
documents, in which the core functionality of architectures such as MHEG
can be captured, but which general browsers can use without the need for
translation or a specialised engine independent of the browser.