Tutorial at WWW 2008

(PDF description)

Title: A Semantic Multimedia Web: Create, Annotate, Present and Share your Media


Download slides in PDF format
Download the slides in PDF.


Tutorial Abstract

The success of content-centered (social) Web 2.0 services contributes to an ever growing amount of digital multimedia content available on the Web. Video advertisement is becoming more and more popular and films, music and videoclips are largely consumed from legacy commercial databases. Re-using such multimedia material is, however, still a hard problem. Why is it so difficult to find appropriate multimedia content, to reuse and repurpose content previously published and to adapt interfaces to these content according to different user needs?

This tutorial proposes to cover these questions. Based on established media workflow practices, we describe a small number of fundamental processes of media production. We explain how multimedia metadata can be represented, attached to the content it describes, and benefits from the web that contains more and more formalized knowledge. We show how web applications can benefit from semantic metadata for creating, searching and presenting multimedia content.


Learning Objectives, Scope and Target Audience

This tutorial is designed for practitioners, researchers and PhD students who work in creating, searching and presenting multimedia content for exchanging and sharing over the Web. The target audience will learn how to understand the semantics of various media, how to describe them, and how to make use of such descriptions in the whole multimedia creation process including management, distribution, delivery and reuse. The tutorial also targets multimedia content providers, such as TV broadcasters and news agencies, who want to sell and expose their content on the web, and industries who supply added value services in content enrichment and organization.

While the tutorial is focused on Multimedia Semantics on the Web, it should also be of interest to people working in: Multimedia Ontology Engineering, Multimedia on the Web, Multimedia User Interface Design, Content-Based Indexing and Retrieval, TREC Video Retrieval and Multimedia Information Retrieval.
The tutorial will include lectures, use cases and demonstrations. Being partially funded by the EU K-Space Network of Excellence, the tutorial will be widely advertised in mailing lists and among related EU research projects for maximizing participation.


Tutorial Full Description

Working with multimedia assets involves their capture, annotation, editing, authoring and/or transfer to other applications for publication and distribution. There is substantial support within the multimedia research community for the collection of machine-processable semantics during established media workflow practices. An essential aspect of these approaches is that a media asset gains value by the inclusion of information (i.e. metadata) about how or when it is created or used, what it represents, and how it is manipulated and organized. For example, users sharing photos on Flickr or Picasa Web would like to keep control of the tags and metadata associated to the media in order to automatically generate digital photo books for a specific event. Semantic search of news require new models and interfaces that could aggregrate media from several sources and personalize the news to the user interests and location.
In this tutorial, we consider the use of Semantic Web technologies for improving the multimedia user experience on the Web. We explain how multimedia metadata can be represented, attached to the content it describes, and benefits from the web that contains more and more formalized knowledge. We show how web applications can benefit from semantic metadata for creating, searching and presenting multimedia content.

While many multimedia systems allow the association of semantic annotations with media assets, there is no agreed-upon way of sharing these among systems. As an initial step, and based on established media workflow practices, we identify a small number of fundamental processes of media production, which we term canonical processes (see Figure 1). The tutorial introduces these processes, defined in terms of their inputs and outputs and regardless of whether these processes can, or should, be carried out by a human or a machine. We illustrate these processes with two systems coming from both academic and industrial research communities: an online photobook creation web application and Vox Populi, a system for automatic generation of argumentation-based video sequences.

Semantic descriptions of non-textual media can be used to facilitate retrieval and presentation of media assets and documents containing them. Existing multimedia metadata standards, such as MPEG-7, provide a means of associating semantics with particular sections of audio-visual material. While technologies for multimedia semantic descriptions already exist, there is as yet no formal description of a high quality multimedia ontology that is compatible with existing (semantic) web technologies. We therefore present four proposals for MPEG-7 based ontologies, and we describe COMM in detail, a Core Ontology of MultiMedia for annotation that extends the DOLCE upper ontology. We explain how semantic multimedia metadata can be represented, attached to the media itself and linked to other vocabularies defined in the Semantic Web. We demonstrate a semi-automatic ontology-based annotation tool for producing semantic annotations of image, audio and video content.

COMM has been designed for representing multimedia metadata, but with different media – such as text, image, video, audio – and with different applications – such as news or cultural heritage – come also a lot of different specific metadata standards and vocabularies, and the situation we found today is a web hosting a plethora of formats. For example, for still images, we find many different standards ranging from EXIF headers in photographs and MPEG-7 image descriptors to XMP/IPTC semantic information or simple user-defined tags from a Web 2.0 application. This makes life difficult for end users and application developers. We show with several use cases how web applications benefit from using multiple metadata formats. We explain how metadata interoperability can be achieved by using Semantic Web technologies to combine and to leverage existing multimedia metadata standards.

Multimedia metadata are therefore heterogenous in formats and types and Semantic Web technologies help in integrating them semantically. Underlying technologies are insufficient in their own right and users require interfaces to access these more complex data. Facet browsing and auto-completion have become popular as a user friendly interface to data repositories. Users should be able to select and navigate through facets of resources of any type and to make selections based on properties of other, semantically related, types. We present various facet browser interfaces developed within academic research projects but deployed more and more in commercial web applications. We show novel search and presentation techniques which make use of interoperability between the data and between the vocabularies, using two demonstrators in the Culturage Heritage and the News domains.

The 9 canonical processes illustrated
Figure 1: The 9 canonical processes illustrated.

The schedule of the tutorial is as follows:

  1. Welcome, Introduction, and Overview (5 minutes)
    Welcome participants, find out who they are and what they want, provide overview of tutorial goals and schedule.
  2. Understanding Multimedia Applications Workflow (55 minutes)
  3. Semantic Annotation of Multimedia Content (30 minutes)
  4. Coffee Break (30 minutes)
  5. Semantic Annotation of Multimedia Content (cont.) (30 minutes)
  6. Semantic Search and Presentation of Multimedia Content (55 minutes)
    • Link your data!
    • Facet Browsing interfaces, auto-completion search and ranking algorithms
    • Browsing multimedia datasets: the eCulture and the News domain
  7. Wrap up, Conclusion and Q/A (5 minutes)

History and References

Tutorial history:

This tutorial follows the lectures Providing Flexible Interfaces to Annotated Multimedia Repositories and Multimodal Interaction given during the K-Space Summer School on Multimedia Semantics (SSMS) organized in Chalkidiki, Greece (2006) and Glasgow, UK (2007) respectively. It is built on a number of real applications and use cases developped within the W3C Multimedia Semantics Incubator Group (June 2006 - August 2007). The tutorial includes theoretical work presented during the Workshop on Multimedia for Human Communication - From Capture to Convey at ACM Multimedia 2005 or discussed during the Panel on The role of multimedia metadata standards in a (Semantic) Web 3.0 at WWW 2007. It re-uses material from the Tutorial on Understanding Media Semantics given at ACM Multimedia 2003 and ACM Multimedia 2004.

Relevant references:

Acknowledgments:

This tutorial is partially supported by the European Commission under contract FP6-027026, K-Space: Knowledge Space of semantic inference for automatic annotation and retrieval of multimedia content.


Biography of the Lecturers

Raphael Troncy portrait

Raphaël Troncy obtained his Master's thesis with honors in computer science at the University Joseph Fourier of Grenoble, France, after one year spent in the University of Montreal, Canada. He benefited from a PhD fellowship at the National Audio-Visual Institute (INA) of Paris where he received his PhD with honors in 2004. During his PhD, he taught undergraduate courses in the University René Descartes, Paris 5 (FR), and gave lectures in the INTD Bachelor of documentation on audio-visual documentation and databases.

He was awarded ERCIM Post-Doctorate Research Associate in the National Research Council (CNR) in Pisa, Italy in 2005, and in the Centre for Mathematics and Computer Science (CWI) in Amsterdam, the Netherlands in 2006 where he is currently employed. Raphaƫl Troncy is co-chair of the W3C Incubator Group on Multimedia Semantics, and an active participant in the EU K-Space Network of Excellence.

His research interests include Semantic Web and Multimedia Technologies, Knowledge Representation, Ontology Modeling and Alignment. Raphaël Troncy is an expert in audio visual metadata and in combining existing metadata standards (such as MPEG-7) with current Semantic Web technologies. He also works closely with the IPTC standardization body and the relationship between the NewsML language and the Semantic Web.

Lynda Hardman portrait

Lynda Hardman heads the Semantic Media Interfaces group at CWI and is part-time full professor at the Technical University of Eindhoven. She obtained her PhD from the University of Amsterdam in 1998, having graduated in mathematics and physics from Glasgow University in 1982. During several years of working in the software industry she was the development manager for Guide - the first hypertext authoring system for personal computers (1986). She was a member of the W3C working group that developed the first SMIL recommendation.

The research projects she currently leads focus on different aspects of the automated generation of hypermedia presentations, with emphasis on aspects of discourse and design and on underlying (Semantic) Web technologies. She is a member of the EU K-Space Network of Excellence and MultimediaN E-culture Project, which won the first prize at the Semantic Web Challenge at the 5th International Semantic Web Conference held in Athens, Georgia, USA, November 2006.

She is a member of the editorial board for the Journal of Web Semantics, and the New Review of Hypermedia and Multimedia and has given numerous tutorials on SMIL.