Towards Semantic Web Document Engineering

Jacco van Ossenbruggen
Centrum voor Wiskunde en Informatica (CWI), Amsterdam
Jacco.van.Ossenbruggen@cwi.nl

Abstract:

Web publishing systems have to take into account a plethora of Web-enabled devices, user preferences and abilities. Technologies generating these presentations will need to be explicitly aware of the context in which the information is being presented. Semantic Web technology can be a fundamental part of the solution to this problem by explicitly modeling the knowledge needed to adapt presentations to a specific delivery context. We propose the development of a Smart Style layer which is able to use metadata to improve the presentation of content to human users. We discuss different uses of metadata and suggest extensions to current Web technology.

Introduction

As the Web continues to grow not only in size but also in complexity, the increasingly varying needs of the intended audience marks the end of the ``one size fits all'' era. Delivery contexts [1] can be characterized in terms of specific user preferences and abilities, capabilities of the access device and available network resources. Given this heterogeneity, any single message needs to be adapted to a particular set of circumstances. As a minimum requirement, the author's intended message needs to be conveyed to the user given the constraints imposed by the access device. In addition, the generated presentation should conform as much as possible to the preferences of the user and the author [2]. These two types of adaptation may lead to an explosion of potential delivery contexts with which current stylesheet technology is unable to deal.

Our prototype multimedia presentation generation system Cuypers [3] generates multimedia presentations adapted to the constraints of a specific delivery context. We claim that the particular solutions deployed within Cuypers realize a level of adaptivity that should become generally available on the Web. This introduces new challenges since the solutions need to be embedded within the current Web infrastructure. In this paper, we introduce the concept of Smart Style: an intelligent presentation adaptation layer for the Web that builds upon two fundamental technologies:

  1. Web document engineering technology, including delivery formats such as HTML [4], SMIL [5], SVG [6] and XSL [7], and style and transformation languages such as CSS [8] and XSLT [9].
  2. Semantic Web knowledge representation and metadata technology, including RDF [10], RDF Schema [11], DAML+OIL [12] and CC/PP [13].

Currently, Semantic Web technology is primarily deployed to improve Web-based information gathering and brokerage. Our vision is, however, that the Semantic Web infrastructure should also play a key role in presenting information in the most appropriate way to each individual reader. On the other hand, document engineering technology is developing relatively independently from the Semantic Web. We argue that device independent Web content engineering requires a large amount of knowledge that needs and could be made explicit by employing Semantic Web technology. Our proposed Smart Style layer would deploy Semantic Web technology to improve the presentation's adaptation, aiming for an optimized design of the presentation that suits the specific requirements of the user's delivery context.


Ingredients for a Smart Style Layer

To build a Smart Style layer on top of the existing Web infrastructure, four ingredients are needed: ways of specifying delivery contexts, support for content descriptions; processing for delivery contexts and content descriptions.

Communicating delivery contexts

Assuming that at least a part of the adaptation will need to take place on the server, it is essential to standardize the communication of delivery contexts: clients need to be able to send the information in a way that the server understands. A machine-readable description of a delivery context that can be sent to the server is often called a profile. CC/PP [13] provides an RDF-based framework for defining the vocabularies that are needed to define profiles. In addition, it also provides a small vocabulary that can be reused across different profiles. The WAP Forum [14] provides a commonly agreed upon mechanism to communicate the (technical) capabilities of mobile phones to servers and proxies. The CC/PP framework, however, is sufficiently flexible to allow the definition of profiles that focus on more user-centered aspects of a delivery context, such as language preference or media preference.

Supporting metadata for content description

Clients need to be able to communicate delivery contexts, but in itself this is insufficient. Many design decisions will also depend on information that is only available at the server. Even when this information is not intended to be published on the Web, having commonly used and standardized solutions for describing and processing it will greatly reduce the development effort needed to implement a smart, adaptive Web site.

Intelligent adaptation systems will need some knowledge of the function of the content they are adapting. To make this type of knowledge explicit, appropriate use of metadata will be of key importance. Within and outside W3C, a large amount of work on metadata standardization is currently in progress, and in most of this work RDF, RDF Schema and DAML+OIL (and the language being specified within WebOnt) play a central role.

For example, suppose an online museum site has developed an RDF Schema1 for the metadata2 used to annotate their Web site. Also suppose the site features an HTML page describing a work by the painter Rembrandt van Rijn, focusing on the use of chiaroscuro (the painting technique that uses strong contrasts of light and dark paintings). Figure 1 shows a an example fragment of the page.

Figure 1: Example XHTML 1.0 fragment from a page about a Rembrandt painting.
<div id="allegory">
  <h1>Musical Allegory<h1>
  <img src="allegory.jpg"/ >
  <p>This is hardly just an ordinary group of musicians.
     The figures are too exotically dressed in oriental
     ...
</div>

From an XML/HTML markup perspective, all we know is that we have a fragment with a first level heading, an image and a text paragraph. The underlying semantics, however, could be explicitly added by the use of RDF metadata, as shown in figure 2.

Figure 2: RDF metadata of XHTML 1.0 fragment.
<museum:Painter rdf:ID="Rembrandt">
  <museum:fname>Rembrandt<museum:fname>
  <museum:lname>Harmenszoon van Rijn<museum:lname>
  <museum:painted rdf:resource="#allegory" />
<museum:Painter>

<museum:Painting rdf:about="#allegory">
   <museum:title>Musical Allegory<museum:title>
   <museum:technique>Chiaroscuro<museum:technique>
<museum:Painting>

This explicitly states that our HTML fragment is an instance of a class Painting, with a title property ``Musical Allegory'', and that there is a Painter instance that has a painted relation with this painting. The question is: can we exploit the knowledge provided by the metadata to improve our style sheets and other adaptation technology?

While the current focus of this type of Semantic Web technology is on the use of metadata to achieve a more intelligent model for Web-based information retrieval (e.g. improving search engines), the use of metadata in our Cuypers system shows that there is also a huge potential in applying this type of technology for improving the adaptation and presentation process. Through the use of metadata to make the intended semantics and function of the content explicit, adaptation systems should be able to make informed decisions during the design process. This requires an adaptation process that is also able to take into account presentation-related metadata. Based on our experience with Cuypers, we found that most metadata is geared to information retrieval purposes, but not for information presentation. Presentation-related metadata provides information about the properties of the content in the context of its presentation to the user. Examples include information about the intended audience (e.g. suitability for presentation to children), the role of the content (e.g. suitability for a specific presentation role, as introductory material or in-depth explanation), and the transformations allowed (e.g. to what extent images may be scaled in terms of minimum/maximum scaling and aspect ratios, or to what extent images can be displayed in grayscale while still communicateing the intended message).

Processing delivery contexts

Assuming that the information upon which we base our design decisions will be available from the Web through the use of standard Semantic Web technologies such as CC/PP and RDF, the next ingredient needed for building a Smart Style layer are efficient tools that are able to take this type of information into account during the adaptation process. A first step is to make the current generation presentation-oriented Web technology interoperable with the next-generation Semantic Web technology. For example, CSS stylesheets are currently not able to take CC/PP profiles into account. CSS has, however, a feature that is closely related to CC/PP, and allows the specification of device dependent style rules: the @media rule. Figure 3 shows an example3 of a stylesheet that uses bigger fonts on computer screens than on paper printouts of the same document.

Figure 3: Device dependent style rules as already supported in CSS2.
@media print {
  body { font-size: 10pt }
}
@media screen {
  body { font-size: 12pt }
}

A first step towards a CSS syntax that allows more detailed queries is suggested in [17]. In this syntax, queries to specific device features are allowed. For example, the CSS media rule for screen display above could be further refined by adding constraints on the minimum width of the screen, as shown in figure 4. Using the constraints, stylesheets could take into account the information provided by profiles such as:

Figure 4: Detailed media queries using a CSS3 extension (work in progress).
@media screen and (min-width: 640px) {
  body { font-size: 14pt }
}
@media screen and (min-width: 800px) {
  body { font-size: 16pt }
}

Even from this extended CSS syntax, however, it is still a long way to fully CC/PP aware style engines. CC/PP features that will affect style application include the ability to define new profile vocabularies, inheritance mechanisms for specifying default values and the description of the capabilities of transcoding proxies. Style engines need to be able to deal with these features in order to take full advantage of the information specified in CC/PP delivery contexts.

Note that the need to take CC/PP information into account also applies to XSLT transformation engines. While the full details of how this could affect future versions of XSLT are beyond the scope of this paper, one could, for example, imagine an extension4 of XSLT's mode concept. For example, transformation rules could be selected in a way similar to that of the media rules in CSS. In such a hypothetical extension (see figure 5) one could, for instance, define a rule for creating a two column layout only if the output medium is print and the paper is wider than 17cm.

Figure 5: Device dependent rules by extending XSLT modes (tentative syntax).
<xsl:template match="body"
              mode="print and (min-width: 17cm)">
  ...
  <fo:region-body column-count="2"/>
  ...
</xsl:template>

Processing content descriptions

In addition to taking information about delivery contexts into account, stylesheets also need to take into account the semantic information that is contained in the metadata associated with the content. Currently, style selector mechanisms only match on the syntactic properties of the underlying (XML) document hierarchy. This applies both to the selector mechanism used by CSS and to the XPath [18] selectors used by XSLT.

In all examples above, the rules were intended to match on the <body> element of an HTML document. Similar rules could be written to match on the syntactic properties of metadata, i.e. on the XML element and attribute names that are used to encode the RDF statements of Figure 2. Using the current generation CSS and XSLT engines to process general metadata it is, however, not practical to match on the semantic properties of metadata: for CSS and XSLT processors, RDF is just XML. As a result, it is very hard to write, for example, a rule that matches on all alternative XML serializations that are allowed for RDF. A more serious problem, however, is that it is impossible to write CSS or XSLT rules that make use of the structural relations of RDF and RDF Schema, for instance a style rule that applies to all objects that are instances of a specific RDFS (sub)class. Neither is it possible to write rules for all objects that have a certain DAML+OIL-defined ontological relation, etc.

Future, Semantic Web-aware, selector mechanisms could allow specification of style rules in terms of the RDF semantics expressed in the metadata. This would extend the currently used CSS and XPath selectors, that are based on the XML syntax encoding the semantics. Consider the extended XSLT example rule in figure 6, which uses the RDF-aware query language RQL [15] for its selector, instead of XPath.

Figure 6: Semantic matching of XSLT rules using RQL selectors (tentative syntax).
<xsl:template match=
  "RQL(http://www.museum.com/schema.rdf#Artifact)">
  ...
</xsl:template>

It matches on all resources that are instances of (subclasses of) the RDF class Artifact. Given the fact that our RDF Schema would define Painting as a subclass of Artifact, the rule would also match on the HTML fragment of Figure 1. Such rules that employ the semantic relations defined in the metadata are currently impossible to write in XSLT.

Conclusions

This paper sketches the requirements for an ambitious goal: automatic adaptation of dynamic text and multimedia content to the requirements of an individual user's delivery context, while respecting the integrity of the semantics of the content. If we reduce our ambition levels, however, and ``only'' aim for taking into account processing context information, this alone would still have major consequences. To prevent CC/PP from becoming a stand-alone W3C recommendation that can only be processed with proprietary tools, we need to clearly define how other recommendations, including CSS, XSLT, XHTML, SMIL and SVG operate in the context of CC/PP. From CC/PP-aware Web transformations, another step is required towards Semantic Web-aware transformations that also take metadata semantics into account. Given the amount of knowledge that needs to be taken into account when adapting Web resources, we need to integrate the document engineering layers of the Web with the knowledge engineering layers of the Semantic Web. This will require tools that can abstract from the underlying XML syntax and operate directly on the semantics of languages such as RDF, RDFS and DAML+OIL.

Realizing such a level of interoperability among W3C Recommendations will be a huge effort. It should be clear that the examples given in this paper serve only to illustrate the discussion, and should by no means be regarded as readily applicable syntactical solutions to achieve the required interoperability. Making the current Web infrastructure interoperate seamlessly with the upcoming Semantic Web will be a huge challenge and a long term effort.

Bibliography

1
W3C, ``Device Independence Principles.'' Work in progress. W3C Working Drafts are available at http://www.w3.org/TR, 18 September 2001.
Edited by Roger Gimson, co-edited by Shlomit Ritz Finkelstein, Stéphane Maes and Lalitha Suryanarayana.

2
D. Bulterman, L. Rutledge, L. Hardman, and J. van Ossenbruggen, ``Supporting Adaptive and Adaptable Hypermedia Presentation Semantics,'' in The 8th IFIP 2.6 Working Conference on Database Semantics (DS-8): Semantic Issues in Multimedia Systems, (Rotorua, New Zealand, 5-8 January 1999), 1999.

3
J. van Ossenbruggen, J. Geurts, F. Cornelissen, L. Rutledge, and L. Hardman, ``Towards Second and Third Generation Web-Based Multimedia,'' in The Tenth International World Wide Web Conference, (Hong Kong), pp. 479-488, IW3C2, May 1-5, 2001.

4
W3C, ``XHTML 1.1 - Module-based XHTML.'' W3C Recommendations are available at http://www.w3.org/TR/, May 31, 2001.
Edited by Murray Altheim and Shane McCarron.

5
W3C, ``Synchronized Multimedia Integration Language (SMIL 2.0) Specification.'' W3C Recommendations are available at http://www.w3.org/TR/, August 7, 2001.
Edited by Aaron Cohen.

6
J. Ferraiolo, ``Scalable Vector Graphics (SVG) 1.0 Specification.'' W3C Recommendations are available at http://www.w3.org/TR/, 4 September 2001.

7
W3C, ``Extensible Stylesheet Language (XSL) Version 1.0.'' W3C Recommendations are available at http://www.w3.org/TR/, 15 October 2001, 2001.

8
B. Bos, H. W. Lie, C. Lilley, and I. Jacobs, ``Cascading Style Sheets, level 2 CSS2 Specification.'' W3C Recommendations are available at http://www.w3.org/TR, May 12, 1998.

9
J. Clark, ``XSL Transformations (XSLT) Version 1.0.'' W3C Recommendations are available at http://www.w3.org/TR/, 16 November 1999.

10
W3C, ``Resource Description Framework (RDF) Model and Syntax Specification.'' W3C Recommendations are available at http://www.w3.org/TR, February, 22, 1999.
Editied by Ora Lassila and Ralph R. Swick.

11
W3C, ``Resource Description Framework (RDF) Schema Specification 1.0.'' W3C Candidate Recommendations are available at http://www.w3.org/TR, 27 March 2000.
Edited by Dan Brickley and R.V. Guha.

12
F. van Harmelen, P. F. Patel-Schneider, and I. Horrocks, ``Reference description of the DAML+OIL (March 2001) ontology markup language.'' http://www.daml.org/2001/03/reference.html.
Contributors: Tim Berners-Lee, Dan Brickley, Dan Connolly, Mike Dean, Stefan Decker, Pat Hayes, Jeff Heflin, Jim Hendler, Ora Lassila, Deb McGuinness, Lynn Andrea Stein, ...

13
W3C, ``Composite Capability/Preference Profiles (CC/PP): Structure and Vocabularies.'' Work in progress. W3C Working Drafts are available at http://www.w3.org/TR, 15 March 2001.
Edited by Graham Klyne, Franklin Reynolds, Chris Woodrow and Hidetaka Ohto.

14
Wireless Application Group, ``WAP-174: WAG UAPROF User Agent Profile Specification,'' 1999.

15
G. Karvounarakis, V. Christophides, D. Plexousakis, and S. Alexaki, ``Querying Community Web Portals.'' http://www.ics.forth.gr/proj/isst/RDF/RQL/rql.html.

16
J. van Ossenbruggen, L. Hardman, and L. Rutledge, ``Hypermedia and the Semantic web: A research agenda,'' Tech. Rep. INS-R0105, CWI, 2001.

17
H. W. Lie and T. Celik, ``Media queries.'' Work in progress. W3C Working Drafts are available at http://www.w3.org/TR, 17 March 2001.

18
J. Clark and S. DeRose, ``XML Path Language (XPath) Version 1.0.'' W3C Recommendations are available at http://www.w3.org/TR/, 16 November 1999.


Footnotes

... Schema1
Museum schema example adapted from [15].
... metadata2
Metadata example adapted from [16]).
... example3
Example taken from the CSS2 Specification [8].
... extension4
We are not advocating a specific syntax, but are only claiming that future XSLT transformations need to be able to take CC/PP-like information into account