KESO - Knowledge Extraction for Statistical Offices

Start: 
01.01.1996
End: 
31.12.1999

Project code: KESO
Research group: Database Architectures and Information Access (INS1)

Project coordinator: Arno Siebes

KESO (Knowledge Extraction for Statistical Offices) is an ESPRIT-IV project under Eurostat/DOSIS. The projected is scheduled for three years and started on 1 January 1996.

The goal of the KESO project is to construct a versatile, efficient, industrial strength data mining system prototype that satisfies the needs of providers of large-scale databases. The development will be guided by a continuous assessment of the participating public and private Statistical Offices both on the applicability and the added value for their complex datasets.

To obtain the feedback from the users to the developers, the intermediate results of the project include three releases of the system, the first has been released in February 1997. Parallel to the development of the first KESO release, large scale applications have been selected and prepared by the Statistical Offices.

The first KESO release offers a new, modular, multi-process system architecture for datamining centered around a common, persistently stored search space which serves as a central point of interaction between the different system processes and the user. The normal course of processing in this architecture is that a search module looks at the current state of search and then decides to do one of two things: ask its description generator to generate new hypotheses, or ask the quality computer module to go to the mining database to test the hypothesis and compute its quality.

The mining conductor is responsible for instantiating search modules with particular search tasks (so multiple search tasks can run in parallel) and for deciding what to do next when a particular search task has completed or reached an impasse, so it can e.g. start an attribute discretization task when a search task has suspended itself for lack of discretized attributes, and then restart the original task. The mining server is the component responsible for actually answering statistical queries about the database; the quality computer bases its quality computation on the answers to these queries.

In the KESO system, the Monet database system is used to store the large search space as well as the mining database. The current version supports the discovery of interesting subsets, both with beam search and broadview, including discretization. Moreover, the system supports the discovery of association rules.

The Statistical Offices are currently testing the first version on the applications they selected in the first year. The development is currently focussed on optimization, extending the set of mining algorithms and, most importantly, on the user interface.

Members
Robert Castelo Valdueza, Martin Kersten, Donald Kwakkel, Arno Siebes

Key publications
Key publications of KESO

Cooperation