Technical reports 2006

A-2006-1 Timo Niemi and Kalervo Järvelin, Another look at XML. June 2006.
Abstract. The origin of XML as a markup language for documents led to that XML documents were begun to model as directed labeled graphs (ordered trees). After that the use of XML rapidly expanded to other purposes, especially to a data format for exchanging and sharing data in the Web. However the modeling of XML documents as directed labeled graphs led to several undesirable features such as complex path-oriented XML query languages, problems to support both document-centric and data-centric manipulation in an appropriate way, as well as a mismatch between conventional (e.g. relational) databases and XML data. In order to remove such disadvantages we develop a novel representation for XML documents. In the present paper we introduce a constructor algebra which results in an XML relation with the schema D(C, T, I) where D is the name of an XML document, C describes each meaningful component (an attribute name, element name, attribute/element value or a word in an attribute/element value consisting of words) in D, T describes its type and I is its index used for identifying its exact location in D. In the paper we analyze similarities and differences of this representation and the notion of relation of the conventional relation model. In addition we demonstrate how this representation supports the treatment of such XML documents whose contents and structures are unknown to the user.
Keywords: XML, semi-structured data, semi-structured query language, data integration.

A-2006-2 Janne Jämsen, Timo Niemi and Kalervo Järvelin, Using derived typesin discovering and analysing semantic associations. September 2006.
Abstract. Semantic associations are direct or indirect linkages between two entities that are con-strued from existing associations among entities. In this paper we extend our previous query lan-guage approach for discovering semantic associations with an ability to retrieve semantic associa-tions that, besides explicitly stated (base) associations, may contain associations derived using logic-based derivation rules. As will be shown, this makes it possible to find semantic associations that are both compact and intuitive. To implement this new feature, we introduce a rewriting princi-ple that utilizes derived associations to reduce resulting semantic associations if possible. Other proposed means to assist the interpretation of query results include answer expansion and the order-ing of answers. The incorporated answer expansion feature lets the user investigate rewritten se-mantic associations in a query result at the desired level of detail. The ordering of answers is based on the lengths of the resulting semantic associations, whereby priority is given to shorter semantic associations which often express close and relevant relationships.
Keywords: semantic association, query language, logic, derivation rule, deductive database, answer expansion.

A-2006-3 Jaakko Hakulinen, Markku Turunen and Kari-Jouko Räihä, Tutoring in a Spoken Language Dialogue System. September 2006.
Abstract. We have developed interactive software tutors to teach users how to use a spoken dialogue timetable system. The tutors teach the functionality and interaction style of the telephone-based timetable system to new users by guiding users and monitoring their interaction. The primary modality of the tutors is graphics and they feature a visual representation of the spoken dialogue between a user and the system. Two different versions of tutoring were compared to a static web manual with the same information in a between-subjects experiment with 27 participants. Participants’ evaluations of guidance materials were the most positive towards a tutor featuring a graphical interface representation of the timetable query. An otherwise similar tutor, which did not have the graphical user interface representation, received the weakest evaluations. Error rate variances suggest that tutoring is better than static guidance especially for those who most need guidance.
pdf-file

A-2006-4 Turkka Näppilä, Kalervo Järvelin and Timo Niemi, Construction of data cubes from structurally heterogeneous XML document collections. October 2006.
Abstract. Advanced data analysis methods ­ like OLAP, Knowledge Discovery, and Data Mining ­ try to meet the information needs of modern organizations. OLAP (Online Analytical Processing) is a powerful tool for analyzing multidimensional data cubes, often on an ad hoc basis. As organizations today are increasingly connected through the Web, the data for OLAP data cubes must often be integrated from distributed and autonomous information sources. In this case, severe problems occur due to semantic, syntactic, and structural data heterogeneity. While XML provides a standard data exchange format, the problems of heterogeneity remain. Popular path-oriented XML query languages, such as XQuery, require the user to know, in much detail, the structure of the documents to be processed. This paper demonstrates that such XML query languages are laborious and troublesome in data integration. It is also argued that the proposed improvements, Lowest Common Ancestor (LCA) -based query evaluation strategies, are insufficient. The paper introduces both a novel high-level data extraction primitive utilizing the developed Smallest Possible Context (SPC) evaluation strategy and an advanced OLAP data cube construction operation. The paper demonstrates, through a system prototype and a sample application in Informetrics, that the approach is a real improvement in data integration.

A-2006-5 Marko Junkkari, Paavo Arvola and Jaana Kekäläinen, Grammatical Approach to XML Information Retrieval Query Languages. November 2006.
Abstract. The formal syntax and semantics of a structured query  language, XIL, for document-oriented XML retrieval are given. The  syntax is represented by a context free grammar and the semantics by  an attribute grammar. Dewey like structural indices and traditional  set theoretical operations for their manipulation are utilized as a  meta-language for the attribute grammar. These indices allow effective  construction of relevance-based views for result representation.  Although we follow the syntax of SQL our aim is not to present an  extension to any existing query language. Instead, we show how an SQL  like query language can be applied to XML retrieval without any  additional constructors - even such that have been established in SQL  for flat structures. Further, the language does not involve difficult  constructs, like shared variables but still has the restructuring  power typical for SQL. Structure and structural conditions are given  with path expressions. Further, simple content-only queries, as well  as querying in heterogeneous structures are supported.

1992-2005
2007
2008
2009
2010
To the upper level