Probabilistic Reasoning and Learning for the Semantic Web

By Riccardo Zese,
Dipartimento di Ingegneria, Università di Ferrara
April 2016

Abstract

The Semantic Web introduced a new vision of the World Wide Web where the information resources published on the Internet are readable and understandable by machines. However, incompleteness and/or uncertainty are intrinsic to much information, specially when it is collected from different sources. Thus we need a way to manage this kind of data.

In this thesis we address this problem and we present a complete framework for handling uncertainty in the Semantic Web. Description Logics (DLs) are the basis of the Semantic Web. DL knowledge bases (KBs) contains both assertional and terminological information regarding individuals, classes of individuals and relationships among them. We first defined a probabilistic semantics for DLs, called DISPONTE. It is inspired by the distribution semantics, a well known approach in probabilistic logic programming. DISPONTE permits to associate degrees of belief to pieces of information and to compute the proba- bility of queries to KBs.

The thesis then proposes a suite of algorithms for reasoning with KBs following DISPONTE:

BUNDLE, for “Binary decision diagrams for Uncertain reasoNing on Description Logic thEories”, computes the probability of queries w.r.t. DISPONTE KBs by means of the tableau algorithm and knowledge compilation. BUNDLE is based on Pellet, a state of the art reasoner, and is written in Java.
TRILL, for “Tableau Reasoner for descrIption Logics in Prolog”, performs inference over DISPONTE KBs with the tableau algorithm implemented in the declarative Prolog language. Prolog is useful for managing the non-determinism of the reasoning process.
TRILLP, for “TRILL powered by Pinpointing formulas”, differs from TRILL because it encodes the set of all explanations for queries with a more compact Boolean formula.

A second problem to address is the fact that the probability values are difficult to set for humans. However, usually information is available which can be leveraged for tuning these parameters. Moreover, terminological information in KBs may be incomplete or poorly structured. We thus need of learning systems able to cope with these problems. We present two learning systems, one for each problem:

EDGE, for “Em over bDds for description loGics paramEter learning”, learns the parameters of a DISPONTE KB.
LEAP, for “LEArning Probabilistic description logics”, learns termino- logical axioms together with their parameters by using EDGE.

However, the size of the data is constantly increasing, leading to the so-called Bid Data, Dataset are often too huge to be handled by a single machine in a reasonable time. Modern computing infrastructures such as clusters and clouds must be used where the work is divided among different machines. We thus extended both EDGE and LEAP to exploit these facilities by imple- menting EDGEMR and LEAPMR that distribute the work using a MapReduce approach.

All systems were tested on real life problems and their performances was comparable or superior to the state of the art.

PUBLISHED FULL TEXT