A new SPARQL specification and why it is relevant to the Logic Programming community

By
Axel Polleres,
Siemens AG Österreich

After the World Wide Web Consortium (W3C) had already advanced 8 of its total SPARQL1.1 specifications to proposed recommentdations status (the last stage before becoming an official standard recommendation) earlier in November, finally also the last 3 specification documents will advance to proposed recommendations soon.

The SPARQL 1.1 specifications

SPARQL, the “Simple Protocol and RDF Query Language”, is the W3C’s standard query language for RDF (the Resource Description Framework, an emerging data format on the growing Web of Data). After SPARQL’s first edition has become a standard in 2008, the community and implementers have requested a variety of additional features that the SPARQL 1.1 working group took as a starting point in 2009 for re-shaping the next version of the standard.

Particularly, the following new features were claimed missing in the query language:

  • Aggregate functions Aggregate functions will allow operations on the query engine side such as counting, numerical min/max/average and so on, by operating over columns of results.
  • Subqueries This feature shall allow nesting the results of a SPARQL query within another query
  • Project expressions This feature will allow one to compute values from expressions within queries, rather than just returning terms appearing in the queried RDF.
  • Property paths Many classes of queries over RDF graphs require traversing hierarchical data structures and involve arbitrary-length paths, a feature also known as regular path queries.
  • Inferred results under different entailment regimes The original specification was missing what additional results a SPARQL query should give in terms of ontological knowledge encoded in RDF Schema, OWL, or in the form of rules (such as specified in the Rule Interchange Format – RIF).

Besides, SPARQL in its first edition, was missing a data manipulation language for updates, or the ability to query data from several remote SPARQL endpoints across the Web that support the SPARQL protocol (federation), and means to describe such endpoints semantically.

The new upcoming SPARQL1.1 specification addresses all these points and moreover provides additional specifications for catering for different popular results formats to exchange SPARQL query results, or for manipulating RDF data directly via the HTTP protocol.

Overall, the SPARQL specification consists of 11 documents:

  1. SPARQL 1.1 Overview – Overview of SPARQL 1.1 and the SPARQL 1.1 documents
  2. SPARQL 1.1 Query Language – A query language for RDF data.
  3. SPARQL 1.1 Update – Specifies additions to the query language to allow clients to update stored data
  4. SPARQL 1.1 Query Results JSON Format – How to use JSON for SPARQL query results
  5. SPARQL 1.1 Query Results CSV and TSV Formats – How to use comma-separated values (CVS) and tab-separated values (TSV) for SPARQL query results
  6. SPARQL Query Results XML Format – How to use XML for SPARQL query results. (This contains only minor, editorial updates from SPARQL 1.0, and is actually a Proposed Edited Recommendation.)
  7. SPARQL 1.1 Federated Query – an extension of the SPARQL 1.1 Query Language for executing queries distributed over different SPARQL endpoints.
  8. SPARQL 1.1 Service Description – a method for discovering and a vocabulary for describing SPARQL services.
  9. SPARQL 1.1 Entailment Regimes – defines the semantics of SPARQL queries under entailment regimes such as RDF Schema, OWL, or RIF.
  10. SPARQL 1.1 Protocol for RDF – A protocol defining means for conveying arbitrary SPARQL queries and update requests to a SPARQL service.
  11. SPARQL 1.1 Graph Store HTTP Protocol – As opposed to the full SPARQL protocol, this specification defines minimal means for managing RDF graph content directly via common HTTP operations.

SPARQL1.1 and Logic Programming

In the recent Datalog2.0 workshop in Vienna I had the opportunity to present a tutorial on the topic how SPARQL interplays with Logic Programming, particularly with Datalog. The tutorial – slides of which are available on workshop Web page – tries to summarize the SPARQL standard and its relation to Logic Programming from a Datalog point of view, as well as the interplay of SPARQL with another W3C standard closely related to Logic Prorgamming, the Rule Interchange Format (RIF). The material, which also contains a list of references to related academic works around these standards, hopefully can serve as an entry point for interested researchers from the Logic Programming community to further dig into this new technology and make use of it within Logic Programming based tools and applications.