RDF Normalization

Project Members

Richard Chbeir (UPPA – France)
Joe Tekli (LAU – Lebanon)
Sébastien Laborie (UPPA – France)
Regina Ticona (UPPA – France)
Irvin Dongo (UPPA – France)
Renato Guzmán (UCSP – Peru)

Abstract

Linked Open Data is a way of publishing and sharing structured data that allows descriptions to be connected and enriched. Nowadays, several datasets are currently available online (e.g., DBpedia, LinkedGeoData and New York Times). These datasets are most of the time built upon RDF, i.e., a graph-based W3C standard model for data inter-change on the Web. Governments, universities as well as companies (e.g., BBC, CNN) are producing huge collections of RDF graphs and exchanging them through di fferent serialization formats (e.g., RDF/XML, Turtle, N-Triple, etc.). However, many available RDF graphs and serializations are noisy in terms of structure and syntax, and thus are not efficient when exploiting them (e.g., storage, processing time, loading time, similarity measuring, mapping, alignment and versioning). In this article, we propose to clean RDF descriptions (i.e., graphs and serializations). For that purpose, we have defi ned a framework entitled R2NR which normalizes di fferent RDF descriptions pertaining to the same information into one normalized representation, which can then be tuned both at the graph level and at the serialization level, depending on the target application. In order to validate our framework, we have implemented an online service and evaluated it through extensive experimentations. Results show signifi cant improvements over existing approaches, namely regarding loading time and file size while preserving all the information from the original description.

Contributions

The contributions of our approach can be summarized as follows:

i) Producing a normalized (output) RDF representation that preserves all the information in the source (input) RDF descriptions,
ii) Eliminating redundancies and disparities in the normalized RDF descriptions, both at the logical (graph) and physical (serialization) levels,
iii) Computing a compact RDF serialization output adapted with respect to. the target application requirements (faster loading, better storage, etc.),
iv) Providing a mathematical formalization of the normalization process with dedicated normalization functions, operators, and rules with provable properties, and
v) Providing an online Web application in order to test and to evaluate the approach’s efficiency.

Prototype

Architecture of R2NR framework

framework