RDF Normalization
Project Members
Richard Chbeir (UPPA – France)
Joe Tekli (LAU – Lebanon)
Sébastien Laborie (UPPA – France)
Regina Ticona (UPPA – France)
Irvin Dongo (UPPA – France)
Renato Guzmán (UCSP – Peru)
Abstract
Linked Open Data is a way of publishing and sharing structured data that allows descriptions to be connected and enriched. Nowadays, several datasets are currently available online (e.g., DBpedia, LinkedGeoData and New York Times). These datasets are most of the time built upon RDF, i.e., a graph-based W3C standard model for data inter-change on the Web. Governments, universities as well as companies (e.g., BBC, CNN) are producing huge collections of RDF graphs and exchanging them through different serialization formats (e.g., RDF/XML, Turtle, N-Triple, etc.). However, many available RDF graphs and serializations are noisy in terms of structure and syntax, and thus are not efficient when exploiting them (e.g., storage, processing time, loading time, similarity measuring, mapping, alignment and versioning). In this article, we propose to clean RDF descriptions (i.e., graphs and serializations). For that purpose, we have defined a framework entitled R2NR which normalizes different RDF descriptions pertaining to the same information into one normalized representation, which can then be tuned both at the graph level and at the serialization level, depending on the target application. In order to validate our framework, we have implemented an online service and evaluated it through extensive experimentations. Results show significant improvements over existing approaches, namely regarding loading time and file size while preserving all the information from the original description.
Contributions
The contributions of our approach can be summarized as follows:
i) Producing a normalized (output) RDF representation that preserves all the information in the source (input) RDF descriptions,
ii) Eliminating redundancies and disparities in the normalized RDF descriptions, both at the logical (graph) and physical (serialization) levels,
iii) Computing a compact RDF serialization output adapted with respect to. the target application requirements (faster loading, better storage, etc.),
iv) Providing a mathematical formalization of the normalization process with dedicated normalization functions, operators, and rules with provable properties, and
v) Providing an online Web application in order to test and to evaluate the approach’s efficiency.
Architecture of R2NR framework