Research | Iliad: Datasets & Data Management
Data harmonization pipelines
The data harmonization pipelines (aka. Data Preparation and Integration Pipelines - DPI) provide a straightforward implementation of an approach based on the adoption of Linked Data as a federated layer, combined with the use of knowledge graph technologies, where data is made available and combined according to common ontologies/vocabularies. The approach carries out different tasks for the generation and publication of Linked Data in line with best practices and guidelines. A key input for the pipelines are the selected target ontologies/vocabularies. In the case of Iliad this is the Ocean Information Model (OIM).
The DPI software leverages and connects different tools, abstracting the different interfaces and implementation details through simple to use interfaces. Accordingly, the DPI facilitates the exploitation of the pipelines’ underlying components via a homogenous layer, enabling users, or other components to launch a full pipeline, or individual steps. The harmonized data generated by the DPI can be accessed via different mechanisms, including:
- SPARQL queries directly over the triplestore
- OGC APIs, which may be generated (semi-)automatically, enabling to expose generated data via standard APIs that will be more convenient and require less effort to use by developers and client applications
- custom APIs that expose pre-defined queries as API methods. This allows users and developers not only to execute pre-defined methods but also to define their own queries that are converted on the fly to API methods.
- Web UIs, which can provide customized user interfaces, e.g., dashboards, to visualize the data via visual components like maps, graphs, etc.
The DPI supports multiple input data formats (e.g., CSV, Shapefile, NetCDF, JSON, RDBMS), and different mapping formats (RML, YARRRML, SPARQL, CSVW). The DPI is available as:
- CLI tool : ETL software written in Python, which carries out the tasks of fetching, extracting, preprocessing, transforming, post-processing, and loading linked data into the triplestore. It supports the execution of pre-defined pipelines, which executes full pipelines for specific datasets (FADN, LPIS), or generic pipelines, which enables the execution of individual tasks or flexible combinations on different types of data sources. A key feature of the general pipeline is that it includes a mapping generator, which can create mappings for the pipeline from scratch based on a simple configuration file (in the form of a YAML file) provided by the user.
- Web Service: wraps the CLI tool and exposes its functionalities through a RESTful API. For Iliad, the service has been deployed in a PaaS environment , along with other containers to provide storage, security, queue management and other functionalities. Additionally, the service relies on a semantic triplestore where the generated data is stored. On top of this service there are a few other components providing simpler user interfaces (GUI) as well as standard programmatic access to the harmonized data generated by the pipelines
- GUI client application : provides a simple user interface to use the DPI functionalities, allowing users, such as data and service providers, to create and configure their pipelines, execute them and access the results. Also, the application includes an interface to allow users to create/edit mapping specifications in YAML
DPI CLI source code: https://gitlab.pcss.pl/daisd-public/dpi-pipelines/pipelines
DPI Web service live instances:
- Production: https://dpi-enabler-dpi-enabler.apps.dcw1.paas.psnc.pl/api/ https://dpi-enabler-dpi-enabler.apps.dcw1.paas.psnc.pl/api/swagger (swagger)
- Development: https://dpi-enabler-demeter.apps.paas-dev.psnc.pl/api/ https://dpi-enabler-demeter.apps.paas-dev.psnc.pl/api/swagger/ (swagger)
DPI GUI: https://dpi-enabler-ui-test.apps.paas-dev.psnc.pl/
Please find demo videos:
Additional Details
Type: Container Images, SaaS and APIs, Web Applications
Theme: Research, Iliad: Datasets & Data Management
Iliad Project Component
Language(s): English
Contact Information:
Raul Palma
Dr
PSNC
Screenshots




Feedback
Have you used this product? Please provide feedback to help product owners improve the quality of their products.