Framework to Automatically Determine the Quality of Open Data Catalogs

Repository Overview: This repository offers an innovative solution for assessing the quality of open data catalogs, based on the paper by Jorge Martinez-Gil.

🌟 Introduction

In the era of data-driven decision making, data catalogs are indispensable. They streamline the discovery, understanding, and utilization of diverse data assets. This framework introduces an automated approach to evaluate the quality of open data catalogs. It's designed to bolster confidence in the data used by organizations, ensuring decisions are based on accurate, complete, and timely information.

📊 Core Quality Dimensions

The framework evaluates data catalogs across multiple dimensions:

Accuracy: Ensures data correctness and precision.
Completeness: Measures data availability comprehensively.
Consistency: Maintains coherence across various data sources.
Scalability: Assesses the catalog's ability to manage growing data volumes.
Timeliness: Keeps data relevant and up-to-date.

📈 Non-Core Quality Dimensions

Beyond the core dimensions, we assess:

Provenance: Traces the origin and history of data.
Readability: Guarantees clear and understandable data descriptions.
Licensing: Confirms data usage rights and restrictions.

🔄 Compatibility and Similarity Assessment

Identify and leverage complementary data assets through our advanced assessment tools for compatibility and similarity among various data catalogs.

🛠️ Installation

pip install -r requirements.txt

⚙️ Usage

A suite of commands to evaluate different aspects of a data catalog:

python check_accuracy.py example001.ttl Check the accuracy of a DCAT data catalog.

python check_compatibility.py example001.ttl example002.ttl Check the compatibility of two DCAT data catalogs.

python check_completeness.py example001.ttl Check the completeness of a DCAT data catalog.

python check_consistency.py example001.ttl entity_type Check the consistency of a DCAT data catalog for the kind of entity (catalog, dataset, distribution).

python check_licensing.py example001.ttl Check the licensing of a DCAT data catalog.

python check_lineage_provenance.py example001.ttl Check the lineage and provenance of a DCAT data catalog.

python check_readability.py example001.ttl Check the readability of a DCAT data catalog according the Flesch-Kincaid Grade Level.

python check_scalability.py example001.ttl Check the scalability of a DCAT data catalog.

python check_similarity.py example001.ttl example002.ttl Check the similarity of two DCAT data catalogs.

python check_timeliness.py example001.ttl Check the timeliness of a DCAT data catalog.

📚 Citation

Please cite our work if you find it useful:

@inproceedings{martinez2023d,
  author    = {Jorge Martinez-Gil},
  title     = {Framework to Automatically Determine the Quality of Open Data Catalogs},
  journal   = {CoRR},
  volume    = {abs/2307.15464},
  year      = {2023},
  url       = {https://arxiv.org/abs/2307.15464},
  doi       = {https://doi.org/10.48550/arXiv.2307.15464},
  eprinttype = {arXiv},
  eprint    = {2307.15464}
}

📄 License

This project is available under the MIT License.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Framework to Automatically Determine the Quality of Open Data Catalogs

🌟 Introduction

📊 Core Quality Dimensions

📈 Non-Core Quality Dimensions

🔄 Compatibility and Similarity Assessment

🛠️ Installation

⚙️ Usage

📚 Citation

📄 License

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
Official catalogs		Official catalogs
LICENSE		LICENSE
README.md		README.md
README.old		README.old
check_accuracy.py		check_accuracy.py
check_compatibility.py		check_compatibility.py
check_completeness.py		check_completeness.py
check_consistency.py		check_consistency.py
check_licensing.py		check_licensing.py
check_lineage_provenance.py		check_lineage_provenance.py
check_readability.py		check_readability.py
check_scalability.py		check_scalability.py
check_similarity.py		check_similarity.py
check_timeliness.py		check_timeliness.py
example001.ttl		example001.ttl
example002.ttl		example002.ttl
example003.ttl		example003.ttl
example004.ttl		example004.ttl
requirements.txt		requirements.txt
show_data_catalog.py		show_data_catalog.py

License

jorge-martinez-gil/dataq

Folders and files

Latest commit

History

Repository files navigation

Framework to Automatically Determine the Quality of Open Data Catalogs

🌟 Introduction

📊 Core Quality Dimensions

📈 Non-Core Quality Dimensions

🔄 Compatibility and Similarity Assessment

🛠️ Installation

⚙️ Usage

📚 Citation

📄 License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages