Boilerplate code that can be used for datasets restification.
The restification procedure uses the following steps:
- Register the dataset and retrieve schema.
- Create a REST service that supports the identified schema.
- Use the service to populate a MongoDB with the raw data and provide API.
In order to register a certain dataset use the following steps:
- Create a folder containing the dataset name into the directory
datasets/
e.g. (metrics) and place all the data files that contain the raw data. - Create the data schema, which is necessary for the restification service. In order to do so, you can use the following code:
# Initialize datasets handler
dH = datasetsHandler('datasets')
# Get the data files of the first dataset
data_files_path = dH.get_datasets_files_path(dH.datasets_to_import[0])
# Create the schema and store it in schemas directory
df = dH.read_data(data_files_path[0], columns = 'all')
dH.schema_extractor(df, 'schemas/class-metrics.schema')
In order to run the REST service based on your created schemas, you can simply run service.py
. The configuration details are documented in settings.py
.
The service is based on the Python REST API framework Eve.
Once you have run the REST service, you can import the data using the following commands:
# Initialize the datasets handler
dH = datasetsHandler('datasets')
# Initialize the data importer
dI = dataImporter.dataImporter(dH)
# Import data (change the url according with your configuration)
dI.import_data('http://localhost:5000/api/v1/')