Template: template.mahdept

This a template for creating datapackage repositories for Government of Maharashtra for orders/GRs found at https://gr.maharashtra.gov.in/1145/Government-Resolutions. If you are looking for a generic datapackage template then visit http://github.com/orgpedia/template.datapackage.

You need to have copier package installed to use this template.

Creating a Maharasthra department data package repository

Create an empty repository in Github make sure you choose a short name <org_code> that beings with mah, for example mahedu or mahhome.
Clone the newly created repository using the command given below. If you are on windows you should enable sym-links. Check this link to know more. Note given there are lot of pdfs, the pdf files are not checked in .

git clone http://github.com/orgpedia/<org_code>
populate <org_code> directory using this template by running the command.

copier https://github.com/orgpedia/template.mahdept.git path/to/<org_code>
commit the changes in the git repository.

Directory Layout

The directory layout of this repository follows the ideas mentioned in this video: Principled Data Processing by Patrick Ball. The idea is to follow a fixed and well understood layout to minimize the need for documentation.

There are 3 main directories import, flow and export. A simple makefile orchestrates the document flow across these folders, run make help to find out more about the commands.

`import`

This contains the documents you want to process and other artifcats that are needed for executing the document flow. Most of this content is either on GitHub LFS or imported at the start and never checked in.

$ tree import
import
├── data_packages      # This data (orders, officer_infos) from other data-packages
│   └── README.md
├── docker             # Generated Docker images and mounted dirs
│   └── README.md
├── documents          # Documents that are processed, with normalized names
│   ├── README.md
│   ├── dept-1.pdf -> ../.../upload_3456_12022023.pdf # short name
│   ├── dept-2.pdf -> ../.../upload_3456_12022024.pdf
│   └── documents.yml
├── models             # Models needed for this data package
│   ├── huggingface
│   │   └── README.md
│   └── models.yml
├── src                # code for downloading docs
│   └── README.md
└── websites           # Actual document pdfs (stored on GitHub LFS)
    ├── README.md
    └── dept.gov.in
        └── uploads
            ├── upload_3456_12022023.pdf
            └── upload_3456_12022024.pdf

`flow`

This contains series of task directories that process the documents and generate the data, this is the main part of the respository. Note, that the data (both intermediate and final) is checked in the repository.

$ tree flow
flow
├── subFlows           # Generate supporting data/conf files     
│   └── README.md       
└── task_              # Main processing directory for document-flow
    ├── conf           # Contains configuration for the files in task
    ├── input          # conains input to the task
    │   ├── dept-1.pdf -> ../../../import/documents/dept-1.pdf
    │   └── dept-2.pdf -> ../../../import/documents/dept-2.pdf
    ├── logs           # logs generated in the task
    ├── output         #  output generated by this task
    │   ├── dept-1.order.json
    │   ├── dept-2.order.json
    │   ├── orders.json
    │   └── tenures.json
    └── src            # code for this task
        └── gen_order.py

`export`

The data generated is stored in the export/data folder as sym-links, this also contains other intermediate data folders that are needed. Most of these directories are not checked in the repository.

$ tree export
export
├── data               # This contains the final data generated
│   ├── README.md
│   ├── orders
│   │   ├── README.md
│   │   ├── dept-1.order.json -> .../flow/task_/output/dept-1.order.json
│   │   └── dept-2.order.json -> .../flow/task_/output/dept-2.order.json
│   ├── orders.json -> ../../flow/task_/output/orders.json
│   ├── schema
│   │   └── README.md
│   └── tenures.json -> ../../flow/task_/output/tenures.json
├── html               # Temporary folder html for debugging
│   └── README.md
├── orgpedia_{{org_code}} # Folder for creating python package
│   ├── __init__.py
│   └── datapackage.json.jinja
└── page_images        # images of pages for vision algorithms
    └── README.md

`make help`

Check makefile to know more about the commands

$ make help
Please use 'make <target>', where <target> is one of

install     install packages and prepare software environment

import      import data required for processing document flow
flow        execute the tasks in the document flow
export      export the data generated by the document flow

check       check if files are correctly laid in the flow directory
readme      generate the readme for the flow/task directories

lint        run the code linters
format      reformat code
pre-commit  run pre-commit checks, runs yaml lint, you need pre-commit

Check the makefile to know exactly what each target is doing.

Name		Name	Last commit message	Last commit date
Latest commit History 51 Commits
.github		.github
.secrets		.secrets
export		export
flow		flow
import		import
.gitattributes.jinja		.gitattributes.jinja
.gitignore		.gitignore
.gitignore.jinja		.gitignore.jinja
.pre-commit-config.yaml		.pre-commit-config.yaml
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE.txt		LICENSE.txt
README.md		README.md
README.md.jinja		README.md.jinja
THANKS.md		THANKS.md
copier.yml		copier.yml
makefile.jinja		makefile.jinja
pyproject.toml.jinja		pyproject.toml.jinja
{{_copier_conf.answers_file}}.jinja		{{_copier_conf.answers_file}}.jinja

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Template: template.mahdept

Creating a Maharasthra department data package repository

Directory Layout

`import`

`flow`

`export`

`make help`

About

Releases

Packages

Languages

License

orgpedia/template.mahdept

Folders and files

Latest commit

History

Repository files navigation

Template: template.mahdept

Creating a Maharasthra department data package repository

Directory Layout

import

flow

export

make help

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

`import`

`flow`

`export`

`make help`

Packages