Welcome to the Code Improvement Commission
This repository was created by UniCourt on behalf of Public.Resource.Org. All this work is in the public domain and there are NO RIGHTS RESERVED.
This repository contains software that transforms official codes from ugly .rtf files into nice-looking, accessible HTML. We use "textutil" on a Mac to go from .rtf to bad HTML. Then, the code in this repository does the heavy lifting.
Currently this code supports following states:
-
###Georgia (GA):
Code repo: https://github.com/UniCourt/cic-code-ga
Code pages: https://unicourt.github.io/cic-code-ga
Original RTF: https://archive.org/download/gov.ga.ocga.2018
-
###Arkansas (AR):
Code repo: https://github.com/UniCourt/cic-code-ar
Code pages: https://unicourt.github.io/cic-code-ar
Original RTF: https://archive.org/download/gov.ar.code
-
###Mississippi (MS):
Code repo: https://github.com/UniCourt/cic-code-ms
Code pages: https://unicourt.github.io/cic-code-ms
Original RTF: https://archive.org/download/gov.ms.code.ann.2018
-
###Tennessee (TN):
Code repo: https://github.com/UniCourt/cic-code-tn
Code pages: https://unicourt.github.io/cic-code-tn
Original RTF: https://archive.org/details/gov.tn.tca
-
###Kentucky (KY):
Code repo: https://github.com/UniCourt/cic-code-ky
Code pages: https://unicourt.github.io/cic-code-ky
Original RTF: https://archive.org/details/gov.ky.code
-
###Colorado (CO):
Code repo: https://github.com/UniCourt/cic-code-co
Code pages: https://unicourt.github.io/cic-code-co
Original RTF: https://archive.org/download/gov.co.crs.bulk
-
###Idaho (ID):
Code repo: https://github.com/UniCourt/cic-code-id
Code pages: https://unicourt.github.io/cic-code-id
Original files can be found here: https://archive.org/details/govlaw?and%5B%5D=subject%3A%22idaho.gov%22+AND+subject%3A%222020+Code%22&sin=&sort=titleSorter
-
###Virginia (VA):
Code repo: https://github.com/UniCourt/cic-code-va
Code pages: https://unicourt.github.io/cic-code-va
Original RTF: https://archive.org/download/gov.va.code/
-
###Vermont (VT):
Code repo: https://github.com/UniCourt/cic-code-vt
Code pages: https://unicourt.github.io/cic-code-vt
Original RTF: https://archive.org/download/gov.vt.code
-
###Wyoming (WY):
Code repo: https://github.com/UniCourt/cic-code-wy
Code pages: https://unicourt.github.io/cic-code-wy
Original RTF: https://archive.org/details/gov.wy.code/
In subsequent months, we intend to add two more features:
- Extend the code to handle the official codes Colorado and Idaho.
- Add a "redline" capability to show diffs.
REQUIREMENTS AND INSTALLATION
BeautifulSoup 4: https://www.crummy.com/software/BeautifulSoup/
lxml: https://lxml.de/
To setup project :
-
Create new folder named transforms
-
Based on the state create a folder called transforms/{state_name}
3.Inside the above folder based on the release create a folder ocga/{release} which will contain raw files (raw files are textutil output files)
-
Example folder structure:
``` project │ README.md │ requirements.txt │ └───html_parser │ │ file011.py │ │ file012.py | └───transforms │ └───ga │ └───ocga │ └───raw │ title_01.html
-
Python3.8 should be installed in development environment to run this project
-
run pip install -r requirements.txt to install all the packages required
Usage: python html_parser/html_parse_runner.py
[--state_key (GA)]
[--release_label (Release-75)]
[--release_date (DD-MM-YYYY)]
[--input_file_name (gov.ga.ocga.title.01.html) This is an optional argument,
if this argument is not passed all the files for provided release label will be parsed]