Skip to content

Commit

Permalink
Merge pull request #59 from uktrade/docs/link-to-more-complete-set-of…
Browse files Browse the repository at this point in the history
…-repos

docs: link to a more complete set of repos
  • Loading branch information
michalc authored Mar 2, 2024
2 parents f044575 + e7eb8cb commit 7eb0b20
Showing 1 changed file with 59 additions and 0 deletions.
59 changes: 59 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -78,6 +78,65 @@ Some of the components of Data Workspace are lower level, and less Data Workspac

A CLI script to make bulk updates to Amazon Quicksight datasets


#### Ingesting data

These components are usually used to ingest data into the PostgreSQL database that's the core of Data Workspace

- [pg-bulk-ingest](https://github.com/uktrade/pg-bulk-ingest)<br>
[pg-force-execute](https://github.com/uktrade/pg-force-execute)

Used to ingest large amounts of data in the PostgreSQL database

- [to-file-like-obj](https://github.com/uktrade/to-file-like-obj)

Used in serveral ways to convery from iterables of bytes to a file-like object for memory-efficient data ingestion. For example when parsing CSVs.

- [iterable-subprocess](https://github.com/uktrade/iterable-subprocess)

Used to extract data from archives in a format that requires running an external program.

- [stream-read-ods](https://github.com/uktrade/stream-read-ods)

Used to extract data from Open Document Spreadsheet (ODS) files in a memory-efficient and disk-efficient way.

- [stream-unzip](https://github.com/uktrade/stream-unzip)

Used to extract data from ZIP files in a memory-efficient and disk-efficient way.

- [stream-read-xbrl](https://github.com/uktrade/stream-read-xbrl)

Used to ingest data from Companies House.

- [sqlite-s3vfs](https://github.com/uktrade/sqlite-s3vfs)

Used to generate large and complex SQLite files that are then ingested into the Data Workspace PostgreSQL database.

- [s3-dropbox](https://github.com/uktrade/s3-dropbox)

Used to power a simple API to accept incoming data files in any format and drop it in S3, subsequently ingested into Data Workspace.


#### Publishing data

These components are used when publishing data from Data Workspace.

- [public-data-api](https://github.com/uktrade/public-data-api)

Makes data available to the public.

- [stream-zip](https://github.com/uktrade/stream-zip)

Creates ZIP files in a memory-efficient and disk-efficient way.

- [stream-write-ods](https://github.com/uktrade/stream-write-ods)

Creates Open Document Spreadsheet (ODS) files in a memory-efficient and disk-efficient way.

- [postgresql-proxy](https://github.com/uktrade/postgresql-proxy)

Part of the system that makes data available to other internal applications.

---

### Contents of this repository
Expand Down

0 comments on commit 7eb0b20

Please sign in to comment.