You need data; Air Supply will get it to you.
Air Supply is a versatile library to handle getting data from multiple sources in a sane way to use in your application build, data analysis, or whatever else requires fetching some data.
Air Supply aims to address the need of having to bring in various data sources when making small or mid-size, self-contained projects. Air Supply was originally conceived while working at the Star Tribune where we often create small projects and different, non-dynamic data sources are needed for most of them. Air Supply is essentailly putting together and making a consistent interface for lots of one-off code around getting and parsing data that was written and used for many years.
- Can handle many sources of data, such as local files and directories, HTTP(S) sources, Google Docs and Sheets, many SQL sources, AirTable, and more. See Packages.
- Can easily parse and transform data such as CSV-ish, MS Excel, YAML, Shapefiles, ArchieML, zip files, and more. See parsers
- Caches by default.
- Aimed at simple uses by just writing a JSON config, as well as more advanced transformations.
- Loads dependency modules as needed and allows for overriding.
- Not focused on performance (yet). The caching mitigates a lot of issues here, but the goal would be to use streams for everything where possible.
- Not meant for very complex data pipelines. For instance, if you have to scrape a thousand pages, Air Supply doesn't currently fit well, but could still be used to pull the processed data into your application.
These projects do roughly similar things, but not to the same degree:
npm install air-supply --save
By default Air Supply only installs the most common dependencies for its packages and parsers. This means, if you need specific more parsers and packages, you will need to install them as well. For instance:
npm install googleapis archieml
If you just want to use the command-line tool, install globally like:
npm install -g air-supply
If you plan to use a number of the packages and parsers, it could be easier (though uses more disk-space), to install all the "dev dependencies" which includes all the packages and parser dependences:
NODE_ENV=dev npm install -g air-supply
Air Supply can be used as a regular Node library, or it can utilize config files that can be run via a command-line tool or as well as through Node.
Basic usage in Node by defining the Packages when using Air Supply.
const { AirSupply } = require('air-supply');
// Create new AirSupply object and tell it about
// the packages it needs
let air = new AirSupply({
packages: {
remoteJSONData: 'http://example.com/data.json',
// To use Google Sheet package, make sure to install
// the googleapis module:
// npm install googleapis
googleSheetData: {
source: 'XXXXXXX',
type: 'google-sheet'
}
}
});
// Get the data, caching will happen by default
let data = await air.supply();
// Data will look something like this
{
remoteJSONData: { ... },
googleSheetData: [
{ column1: 'abc', column2: 234 },
...
]
}
The command line tool will look for configuration in multiple places. See Configuration files below. You can simply call it with:
air-supply
A configuration file, such as a .air-supply.json
, will be loaded and run through Air Supply, outputting the fetched and transformed data to the command line (stdout).:
{
"packages": {
"cliData": "some-file.yml"
}
}
You can also point the comand-line tool to a specific file if you want:
air-supply -c air-supply.rc > data.json
Any AirSupply options are passed down to each Package, so we can define a custom ttl
(cache time) to AirSupply and then override for each package.
const { AirSupply } = require("air-supply");
// Since we are using the YAML parser, make sure module is installed
// npm install js-yaml
let air = new AirSupply({
ttl: 1000 * 60 * 60,
packages: {
// This data will probably not change during our project
unchanging: {
ttl: 1000 * 60 * 60 * 24 * 30,
source: "http://example.com/data.json"
},
defaultChanging: "https://example/data.yml"
}
});
await air.supply();
Each Package can be given a transform function to transform data. We can also alter when the caching happens. this can be helpful in this instance so that we don't do an expensive task like parsing HTML.
// Cheerio: https://cheerio.js.org/
const cheerio = require("cheerio");
const { AirSupply } = require("air-supply");
let air = new AirSupply({
packages: {
htmlData: {
// Turn off any parsing, since we will be using cheerio
parser: false,
source: "http://example.com/html-table.html",
// Transform function
transform(htmlData) {
$ = cheerio.load(htmlData);
let data = [];
$("table.example tbody tr").each(function(i, $tr) {
data.push({
column1: $tr.find("td.col1").text(),
columnNumber: parseInteger($tr.find("td.col2").text(), 10)
});
});
return data;
},
// Alter the cachePoint so that AirSupply will cache this
// after the transform
cachePoint: "transform"
}
}
});
await air.supply();
You can easily read a directory of files. If you just give it a path to a directory, it will assume you mean a glob of **/*
in that directory.
const { AirSupply } = require("air-supply");
let air = new AirSupply({
packages: {
directoryData: "path/to/directory/"
}
});
await air.supply();
This might cause problems or otherwise be an issue as it will read every file recursively in that directory. So, it may be helpful to be more specific and define a glob to use. This requires being explicit about the type of Package. We can also use specific parserOptions
to define how to parse files.
// In this example we are using the csv and yaml parsers, so make sure to:
// npm install js-yaml csv-parse
const { AirSupply } = require("air-supply");
let air = new AirSupply({
packages: {
directoryData: {
source: "path/to/directory/**/*.{json|yml|csv|custom-ext}",
type: "directory"
// The Directory Package type will define the `parser` option as
// { multiSource: true } which will tell the parser to treat it
// as an object where each key is a source. This means, we can
// define specific options for specific files.
parserOptions: {
"file.custom-ext": {
parser: "yaml"
}
}
}
}
});
await air.supply();
You can also achieve something similar by just overriding the parser configuration to handle other extensions. Here we will update the YAML matching for another extension.
// In this example we are using the csv and yaml parsers, so make sure to:
// npm install js-yaml csv-parse
const { AirSupply } = require("air-supply");
let air = new AirSupply({
parserMethods: {
yaml: {
match: /(yaml|yml|custom-ext)$/i
}
},
packages: {
directoryData: {
source: "path/to/directory/**/*.{json|yml|csv|custom-ext}",
type: "directory"
}
}
});
await air.supply();
Here is an example that gets a shapefile from an FTP source, reprojects and turns it to Topojson:
const { AirSupply } = require("air-supply");
let air = new AirSupply({
cachePath: defaultCachePath,
packages: {
mnCounties: {
// The FTP Package require the ftp module
// npm install ftp
source:
"ftp://ftp.commissions.leg.state.mn.us/pub/gis/shape/county2010.zip",
// We need to ensure that the Package will pass the data as a buffer
fetchOptions: {
type: "buffer"
},
parsers: [
// The shapefile parser require specific modules
// npm install shapefile adm-zip
"shapefile",
// We then reproject the geo data and need some more modules
// npm install reproject epsg
{
parser: "reproject",
parserOptions: {
sourceCrs: "EPSG:26915",
targetCrs: "EPSG:4326"
}
},
// Finally, we make the data more compact with topojson
// npm install topojson
{
parser: "topojson",
name: "mnCounties"
}
]
}
}
});
await air.supply();
Air Supply will look for a config files based on cosmiconfig rules with a little customization. So, it will read the first of any of these files as it goes up the directory tree:
package.json # An 'air-supply' property
.air-supply
.air-supply.json
.air-supply.json5
.air-supply.yaml
.air-supply.yml
.air-supply.js
air-supply.config.js
Note that any JSON will be read by the json5 module.
Packages are the methods that define how to get raw data from sources. The following are the available packages; see the full API documentation for all the specific options available.
Packages will get passed any options from the AirSupply object that is using it, as well has some common options and usage.
Note that many packages require specific modules to be installed separately.
AirSupply({
ttl: 1000 * 60 * 10,
packages: {
things: {
// Type is the kebab case of the package class name, i.e.
// the package class name here would be PackageName.
//
// AirSupply will try to guess this given a source
type: "package-name",
// Almost all pcakages use the source option as it's
// main option to get data
source: "the main source option for this package",
// Depending on the package, any options for the
// fetching of data is ususally managed in fetchOptions
fetchOptions: {
fetchEverything: true
},
// Can override any defaults from the AirSupply object
ttl: 1000 * 60 * 60,
// Parsers are simple functions to transform the raw data.
// This can be a string definign which parser to use,
// an object of configuration, or an array of either if
// you want to do multiple parsers. The package
// will guess what kind of parser is needed based on the source.
parsers: ["zip", { multiSource: true }],
// Custom transform function that will happen after parsing.
transform(data) {
return expensiveAlterFunction(data);
},
// Custom transform function that will happen after getting
// all packages.
bundle(allPackages) {
return alterPackages(data);
},
// By default, caching will happen after fetching the raw data and
// any of the built-in parsing. But, you can cache after the 'transform'
// or after the 'bundle'.
//
// Overall, this is only needed if you have expensive transformations
cachePoint: "transform",
// Use the output option to save the fully loaded data
// to the filesystem. This is useful if you need to save files
// that will get loaded into the client (asynchronously).
output: "things.json"
}
}
});
Package | Description | Docs | Dependencies |
---|---|---|---|
AirTable | Get data from an AirTable table. | API | npm install airtable |
Data | Just pass JS data through. | API | |
Directory | Read files from a directory and parse each one if can. | API | |
File | Read a file from the filesystem. | API | |
Ftp | Get a file from an FTP source. | API | npm install ftp |
GoogleDoc | Get plain text version of a Google Doc and by default parse with ArchieML. Can be a "Published to the web" URL, or if given an ID will use Google's authentication. | API | npm install googleapis |
GoogleSheet | Get tabular data from a Google Sheet and assumes first row is headers by default. Uses Google's authentication; if you want to use a public "Published to the web" CSV, just use the Http package with a CSV parser. | API | npm install googleapis |
Http | Get data from an HTTP source. | API | |
Sql | Get data from SQL sources as that are supported by sequelize. | API | npm install sequelize |
Parsers are simple functions to transform common data; mostly these are used to transform the raw data to more meaningful JSON data.
Note that most parsers require specific modules to be installed separately.
The parsers
options can be defined a few different ways:
- If it is
undefined
, the package will try to determine which parser to use by looking at thesource
. - If it is
false
, then no parsing will happen. - If it is a string, such as
'csv'
, then it will use that parser with any default options. - If it is a function, then it will simply run the data through that function.
- If it is an object, it should have a
parser
key which is the is one of the above options, and optionally aparserOptions
that will get passed the parser function. Or it can just be{ multiSource: true }
which will assume the data coming in is an object where each key is a source. - If it is an array, it is assume to be multiple parsers with the above options.
The following parsers are available by default.
Parser | Description | Source match | Docs | Dependencies |
---|---|---|---|---|
archieml | Uses archieml. | /aml$/i |
API | npm install archieml |
csv | Uses csv-parse. Can be used for any delimited data. | /csv$/i |
API | npm install csv-parse |
gpx | Uses togeojson. | /gpx$/i |
API | npm install @mapbox/togeojson |
json | Uses json5. | /json5?$/i |
API | |
kml | Uses togeojson. | /kml$/i |
API | npm install @mapbox/togeojson |
reproject | Reprojects GeoJSON using reproject. | NA | API | npm install reproject epsg |
shapefile | Parsers a Shapefile as a .zip or .shp file into GeoJSON using shapefile. | `/(shp.*zip | shp)$/i` | API |
topojson | Transforms GeoJSON to TopoJSON using topojson. | /geo.?json$/i |
API | npm install topojson |
xlsx | Parsers MS Excel and others (.xlsx, .xls, .dbf, .ods) using xlsx. | `/(xlsx | xls | dbf |
xml | Parsers XML using xml2js. | /xml$/i |
API | npm install xml2js |
yaml | Uses js-yaml. | `/(yml | yaml)$/i` | API |
zip | Turns a zip file into an object where each key is the file name and the value is the text contents of that file using adm-zip. | /zip$/i |
API | npm install adm-zip |
Full API documentation can be found at zzolo.org/air-supply.
Use npm run docs:preview
and open localhost:4001 in a browser.
Run tests with: npm run test
- Bump version in
package.json
and runnpm install
. - Commit.
- Tag:
git tag X.X.X
- Push up:
git push origin master --tags
- Run
npm publish
Build and publish to Github Pages (after NPM publish): npm run docs:publish