Skip to content

Lambda wrapper over AWS S3 Select to allow create READ microservices based on CSV or JSON easily

Notifications You must be signed in to change notification settings

davidayalas/aws-s3-select-lambda

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AWS S3 Select with AWS Lambda

Context

Requirements to deploy this demo

Description

This service exposes:

  • an api gateway lambda endpoint: index.js
    • that uses a little library that wraps s3 select to query s3 objects: s3select.js
    • it supports CSV, JSON and Parquet formats as Input

You can create as many endpoints as you want, and setup some ENV VARS to adapt functionality.

In the serverless example setup file setup.demo.json you can view four configurations: CSV, JSON, Parquet (get request) and an extra setup for post queries.

Lambda function setup (env vars)

  • METHOD: default "GET". Other values: POST.

    • in "POST" method you can send a "QUERY" field in the body with a SQL more complex than only send params through querystring. Example for the endpoint "postQuery"

            select s.city from s3object s where CAST(s.lat AS FLOAT)>40.0 and CAST(s.lng AS FLOAT)>-3.0
      
  • BUCKET

  • FILE

  • QUERY (only for GET method): in the query you have to interpolate the query params you want to use to select objects, as '{param}' (note the single quotes)

  • Values to setup the INPUT SERIALIZATION:

    • COMPRESSION_TYPE: Default "NONE". Other values: "GZIP", "BZIP2"
    • TYPE: default "CSV". Other values: "JSON", "Parquet":

      • If "CSV", other vars:
        • CSV_FILE_HEADER: default "USE". Other values: "NONE", "IGNORE"
        • CSV_FIELD_DELIMITER: A single character used to separate individual fields in a record. You can specify an arbitrary delimiter.
        • CSV_COMMENTS: A single character used to indicate that a row should be ignored when the character is present at the start of that row
        • CSV_QUOTE_CHARACTER: A single character used for escaping when the field delimiter is part of the value

      • If "JSON", other vars:
        • JSON_TYPE: default "LINES". Other values: "DOCUMENT"

Sample data

  • Sample data is from https://simplemaps.com/data/world-cities

  • I transform CSV data into JSON and Parquet with a simple script:

      $ npm install | node conversor
    
  • The same script compress csv and json into gzip and bzip2 for testing.

  • View data directory for more info

Test

Requirements for test:

  • Upload the content from data/files to the bucket specified in setup.demo.json

  • Setup aws credentials

  • Then...

      $ cd test
      $ npm test
    

About

Lambda wrapper over AWS S3 Select to allow create READ microservices based on CSV or JSON easily

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published