Skip to content

Commit

Permalink
Prepare for 0.6 (#513)
Browse files Browse the repository at this point in the history
* Update README

* Update changelog
  • Loading branch information
kylebarron authored Apr 21, 2024
1 parent 4a0f504 commit 77bdab6
Show file tree
Hide file tree
Showing 3 changed files with 139 additions and 8 deletions.
39 changes: 39 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,44 @@
# Changelog

## [0.6.0] - 2024-04-21

### New! :sparkles:

- Class-based API + concurrent streams + column selections + File reader by @H-Plus-Time in https://github.com/kylebarron/parquet-wasm/pull/407. This added a new `ParquetFile` API for working with files at remote URLs without downloading them first.
- Conditional exports in `package.json`. This should make it easier to use across Node and browser.
- Improved documentation for how to use different entry points.

### Breaking Changes:

- arrow2 and parquet2-based implementation has been removed.
- Layout of files has changed. Your import may need to change.
- Imports are now `parquet-wasm`, `parquet-wasm/esm`, `parquet-wasm/bundler`, and `parquet-wasm/node`.

## What's Changed

- Add conditional exports by @kylebarron in https://github.com/kylebarron/parquet-wasm/pull/382
- CI production build size summary by @H-Plus-Time in https://github.com/kylebarron/parquet-wasm/pull/401
- Remove arrow2 implementation by @kylebarron in https://github.com/kylebarron/parquet-wasm/pull/446
- feat: add lz4_raw support for `arrow1` by @fspoettel in https://github.com/kylebarron/parquet-wasm/pull/466
- Highlight that esm entry point needs await of default export by @kylebarron in https://github.com/kylebarron/parquet-wasm/pull/487
- Fixes for both report builds and PR comment workflow by @H-Plus-Time in https://github.com/kylebarron/parquet-wasm/pull/495
- fix package exports by @kylebarron in https://github.com/kylebarron/parquet-wasm/pull/414
- Object store wasm usage by @H-Plus-Time in https://github.com/kylebarron/parquet-wasm/pull/490
- Set Parquet key-value metadata by @kylebarron in https://github.com/kylebarron/parquet-wasm/pull/503
- Read parquet with options by @kylebarron in https://github.com/kylebarron/parquet-wasm/pull/506
- Documentation updates for 0.6 by @kylebarron in https://github.com/kylebarron/parquet-wasm/pull/507
- Avoid bigint for metadata queries by @kylebarron in https://github.com/kylebarron/parquet-wasm/pull/508
- Update async API by @kylebarron in https://github.com/kylebarron/parquet-wasm/pull/510
- Add test to read empty file by @kylebarron in https://github.com/kylebarron/parquet-wasm/pull/512
- bump arrow libraries to version 51 by @jdoig in https://github.com/kylebarron/parquet-wasm/pull/496

## New Contributors

- @fspoettel made their first contribution in https://github.com/kylebarron/parquet-wasm/pull/466
- @jdoig made their first contribution in https://github.com/kylebarron/parquet-wasm/pull/496

**Full Changelog**: https://github.com/kylebarron/parquet-wasm/compare/v0.5.0...v0.6.0

## [0.5.0] - 2023-10-21

## What's Changed
Expand Down
103 changes: 95 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,20 +22,107 @@ npm install parquet-wasm

## API

### Choice of bundles
Parquet-wasm has both a synchronous and asynchronous API. The sync API is simpler but requires fetching the entire Parquet buffer in advance, which is often prohibitive.

| Entry point | Description | Documentation |
| ---------------------- | ------------------------------------------------------- | -------------------- |
| `parquet-wasm` | ESM, to be used directly from the Web as an ES Module | [Link][esm-docs] |
| `parquet-wasm/esm` | ESM, to be used directly from the Web as an ES Module | [Link][esm-docs] |
| `parquet-wasm/bundler` | "Bundler" build, to be used in bundlers such as Webpack | [Link][bundler-docs] |
| `parquet-wasm/node` | Node build, to be used with `require` in NodeJS | [Link][node-docs] |
### Sync API

Refer to these functions:

- [`readParquet`](https://kylebarron.dev/parquet-wasm/functions/esm_parquet_wasm.readParquet.html): Read a Parquet file synchronously.
- [`readSchema`](https://kylebarron.dev/parquet-wasm/functions/esm_parquet_wasm.readSchema.html): Read an Arrow schema from a Parquet file synchronously.
- [`writeParquet`](https://kylebarron.dev/parquet-wasm/functions/esm_parquet_wasm.writeParquet.html): Write a Parquet file synchronously.

### Async API

- [`readParquetStream`](https://kylebarron.dev/parquet-wasm/functions/esm_parquet_wasm.readParquetStream.html): Create a [ReadableStream](https://developer.mozilla.org/en-US/docs/Web/API/ReadableStream) that emits Arrow RecordBatches from a Parquet file.
- [`ParquetFile`](https://kylebarron.dev/parquet-wasm/classes/esm_parquet_wasm.ParquetFile.html): A class for reading portions of a remote Parquet file. Use [`fromUrl`](https://kylebarron.dev/parquet-wasm/classes/esm_parquet_wasm.ParquetFile.html#fromUrl) to construct from a remote URL or [`fromFile`](https://kylebarron.dev/parquet-wasm/classes/esm_parquet_wasm.ParquetFile.html#fromFile) to construct from a [`File`](https://developer.mozilla.org/en-US/docs/Web/API/File) handle. Note that when you're done using this class, you'll need to call [`free`](https://kylebarron.dev/parquet-wasm/classes/esm_parquet_wasm.ParquetFile.html#free) to release any memory held by the ParquetFile instance itself.


Both sync and async functions return or accept a [`Table`](https://kylebarron.dev/parquet-wasm/classes/bundler_parquet_wasm.Table.html) class, an Arrow table in WebAssembly memory. Refer to its documentation for moving data into/out of WebAssembly.

## Entry Points


| Entry point | Description | Documentation |
| ------------------------------------------------------------------------- | ------------------------------------------------------- | -------------------- |
| `parquet-wasm`, `parquet-wasm/esm`, or `parquet-wasm/esm/parquet_wasm.js` | ESM, to be used directly from the Web as an ES Module | [Link][esm-docs] |
| `parquet-wasm/bundler` | "Bundler" build, to be used in bundlers such as Webpack | [Link][bundler-docs] |
| `parquet-wasm/node` | Node build, to be used with synchronous `require` in NodeJS | [Link][node-docs] |

[bundler-docs]: https://kylebarron.dev/parquet-wasm/modules/bundler_parquet_wasm.html
[node-docs]: https://kylebarron.dev/parquet-wasm/modules/node_parquet_wasm.html
[esm-docs]: https://kylebarron.dev/parquet-wasm/modules/esm_parquet_wasm.html

**Note that when using the `esm` bundles, the default export must be awaited**. Otherwise, you'll get an error `TypeError: Cannot read properties of undefined`. See [here](https://rustwasm.github.io/docs/wasm-bindgen/examples/without-a-bundler.html) for an example.
### ESM

The `esm` entry point is the primary entry point. It is the default export from `parquet-wasm`, and is also accessible at `parquet-wasm/esm` and `parquet-wasm/esm/parquet_wasm.js` (for symmetric imports [directly from a browser](#using-directly-from-a-browser)).

**Note that when using the `esm` bundles, you must manually initialize the WebAssembly module before using any APIs**. Otherwise, you'll get an error `TypeError: Cannot read properties of undefined`. There are multiple ways to initialize the WebAssembly code:

#### Asynchronous initialization

The primary way to initialize is by awaiting the default export.

```js
import wasmInit, {readParquet} from "parquet-wasm";

await wasmInit();
```

Without any parameter, this will try to fetch a file named `'parquet_wasm_bg.wasm'` at the same location as `parquet-wasm`. (E.g. this snippet `input = new URL('parquet_wasm_bg.wasm', import.meta.url);`).

Note that you can also pass in a custom URL if you want to host the `.wasm` file on your own servers.

```js
import wasmInit, {readParquet} from "parquet-wasm";

// Update this version to match the version you're using.
const wasmUrl = "https://cdn.jsdelivr.net/npm/parquet-wasm@0.6.0/esm/parquet_wasm_bg.wasm";
await wasmInit(wasmUrl);
```

#### Synchronous initialization

The `initSync` named export allows for

```js
import {initSync, readParquet} from "parquet-wasm";

// The contents of esm/parquet_wasm_bg.wasm in an ArrayBuffer
const wasmBuffer = new ArrayBuffer(...);

// Initialize the Wasm synchronously
initSync(wasmBuffer)
```

Async initialization should be preferred over downloading the Wasm buffer and then initializing it synchronously, as [`WebAssembly.instantiateStreaming`](https://developer.mozilla.org/en-US/docs/WebAssembly/JavaScript_interface/instantiateStreaming_static) is the most efficient way to both download and initialize Wasm code.

### Bundler

The `bundler` entry point doesn't require manual initialization of the WebAssembly blob, but needs setup with whatever bundler you're using. [Refer to the Rust Wasm documentation for more info](https://rustwasm.github.io/docs/wasm-bindgen/reference/deployment.html#bundlers).

### Node

The `node` entry point can be loaded synchronously from Node.

```js
const {readParquet} = require("parquet-wasm");

const wasmTable = readParquet(...);
```

### Using directly from a browser

You can load the `esm/parquet_wasm.js` file directly from a CDN

```js
const parquet = await import(
"https://cdn.jsdelivr.net/npm/parquet-wasm@0.6.0/esm/parquet_wasm.js"
)
await parquet.default();

const wasmTable = parquet.readParquet(...);
```

### Debug functions

Expand Down
5 changes: 5 additions & 0 deletions templates/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@
"webassembly",
"arrow"
],
"$comment": "We export ./esm/parquet_wasm.js so that code can work the same bundled and directly on the frontend",
"exports": {
"./bundler": {
"types": "./bundler/parquet_wasm.d.ts",
Expand All @@ -33,6 +34,10 @@
"types": "./node/parquet_wasm.d.ts",
"default": "./node/parquet_wasm.js"
},
"./esm/parquet_wasm.js": {
"types": "./esm/parquet_wasm.d.ts",
"default": "./esm/parquet_wasm.js"
},
".": {
"node": {
"types": "./node/parquet_wasm.d.ts",
Expand Down

0 comments on commit 77bdab6

Please sign in to comment.