Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Write files as Parquet (ideally to S3) #9265

Open
jvilhuber opened this issue Aug 22, 2024 · 1 comment
Open

Write files as Parquet (ideally to S3) #9265

jvilhuber opened this issue Aug 22, 2024 · 1 comment

Comments

@jvilhuber
Copy link

Is your feature request related to a problem? Please describe.

Parquet is a pretty popular format for data analysis and I have a need to transform some json-formatted logs to parquet and into s3 for further consumption by other services/projects. Fluentd seems to support this via the s3 plugin: https://github.com/fluent/fluent-plugin-s3/blob/master/docs/output.md#store_as

Would be very handy if fluent-bit could do this.

Describe the solution you'd like

I would like to be able to specify that the output file format is parquet (optionally with some parquet parameters like compression). It's fine to assume that field-names and field-types match the incoming json.

Describe alternatives you've considered

Alternatives are:

  1. use an aws lambda to monitor new s3 files and convert them from json to parquet
  2. use File or NATS (or TCP) output modules to pass the data to some other (local) service to convert the files and have those services push to s3.

All those incur the extra complexity of additional components I have to write, monitor, keep up to date, etc.

Additional context

@cosmo0920
Copy link
Contributor

We already have a PR to write parquet with using columnify which is like as Fluentd's stuffs: #8837
We're just waiting for AWS people's responses.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants