Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[lambda][rule] some s3 optimizations #126

Merged
merged 2 commits into from
Apr 28, 2017

Conversation

ryandeivert
Copy link
Contributor

to @airbnb/streamalert-maintainers
size: small

changes

  • Will now yield and parse lines from an s3 file as they are read instead of reading once and then iterating over the lines a second time to parse.

Copy link
Contributor

@austinbyers austinbyers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The biggest benefit will actually be improving memory efficiency. Looks good, but you still have to sadly truncate the files.

yield line.rstrip()

# remove the file
os.remove(downloaded_s3_object)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't work as expected in Lambda - you have to truncate the file first to avoid using all of the disk space

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm that's strange and doesn't make much sense, but okay. I will add the truncate back.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, Jack and I both independently observed the phenomenon. os.remove is really os.unlink, and presumably Lambda's version of unlinking is not the same as a traditional OS

Args:
downloaded_s3_object (string): A full path to the downloaded file.

Returns:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yields: (not returns)

@ryandeivert ryandeivert force-pushed the ryandeivert-s3-optimizations branch 2 times, most recently from 140b654 to d3611db Compare April 28, 2017 21:09
Copy link
Contributor

@austinbyers austinbyers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@jacknagz
Copy link
Contributor

lgtm-wally

@ryandeivert ryandeivert force-pushed the ryandeivert-s3-optimizations branch from d3611db to ce14db3 Compare April 28, 2017 22:13
@ryandeivert ryandeivert merged commit 51b4cc5 into master Apr 28, 2017
@ryandeivert ryandeivert deleted the ryandeivert-s3-optimizations branch April 28, 2017 22:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants