Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compress the training data with Google Snappy for higher IO throughput #1533

Closed
futurely opened this issue Dec 5, 2014 · 1 comment
Closed

Comments

@futurely
Copy link

futurely commented Dec 5, 2014

When computation is faster than data IO, the device utitlization is reduced. To reduce the throughput gap, the data is usually compressed in many distributed computation or storage framework such as Hadoop, HBase, Hive and Kafka. Among many compressing libraries available for this purpose, Google Snappy is a very widely used one. It supports many languages and strikes a good balance between compression ratio and speed.

@shelhamer
Copy link
Member

Closing as not a grievous bottleneck in many cases, and b.c. this can be handled by specialized layers relevant to the given use case that don't necessarily have general relevance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants