Skip to content

Optimized generation of public and presigned AWS S3 GET URLs in ruby faster

License

Notifications You must be signed in to change notification settings

prullmann/faster_s3_url

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

47 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

FasterS3Url

Generate public and presigned AWS S3 GET URLs faster in ruby

Gem Version CI Status

The official ruby AWS SDK is actually quite slow and unoptimized when generating URLs to access S3 objects. If you are only creating a couple S3 URLs at a time this may not matter. But it can matter on the order of even two or three hundred at a time, especially when creating presigned URLs, for which the AWS SDK is especially un-optimized.

This gem provides a much faster implementation, by around an order of magnitude, for both public and presigned S3 GET URLs. Additional S3 params such as response-content-disposition are supported for presigned URLs.

Usage

signer = FasterS3Url::Builder.new(
  bucket_name: "my-bucket",
  region: "us-east-1",
  access_key_id: ENV['AWS_ACCESS_KEY'],
  secret_access_key: ENV['AWS_SECRET_KEY']
)

signer.public_url("my/object/key.jpg")
  #=> "https://my-bucket.s3.amazonaws.com/my/object/key.jpg"
  # does handle URI-escaping keys properly
/
signer.presigned_url("my/object/key.jpg")
  # => https://my-bucket.s3.amazonaws.com/my/object/key.jpg?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=fakeExampleAccessKeyId%2F20201001%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20201001T183223Z&X-Amz-Expires=900&X-Amz-SignedHeaders=host&X-Amz-Signature=b5d2ec17cd7c9d4cc45f446f87b17522f18d3776f06d051bef73fb8329a2e605
  # Also can handle additional query params such as response_content_disposition

You can re-use a signer object for convenience or slighlty improved performance. It should be concurrency-safe to share globally between threads.

If you are using S3 keys that need to be escaped in the URLs, this gem will escpae them properly.

When presigning URLs, you can pass the query parameters supported by S3 to control subsequent response headers. You can also supply a version_id for a URL to access a specific version.

signer.presigned_url("my/object/key.jpg",
  response_cache_control: "public, max-age=604800, immutable",
  response_content_disposition: "attachment",
  response_content_language: "de-DE, en-CA",
  response_content_type: "text/html; charset=UTF-8",
  response_content_encoding: "deflate, gzip",
  response_expires: "Wed, 21 Oct 2030 07:28:00 GMT",
  version_id: "BspIL8pXg_52rGXELmqZ7cgmn7u4XJgS"
)

Use a CNAME or CDN or any other hostname variant other than the default this gem will come up with? Just pass in a host argument to initializer. Will work with both public and presigned GET URLs. You can also prefix the hostname with http://, eg for development setups.

builder = FasterS3Url::Builder.new(
  bucket_name: "my-bucket.example.com",
  host: "my-bucket.example.com",
  region: "us-east-1",
  access_key_id: ENV['AWS_ACCESS_KEY'],
  secret_access_key: ENV['AWS_SECRET_KEY']
)

Fancy using some other S3 compatible servers like Minio? The usual endpoint and force_path_style options are supported here, too. NOTE: While host will override endpoint for GET URL generation, endpoint can still be used by upload logic that uses the same s3 config (eg Shrine, see below).

builder = FasterS3Url::Builder.new(
  bucket_name: "my-bucket",
  region: "my-place-1",
  endpoint: "https://my-store.example.com",
  force_path_style: true,
  access_key_id: ENV['AWS_ACCESS_KEY'],
  secret_access_key: ENV['AWS_SECRET_KEY']
)

Cache signing keys for further performance

Under most usage patterns, the presigend URLs you generate will all use a time with the same UTC date. In this case, a performance advantage can be had by asking the Builder to cache and re-use AWS signing keys, which only vary with calendar date of time arg, not time, or S3 key, or other args. It will actually cache the 5 most recently used signing keys. This can result in around a 50% performance improvement with a re-used Builder used for generating presigned keys.

NOTE WELL: This will technically make the Builder object no longer concurrency-safe under multiple threads. Although you might get away with it under MRI. This is one reason it is not on by default.

builder = FasterS3Url::Builder.new(
  bucket_name: "my-bucket.example.com",
  region: "us-east-1",
  access_key_id: ENV['AWS_ACCESS_KEY'],
  secret_access_key: ENV['AWS_SECRET_KEY'],
  cache_signing_keys: true
)
builder.presign_url(key) # performance enhanced

Automatic AWS credentials lookup?

Right now, you need to explicitly supply access_key_id and secret_access_key, in part to avoid a dependency on the AWS SDK (This gem doesn't have such a dependency!). Let us know if this makes you feel a certain kind of way.

If you want to look up key/secret/region using the standard SDK methods of checking various places, in order to supply them to the FasterS3Url::Builder, you can try this (is there a better way? Cause this is kind of a mess!)

require 'aws-sdk-s3'
client = Aws::S3::Client.new
credentials = client.config.credentials
credentials = credentials.credentials if credentials.respond_to?(:credentials)

access_key_id     = credentials.access_key_id
secret_access_key = credentials.secret_access_key
region            = client.config.region

Shrine Storage

Use shrine? We do and love it. This gem provides a storage that can be a drop-in replacement to Shrine::Storage::S3 (shrine 3.x required), but with faster URL generation.

# Where you might have done:

require "shrine/storage/s3"

s3 = Shrine::Storage::S3.new(
  bucket: "my-app", # required
  region: "eu-west-1", # required
  access_key_id: "abc",
  secret_access_key: "xyz",
)

# instead do:

require "faster_s3_url/shrine/storage"

s3 = FasterS3Url::Shrine::Storage.new(
  bucket: "my-app", # required
  region: "eu-west-1", # required
  access_key_id: "abc", # required
  secret_access_key: "xyz", # required
)

A couple minor differences, let me know if they disrupt you:

  • We don't support the signer initializer argument, not clear to me why you'd want to use this gem if you are using it.
  • We support a host arg in initializer, but not in #url method.

Performance Benchmarking

Benchmarks were done using scripts checked into repo at ./perf (which use benchmark-ips with mode :stats => :bootstrap, :confidence => 95), on my 2015 Macbook Pro, using ruby MRI 2.6.6., on faster_s3_url version 0.1.0. Benchmarking is never an exact science, hopefully this is reasonable.

In my narrative, I normalize to how many iterations can happen in 10ms to have numbers closer to what might be typical use cases.

Public URLs

aws-sdk-s3 can create about 180 public URLs in 10ms, not horrible, but for how simple it seems the operation should be? FasterS3Url can do 2,200 public URLs in 10ms, that's a lot better.

$ bundle exec ruby perf/public_bench.rb
Warming up --------------------------------------
          aws-sdk-s3     1.265k i/100ms
         FasterS3Url    24.414k i/100ms
Calculating -------------------------------------
          aws-sdk-s3     18.701k (± 3.2%) i/s -     92.345k in   5.048062s
         FasterS3Url    222.938k (± 3.2%) i/s -      1.123M in   5.106971s
                   with 95.0% confidence

Presigned URLs

Here's where it really starts to matter.

aws-sdk-s3 can only generate about 10 presigned URLs in 10ms, painful. FasterS3URL, with a re-used Builder object, can generate about 220 presigned URLs in 10ms, much better, and actually faster than aws-sdk-s3 can generate public urls! Even if we re-instantiate a Builder each time, we can generate 180 presigned URLs in 10ms, don't lose too much performance that way.

If we re-use the Builder and turn on the (not thread-safe) cached_signing_keys option, we can get up to 300 presigned URLs generated in 10ms.

FasterS3URL supports supplying custom query params to instruct s3 HTTP response headers. This does slow things down since they need to be URI-escaped and constructed. Using this feature with aws-sdk-s3, it doesn't lose much speed, down to 9 instead of 10 URLs in 10ms. FasterS3URL goes down from 210 to 180 URLs generated in 10ms (without using cached_signing_keys option).

We can compare to the ultra-fast wt_s3_signer gem, which, with a re-used signer object (that assumes the same time for all URLs, unlike us; and does not support per-url custom query params) can get all the way up to 680 URLs generated in 10ms, over twice as fast as we can do even with cached_signing_keys. If the restrictions and API of wt_s3_signer are amenable to your use case, it's definitely the fastest. But FasterS3URL is in the ballpark, and still more than an order of magnitude faster than aws-sdk-s3.

$ bundle exec ruby perf/presigned_bench.rb
Warming up --------------------------------------
          aws-sdk-s3   113.000  i/100ms
aws-sdk-s3 with custom headers
                        95.000  i/100ms
 re-used FasterS3Url     1.820k i/100ms
re-used FasterS3Url with cached signing keys
                         2.920k i/100ms
re-used FasterS3URL with custom headers
                         1.494k i/100ms
new FasterS3URL Builder each time
                         1.977k i/100ms
re-used WT::S3Signer     7.985k i/100ms
new WT::S3Signer each time
                         1.611k i/100ms
Calculating -------------------------------------
          aws-sdk-s3      1.084k (± 4.2%) i/s -      5.311k in   5.003981s
aws-sdk-s3 with custom headers
                        918.315  (± 4.8%) i/s -      4.560k in   5.118770s
 re-used FasterS3Url     21.906k (± 4.0%) i/s -    107.380k in   5.046561s
re-used FasterS3Url with cached signing keys
                         29.756k (± 3.6%) i/s -    146.000k in   4.999910s
re-used FasterS3URL with custom headers
                         18.062k (± 4.3%) i/s -     85.158k in   5.025685s
new FasterS3URL Builder each time
                         18.312k (± 3.9%) i/s -     90.942k in   5.098636s
re-used WT::S3Signer     68.275k (± 3.5%) i/s -    343.355k in   5.109088s
new WT::S3Signer each time
                         22.425k (± 2.8%) i/s -    111.159k in   5.036814s
                   with 95.0% confidence

Development

After checking out the repo, run bin/setup to install dependencies. Then, run rake spec to run the tests. You can also run bin/console for an interactive prompt that will allow you to experiment.

To install local unreleased source of this gem onto your local machine for development, run bundle exec rake install. To release a new version, update the version number in version.rb, and then run bundle exec rake release, which will create a git tag for the version, push git commits and tags, and push the .gem file to rubygems.org.

Sources/Acknowledgements

wt_s3_signer served as a proof of concept, but was optimized for their particular use case, assuming batch processing where all signed S3 URLs share the same "now" time. I needed to support cases that didn't assume this, and also support custom headers like response_content_disposition. This code is also released with a different license. But if the API and use cases of wt_s3_signer meet your needs, it is even faster than this code.

I tried to figure out how to do the S3 presigned request from some AWS docs:

But these don't really have all info you need for generating an S3 signature. So also ended up debugger reverse engineering the ruby aws-sdk code when generating an S3 presigned url, for instance:

Contributing

Bug reports and pull requests are welcome on GitHub at https://github.com/jrochkind/faster_s3_url.

Is there a feature missing that you need? I may not be able to provide it, but I would love to hear from you!

License

The gem is available as open source under the terms of the MIT License.

About

Optimized generation of public and presigned AWS S3 GET URLs in ruby faster

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Ruby 99.7%
  • Shell 0.3%