Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support binary type natively #822

Merged
merged 3 commits into from
Nov 22, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .rubocop_thread_safety.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,5 +2,5 @@
# TODO: Comment out the following to see code needing to be refactored for thread safety!
ThreadSafety/ClassAndModuleAttributes:
Enabled: false
ThreadSafety/InstanceVariableInClassMethod:
ThreadSafety/ClassInstanceVariable:
Enabled: false
21 changes: 21 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -342,6 +342,24 @@ class Document
end
```

#### Note on binary type

By default binary fields are persisted as DynamoDB String value encoded
in the Base64 encoding. DynamoDB supports binary data natively. To use
it instead of String a `store_binary_as_native` field option should be
set:

```ruby
class Document
include Dynamoid::Document

field :image, :binary, store_binary_as_native: true
end
```

There is also a global config option `store_binary_as_native` that is
`false` by default as well.

#### Magic Columns

You get magic columns of `id` (`string`), `created_at` (`datetime`), and
Expand Down Expand Up @@ -1138,6 +1156,9 @@ Listed below are all configuration options.
* `store_boolean_as_native` - if `true` Dynamoid stores boolean fields
as native DynamoDB boolean values. Otherwise boolean fields are stored
as string values `'t'` and `'f'`. Default is `true`
* `store_binary_as_native` - if `true` Dynamoid stores binary fields
as native DynamoDB binary values. Otherwise binary fields are stored
as Base64 encoded string values. Default is `false`
* `backoff` - is a hash: key is a backoff strategy (symbol), value is
parameters for the strategy. Is used in batch operations. Default id
`nil`
Expand Down
1 change: 1 addition & 0 deletions lib/dynamoid/config.rb
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,7 @@ module Config
option :store_date_as_string, default: false # store Date fields in ISO 8601 string format
option :store_empty_string_as_nil, default: true # store attribute's empty String value as null
option :store_boolean_as_native, default: true
option :store_binary_as_native, default: false
andrykonchin marked this conversation as resolved.
Show resolved Hide resolved
option :backoff, default: nil # callable object to handle exceeding of table throughput limit
option :backoff_strategies, default: {
constant: BackoffStrategies::ConstantBackoff,
Expand Down
18 changes: 16 additions & 2 deletions lib/dynamoid/dumping.rb
Original file line number Diff line number Diff line change
Expand Up @@ -297,10 +297,24 @@ def process(value)
end
end

# string -> string
# string -> StringIO
class BinaryDumper < Base
def process(value)
Base64.strict_encode64(value)
store_as_binary = if @options[:store_as_native_binary].nil?
Dynamoid.config.store_binary_as_native
else
@options[:store_as_native_binary]
end

if store_as_binary
if value.is_a?(StringIO) || value.is_a?(IO)
value
else
StringIO.new(value)
end
else
Base64.strict_encode64(value)
end
Copy link
Member

@andrykonchin andrykonchin Nov 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose that now new values of a binary field will be stored in DynamoDB as binary but all existing values are stored as string. It might be a bit surprising for end users.

The common approach is to step-by-step migrate users from the current behaviour to the new one. We can introduce a new field's option to either keep old behaviour (by default, so there is no a breaking change in a minor release) or to switch to the new one (for new fields or new tables or in new projects).

Example is a boolean type that wasn't supported by DynamoDB initially so Dynamoid's boolean fields were stored as f and t. Later DynamoDB added support of the boolean type and a new option store_as_native_boolean was added:

field :active, :boolean, store_as_native_boolean: false

Later in the next major release such migration options can be removed and we can propose users to custom types to keep old storage formats.

So I propose to add a new field's option store_as_native_binary that is by default false. And check it at casting, dumping and undumping.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. It could cause issues for integrators even though they might not have noticed that the binary values aren't stored with binary type in DynamoDB.

I've added the store_as_native_binary option that should be a safe work-around, but there's still a risk with this change to not get noticed with the new major release since there will not be a breaking code change.

This concerns me along with the lack of alignment with DynamoDB's single table design which is why I've started exploring Aws::DynamoDB::Client directly for my project.

Let me know if you want me to do any other changes to this PR, or I'll leave it to you for how you want to get it out.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only way to prevent breaking changes being unnoticed that I see is to add deprecation warnings in the last minor release before the major one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could also require the store_as_native_binary option be set, otherwise raise an error.

Copy link
Member

@andrykonchin andrykonchin Nov 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the single-table design - it seems to me it's out of scope of Dynamoid. Or at least of the initial scope to implement the Active Record pattern.

I am not opposed to including the new functionality into the gem related to this "pattern". But I haven't found any common and widely accepted "interface" or API. So I postponed this.

There is an issue #568 but all I heard there were either some generic thoughts or requesting somebody's project specific features. So there was nothing I can work with.

end
end

Expand Down
4 changes: 3 additions & 1 deletion lib/dynamoid/type_casting.rb
Original file line number Diff line number Diff line change
Expand Up @@ -288,7 +288,9 @@ def process(value)

class BinaryTypeCaster < Base
def process(value)
if value.is_a? String
if value.is_a?(StringIO) || value.is_a?(IO)
value
elsif value.is_a?(String)
value.dup
else
value.to_s
Expand Down
12 changes: 11 additions & 1 deletion lib/dynamoid/undumping.rb
Original file line number Diff line number Diff line change
Expand Up @@ -284,7 +284,17 @@ def process(value)

class BinaryUndumper < Base
def process(value)
Base64.strict_decode64(value)
store_as_binary = if @options[:store_as_native_binary].nil?
Dynamoid.config.store_binary_as_native
else
@options[:store_as_native_binary]
end

if store_as_binary
dalibor marked this conversation as resolved.
Show resolved Hide resolved
value.string # expect StringIO here
else
Base64.strict_decode64(value)
end
end
end

Expand Down
85 changes: 76 additions & 9 deletions spec/dynamoid/dumping_spec.rb
Original file line number Diff line number Diff line change
Expand Up @@ -1606,20 +1606,87 @@ def self.dynamoid_field_type
end

describe 'Binary field' do
let(:klass) do
new_class do
field :image, :binary
let(:unfrozen_string) { +"\x00\x88\xFF" }
let(:binary_value) { unfrozen_string.force_encoding('ASCII-8BIT') }

context 'default non-native binary' do
let(:klass) do
new_class do
field :image, :binary
end
end

it 'encodes a string in base64-encoded format' do
obj = klass.create(image: binary_value)

expect(reload(obj).image).to eql(binary_value)
expect(raw_attributes(obj)[:image]).to eql(Base64.strict_encode64(binary_value))
end
end

let(:unfrozen_string) { +"\x00\x88\xFF" }
let(:binary_value) { unfrozen_string.force_encoding('ASCII-8BIT') }
context 'native binary' do
let(:klass) do
new_class do
field :image, :binary, store_as_native_binary: true
end
end

it 'converts string to StringIO object' do
obj = klass.create(image: binary_value)

expect(reload(obj).image).to eql(binary_value)
expect(raw_attributes(obj)[:image].class).to eql(StringIO)
expect(raw_attributes(obj)[:image].string).to eql(binary_value)
end

it 'encodes a string in base64-encoded format' do
obj = klass.create(image: binary_value)
it 'accepts StringIO object' do
image = StringIO.new(binary_value)
obj = klass.create(image: image)

expect(reload(obj).image).to eql(binary_value)
expect(raw_attributes(obj)[:image]).to eql(Base64.strict_encode64(binary_value))
expect(reload(obj).image).to eql(binary_value)
expect(raw_attributes(obj)[:image].class).to eql(StringIO)
expect(raw_attributes(obj)[:image].string).to eql(binary_value)
end

it 'accepts IO object' do
Tempfile.create('image') do |image|
image.write(binary_value)
image.rewind

obj = klass.create(image: image)

expect(reload(obj).image).to eql(binary_value)
expect(raw_attributes(obj)[:image].class).to eql(StringIO)
expect(raw_attributes(obj)[:image].string).to eql(binary_value)
end
end
end

context 'store_binary_as_native config option' do
it 'is stored as binary if store_binary_as_native config option is true',
config: { store_binary_as_native: true } do
klass = new_class do
field :image, :binary
end

obj = klass.create(image: binary_value)

expect(reload(obj).image).to eql(binary_value)
expect(raw_attributes(obj)[:image].class).to eql(StringIO)
expect(raw_attributes(obj)[:image].string).to eql(binary_value)
end

it 'is not stored as binary if store_binary_as_native config option is false',
config: { store_binary_as_native: false } do
klass = new_class do
field :image, :binary
end

obj = klass.create(image: binary_value)

expect(reload(obj).image).to eql(binary_value)
expect(raw_attributes(obj)[:image]).to eql(Base64.strict_encode64(binary_value))
end
end
end
end
19 changes: 14 additions & 5 deletions spec/dynamoid/type_casting_spec.rb
Original file line number Diff line number Diff line change
Expand Up @@ -274,7 +274,7 @@

obj = klass.new(values: Set.new([1, 1.5, '2'.to_d]))

expect(obj.values).to eql(Set.new([1, 1, 2]))
expect(obj.values).to eql(Set.new([1, 2]))
end

it 'type casts numbers' do
Expand Down Expand Up @@ -671,12 +671,21 @@ def settings.to_hash
expect(obj.image).to eql('string representation')
end

it 'dups a string' do
value = 'foo'
it 'does not convert StringIO objects' do
value = StringIO.new('foo')
obj = klass.new(image: value)

expect(obj.image).to eql(value)
expect(obj.image).not_to equal(value)
expect(obj.image).to equal(value)
end

it 'does not convert IO objects' do
Tempfile.create('image') do |value|
value.write('foo')
value.rewind

obj = klass.new(image: value)
expect(obj.image).to equal(value)
end
andrykonchin marked this conversation as resolved.
Show resolved Hide resolved
end
end

Expand Down
Loading