Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support binary type natively #822

Merged
merged 3 commits into from
Nov 22, 2024
Merged

Support binary type natively #822

merged 3 commits into from
Nov 22, 2024

Conversation

dalibor
Copy link
Contributor

@dalibor dalibor commented Nov 19, 2024

aws-sdk-dynamodb expects StringIO or IO object for binary and returns back StringIO for binary values.

Current implementation incorrectly converts and stores binary values as strings that due to Base64 encoding use 33% more storage and read/write capacity.

Request Before:

{
  "TableName": "dynamoid_tests_documents_1732010637_594",
  "Item": {
    "image": {
      "S": "AIj/"
    },
    ...
}

Request After:

{
  "TableName": "dynamoid_tests_documents_1732010679_801",
  "Item": {
    "image": {
      "B": "AIj/"
    },
    ...
}

Copy link
Member

@andrykonchin andrykonchin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

Could you please take a look at the comments?

lib/dynamoid/type_casting.rb Outdated Show resolved Hide resolved
lib/dynamoid/undumping.rb Outdated Show resolved Hide resolved
spec/dynamoid/dumping_spec.rb Outdated Show resolved Hide resolved
@andrykonchin
Copy link
Member

Started a discussion aws/aws-sdk-ruby#3146.

value
else
StringIO.new(value)
end
Copy link
Member

@andrykonchin andrykonchin Nov 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose that now new values of a binary field will be stored in DynamoDB as binary but all existing values are stored as string. It might be a bit surprising for end users.

The common approach is to step-by-step migrate users from the current behaviour to the new one. We can introduce a new field's option to either keep old behaviour (by default, so there is no a breaking change in a minor release) or to switch to the new one (for new fields or new tables or in new projects).

Example is a boolean type that wasn't supported by DynamoDB initially so Dynamoid's boolean fields were stored as f and t. Later DynamoDB added support of the boolean type and a new option store_as_native_boolean was added:

field :active, :boolean, store_as_native_boolean: false

Later in the next major release such migration options can be removed and we can propose users to custom types to keep old storage formats.

So I propose to add a new field's option store_as_native_binary that is by default false. And check it at casting, dumping and undumping.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. It could cause issues for integrators even though they might not have noticed that the binary values aren't stored with binary type in DynamoDB.

I've added the store_as_native_binary option that should be a safe work-around, but there's still a risk with this change to not get noticed with the new major release since there will not be a breaking code change.

This concerns me along with the lack of alignment with DynamoDB's single table design which is why I've started exploring Aws::DynamoDB::Client directly for my project.

Let me know if you want me to do any other changes to this PR, or I'll leave it to you for how you want to get it out.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only way to prevent breaking changes being unnoticed that I see is to add deprecation warnings in the last minor release before the major one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could also require the store_as_native_binary option be set, otherwise raise an error.

Copy link
Member

@andrykonchin andrykonchin Nov 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the single-table design - it seems to me it's out of scope of Dynamoid. Or at least of the initial scope to implement the Active Record pattern.

I am not opposed to including the new functionality into the gem related to this "pattern". But I haven't found any common and widely accepted "interface" or API. So I postponed this.

There is an issue #568 but all I heard there were either some generic thoughts or requesting somebody's project specific features. So there was nothing I can work with.

andrykonchin
andrykonchin previously approved these changes Nov 22, 2024
Copy link
Member

@andrykonchin andrykonchin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great to me! Thank you.

@dalibor
Copy link
Contributor Author

dalibor commented Nov 22, 2024

There's an issue with Code Climate:

File undumping.rb has 252 lines of code (exceeds 250 allowed).

I could fix it with this change, although not much ideal:

--- a/lib/dynamoid/undumping.rb
+++ b/lib/dynamoid/undumping.rb
@@ -267,11 +267,9 @@ module Dynamoid
       STRING_VALUES = %w[t f].freeze

       def process(value)
-        store_as_boolean = if @options[:store_as_native_boolean].nil?
-                             Dynamoid.config.store_boolean_as_native
-                           else
-                             @options[:store_as_native_boolean]
-                           end
+        store_as_boolean = @options[:store_as_native_boolean]
+        store_as_boolean = Dynamoid.config.store_boolean_as_native if store_as_boolean.nil?
+
         if store_as_boolean
           !!value
         elsif STRING_VALUES.include?(value)
@@ -284,11 +282,8 @@ module Dynamoid

     class BinaryUndumper < Base
       def process(value)
-        store_as_binary = if @options[:store_as_native_binary].nil?
-                            Dynamoid.config.store_binary_as_native
-                          else
-                            @options[:store_as_native_binary]
-                          end
+        store_as_binary = @options[:store_as_native_binary]
+        store_as_binary = Dynamoid.config.store_binary_as_native if store_as_binary.nil?

         if store_as_binary
           value.string

Code coverage seems to have already been broken in master. It might need a later ruby version?

@andrykonchin
Copy link
Member

andrykonchin commented Nov 22, 2024

Code coverage seems to have already been broken in master. It might need a later ruby version?

Yeah, I didn't have a chance to fix this annoying issue. Please don't bother with it. With code climate as well - its rules aren't ideal.

The only thing left is to squash working commits. I can do it on my own today later.

@dalibor
Copy link
Contributor Author

dalibor commented Nov 22, 2024

Cool - feel free to squash and merge when you'd like. Thank you!

@andrykonchin andrykonchin merged commit d0662db into Dynamoid:master Nov 22, 2024
56 of 58 checks passed
@dalibor dalibor deleted the binary branch November 26, 2024 07:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants