Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

401 if an unauthorized user attempts to download a workflow restricted file #5921

Merged
merged 7 commits into from
Dec 7, 2022

Conversation

bbpennel
Copy link
Contributor

@bbpennel bbpennel commented Nov 30, 2022

Fixes #5913
I've confirmed this bug in 3.x and 2.x, and it was confirmed in nurax.

Files are currently downloadable by unauthorized users if they are restricted by a workflow and the user knows the download URL. See the issue for more details.

Changes proposed in this pull request:

  • Adds a workflow_restriction? check of the FileSets parent to the authorize_download! method in DownloadsController

Guidance for testing, such as acceptance criteria or new user interface behaviors:

  • Direct download links to workflow restricted files (such as in the case of a mediated deposit that has not been approved) should respond with a not authorized response, just like with regular access restrictions
  • Other requests to download should behave as before.

@samvera/hyrax-code-reviewers

@bbpennel bbpennel force-pushed the workflow-restricted-downloads branch from 4e81246 to f0a0ee0 Compare November 30, 2022 16:18
…dora, since this seems to be required in order to verify whether it is in a workflow
@dlpierce
Copy link
Contributor

dlpierce commented Dec 1, 2022

Thank you for the work on this!

  1. If at all possible, we should try to retain the behavior of not making a request to fedora here. It seems like the information needed should be available via the solr documents.

  2. Can the code that checks the download permission be moved into Hyrax::Ability? That seems like a better place for it to live.

@bbpennel
Copy link
Contributor Author

bbpennel commented Dec 1, 2022

Thanks for the feedback.

Would you be able to point me to an example of getting the FileSet only via a solr document? I'm guessing it is find_by_alternate_identifier which is calling fedora?

If I'm understanding the suggestion correctly, the code that checks for download permissions wasn't added in this PR (authorize! :download, params[asset_param_key]). I see authorize! calls in a number of the other controllers too. Or do you mean move the whole authorize_download! method?

@bbpennel
Copy link
Contributor Author

bbpennel commented Dec 6, 2022

@dlpierce
I updated it to use ActiveFedora::Base.where instead of ActiveFedora::Base.find, so it should be using solr instead of fedora to pull the fileset object. This works in my local hyrax 3 based app (including a duplicate of the thumbnail not contacting fedora test), but here in main the 'retrieves the thumbnail without contacting Fedora' fails inside of ActiveFedora::Base.where all the way down in permissions.rb.

After digging around with byebug, I can see that at this line:
https://github.com/samvera/hydra-head/blob/v12.0.2/hydra-access-controls/app/models/concerns/hydra/access_controls/permissions.rb#L420
is receiving a permissions object that looks like the following:

#<Hydra::AccessControl::CollectionRelationship:0x00007f81811ed430
 @owner=#<Hydra::AccessControl id: "38124053-e310-4b3f-838b-1db35d1a9e93">,
 @relationship=[nil]>
#<Hydra::AccessControl::CollectionRelationship:0x00007f81811ed430 @owner=#<Hydra::AccessControl id: "38124053-e310-4b3f-838b-1db35d1a9e93">, @relationship=[nil]>

So when .to_a is call on it, it produces a list containing [[nil], [nil]] since it is putting the relationship value into it, which ultimately ends in the error in circleci

     NoMethodError:
       undefined method `agent' for nil:NilClass
          # ./vendor/bundle/ruby/2.7.0/gems/hydra-access-controls-12.0.2/app/models/concerns/hydra/access_controls/permissions.rb:421:in `block in search_by_mode'
     # ./vendor/bundle/ruby/2.7.0/gems/hydra-access-controls-12.0.2/app/models/concerns/hydra/access_controls/permissions.rb:420:in `select'
     # ./vendor/bundle/ruby/2.7.0/gems/hydra-access-controls-12.0.2/app/models/concerns/hydra/access_controls/permissions.rb:420:in `search_by_mode'
     # ./vendor/bundle/ruby/2.7.0/gems/hydra-access-controls-12.0.2/app/models/concerns/hydra/access_controls/permissions.rb:410:in `search_by_type_and_mode'
     # ./vendor/bundle/ruby/2.7.0/gems/hydra-access-controls-12.0.2/app/models/concerns/hydra/access_controls/permissions.rb:172:in `read_groups'
     ...
     # ./vendor/bundle/ruby/2.7.0/gems/active-fedora-13.3.0/lib/active_fedora/relation/finder_methods.rb:11:in `first'
     # ./app/controllers/hyrax/downloads_controller.rb:42:in `file_set_parent'
     # ./app/controllers/hyrax/downloads_controller.rb:52:in `authorize_download!'
     ...
     # ./spec/controllers/hyrax/downloads_controller_spec.rb:87:in `block (6 levels) in <top (required)>'

I'm struggling to figure out why the permissions object would have nils as the relationships. Do you have any guidance?

Is the relationship not available when getting the fileset and work via solr? This is with unchanged permissions to the work or user (although I did locally try explicitly changing the visibility, but it had no affect).

@dlpierce
Copy link
Contributor

dlpierce commented Dec 6, 2022

I believe this is a case of the rspec expectation interfering with behavior.

expect(ActiveFedora::Base).not_to receive(:find).with(file_set.id)

does not set any response, and by default responds with nil. Howeverm tis is actually not the correct way to setup a message expectation.

it 'retrieves the thumbnail without contacting Fedora' do
  allow(ActiveFedora::Base).to receive(:find).and_call_original
  get :show, params: { id: file_set, file: 'thumbnail' }
  expect(ActiveFedora::Base).not_to have_received(:find).with(file_set.id)
end

This will do what we want and pass, However, if you take out .with(file_set.id) then you'll discover that find is actually already being called 5 times with other id strings, only one of which was added by this PR. So I guess worrying about creating additional Fedora requests was mostly futile here.

Going back to using the valkyrie compatible query would help with future compatibility, if it's not too much trouble?

@bbpennel
Copy link
Contributor Author

bbpennel commented Dec 7, 2022

@dlpierce
Interesting, with the corrected expectation and the with remove, I see the test failing with one call to ActiveFedora::find on main, in my branch using the ActiveFedora::Base.where appraoch, and in my branch using the Hyrax.query_service approach. So I guess its no worse?

Anyway, I've switched back to using the valkyrie query approach I believe, let me know if its not what you intended.

@bbpennel bbpennel force-pushed the workflow-restricted-downloads branch from b99966d to 5768d35 Compare December 7, 2022 00:09
Copy link
Contributor

@dlpierce dlpierce left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, looks good!

@dlpierce dlpierce merged commit 00643e0 into samvera:main Dec 7, 2022
@bbpennel bbpennel deleted the workflow-restricted-downloads branch December 7, 2022 18:28
dlpierce pushed a commit that referenced this pull request Dec 7, 2022
…d file (#5921)

* Return 401s if an unauthorized user attempts to download a workflow restricted file

* Add method for finding parent

* Removed expectation that thumbnail be retrieved without contacting fedora, since this seems to be required in order to verify whether it is in a workflow

* Use ActiveFedora::Base.where to find the FileSet to save trip to fedora

* Move test into context where it should have permission, and remove no fedora calls test

* Switch back to using valkyrie friendly lookup of fileset and parent

* Make the lines super long to make rubocop happy
@dlpierce dlpierce added the notes-bugfix Release Notes: Fixed a bug label Mar 30, 2023
@conorom
Copy link
Contributor

conorom commented Aug 16, 2023

At fulcrum.org we discovered last week that this PR's changes are indeed hitting Fedora for every single thumbnail load, causing a massive performance hit on Works with many FileSets attached. When viewing 100 items per page, that's 100 hits to Fedora, the notoriously, terribly slow part of the stack that we all go out of our way to avoid touching with regularity! 😬
So I made a bug ticket. I think this is bad enough for performance of apps using ActiveFedora (still everyone, right?) that the bug label is appropriate.

aside: others would presumably also take this hit on their catalog pages when showing thumbnails for Works. We just happen to use RIIIF for those thumbnails, thankfully, instead of the stock hydra-derivatives ones.

Looks like you guys were suspicious this would happen in the thread above but hoped it was a spec issue and so removed the spec that was meant to prevent this from happening. Possibly this happened in confusion over the course of the 7 commits or something. Not sure. Looks like a lot of thought was put into not hitting Fedora here.

This is very easily tested in dev BTW if you just stop your Fedora server and load up a thumbnail download path in a browser. Works fine before the change to authorize_download!, fails afterwards.

For anyone that can't deal with the performance hit until this is fixed, reverting Hyrax::DownloadsController.authorize_download! to the way it was before this PR is the way to go.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
notes-bugfix Release Notes: Fixed a bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Files for work under review can be downloaded by public via direct link
3 participants