Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use special "deleted" journalist for associations with deleted users #6225

Merged
merged 1 commit into from
Feb 3, 2022

Conversation

legoktm
Copy link
Member

@legoktm legoktm commented Jan 14, 2022

Status

Ready for review. Note that this depends on #6223 to work properly.

Description of Changes

Currently when a journalist is deleted, most referential tables are
updated with journalist_id=NULL, forcing all clients to accomodate
that case. Instead, we are now going to either re-associate those rows
with a special "deleted" journalist account, or delete them outright.
All journalist_id columns are now NOT NULL to enforce this.

Tables with rows migrated to "deleted":

  • replies
  • seen_files
  • seen_messages
  • seen_replies

Tables with rows that are deleted outright:

  • journalist_login_attempt
  • revoked_tokens

The "deleted" journalist account is a real account that exists in the
database, but cannot be logged into and has no passphase set. It is not
possible to delete it nor is it shown in the admin listing of
journalists. It is lazily created on demand using a special
DeletedJournalist subclass that bypasses username and passphrase
validation.

Because we now have a real user to reference, the api.single_reply
endpoint will return a real UUID instead of the "deleted" placeholder.

Journalist objects must now be deleted by calling the new delete()
function on them. Trying to directly db.session.delete(journalist)
will most likely fail with an Exception because of rows that weren't
migrated first.

The migration step looks for any existing rows in those tables with
journalist_id=NULL and either migrates them to "deleted" or deletes
them. Then all the columns are changed to be NOT NULL.

Fixes #6192.

Testing

  • Start the dev environment, which calls loaddata.py. Doing so creates a journalist ("clarkkent"), adds some replies, and then immediately deletes it
  • Using the same dev environment, visit the admin page and notice that no "deleted" journalist is shown.
  • Create a journalist, login as them, have them reply, then delete them (so journalist_id=NULL exists in the DB). Then apply this patch, run the upgrader and observe in the DB that a "deleted" user is created and all the replies, seen_* rows have been updated to point to it
  • Hit the API single_reply endpoint and see that you still get "deleted" for username but a real UUID

Deployment

This has a schema change and data migration for existing installs.

Checklist

  • Linting (make lint) and tests (make test) pass in the development container
  • I have written a test plan and validated it for this PR
  • These changes do not require documentation (correct me if I'm wrong please)

@legoktm legoktm requested a review from a team as a code owner January 14, 2022 08:04
@lgtm-com
Copy link

lgtm-com bot commented Jan 14, 2022

This pull request introduces 1 alert when merging 0a18cb0 into f768a1a - view on LGTM.com

new alerts:

  • 1 for `__init__` method calls overridden method

@lgtm-com
Copy link

lgtm-com bot commented Jan 14, 2022

This pull request introduces 1 alert when merging 5abb6f4 into f768a1a - view on LGTM.com

new alerts:

  • 1 for `__init__` method calls overridden method

@codecov-commenter
Copy link

codecov-commenter commented Jan 14, 2022

Codecov Report

Merging #6225 (0b648d5) into develop (4f6259f) will decrease coverage by 1.06%.
The diff coverage is 50.42%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #6225      +/-   ##
===========================================
- Coverage    85.12%   84.06%   -1.07%     
===========================================
  Files           59       60       +1     
  Lines         4088     4186      +98     
  Branches       487      503      +16     
===========================================
+ Hits          3480     3519      +39     
- Misses         491      548      +57     
- Partials       117      119       +2     
Impacted Files Coverage Δ
...ns/2e24fc7536e8_make_journalist_id_non_nullable.py 27.94% <27.94%> (ø)
securedrop/models.py 89.35% <76.74%> (-1.43%) ⬇️
securedrop/journalist_app/admin.py 89.54% <100.00%> (+0.14%) ⬆️
securedrop/journalist_app/forms.py 96.22% <0.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4f6259f...0b648d5. Read the comment docs.

@legoktm
Copy link
Member Author

legoktm commented Jan 14, 2022

Is there a way to suppress the lgtm warning? The thing it's complaining about is intentional. And codecov is claiming that the migration script isn't covered, but AFAICT the test I wrote for it is running and works. The other new thing it said wasn't covered I just added a test for.

@lgtm-com
Copy link

lgtm-com bot commented Jan 14, 2022

This pull request introduces 1 alert when merging 312106a into f768a1a - view on LGTM.com

new alerts:

  • 1 for `__init__` method calls overridden method

securedrop/journalist_app/admin.py Show resolved Hide resolved
@@ -784,13 +790,57 @@ def to_json(self, all_info: bool = True) -> Dict[str, Any]:

return json_user

def delete(self) -> None:
Copy link
Contributor

@nabla-c0d3 nabla-c0d3 Jan 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would recommend explicitly passing the db_session as an argument:

  • It will make testing easier.
  • Importing the db object/module directly creates the implicit assumption that the caller has setup the special (flask) context where this import will work. This is usually fine in the running app, but can be tricky in utility scripts and in the test suite. Making the db_session an argument makes this requirement explicit/documented, and leaves it to the caller to get the db session.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the other methods in Journalist don't do this, so I'd prefer to keep it consistent in the class for now. What you explained seems reasonable to me, I just think it should be done separately.

securedrop/models.py Outdated Show resolved Hide resolved
securedrop/models.py Outdated Show resolved Hide resolved
@zenmonkeykstop zenmonkeykstop self-assigned this Jan 18, 2022
@lgtm-com
Copy link

lgtm-com bot commented Jan 18, 2022

This pull request introduces 1 alert when merging 7854b88 into f768a1a - view on LGTM.com

new alerts:

  • 1 for `__init__` method calls overridden method

@lgtm-com
Copy link

lgtm-com bot commented Jan 19, 2022

This pull request introduces 1 alert when merging 3685cb0 into b4c2b3a - view on LGTM.com

new alerts:

  • 1 for `__init__` method calls overridden method

@legoktm
Copy link
Member Author

legoktm commented Jan 21, 2022

In testing the client PR, I realized I overlooked that all the seen_* tables have a UNIQUE(fk_id, journalist_id), which means if two journalists have seen the same thing and then both are deleted, the second update would fail with a unique key violation since the "deleted" journalist has already seen that thing. The delete() method now checks to see if deleted has already seen that item in which case it deletes the duplicate row. The migration script tries to update as many rows as possible, ignoring any key conflicts, and then does a second pass to delete everything remaining.

@lgtm-com
Copy link

lgtm-com bot commented Jan 21, 2022

This pull request introduces 2 alerts when merging 1176a7d into b4c2b3a - view on LGTM.com

new alerts:

  • 1 for Unused import
  • 1 for `__init__` method calls overridden method

@lgtm-com
Copy link

lgtm-com bot commented Jan 21, 2022

This pull request introduces 1 alert when merging 68d1b25 into b4c2b3a - view on LGTM.com

new alerts:

  • 1 for `__init__` method calls overridden method

@lgtm-com
Copy link

lgtm-com bot commented Jan 21, 2022

This pull request introduces 1 alert when merging de8e49f into b4c2b3a - view on LGTM.com

new alerts:

  • 1 for `__init__` method calls overridden method

def __init__(self) -> None:
# password field is unused (we override set_password below), plus
# it's too short to be valid anyway.
super().__init__(username='deleted', password='unused')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A valid OTP secret is generated for the deleted user - attempting to log in with said credentials fails with the following raceback:

ValueError

ValueError: Should never happen: pw_salt is none for legacy Journalist 4
Traceback (most recent call last)

    File "/opt/venvs/securedrop-app-code/lib/python3.8/site-packages/flask/app.py", line 2309, in __call__

    return self.wsgi_app(environ, start_response)

    File "/opt/venvs/securedrop-app-code/lib/python3.8/site-packages/flask/app.py", line 2295, in wsgi_app

    response = self.handle_exception(e)

    File "/opt/venvs/securedrop-app-code/lib/python3.8/site-packages/flask/app.py", line 1741, in handle_exception

    reraise(exc_type, exc_value, tb)

    File "/opt/venvs/securedrop-app-code/lib/python3.8/site-packages/flask/_compat.py", line 35, in reraise

    raise value

    File "/opt/venvs/securedrop-app-code/lib/python3.8/site-packages/flask/app.py", line 2292, in wsgi_app

    response = self.full_dispatch_request()

    File "/opt/venvs/securedrop-app-code/lib/python3.8/site-packages/flask/app.py", line 1815, in full_dispatch_request

    rv = self.handle_user_exception(e)

    File "/opt/venvs/securedrop-app-code/lib/python3.8/site-packages/flask/app.py", line 1718, in handle_user_exception

    reraise(exc_type, exc_value, tb)

    File "/opt/venvs/securedrop-app-code/lib/python3.8/site-packages/flask/_compat.py", line 35, in reraise

    raise value

    File "/opt/venvs/securedrop-app-code/lib/python3.8/site-packages/flask/app.py", line 1813, in full_dispatch_request

    rv = self.dispatch_request()

    File "/opt/venvs/securedrop-app-code/lib/python3.8/site-packages/flask/app.py", line 1799, in dispatch_request

    return self.view_functions[rule.endpoint](**req.view_args)

    File "/home/user/securedrop/securedrop/journalist_app/main.py", line 30, in login

    user = validate_user(request.form['username'],

    File "/home/user/securedrop/securedrop/journalist_app/utils.py", line 97, in validate_user

    return Journalist.login(username, password, token)

    File "/home/user/securedrop/securedrop/models.py", line 730, in login

    if not user.valid_password(password):

    File "/home/user/securedrop/securedrop/models.py", line 571, in valid_password

    raise ValueError(

    ValueError: Should never happen: pw_salt is none for legacy Journalist 4

Login should fail, but not this explosively! There was existing code to bail out of logins for invalid usernames, of which deleted is currently the only one, but it's not getting hit because it assumes that the invalid username and UUID are the same.

I'm kindof coming around to the idea of a "login_enabled" field or something in the journalist table, which would be checked first when validating user logins. This will probably have to be added anyway in the future to support account locking, but it could simplify this PR now, and allow for the removal of that INVALID_USERNAMES array in favour of a db query.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, setting/forgetting a valid diceware password for the deleted user, and dropping the overriden set_password method, would both rule out the exception above and clear the new LGTM alert. I recognise that it's an intentional, not accidental, change, but the side effects of overriding the method seem undefined, which is a worry.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Login should fail, but not this explosively! There was existing code to bail out of logins for invalid usernames, of which deleted is currently the only one, but it's not getting hit because it assumes that the invalid username and UUID are the same.

Uh, I missed that UUID check somehow. Fixing that and writing a test for this as well.

It is semi-intentional on my part that it does explode, the idea was that it should be impossible to log into "deleted" on multiple levels, first being the INVALID_USERNAMES check (or login_enabled in the future), and if you somehow bypassed it (as happened in this case), you'd still get stuck because there's no salt/password/TOTP. If we are fine with just having a random diceware+TOTP values to provide that second layer, I can try to switch to that. It is likely that will add additional complexity to the database migration script since we need to go through all that logic without being able to reuse the Journalist class, but it seems doable. We would still need the override for check_username_acceptable, but the consequences of that seem much more predictable (compared to your concern with the override of set_password).

I think we still want to keep INVALID_USERNAMES even with a potential login_enabled because we want to prevent people from manually creating "deleted" account. So I can work on adding in login_enabled but I don't think it'll help that much here compare to the increased scope of another schema change.

Copy link
Contributor

@zenmonkeykstop zenmonkeykstop Jan 31, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm just leery of something like that throwing an undhandled exception, but the point about migration woes is well-taken. Maybe we don't even need the login_enabled field just yet: if you had a Journalist.is_unlocked() that just returned true you could override that in DeletedJournalist to return false without falling foul of constructor overrides. If we ever did add account locking (and said field) it would still be useful, the Journalist version of the method would probably just query the db.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Creating a PassphraseGenerator in the migration step seems to work with minimal copy and paste, so I pushed changes that now set a real passphrase/totp secret for "deleted", and no longer override "set_password". I also fixed the pre-existing tests for the deleted journalist login to work with the new style of having a real UUID.

if you had a Journalist.can_log_in() that just returned true you could override that in DeletedJournalist to return false without falling foul of constructor overrides

This wouldn't work because on the login form when we query by username, we get an instance of Journalist back, not DeletedJournalist. Now DeletedJournalist is only used to bypass the username validation in check_username_acceptable.

Now that we no longer need it to override password handling, we could just get rid of the class and do something like:

deleted = Journalist(username="placeholder", password=...)
deleted.username = "deleted"

to bypass the constructor validation. And DeletedJournalist.get() would become Journalist.get_deleted(). What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM - will hold off on further review for now, lemme know when it's updated.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lgtm-com
Copy link

lgtm-com bot commented Jan 31, 2022

This pull request introduces 1 alert when merging eb9184b into 5daebb6 - view on LGTM.com

new alerts:

  • 1 for Except block handles 'BaseException'

@lgtm-com
Copy link

lgtm-com bot commented Jan 31, 2022

This pull request introduces 1 alert when merging 0b648d5 into 5daebb6 - view on LGTM.com

new alerts:

  • 1 for Except block handles 'BaseException'

@legoktm
Copy link
Member Author

legoktm commented Feb 1, 2022

(Rebased to fix the merge conflict after the Flask 2.0 PR landed)

@lgtm-com
Copy link

lgtm-com bot commented Feb 1, 2022

This pull request introduces 1 alert when merging 4a15c14 into a20c843 - view on LGTM.com

new alerts:

  • 1 for Except block handles 'BaseException'

Currently when a journalist is deleted, most referential tables are
updated with `journalist_id=NULL`, forcing all clients to accomodate
that case. Instead, we are now going to either re-associate those rows
with a special "deleted" journalist account, or delete them outright.
All journalist_id columns are now NOT NULL to enforce this.

Tables with rows migrated to "deleted":
* replies
* seen_files
* seen_messages
* seen_replies

Tables with rows that are deleted outright:
* journalist_login_attempt
* revoked_tokens

The "deleted" journalist account is a real account that exists in the
database, but cannot be logged into. It is not possible to delete it nor
is it shown in the admin listing of journalists. It is lazily created on
demand using `Journalist.get_deleted()`. For consistency reasons, a
randomly generated diceware passphrase and TOTP secret are set in the
database for this account, but never revealed to anyone. We use a
placeholder username to bypass Journalist's normal validation check that
the username is not "deleted".

Because we now have a real user to reference, the api.single_reply
endpoint will return a real UUID instead of the "deleted" placeholder.

Journalist objects must now be deleted by calling the new delete()
function on them. Trying to directly `db.session.delete(journalist)`
will most likely fail with an Exception because of rows that weren't
migrated first.

The migration step looks for any existing rows in those tables with
journalist_id=NULL and either migrates them to "deleted" or deletes
them. Then all the columns are changed to be NOT NULL.

Fixes #6192.
@lgtm-com
Copy link

lgtm-com bot commented Feb 1, 2022

This pull request introduces 1 alert when merging 6685e28 into a20c843 - view on LGTM.com

new alerts:

  • 1 for Except block handles 'BaseException'

Copy link
Contributor

@zenmonkeykstop zenmonkeykstop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ran through test plan, LGTM!
Also:

  • spun up a prod VM install in 2.1.0
  • added a journalist account, viewed submissions with it and deleted the account
  • confirmed nulls in seen_* tables
  • upgraded to packages built off this branch
  • confirmed the creation of a deleted user with a valid UUID, and that seen_* table entries with nulls now reference this user's id
  • confirmed the API reply behaviour.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use a special deleted journalist user for associations with deleted users
5 participants