Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add action to (re-)generate and export conversation transcript and downloaded files #1627

Merged
merged 6 commits into from
Feb 24, 2023

Conversation

gonzalo-bulnes
Copy link
Contributor

@gonzalo-bulnes gonzalo-bulnes commented Feb 14, 2023

Description

The shortest path to providing an option to export a entire conversation to journalists.
Entire conversation = freshly (re-)generated conversation transcript + all the downloaded files [1]

[1] On one hand, there is an option to download all files, which is easy to trigger. On the other hand, if a file was intentionally not downloaded, conditioning the export to an automatic download seems counter-productive.

Context

This is a direct continuation of #1624, the same context applies.

Implementation

  • Renames the existing export.Dialog to export.FileDialog because that's what it is, and there is a parallel with the export.PrintFileDialog. breaking the symmetry would become confusing I think.
  • Uses the conversation/export.Dialog name to export the entire conversation.
  • Takes the existing export dialog and export transcript dialog as a reference
  • Follows the existing patterns for testing the new dialog

Test plan

  1. Start the app
  2. Select a conversation
  3. For good measure download no file, or a file, or two (or all of them)
  4. Trigger "Export Conversation" in the conversation menu (a.k.a source menu, kebab menu...)
  • Confirm that the dialog says it will export 4 files if three files were downloaded (the transcript is the fourth file)
  • Confirm the transcript is exported as expected
  • Confirm the files are exported as expected
  • Confirm that error messages are shown in the same ways as in the existing export dialog

(This patch allows to test this PR in a lightweight way. That being said, a proper export to a USB device is desirable.)

@gonzalo-bulnes gonzalo-bulnes marked this pull request as ready for review February 14, 2023 06:07
@gonzalo-bulnes gonzalo-bulnes requested a review from a team as a code owner February 14, 2023 06:07
@gonzalo-bulnes
Copy link
Contributor Author

@rocodes @cfm I realized that in the conversation namespace, we've got:

  • export.PrintFileDialog that is very similar to export.FileDialog
  • this new dialog that exports the entire conversation, and which would fit the name export.Dialog

Now, for "historical" reasons (not old history, but still) the dialog that exports one file was called export.Dialog. I think the lack of symmetry would get confusing, so I re-established the de-facto symmetry by renaming it.

That small renaming of course touched a gazillion of files.

I kept the two commits that rename files separate, and we can pull them to a separate PR if you prefer. I don't have any preference myself, but that's what explains that more files were touched than in #1624. Otherwise, the only significant differences with that PR are:

  • what the dialog displays (e.g. "4 files" instead of "conversation.txt")
  • we open and export multiple files instead of only one
  • the name of the action in the menu

Copy link
Contributor Author

@gonzalo-bulnes gonzalo-bulnes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't request changes from myself, yet:

  • disambiguation is required (e.g. two files are called memo.txt
  • we discussed it previously, when thinking about what files names to print in the transcript (that ideally would match the file structure when the conversation is exported)

So, my current thinking is:

  • Ensure the files are stored in the archive in the same way they are stored in the .securedrop-client/data directory
  • Adjust the transcript template accordingly
  • Discuss if we want to also use that structure when:
    • exporting a single file
    • exporting the conversation transcript only

☝️ I think we should, so that I can export the transcript, change my mind, and export the entire conversation without that resulting in mutliple copies of the same file on my USB drive.

cC: @rocodes @cfm

@rocodes rocodes self-assigned this Feb 14, 2023
@gonzalo-bulnes gonzalo-bulnes force-pushed the export-all-downloaded-files-and-transcript branch 2 times, most recently from de182c8 to 85a2f64 Compare February 15, 2023 03:14
@gonzalo-bulnes gonzalo-bulnes force-pushed the export-all-downloaded-files-and-transcript branch from 85a2f64 to a87f7ae Compare February 15, 2023 03:17
@gonzalo-bulnes
Copy link
Contributor Author

@rocodes @cfm Can you take a look at this branch as it is now please?

If we're happy with the behavior, we can talk about the implementation (I expect comments!) and adjust the transcript accordingly. (But I'd rather we agree on the behavior we want before moving forward with the transcript and eventual backports of the structure —see previous comment.)

@cfm
Copy link
Member

cfm commented Feb 15, 2023

Chiming in as you requested in #1627 (review), @gonzalo-bulnes: I like the disambiguation you've implemented in a87f7ae. I think it could benefit from some inline documentation of why it's doing what it's doing. But that need would actually be lessened by generalizing this behavior as you suggest to all exports, which would also make export behave more consistently: different export operations (transcript, one file, multiple files) will always yield consistent (if consistently clunky :-) archive structures.

@rocodes, I'll defer to you otherwise, but feel free to tag me back in!

@rocodes
Copy link
Contributor

rocodes commented Feb 16, 2023

I'm just finishing testing out your changes (and therefore my review), but I'll reply first on the comment you made above.
I'm a little confused about the part with the checkboxes - I know you're talking about what to do if there are multiple submissions with the same filename but I'm not sure if you're proposing one option, or alternatives.

One point to note about this comment:

[...] so that I can export the transcript, change my mind, and export the entire conversation without that resulting in mutliple copies of the same file on my USB drive

Currently, every time the user runs an export, it exports to a new folder on the user's USB drive (the folder name is a sd-export-YYYMMDD-HHMMSS), and inside that is the export data. So we end up with a new directory every time we run an export, and therefore no deduplication effort. This choice was initially made I believe to prevent potentially surprising the user by overwriting existing data. It may be something we want to revisit now that we support batch file operations.

@gonzalo-bulnes
Copy link
Contributor Author

Currently, every time the user runs an export, it exports to a new folder on the user's USB drive (the folder name is a sd-export-YYYMMDD-HHMMSS), and inside that is the export data. So we end up with a new directory every time we run an export, and therefore no deduplication effort. This choice was initially made I believe to prevent potentially surprising the user by overwriting existing data. It may be something we want to revisit now that we support batch file operations.

🤦‍♀️ That is true @rocodes! Thank you so much. I keep forgetting about that wrapping... meaning I probably don't spend enough time with actual export devices. 😬

That I think answers the part of the checkboxes you are confused about. I was laying out questions:

Discuss if we want to also use that structure when:

  • exporting a single file ①
  • exporting the conversation transcript only ②

① Do we want to modify the structure that's exported when exporting a single file so that I can export one file, then an entire conversation and avoid confusion?
② Do we want to modify the structure that's exported when exporting the conversation transcript only so that I can export it, then an entire conversation and avoid confusion?

I think both questions' answers are the same once we take your observation into account: there will be no overwrites occurring during multiple exports, because each export is contained in its own directory.

We don't need to modify how the existing export functions work, and I believe the answer to the questions is that we shouldn't modify the existing export functions because that would be more trouble that help at this time. 👈 Decision (?) What do you think @rocodes @cfm?

@gonzalo-bulnes
Copy link
Contributor Author

gonzalo-bulnes commented Feb 16, 2023

If we agree on not modifying the existing export features, then the next step is to:

  • Adjust the file names in the transcript at least when the entire conversation is being exported (that is: when the files are exported in unambiguous directories). Goal: that you know where to find a file from look at the transcript.
    • Open question: would it hurt if we modified the template of the transcript in all cases (for simplicity)?
  • Clean up the code in a87f7ae until we're happy with it. (Full disclosure: my standards are intentionally low at this time, given the scope and context of this PR, please push back!)

Copy link
Contributor

@rocodes rocodes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @gonzalo-bulnes, thank you for preparing this :)

Are you able to successfully export to USB using this branch? Testing (STR: staging setup, check out your branch in sd-app), I'm not able to successfully export a conversation transcript + files. The client shows an UNEXPECTED_RETURN_STATUS error, and in sd-devices, the sd-export-$timestamp directory is created on the target USB, but the directory is empty, and in the logs I can see ERROR_USB_WRITE. This problem does not occur when Export Conversation doesn't contain any downloaded submissions (i.e. when n files == 1).

The sd-devices code naively copies everything in export_data via cp -r, so it shouldn't barf with more than one file to copy as long as they are all in the export_data directory. I think there's something funny about how the export call is happening, because it looks like the drive is closed and locked, and you can see there is an export error (a write error) during the disk_check routine, which is fishy.

logs
2023-02-15 16:27:42,657 - securedrop_export.archive:38(validate) INFO: Parsing archive metadata
2023-02-15 16:27:42,658 - securedrop_export.archive:43(validate) INFO: Target: disk, encryption_method luks
2023-02-15 16:27:42,658 - securedrop_export.archive:55(validate) DEBUG: Validate export action
2023-02-15 16:27:42,658 - securedrop_export.main:63(entrypoint) INFO: Archive extraction and metadata validation successful
2023-02-15 16:27:42,658 - securedrop_export.main:69(entrypoint) INFO: Start disk service
2023-02-15 16:27:42,658 - securedrop_export.disk.legacy_service:77(export) INFO: Export archive is disk
2023-02-15 16:27:42,659 - securedrop_export.disk.cli:34(get_connected_devices) INFO: Checking connected volumes
2023-02-15 16:27:42,662 - securedrop_export.disk.cli:61(_get_removable_devices) INFO: Checking removable devices
2023-02-15 16:27:42,669 - securedrop_export.disk.cli:81(_get_removable_devices) INFO: 1 connected
2023-02-15 16:27:42,670 - securedrop_export.disk.cli:121(_check_partitions) DEBUG: Checking device partitions on /dev/sda
2023-02-15 16:27:42,673 - securedrop_export.disk.cli:98(get_partitioned_device) DEBUG: Counted 0 partitions
2023-02-15 16:27:42,673 - securedrop_export.disk.legacy_service:86(export) INFO: Check if LUKS
2023-02-15 16:27:42,673 - securedrop_export.disk.cli:139(is_luks_volume) DEBUG: Checking if target device is luks encrypted
2023-02-15 16:27:42,688 - securedrop_export.disk.cli:159(_get_luks_name_from_headers) DEBUG: Get LUKS name from headers
2023-02-15 16:27:42,714 - securedrop_export.disk.cli:196(get_luks_volume) DEBUG: Mapped name is luks-7d7f1303-b41f-49ad-af2a-e5a5dce9a450
2023-02-15 16:27:42,715 - securedrop_export.disk.cli:264(_get_mountpoint) DEBUG: Checking mountpoint
2023-02-15 16:27:42,720 - securedrop_export.disk.cli:291(mount_volume) INFO: The device is already mounted
2023-02-15 16:27:42,721 - securedrop_export.disk.cli:293(mount_volume) WARNING: Mountpoint was inaccurate, updating
2023-02-15 16:27:42,721 - securedrop_export.disk.legacy_service:89(export) INFO: Check if writable
2023-02-15 16:27:42,722 - securedrop_export.disk.legacy_service:97(export) INFO: Export submission to /media/user/transfer
2023-02-15 16:27:42,727 - securedrop_export.disk.cli:348(write_data_to_device) DEBUG: Copying file to sd-export-20230215-162742
2023-02-15 16:27:42,729 - securedrop_export.disk.cli:369(cleanup_drive_and_tmpdir) DEBUG: Syncing filesystems
2023-02-15 16:27:42,775 - securedrop_export.disk.cli:386(_unmount_volume) DEBUG: Unmounting drive from /media/user/transfer
2023-02-15 16:27:42,825 - securedrop_export.disk.cli:404(_close_luks_volume) DEBUG: Locking luks volume <securedrop_export.disk.volume.Volume object at 0x7a8b0e192f70>
2023-02-15 16:27:42,888 - securedrop_export.disk.cli:418(_remove_temp_directory) DEBUG: Deleting temporary directory /tmp/tmpn7c77_go
2023-02-15 16:27:42,894 - securedrop_export.disk.legacy_service:115(export) ERROR: Error encountered during disk format check: ERROR_EXPORT
2023-02-15 16:27:42,894 - securedrop_export.disk.legacy_service:131(_legacy_status) INFO: Convert to legacy: ERROR_EXPORT
2023-02-15 16:27:42,895 - securedrop_export.main:73(entrypoint) ERROR: Encountered exception ERROR_USB_WRITE, exiting
2023-02-15 16:27:42,895 - securedrop_export.main:74(entrypoint) ERROR: 
2023-02-15 16:27:42,895 - securedrop_export.main:154(_exit_gracefully) INFO: Exit gracefully with status: ERROR_USB_WRITE
2023-02-15 16:27:42,895 - securedrop_export.main:178(_write_status) INFO: Write status ERROR_USB_WRITE

The other topic I wanted to put to you and/or the group is about the labels on the QAction menu. I find the menu a bit confusing now in that I find it hard to distinguish between "Export Conversation" and "Export Conversation Transcript." (I'm attaching a screenshot so that others less familiar can take a quick peek). I think we might be able to make those actions a bit more distinct with a slight adjustment to the label names, but I would like to know what others think. I will also investigate the export error a little more in case I see any obvious reasons why it isn't working.

export_qaction_menu

securedrop_client/logic.py Show resolved Hide resolved
@@ -184,8 +184,9 @@ def _create_archive(
with tarfile.open(archive_path, "w:gz") as archive:
cls._add_virtual_file_to_archive(archive, cls.METADATA_FN, metadata)

needs_disambiguation = len(filepaths) > 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think disambiguate/needs_disambiguation is a little confusing without comments. I suggest maybe a naming convention like is_multifile_export instead of needs_disambiguation, and then a comment in the _add_file_to_archive method about what that parameter does, or just adding comments for clarity if you prefer to keep the names untouched

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

edit: having read your comment above, the is_multifile_export name may be a slightly inaccurate choice, but I'm still in favour of naming for what it is/does (is_name_collision), vs what we think of it (needs_disambiguation), and/or adding a comment explaining how we resolve this once we decide on the correct resolution

Copy link
Contributor Author

@gonzalo-bulnes gonzalo-bulnes Feb 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think of an imperative: prevent_name_collisions: bool ? @rocodes

(We don't know name collisions will happen if multiple files are being exported, we only know they're likely.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You know what I'm gonna say...whatever you decide is cool as long as you document it :) Stylistically, based on what you said I'd go with an is_possible_name_collision, but I really don't feel strongly as long as someone new reading the code understands quickly what's happening, here and in add-file-to-archive

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I renamed the variable to be close to the condition (is_one_of_multiple_files) and added a comment explaining why that's relevant.

I renamed the disambiguate option to prevent_name_collisions because the method is called add_file_to_archive and that seemed like a reasonable name for an option.

securedrop_client/gui/actions.py Show resolved Hide resolved
@gonzalo-bulnes
Copy link
Contributor Author

gonzalo-bulnes commented Feb 16, 2023

Are you able to successfully export to USB using this branch?

Short answer: I don't know! 🙈
Longer answer: ran into Workstation trouble, so I added a shortcut patch and started by verifying that the archive was created as expected. And that did match my expectations.

This is a huge problem, and I really didn't expect it, so I was very cool with deferring the end-to-end USB drive testing for later in the review process. I'll look into it as well, please keep me posted on your findings 👁️. I'll first try to rule out any issue in sd-devices by creating an archive with multiple files and trying to export it manually. 📦

@rocodes
Copy link
Contributor

rocodes commented Feb 22, 2023

@gonzalo-bulnes I am winding down for the day, but I do think the culprit is the directory structure. sd-devices expects all data to be in the export_data parent directory, with the exception of metadata.json.

Added some debug lines to the client and rebuilt, and I see the following happening in the add_file_to_archive method (this is exporting a transcript plus one downloaded file, memo.txt). Looks like something funky is up with the renaming/disambiguating of directory structure there.

2023-02-21 18:07:37,060 - securedrop_client.export:357(send_file_to_usb_device) DEBUG: beginning export from thread 122304684939008
2023-02-21 18:07:37,061 - securedrop_client.export:224(_add_file_to_archive) DEBUG: Filename is memo.txt, arcname is export_data/memo.txt
2023-02-21 18:07:37,061 - securedrop_client.export:230(_add_file_to_archive) DEBUG: Disambiguate was true. New arcname is littoral_exhalation/3-littoral_exhalation-doc/memo.txt
2023-02-21 18:07:37,061 - securedrop_client.export:234(_add_file_to_archive) DEBUG: Add filepath /home/user/.securedrop_client/data/littoral_exhalation/3-littoral_exhalation-doc/memo.txt, arcname littoral_exhalation/3-littoral_exhalation-doc/memo.txt
2023-02-21 18:07:37,062 - securedrop_client.export:224(_add_file_to_archive) DEBUG: Filename is conversation.txt, arcname is export_data/conversation.txt
2023-02-21 18:07:37,062 - securedrop_client.export:230(_add_file_to_archive) DEBUG: Disambiguate was true. New arcname is data/littoral_exhalation/conversation.txt
2023-02-21 18:07:37,062 - securedrop_client.export:233(_add_file_to_archive) DEBUG: conversation.txt was true. New arcname is littoral_exhalation/conversation.txt
2023-02-21 18:07:37,062 - securedrop_client.export:234(_add_file_to_archive) DEBUG: Add filepath /home/user/.securedrop_client/data/littoral_exhalation/conversation.txt, arcname littoral_exhalation/conversation.txt
2023-02-21 18:07:39,172 - securedrop_client.export:160(_export_archive) DEBUG: Export subprocess returned unexpected value: "cp: cannot stat '/tmp/tmpj0g2epj9/export_data/': No such file or directory\nERROR_USB_WRITE" is not a valid ExportStatus

The archive MUST contain a directory called `/export_data`.
That is part of the sd-devices API and was my mistake.

Thanks @rocodes for narrowing the issue to te archive structure!
@gonzalo-bulnes gonzalo-bulnes force-pushed the export-all-downloaded-files-and-transcript branch from 5f9338d to 24f37a7 Compare February 22, 2023 08:36
@gonzalo-bulnes
Copy link
Contributor Author

gonzalo-bulnes commented Feb 22, 2023

Recap of troubleshooting:

  • some offline logging provided by @rocodes pointed to the /export_data directory being missing in the archive
  • the logging above confirmed that suspicion

I pushed 24f37a7 that fixes the issue.

@gonzalo-bulnes
Copy link
Contributor Author

gonzalo-bulnes commented Feb 22, 2023

The other topic I wanted to put to you and/or the group is about the labels on the QAction menu. I find the menu a bit confusing now in that I find it hard to distinguish between "Export Conversation" and "Export Conversation Transcript."

@rocodes I'd personally drop "Export Conversation Transcript". If that menu was less of a mixed bag, I probably wouldn't, but as it is it seems to me that reducing the number of actions would be beneficial. Exporting the conversation transcript was a stepping stone to exporting the entire conversation and I'd remove it until we hear that it is a desirable feature in itself. (We're working on assumptions, so I would personally not let that play in favor of worse UI.)


Edit: FWIW, I'd be comfortable leaving that for a separate PR so that we can touch base with @tina-ux!

@rocodes
Copy link
Contributor

rocodes commented Feb 22, 2023

Thanks for the quick fix @gonzalo-bulnes - with those changes I'm able to successfully export files + conversation.

I agree with you on not wanting to make changes without @tina-ux and further discussion. My immediate take is that we've been (loosely) calling this feature "Export All" internally, and to me that phrase takes the least amount of thought to understand, but I know it's a bit complicated (since it is really "export all-that-has-already-been-downloaded"). I'll open a separate issue and hopefully we can discuss there :)

@gonzalo-bulnes
Copy link
Contributor Author

Yeah, to be honest, even the "conversation" concept is new. I can't wait until we can properly review the overall copy writing guidelines and concepts we want to expose to journalists.

@philmcmahon
Copy link
Contributor

philmcmahon commented Feb 23, 2023

Very excited about this feature! Thanks for all the hard work on it we're looking forwards to trying it out. From our side, a 'download and export all' option would be helpful I think as that's what we'll want to do most of the time, but perhaps for a future bit of work

Update: Meant to say I've tried this branch out on my qubes workstation (using the patch to export to filesystem) and very happy with it.

@rocodes
Copy link
Contributor

rocodes commented Feb 23, 2023

@philmcmahon - thanks for trying it out, that's great - and noted on the feedback front. :) If you have any other requests around the 'actions' (download, export, etc) I have opened this discussion issue to collect some feedback there, and we'll try to offer more opportunities like that once we're done this next bunch of releases.

Copy link
Contributor

@rocodes rocodes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @gonzalo-bulnes! Did some extra testing, changes look great. Appreciate the extra documentation and inline comments, they're clear and helpful.

I'm approving if you wanna do the honours, but I'll hold off on merging in case you prefer to squash commits first.

@gonzalo-bulnes gonzalo-bulnes merged commit ee6da72 into main Feb 24, 2023
@gonzalo-bulnes gonzalo-bulnes deleted the export-all-downloaded-files-and-transcript branch February 24, 2023 05:58
@rocodes rocodes mentioned this pull request Mar 14, 2023
16 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants