-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add action to (re-)generate and export conversation transcript and downloaded files #1627
Add action to (re-)generate and export conversation transcript and downloaded files #1627
Conversation
@rocodes @cfm I realized that in the
Now, for "historical" reasons (not old history, but still) the dialog that exports one file was called That small renaming of course touched a gazillion of files. I kept the two commits that rename files separate, and we can pull them to a separate PR if you prefer. I don't have any preference myself, but that's what explains that more files were touched than in #1624. Otherwise, the only significant differences with that PR are:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't request changes from myself, yet:
- disambiguation is required (e.g. two files are called
memo.txt
- we discussed it previously, when thinking about what files names to print in the transcript (that ideally would match the file structure when the conversation is exported)
So, my current thinking is:
- Ensure the files are stored in the archive in the same way they are stored in the
.securedrop-client/data
directory - Adjust the transcript template accordingly
- Discuss if we want to also use that structure when:
- exporting a single file
- exporting the conversation transcript only
☝️ I think we should, so that I can export the transcript, change my mind, and export the entire conversation without that resulting in mutliple copies of the same file on my USB drive.
de182c8
to
85a2f64
Compare
85a2f64
to
a87f7ae
Compare
@rocodes @cfm Can you take a look at this branch as it is now please? If we're happy with the behavior, we can talk about the implementation (I expect comments!) and adjust the transcript accordingly. (But I'd rather we agree on the behavior we want before moving forward with the transcript and eventual backports of the structure —see previous comment.) |
Chiming in as you requested in #1627 (review), @gonzalo-bulnes: I like the disambiguation you've implemented in a87f7ae. I think it could benefit from some inline documentation of why it's doing what it's doing. But that need would actually be lessened by generalizing this behavior as you suggest to all exports, which would also make export behave more consistently: different export operations (transcript, one file, multiple files) will always yield consistent (if consistently clunky :-) archive structures. @rocodes, I'll defer to you otherwise, but feel free to tag me back in! |
I'm just finishing testing out your changes (and therefore my review), but I'll reply first on the comment you made above. One point to note about this comment:
Currently, every time the user runs an export, it exports to a new folder on the user's USB drive (the folder name is a |
🤦♀️ That is true @rocodes! Thank you so much. I keep forgetting about that wrapping... meaning I probably don't spend enough time with actual export devices. 😬 That I think answers the part of the checkboxes you are confused about. I was laying out questions:
① Do we want to modify the structure that's exported when exporting a single file so that I can export one file, then an entire conversation and avoid confusion? I think both questions' answers are the same once we take your observation into account: there will be no overwrites occurring during multiple exports, because each export is contained in its own directory. We don't need to modify how the existing export functions work, and I believe the answer to the questions is that we shouldn't modify the existing export functions because that would be more trouble that help at this time. 👈 Decision (?) What do you think @rocodes @cfm? |
If we agree on not modifying the existing export features, then the next step is to:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @gonzalo-bulnes, thank you for preparing this :)
Are you able to successfully export to USB using this branch? Testing (STR: staging setup, check out your branch in sd-app), I'm not able to successfully export a conversation transcript + files. The client shows an UNEXPECTED_RETURN_STATUS
error, and in sd-devices, the sd-export-$timestamp directory is created on the target USB, but the directory is empty, and in the logs I can see ERROR_USB_WRITE. This problem does not occur when Export Conversation doesn't contain any downloaded submissions (i.e. when n files == 1).
The sd-devices code naively copies everything in export_data
via cp -r
, so it shouldn't barf with more than one file to copy as long as they are all in the export_data directory. I think there's something funny about how the export call is happening, because it looks like the drive is closed and locked, and you can see there is an export error (a write error) during the disk_check routine, which is fishy.
logs
2023-02-15 16:27:42,657 - securedrop_export.archive:38(validate) INFO: Parsing archive metadata
2023-02-15 16:27:42,658 - securedrop_export.archive:43(validate) INFO: Target: disk, encryption_method luks
2023-02-15 16:27:42,658 - securedrop_export.archive:55(validate) DEBUG: Validate export action
2023-02-15 16:27:42,658 - securedrop_export.main:63(entrypoint) INFO: Archive extraction and metadata validation successful
2023-02-15 16:27:42,658 - securedrop_export.main:69(entrypoint) INFO: Start disk service
2023-02-15 16:27:42,658 - securedrop_export.disk.legacy_service:77(export) INFO: Export archive is disk
2023-02-15 16:27:42,659 - securedrop_export.disk.cli:34(get_connected_devices) INFO: Checking connected volumes
2023-02-15 16:27:42,662 - securedrop_export.disk.cli:61(_get_removable_devices) INFO: Checking removable devices
2023-02-15 16:27:42,669 - securedrop_export.disk.cli:81(_get_removable_devices) INFO: 1 connected
2023-02-15 16:27:42,670 - securedrop_export.disk.cli:121(_check_partitions) DEBUG: Checking device partitions on /dev/sda
2023-02-15 16:27:42,673 - securedrop_export.disk.cli:98(get_partitioned_device) DEBUG: Counted 0 partitions
2023-02-15 16:27:42,673 - securedrop_export.disk.legacy_service:86(export) INFO: Check if LUKS
2023-02-15 16:27:42,673 - securedrop_export.disk.cli:139(is_luks_volume) DEBUG: Checking if target device is luks encrypted
2023-02-15 16:27:42,688 - securedrop_export.disk.cli:159(_get_luks_name_from_headers) DEBUG: Get LUKS name from headers
2023-02-15 16:27:42,714 - securedrop_export.disk.cli:196(get_luks_volume) DEBUG: Mapped name is luks-7d7f1303-b41f-49ad-af2a-e5a5dce9a450
2023-02-15 16:27:42,715 - securedrop_export.disk.cli:264(_get_mountpoint) DEBUG: Checking mountpoint
2023-02-15 16:27:42,720 - securedrop_export.disk.cli:291(mount_volume) INFO: The device is already mounted
2023-02-15 16:27:42,721 - securedrop_export.disk.cli:293(mount_volume) WARNING: Mountpoint was inaccurate, updating
2023-02-15 16:27:42,721 - securedrop_export.disk.legacy_service:89(export) INFO: Check if writable
2023-02-15 16:27:42,722 - securedrop_export.disk.legacy_service:97(export) INFO: Export submission to /media/user/transfer
2023-02-15 16:27:42,727 - securedrop_export.disk.cli:348(write_data_to_device) DEBUG: Copying file to sd-export-20230215-162742
2023-02-15 16:27:42,729 - securedrop_export.disk.cli:369(cleanup_drive_and_tmpdir) DEBUG: Syncing filesystems
2023-02-15 16:27:42,775 - securedrop_export.disk.cli:386(_unmount_volume) DEBUG: Unmounting drive from /media/user/transfer
2023-02-15 16:27:42,825 - securedrop_export.disk.cli:404(_close_luks_volume) DEBUG: Locking luks volume <securedrop_export.disk.volume.Volume object at 0x7a8b0e192f70>
2023-02-15 16:27:42,888 - securedrop_export.disk.cli:418(_remove_temp_directory) DEBUG: Deleting temporary directory /tmp/tmpn7c77_go
2023-02-15 16:27:42,894 - securedrop_export.disk.legacy_service:115(export) ERROR: Error encountered during disk format check: ERROR_EXPORT
2023-02-15 16:27:42,894 - securedrop_export.disk.legacy_service:131(_legacy_status) INFO: Convert to legacy: ERROR_EXPORT
2023-02-15 16:27:42,895 - securedrop_export.main:73(entrypoint) ERROR: Encountered exception ERROR_USB_WRITE, exiting
2023-02-15 16:27:42,895 - securedrop_export.main:74(entrypoint) ERROR:
2023-02-15 16:27:42,895 - securedrop_export.main:154(_exit_gracefully) INFO: Exit gracefully with status: ERROR_USB_WRITE
2023-02-15 16:27:42,895 - securedrop_export.main:178(_write_status) INFO: Write status ERROR_USB_WRITE
The other topic I wanted to put to you and/or the group is about the labels on the QAction menu. I find the menu a bit confusing now in that I find it hard to distinguish between "Export Conversation" and "Export Conversation Transcript." (I'm attaching a screenshot so that others less familiar can take a quick peek). I think we might be able to make those actions a bit more distinct with a slight adjustment to the label names, but I would like to know what others think. I will also investigate the export error a little more in case I see any obvious reasons why it isn't working.
securedrop_client/export.py
Outdated
@@ -184,8 +184,9 @@ def _create_archive( | |||
with tarfile.open(archive_path, "w:gz") as archive: | |||
cls._add_virtual_file_to_archive(archive, cls.METADATA_FN, metadata) | |||
|
|||
needs_disambiguation = len(filepaths) > 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think disambiguate
/needs_disambiguation
is a little confusing without comments. I suggest maybe a naming convention like is_multifile_export
instead of needs_disambiguation
, and then a comment in the _add_file_to_archive method about what that parameter does, or just adding comments for clarity if you prefer to keep the names untouched
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
edit: having read your comment above, the is_multifile_export
name may be a slightly inaccurate choice, but I'm still in favour of naming for what it is/does (is_name_collision), vs what we think of it (needs_disambiguation), and/or adding a comment explaining how we resolve this once we decide on the correct resolution
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think of an imperative: prevent_name_collisions: bool
? @rocodes
(We don't know name collisions will happen if multiple files are being exported, we only know they're likely.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You know what I'm gonna say...whatever you decide is cool as long as you document it :) Stylistically, based on what you said I'd go with an is_possible_name_collision
, but I really don't feel strongly as long as someone new reading the code understands quickly what's happening, here and in add-file-to-archive
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I renamed the variable to be close to the condition (is_one_of_multiple_files
) and added a comment explaining why that's relevant.
I renamed the disambiguate
option to prevent_name_collisions
because the method is called add_file_to_archive
and that seemed like a reasonable name for an option.
Short answer: I don't know! 🙈 This is a huge problem, and I really didn't expect it, so I was very cool with deferring the end-to-end USB drive testing for later in the review process. I'll look into it as well, please keep me posted on your findings 👁️. I'll first try to rule out any issue in |
@gonzalo-bulnes I am winding down for the day, but I do think the culprit is the directory structure. Added some debug lines to the client and rebuilt, and I see the following happening in the
|
The archive MUST contain a directory called `/export_data`. That is part of the sd-devices API and was my mistake. Thanks @rocodes for narrowing the issue to te archive structure!
5f9338d
to
24f37a7
Compare
@rocodes I'd personally drop "Export Conversation Transcript". If that menu was less of a mixed bag, I probably wouldn't, but as it is it seems to me that reducing the number of actions would be beneficial. Exporting the conversation transcript was a stepping stone to exporting the entire conversation and I'd remove it until we hear that it is a desirable feature in itself. (We're working on assumptions, so I would personally not let that play in favor of worse UI.) Edit: FWIW, I'd be comfortable leaving that for a separate PR so that we can touch base with @tina-ux! |
Thanks for the quick fix @gonzalo-bulnes - with those changes I'm able to successfully export files + conversation. I agree with you on not wanting to make changes without @tina-ux and further discussion. My immediate take is that we've been (loosely) calling this feature "Export All" internally, and to me that phrase takes the least amount of thought to understand, but I know it's a bit complicated (since it is really "export all-that-has-already-been-downloaded"). I'll open a separate issue and hopefully we can discuss there :) |
Yeah, to be honest, even the "conversation" concept is new. I can't wait until we can properly review the overall copy writing guidelines and concepts we want to expose to journalists. |
Very excited about this feature! Thanks for all the hard work on it we're looking forwards to trying it out. From our side, a 'download and export all' option would be helpful I think as that's what we'll want to do most of the time, but perhaps for a future bit of work Update: Meant to say I've tried this branch out on my qubes workstation (using the patch to export to filesystem) and very happy with it. |
@philmcmahon - thanks for trying it out, that's great - and noted on the feedback front. :) If you have any other requests around the 'actions' (download, export, etc) I have opened this discussion issue to collect some feedback there, and we'll try to offer more opportunities like that once we're done this next bunch of releases. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @gonzalo-bulnes! Did some extra testing, changes look great. Appreciate the extra documentation and inline comments, they're clear and helpful.
I'm approving if you wanna do the honours, but I'll hold off on merging in case you prefer to squash commits first.
Description
The shortest path to providing an option to export a entire conversation to journalists.
Entire conversation = freshly (re-)generated conversation transcript + all the downloaded files [1]
[1] On one hand, there is an option to download all files, which is easy to trigger. On the other hand, if a file was intentionally not downloaded, conditioning the export to an automatic download seems counter-productive.
Context
This is a direct continuation of #1624, the same context applies.
Implementation
export.Dialog
toexport.FileDialog
because that's what it is, and there is a parallel with theexport.PrintFileDialog
. breaking the symmetry would become confusing I think.conversation/export.Dialog
name to export the entire conversation.Test plan
4 files
if three files were downloaded (the transcript is the fourth file)(This patch allows to test this PR in a lightweight way. That being said, a proper export to a USB device is desirable.)