Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add zip to archives in File Manager #2163

Merged
merged 23 commits into from
Oct 9, 2023

Conversation

pstaabp
Copy link
Member

@pstaabp pstaabp commented Aug 8, 2023

This adds the ability to use zip/unzip for archiving/unarchiving in the File Manager.

Those on both Mac and PCs have zip/unzip ability more than a tar.gz file, so will increase the access to do handle archives.

This currently has the ability to unarchive zip files in the File Manager, but not yet archive them. I've been wondering about the UI to do this. Some of my thoughts are

  1. a course configuration to toggle between zip and tar.gz files. Then "Make Archive" will use the desired method.
  2. Have a bootstrap dropdown button https://getbootstrap.com/docs/5.2/components/dropdowns/ that toggles between the two methods.
  3. Replace "Make Archive" with two buttons, "Make ZIP archive" and "Make tar.gz archive. (not a fan of this)

@pstaabp pstaabp changed the base branch from main to develop August 8, 2023 20:46
@pstaabp pstaabp force-pushed the add-zip-to-archives branch from 8a5943c to 44ceea6 Compare August 8, 2023 20:47
Copy link
Member

@drgrice1 drgrice1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks okay for now other than the repeated definition of $externalPrograms{unzip}.

Eventually we should eliminate $externalPrograms{tar} and $externalPrograms{unzip} and use Perl modules for the things they are used for (we already use the zip module and there is a good module for tar as well I believe). That would be more portable, and not have all of the possible issues with versions of these programs installed on systems that are not compatible. This should be done with many of the other $externalPrograms as well (like mv, cp, rm, and mkdir which all can be handled with either built in Perl or a module).

conf/site.conf.dist Outdated Show resolved Hide resolved
@drgrice1
Copy link
Member

drgrice1 commented Aug 9, 2023

By the way, I realize this is a draft, but the note on replacing $externalPrograms with Perl is one that has been on my mind for a while.

@pstaabp
Copy link
Member Author

pstaabp commented Aug 9, 2023

By the way, I realize this is a draft, but the note on replacing $externalPrograms with Perl is one that has been on my mind for a while.

Should we roll that into this PR or (probably) make a separate one? If a separate one, we should replace $externalProgams first, then I can add this one later.

@drgrice1
Copy link
Member

drgrice1 commented Aug 9, 2023

If you want to do that. This is okay for now though if you want to put it off.

@pstaabp pstaabp force-pushed the add-zip-to-archives branch from 44ceea6 to efd7a8e Compare August 10, 2023 12:49
@pstaabp
Copy link
Member Author

pstaabp commented Aug 10, 2023

Updated this and now has the ability to make either zip or tgz archives. Also, removed the use of $externalPrograms in favor of packages. It looks like Archive::Tar and IO::Compress::Zip are already installed on ubuntu, so didn't update the Docker files.

A few issues:

  1. I created a select to choose the archive type, but it stretches the width of the column. Changing the column width improved this, but that changes the overall layout. Thoughts on improving this would be appreciated.
  2. I can't figure out why when making an archive, it always reports 0 files added. On line 403 of FileManager.pm, including a warn $n; reports the right number, but the makeText isn't reading this right.

@drgrice1
Copy link
Member

The message that is reported when an archive is created is not on line 403 (nor 404 now). That message would be on line 378, and that maketext call has invalid maketext syntax for the quant. It is incorrect in develop and main (and has been since 2016). The syntax does not allow for spaces after commas. So 'Archive "[_1]" created successfully ([quant, _2, file])' should be 'Archive "[_1]" created successfully ([quant,_2,file])'.

@drgrice1
Copy link
Member

1. I created a select to choose the archive type, but it stretches the width of the column. Changing the column width improved this, but that changes the overall layout.  Thoughts on improving this would be appreciated.

Yeah, that isn't going to work. That looks particularly bad. I am not even sure that css styling can fix it. It is probably going to have to go somewhere else. Or most likely this is going to need to be reworked entirely to have the "Make Archive" button open a new page on which the user selects the type of archive. In my opinion this is needed anyway, rather than just blindly creating an archive file. The user should also have the option to select the archive file name, and a report should tell the user what files are going to be added to the archive.

@drgrice1
Copy link
Member

Also, you added the usage of the Archive::Extract perl module. That is not installed by default. It needs to be added to check_modules.pl and to the docker build. The Ubuntu package is libarchive-extract-perl.

@pstaabp
Copy link
Member Author

pstaabp commented Aug 10, 2023

Also, you added the usage of the Archive::Extract perl module. That is not installed by default. It needs to be added to check_modules.pl and to the docker build. The Ubuntu package is libarchive-extract-perl.

Oops. I was looking at Archive::Tar for a while, but Archive::Extract handled both zip and tar. Will update the check_modules and docker files.

@pstaabp
Copy link
Member Author

pstaabp commented Aug 10, 2023

Or most likely this is going to need to be reworked entirely to have the "Make Archive" button open a new page on which the user selects the type of archive. In my opinion this is needed anyway, rather than just blindly creating an archive file. The user should also have the option to select the archive file name, and a report should tell the user what files are going to be added to the archive.

We might be able to do this in a bootstrap dialog--otherwise I can create a new page.

@drgrice1
Copy link
Member

The FileManager.pm module has a mechanism for separate pages. It uses it for confirmations, file deletion, file view and edit, and the default view is the "refresh". So it is pretty easy to do this without creating a new module.

@pstaabp pstaabp force-pushed the add-zip-to-archives branch from efd7a8e to 7a79b63 Compare August 10, 2023 21:49
@pstaabp
Copy link
Member Author

pstaabp commented Aug 10, 2023

This adds Archive::Extract to check_modules.pl and libarchive-extract-perl to docker files.

In addition, the UI now has a secondary page for creating an archive.

@pstaabp pstaabp marked this pull request as ready for review August 10, 2023 21:49
@pstaabp pstaabp force-pushed the add-zip-to-archives branch from 7a79b63 to 925037a Compare August 10, 2023 21:51
Copy link
Member

@drgrice1 drgrice1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some serious problems with this pull request. The chdir issue is the biggest. The fact that archives can't actually be made at all with the new page is pretty big too though.

htdocs/js/FileManager/filemanager.js Outdated Show resolved Hide resolved
htdocs/js/FileManager/filemanager.js Outdated Show resolved Hide resolved
htdocs/js/FileManager/filemanager.js Outdated Show resolved Hide resolved
htdocs/js/FileManager/filemanager.js Outdated Show resolved Hide resolved
lib/WeBWorK/ContentGenerator/Instructor/FileManager.pm Outdated Show resolved Hide resolved
@drgrice1
Copy link
Member

There are some other problems with this as well. This does not deal with symlinks or directories correctly. Your code follows symlinks treating them as directories instead. The glob that you use for directories also does not recurse into subdirectories.

@pstaabp
Copy link
Member Author

pstaabp commented Aug 16, 2023

Working on this now. I think I can use Mojo::File to handle a number of the file related issues. There's a method list_tree to get all files recursively in a directory.

@drgrice1
Copy link
Member

I have a pull request to your branch coming soon that fixes all of the issues I listed. So you might want to wait for that instead of working on it yourself.

drgrice1 added a commit to drgrice1/webwork2 that referenced this pull request Aug 16, 2023
Fix the action.  There needs to be a hidden "confirm" input.

Improve layouts.

Also other clean up suggested in my review of openwebwork#2163.

Don't use `chdir`.

The IO::Compression::Zip module is too limited.  It can not add
directories to the zip file.  Empty directories in particular should be.
So use Archive::Zip instead.  This module is already in use by webwork
as well.
@drgrice1
Copy link
Member

I added a pull request to this branch with what I have for now. There is one issue left unresolved though. That is symbolic links in zip files. That should work, but there is a permissions error when attempting to add them. I don't know why yet.

drgrice1 added a commit to drgrice1/webwork2 that referenced this pull request Aug 16, 2023
Fix the action.  There needs to be a hidden "confirm" input.

Improve layouts.

Also other clean up suggested in my review of openwebwork#2163.

Don't use `chdir`.

The IO::Compression::Zip module is too limited.  It can not add
directories to the zip file.  Empty directories in particular should be.
So use Archive::Zip instead.  This module is already in use by webwork
as well.
drgrice1 added a commit to drgrice1/webwork2 that referenced this pull request Aug 16, 2023
Fix the action.  There needs to be a hidden "confirm" input.

Improve layouts.

Also other clean up suggested in my review of openwebwork#2163.

Don't use `chdir`.

The IO::Compression::Zip module is too limited.  It can not add
directories to the zip file.  Empty directories in particular should be.
So use Archive::Zip instead.  This module is already in use by webwork
as well.
drgrice1 added a commit to drgrice1/webwork2 that referenced this pull request Aug 16, 2023
Fix the action.  There needs to be a hidden "confirm" input.

Improve layouts.

Also other clean up suggested in my review of openwebwork#2163.

Don't use `chdir`.

The IO::Compression::Zip module is too limited.  It can not add
directories to the zip file.  Empty directories in particular should be.
So use Archive::Zip instead.  This module is already in use by webwork
as well.
@pstaabp
Copy link
Member Author

pstaabp commented Aug 16, 2023

There are some other problems with this as well. This does not deal with symlinks or directories correctly. Your code follows symlinks treating them as directories instead. The glob that you use for directories also does not recurse into subdirectories.

Should we include symlinked files in the archives? Right now I have a working version using list_tree from Mojo::File, but it doesn't appear that it follows symlinks.

@somiaj
Copy link
Contributor

somiaj commented Aug 16, 2023

Should we include symlinked files in the archives?

Yes, not only is the Library and Contrib links used by default for the OPL, others may be using symlinks as well, and the archive won't be complete without them.

@somiaj
Copy link
Contributor

somiaj commented Sep 2, 2023

What version of zip are you testing? Or maybe I can't make a .zip to test correctly.

$ unzip -l test.zip
Archive:  test.zip
  Length      Date    Time    Name
---------  ---------- -----   ----
        0  2023-09-01 19:40   test5
        0  2023-09-01 19:40   test6
        0  2023-09-01 19:39   ../test2
        0  2023-09-01 19:40   ../dir2/test3
        0  2023-09-01 19:40   ../dir2/test4
        0  2023-09-01 19:39   ../../test1
---------                     -------
        0                     6 files

But when I try to extract it, it won't extract things into ../ for me, and just gives me a warning.

$ unzip test.zip
Archive:  test.zip
 extracting: test5                   
 extracting: test6                   
warning:  skipped "../" path component(s) in ../test2
 extracting: test2                   
warning:  skipped "../" path component(s) in ../dir2/test3
 extracting: dir2/test3              
warning:  skipped "../" path component(s) in ../dir2/test4
 extracting: dir2/test4              
warning:  skipped "../" path component(s) in ../../test1
 extracting: test1                   

so the version of unzip I'm using in bookworm (6.0-28) seems to already remove ../ from any paths, but then extracts it in the cwd.

I think @drgrice1 suggestion of parsing the list and only extracting the valid files is best in the long run.

@drgrice1
Copy link
Member

drgrice1 commented Sep 2, 2023

We were testing this with the Archive::Extract module and this pull request. We were not using unzip directly.

@somiaj
Copy link
Contributor

somiaj commented Sep 2, 2023

Ahh, I incorrectly assumed Archive::Extract was using unzip in the backend, and looking at the depends now I see that isn't the case. Thanks.

@Alex-Jordan
Copy link
Contributor

Parsing ahead of time would be nice because you could identify which files you might be overwriting and report that to the user.

The `Archive::Extract` module is not safe with zip files and can extract
files outside of the current working directory (assuming the server has
write permission to do so).

The `Archive::Zip` and `Archive::Tar` modules will not extract outside
of the current working directory (by default).

Additionally, using the `Archive::Zip` and `Archive::Tar` modules gives
more power for archive extraction. If the "Overwrite existing files
silenetly" checkbox is not checked, then this now checks each file in
the archive to see if extracting it will overwrite an existing file.  If
so it refuses to do so, and a message is displayed.
@drgrice1
Copy link
Member

drgrice1 commented Sep 3, 2023

@pstaabp: I added yet another pull request to this branch that fixes the issue with extracting archives with .. in the path attempting to extract outside the course directory. It switches to using Archive::Zip for zip file extraction, and Archive::Tar for tar file extraction. So Archive::Extract is not used. It furthermore does not blindly overwrite existing course files with files from the archive. You now need to check Overwrite existing files silently to extract those files from an archive. At this point this change does not prevent overwriting course.conf and simple.conf, but it would be easy to do that with this approach. I put that off for another pull request for now, since there are other things that need to be done with that.

@pstaabp
Copy link
Member Author

pstaabp commented Sep 4, 2023

Merged @drgrice1 PR and also updated the error handling. Instead of listing every existing file, it lists the number and the first filename or two.

I had an example with 30+ existing files and the previous version listed every one.

@drgrice1
Copy link
Member

drgrice1 commented Sep 4, 2023

Merged @drgrice1 PR and also updated the error handling. Instead of listing every existing file, it lists the number and the first filename or two.

I had an example with 30+ existing files and the previous version listed every one.

I don't really think you should have done that. This is an edge case that really should never happen. Archive files should never be created that do this sort of thing. If you are creating an archive that you are going to share with someone else in particular, this is considered extremely bad practice. If you have an archive that does this the message should show it all. Furthermore, you did this for zip archives, but you didn't do it for tar archives. Very inconsistent behavior. If you extract an improperly made tar archive that has files that would be extracted outside of the working directory, you still will get each file listed. Note that for tar archives this is handled by the Archive::Tar module.

Copy link
Member

@drgrice1 drgrice1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This now needs more work. All files need to be shown. Not a summary potentially showing at most 2. Those can be combined into a single alert, but none of them can be omitted. Perhaps in the extreme of a very large number this can be done, but showing only 1 or 2 does not give the user enough information.

Comment on lines 453 to 454
my (@members, @existing_files);
my $num_extracted = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The @members variable defined here is never used. You still have my @members in both of the if .. elsif below that shadows this variable.

Comment on lines 491 to 497
$c->addbadmessage($c->maketext(
'There are [_1] files that already exist including [_2] and [_3]. '
. 'Check "Overwrite existing files silently" to unpack these files',
scalar(@outside_files),
$outside_files[0],
$outside_files[1]
));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very bad idea. The user needs to know ALL files that would be overwritten to make the determination to check the "Overwrite ..." checkbox. This is not thought out.

Comment on lines 491 to 497
$c->addbadmessage($c->maketext(
'There are [_1] files that already exist including [_2] and [_3]. '
. 'Check "Overwrite existing files silently" to unpack these files',
scalar(@outside_files),
$outside_files[0],
$outside_files[1]
));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't even the right message. These are files that would be extracted outside the working directory, and the message says these files exist.

@drgrice1
Copy link
Member

drgrice1 commented Sep 4, 2023

@pstaabp: And another pull request to this one. This pull request makes it so that the first thirty files are listed for both files outside the working directory or already existing, and if there are more than that a note is made to that effect.

@drgrice1
Copy link
Member

drgrice1 commented Sep 4, 2023

Furthermore, each of those lists are in a single alert.

drgrice1 and others added 2 commits September 5, 2023 05:32
This makes it so that the first thirty files in both lists are shown.
If there are more than thirty, then the last item in the list will say
that there are more files not shown.

To make this work a `min` method was added to Utils.pm (there was a max
method, but no min method?), and the change in openwebwork#2194 to make good and
bad messages be in a `div` instead of a `p` was added here too.
Fix messages for existing files or files outside the course directory.
@pstaabp
Copy link
Member Author

pstaabp commented Sep 6, 2023

Looks good. Merged.

Copy link
Member

@drgrice1 drgrice1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I think this is good again, and ready for another round of testing by someone else.

@somiaj
Copy link
Contributor

somiaj commented Sep 14, 2023

Finally getting around to test this. I have only been partially following the discussion here, so I may have missed somethings already discussed. My initial observations (I didn't test it in a way to try to break things with malformed archives). I only think the first one needs to be addressed. The others are just my thoughts when using the interface.

When you select a single directory the file name is just directory name (Without .zip appended) while if you select multiple files/directories the filename becomes webwork_files.zip with the .zip extension. In both cases by extension is the default, and if no extension is added .zip is used. I would suggest making this consistent, and would slightly prefer that the name webwork_files is used as it is easier to add an extension than delete one (though adding .zip to the directory name would make this consistent).

When creating archive files in sub-directories, the archive is placed in the current working directory. This is reasonable, but it also starts to scatter archives around if people archive sub-directories. Would placing all archives in a single location like an archives directory of the templates directory be reasonable? Though this may just add other issues that are less desirable. The main ones I see is it will be easier to duplicate filenames and possible issues with extracting archives.

When unpacking archives, they are unpacked into the current directory. Would adding an option to unpack them to some other directory and a confirm page be useful? I also didn't realize (until I got an error) that Overwrite existing files silently also affects unpacking archives from then unpack archive button. I also didn't see mention of this in the help for the page. Maybe add something to the help about this?

You can make archives of archives, and if someone starts to put archives in various sub directories, they may accidentally create archives of archives. Would it be reasonable to exclude archives from archives? Or have an option to exclude them checked by default? The archive page could also just not select archives by default, but list them, and maybe give the user a message that archives were found and are not selected by default. Similar issue would happen with the Archive Course functionality as archives including archives.

When no files are selected the Make Archive button is greyed out. Since you have to go to another page to select what is finally archived, would having this default to archiving the current directory be reasonable? Or maybe open up the subpage with all files in the current directory listed, but not selected? Current functionality is fine by me, but I might have missed the reasoning of why it is greyed out, because to me it could just bring one to the next page even if nothing is selected.

@somiaj
Copy link
Contributor

somiaj commented Sep 14, 2023

One more thing, since the directory name doesn't append .zip by default, I noticed that if there is no extension, selecting either zip or tar won't add the correct extension, only change it if it was already .zip or .tgz. Similar if someone puts a bad extension, like .foo, select zip or tar it won't fix or add an appropriate extension (though maybe due to the issue of what to do with a bad extension like .foo, the javascript will only change valid extensions)?

I guess this makes it more reasonable to add .zip to the directory name by default than my suggestion above of removing .zip from webwork_files.zip.

@drgrice1
Copy link
Member

When you select a single directory the file name is just directory name (Without .zip appended) while if you select multiple files/directories the filename becomes webwork_files.zip with the .zip extension. In both cases by extension is the default, and if no extension is added .zip is used. I would suggest making this consistent, and would slightly prefer that the name webwork_files is used as it is easier to add an extension than delete one (though adding .zip to the directory name would make this consistent).

The .zip extension should be there in all cases by default. I will look into why it is not added for a directory. Although it is easier to add an extension than delete one, it is not as clear to the user what is happening.

When creating archive files in sub-directories, the archive is placed in the current working directory. This is reasonable, but it also starts to scatter archives around if people archive sub-directories. Would placing all archives in a single location like an archives directory of the templates directory be reasonable? Though this may just add other issues that are less desirable. The main ones I see is it will be easier to duplicate filenames and possible issues with extracting archives.

This is not feasible or even really realistic. The current working directory approach is the most reasonable way to do this. Placing the file in an archives directory doesn't prevent the instructor from moving it elsewhere. An archive file can be uploaded into any location as well. We don't need another directory that is needed in the course directory structure in any case.

When unpacking archives, they are unpacked into the current directory. Would adding an option to unpack them to some other directory and a confirm page be useful? I also didn't realize (until I got an error) that Overwrite existing files silently also affects unpacking archives from then unpack archive button. I also didn't see mention of this in the help for the page. Maybe add something to the help about this?

Unpacking to the current working directory is good enough for now. It is the way it currently works, and is all that is really needed at this point. You can add this as a feature request for future development if you like. A note on where archives are extracted could be added to the help. Using Overwrite existing files silently when unpacking archives is new with this pull request. Previously files were blindly overwritten. A comment on this could be added to the help, but the behavior is straightforward, and instructions are given when a file in an archive would be overwritten.

You can make archives of archives, and if someone starts to put archives in various sub directories, they may accidentally create archives of archives. Would it be reasonable to exclude archives from archives? Or have an option to exclude them checked by default? The archive page could also just not select archives by default, but list them, and maybe give the user a message that archives were found and are not selected by default. Similar issue would happen with the Archive Course functionality as archives including archives.

Archives can be excluded by un-selecting them on the make archive page. Note that the previous behavior was to blindly add all files in sub-directories.

When no files are selected the Make Archive button is greyed out. Since you have to go to another page to select what is finally archived, would having this default to archiving the current directory be reasonable? Or maybe open up the subpage with all files in the current directory listed, but not selected? Current functionality is fine by me, but I might have missed the reasoning of why it is greyed out, because to me it could just bring one to the next page even if nothing is selected.

The current design makes more sense and is consistent with how the other buttons work on the page. You can quite easily select all files in the current working directory if you want all files in the archive. The point of the second page is to allow you to see what you have selected (including all files in subdirectories) and allow you to remove files (particularly from subdirectories) you don't want in the archive. I don't see your confusion on the greying out to be frank.

@drgrice1
Copy link
Member

@pstaabp: I added another pull request that makes sure the .zip extension is always there by default, even if a file or directory is selected that does not have an extension to begin with.

@drgrice1
Copy link
Member

@somiaj: I want to clarify that I think some of the things you suggest are reasonable, but I think they are more aggressive than is needed for this pull request. This pull request started merely to add the feature of zip archive extraction and creation. I asked @pstaabp to add the new archive sub page to what he had initially, and since then several other things were requested. I think that this pull request has gone far enough, and needs to be merged. We can then add other features and such. I already have some other changes, and am tired of making pull requests to this one.

@Alex-Jordan
Copy link
Contributor

Would placing all archives in a single location like an archives directory of the templates directory be reasonable?

I think an issue with this would be that if I'm in one working directory and make a generic webwork_files.zip archive, later in the proposed central archives directory I don't really see where the archive originated. If some time passes I'm not going to know. And then this would be compounded if I were in some other working directory and made another generic webwork_files.zip archive that I suppose would be named webwork_files2.zip or something.

I will try to test this later today or tomorrow.

Add the .zip extension when a single file or directory without extension is selected.
@Alex-Jordan
Copy link
Contributor

I wanted to test this, but I'm just swamped with day job stuff now. If @pstaabp, @drgrice1, and @somiaj all approve it as it is, please don't wait on me.

@drgrice1 drgrice1 merged commit ad1c5ba into openwebwork:develop Oct 9, 2023
1 check passed
@pstaabp pstaabp deleted the add-zip-to-archives branch October 9, 2023 20:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants