-
Notifications
You must be signed in to change notification settings - Fork 168
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problems with new ISISDATA downloadIsisData Utility #5024
Comments
@KrisBecker I caught these over the weekend as well. I fixed most of them and can get a PR in a little bit. I think they got through with the haphazard way that PR got merged during a release. |
@KrisBecker Do you want to try redownloading using the instructions at the bottom of dev's readme? |
@Kelvinrr I downloaded the script and the rclone.conf files as described at the bottom of dev/README.md. There are a few issues that remain. (Note I am using python 3.9.13.) There appears to be two sources for each dataset to download kernels from - one from the mission SPICE archive source (sometimes NAIF) and then the USGS ISIS data. When running the script for a particular mission, say
More concerning is that when the
Note two critical directories (spk, lsk) are completely removed! And comparing the sizes of the old download and new:
Some mission cases are extreme. Take for example, OREX. Its SPICE archive is 397GB and the final install is 8.7GB - nearly 45 times the final download size! Also, many of the references to mission SPICE archives fail to download at all for some reason. Here is the log for mariner10 data:
Here is the rclone.conf entries for URLs and then mariner10:
Note the directory on the NAIF public kernel server, Finally, in the verbose output it would nice to log the URL/location of the download. |
@KrisBecker on the data being deleted that's definitely a bug. The public data should be much larger than the usgs data which is why it ends up being really small at the end when one gets clobbered for the other. The intent would be to do sync a union of the two. It should be an easy fix. You can add two v's in order to get debug output with the URLs:
I'll add a help string to include the verbose levels. As far as some of the mariner stuff not downloading, looking at the output it looks like it did copy the files over? It's only 1.8mb of data so if it's already there the second pass might now download anything. I think right now there are some redundancy between the usgs data source and the public source that needs to be cleaned out. So if the public source is downloaded first, you don't really download anything on the second pass of downloading from usgs. |
@Kelvinrr please do not do a union of the NAIF SPICE archive and ISIS data. This creates an absolutely unnecessary burden on users. If anything, an intersection of the two would be sufficient. But even downloading the SPICE archive at this point does not add anything that is not already in the ISIS data. The kernel maintenance scripts are designed to download from the SPICE archive any kernel version updates but only the required ones. And you do not want to just update any of the kernels without considering the impact of that action. For example, you would not want to simply install a PCK or IK kernel without evaluating the impact on geometry. As I pointed out, the OREX archive is huge - nearly 400GB. It is unreasonable to burden users with accommodating the disk space required for the entire archive when only 9GB is required to support the mission in the ISIS environment. |
@KrisBecker I think downloading everything and using the kerneldb's to control what ISIS actually uses when spiceiniting and having the rest of the kernels available is fine as test failures right now seem mostly from things missing not things being there that shouldn't. I see your point about unused kernels, so I don't think it's unreasonable to create whitelists for kernels based on the kerneledb files to not bloat things more than they need to be or disable non usgs sources for active missions.
Edit: After thinking about it more, my first question would be what are the 400GBs or OREX data doing in a these stores if most of them are not useful? Is there a way OREX can partition these kernels such that they are easier to download only what is needed without us maintaining some kind of whitelist instead? |
I recommend you disable the SPICE archive download for all missions. Then add a parameter, e.g., For example, there are many missions, e.g., one being rosetta, that have a |
No - generally applicable to all public SPICE kernel archives produced by missions (not just instrument teams). |
To be really explicit here: USGS is no longer able to be the repository of record for these SPICE kernels for the community. We will continue to be the repository of record for those kernels which we produce (e.g., smithed THEMIS kernels). For all other kernels, we will be providing a mechanism to download kernels from a publicly available source. If that source is making 400GB of kernels available to the community with no mechanism to provide only those kernels which the community finds most useful (in whatever context, whether ISIS or otherwise) a multitude of options exist including, but not limited to:
As stated elsewhere, starting in early November, we are no longer serving a curated subset of kernels due to a number of policy and data release requirements. We are providing the download scripts as a means for users to access kernels from their repositories of record. We will continue to serve supplemental elements that are needed, e.g., IAKs as they are a component of ISIS and not a product generated by a team. |
Can you clarify what the contents of the USGS kernel sources will be after USGS discontinues hosting kernels? Comments in this thread and #5026 indicate that at the end of November 2022, the complete content of all ISISDATA areas will be scrubbed of all SPICE kernels. It appears that the only files that will exist on the AWS servers will be the configuration files and all kernels will come from archive or mission sources. I am asking this question because current testing indicates the presence of SPICE kernels in the USGS AWS sources. Will they still exist after USGS discontinues hosting kernels? |
@KrisBecker The short answer is that the USGS stores will only have things that are not in other hosted areas (naif, ESA, etc.). So it'll still have kernels that we publish (e.g. smithed kernels) and stuff that's difficult to get elsewhere that it's easier to just host (some random kernels and data here and there). |
I have just completed the download of all ISISDATA. Can you confirm the total size is 1.9TB? |
I have a preliminary version of an rclone filter for ISISDATA that greatly reduces the current download size. I will stress this is preliminary and should be used with caution, particularly in an active mission or research situation. That said, it would be good to get some testing if there is an interest to incorporate this as part of the USGS instruction set. The custom reclone filter file (via Gist) isisdata_rclone_filter_from.lis can be provided as the optional The reclone filtering documentaton is helpful. Here are some examples taken from the file isisdata_rclone_filter_from.lis:
|
Thank you for your contribution! Unfortunately, this issue hasn't received much attention lately, so it is labeled as 'stale.' If no additional action is taken, this issue will be automatically closed in 180 days. |
The actual filtering(blacklisting/whitelisting) of the data area being downloaded is now handled by the --include/--exclude and --filter flags and that aspect of this discussion seems to be resolved. the size of the data area looks to be correct as well from my testing. I would direct future conversation on filtering issues towards issues dealing more directly with that feature, i.e. #5264 with kernels being excluded and seems to be more a misunderstanding on how the rclone filtering actually works rather than an actual issue with downloadIsisData). |
ISIS version(s) affected: 7.1.0_RC1
Description
I have encountered what appears to be a bug when using the new ISISDATA data download utility, downloadIsisData.
Downloading/Updating All ISISDATA
When running the recommended command to update all ISISDATA directories, the following error is produced:
Indications are that
ALL
is not a supported mission.Documentation Issues
The utility documentation is inconsistent with ISIS documentation and provides an invalid example.
When running the example, it produces the following:
It looks as though the
./data/tgo
should just be./data
.And the example uses the command with
python
invocation and a.py
extension, which is not a viable example given the installation of the script.Script Installation
The installation of the utility is inconsistent with runtime scenarios. For example, the CMAKE installation of
downloadIsisData
is in the CMAKE_INSTALL_PREFIX directory, but the find_conf() Python function looks for it in $CONDA_PREFIX. Perhaps using $ISISROOT would be indicated here for consistency.This may not be directly related, but I'll mention it anyway. The installation of the $ISISROOT/scripts directory does not preserve permissions as set in the ISIS source tree:
In the installation directory, which happens to be $CONDA_PREFIX:
Maybe this is intentional.
How to reproduce
Possible Solution
Additional context
The text was updated successfully, but these errors were encountered: