Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

isis.dist - rsync data - shutting down 11.30.22 #5026

Closed
jlaura opened this issue Aug 9, 2022 · 26 comments
Closed

isis.dist - rsync data - shutting down 11.30.22 #5026

jlaura opened this issue Aug 9, 2022 · 26 comments
Labels
inactive Issue that has been inactive for at least 6 months

Comments

@jlaura
Copy link
Collaborator

jlaura commented Aug 9, 2022

The servers running at isisdist, where the isisdata are being downloaded from are scheduled to be turned off 11.30.22.

This issue is to discuss anything related to that server being turned off. The new download scripts for ISIS data are available in the 7.1.0_RC1 release.

@KrisBecker
Copy link
Contributor

Are there procedures/instructions for adding external contributions for new mission's (instrument) SPICE data using this new process?

@cfassett
Copy link

Does the new download procedure include a "skip kernels" option? Not seeing one...

@jessemapel
Copy link
Contributor

@cfassett I don't think so but that would be a good addition

@jlaura
Copy link
Collaborator Author

jlaura commented Aug 23, 2022

@KrisBecker I do not believe instructions are up for modifying the download script to point at new locations for kernels. Looking at rclone.conf, adding new kernel sources looks to be as simple as adding an entry to that file. That file is making use of aliases, so the naif entry is then used as naif: for pulling kernels over http. @Kelvinrr is that correct?

I want to note too that, with this transition, the ASC is pointing at the repository of record for kernels, e.g., a mission team's repo of their kernels or the NAIF hosted archive. We will continue to serve the kernels that we generate (e.g., smithed kernels made from controlled products), but all other kernels will be coming from their original sources.

Hopefully that answers your question?

@Kelvinrr
Copy link
Collaborator

@jlaura That's correct. You just need to append to the config with the URL. I can write those docs once I get a few other things off my plate this week.

@cfassett we can add an ignore kernels option if you want to open an issue about it describing exactly what you want. The work around for now is to use rclone directly similar how one would use rsync. That config file is really just an rclone conf describing the endpoints for the different sources and the script is more for convenience of downloading everything at once.

@jessemapel
Copy link
Contributor

rclone appears to have a similar option to the rsync --exclude option:

https://rclone.org/filtering/

@KrisBecker
Copy link
Contributor

@jlaura after working with the download script and config in #5024, I have concerns.

There is no apparent way to specify select files or patterns of kernel file names to minimize the download size. Hopefully that can be added/worked out.

ASC ISIS data directories contain kernel DB config files, IAK kernels, scripts to create/update the kernel DB files and calibration data. It may be difficult to provide those in mission SPICE archives that remain valid, particularly with an active mission. That is unless ASC works with missions to host and maintain these data as they do contributed code.

Some older/completed mission update their SPICE data periodically which will very likely change ISIS app/unit tests. Using an external source for SPICE data introduces uncertainty which may cause conflicts and introduce undesirable effects for users and developers. This is particularly true for PCKs and other configurations that will change geometry.

SPICE kernel management/configurations is generally not trivial for many missions and some require a good bit of specialization.

How will the ASC SPICE web server work under these conditions?

@jlaura
Copy link
Collaborator Author

jlaura commented Aug 25, 2022

@KrisBecker Thanks for your concerns.

Missions will in fact have to work with the ASC in order to keep the generation of ancillary files (IAKs, DB configs, etc.) in sync. The mission is the repository of record for their SPICE (until they deliver to NAIF). In order for the SPICE to be usable in ISIS, we will need coordination.

When SPICE data updates, tests need to update. This is neither new nor novel. This is a major reason to maximize the disconnect between live mission data and unit/app tests. In this way, only module tests will be impacted. We are working to minimize the impact there as well (e.g., see the recent TGO CaSSIS module test update PRs). If you would like to be involved in helping update those tests, just let me know and we can help you to start opening PRs (jlaura@usgs.gov).

I absolutely agree that SPICE management is not trivial. The ASC has and continues to be available to provide support (not direct hosting) to mission teams. I invite any missions team (in planning, active, or decommissioned) to reach out and discuss how we might be able to support their SPICE needs).

The ASC SPICE server is a service that we maintain and will continue to maintain. It is not a service that is community maintained at this time.

@KrisBecker
Copy link
Contributor

KrisBecker commented Aug 25, 2022

I have a suggestion that may make this easier.

Create in each mission directory a $ISISROOT/src/<mission>/config/kernels directory that contains SPICE installation details. We put a rclone.conf (or whatever, I prefer a JSON config) in this directory that contains all the information necessary to install a full up mission SPICE data area. This would include all those things I mentioned previously, such as (mostly) static kernel DB files (including the IAK) and scripts to create and maintain the dynamic SPICE kernel DBs for the mission $ISISDATA infrastructure. This binds the SPICE install to the code.

We can then expand this concept to include $ISISROOT/src/<mission>/tests for the contents of ./apps and ./objs.

And finally, design and develop an ISIS ABI compatible interface (opaque pointers?) to Camera and the ingestion and calibration applications that can be installed and configured in any ISIS system version.

This would allow us to develop individual mission packages that can be managed as ISIS dependencies.

If we do it well, this can also be made to support the CSM and other interfaces such as a robust Python callable system.

This would be a really nice system for the ISIS LTS as well since the mission package can evolve independent of ISIS policy.

@jlaura
Copy link
Collaborator Author

jlaura commented Aug 25, 2022

@KrisBecker Sounds like a great idea to post as an RFC or an enhancement so that others can discuss outside this announcement about the rsync server shutting down.

@Kelvinrr
Copy link
Collaborator

@cfassett #5042 should add the ability to pass in arbitrary rclone flags to append to the standard flags. This includes filter flags and such.

@KrisBecker
Copy link
Contributor

How will the ISIS kernel DB files be maintained, mainly for CK and SPK kernel data sets, as some kernel sources may be updated?

@lwellerastro
Copy link
Contributor

@jlaura, @jessemapel, will Astro internal users be affected by the isisdist server being shut down or is this purely an external thing? I'm in the dark in regard to the these changes and thought it was best to ask. Thanks.

@jessemapel
Copy link
Contributor

This shouldn't have an impact on internal folks. We will still have the data on systems in the building. We're also working on mountable cloud buckets and systems that have an "internal" setup of the data area for us to use.

@kbowley-asu
Copy link

kbowley-asu commented Sep 14, 2022

I'm working on trying to update our nightly jobs that sync data from using the rsync server to using the new rclone based system, and noticed that symlinks are now being transferred as complete files, which ends up using more space. And specific example of this is base/dems/. The version on the rsync server takes up about 7G, but the expanded version when using downloadIsisData takes up 20G.

@kbowley-asu
Copy link

I've also noticed that some of the datasets will transfer everything ever time you run downloadIsisData rather than just grabbing updates. This is a TON of wasted time and bandwidth. We had a nightly cronjob that rsynced the data areas that only pulled over new data, but the new method takes most of the day to retransfer data that has already been downloaded.

@Kelvinrr
Copy link
Collaborator

@kbowley-asu It shouldn't redownload, which mission is being redownloaded? Maybe there is an issue where timestamps are being updated despite data not chaining.

@kbowley-asu
Copy link

@Kelvinrr one example I've noticed is cassini. Now that I'm looking at it, it appears to be any mission that grabs data from two locations. I'm wondering if this is related to issue #5024

@Kelvinrr
Copy link
Collaborator

@kbowley-asu trying adding the --update flag, might need to specify the behavior explicitly to rclone. Even with redundancies between sources (aware of it, need to remove some of the redundancy), if the files are identical or the timestamp is older it should not re-download.

@kbowley-asu
Copy link

@Kelvinrr that does seem to help when I told it to sync cassini, but when doing an 'all' run, it looks like it's working on syncing over 30G of messenger data. I'll do a couple runs with the --update flag and see if it get's any better.

@Kelvinrr
Copy link
Collaborator

@kbowley-asu alright, if it ends up helping we make that flag on by default.

@kbowley-asu
Copy link

I'm currently testing taking out the --progress option so it can be tossed in to a daily cronjob without generating useless output, and removing the --track-rename that does nothing but generate error messages since it's not valid with a local filesystem.

@Kelvinrr
Copy link
Collaborator

+1 on the --progress flag being removed by default. It's nice when on a local terminal but we have a similar issue with our crons. Since arbitrary rclone kwargs can be passed in now, it doesn't need to be on by default. I can include that in whatever PR comes out out of figuring out these redundant data downloads.

@lwellerastro
Copy link
Contributor

This shouldn't have an impact on internal folks. We will still have the data on systems in the building. We're also working on mountable cloud buckets and systems that have an "internal" setup of the data area for us to use.

This is having an impact on internal users and therefore a ripple effect to external users as well. See #5056, #5053, #5054 and now closed #5049. It seems that either files and directories we intended to migrate to the cloud did not make it (and therefore did not populate the new internal data area tied to the cloud), or we maybe don't have a complete inventory of what the ASC has specially supplied users in the way of iak files, pck's (that sometimes are updated and associated with smithed kerenels we generate) and tspks, etc. (as @KrisBecker has explained here and in other posts) and are inadvertently allowing files to be overwritten - I simply don't know.

I am by no means an expert in spice and have failed to understand the importance of this migration and how it may affect my work, but now that the local data area is in sync with what is stored on the cloud, I'm feeling it's impact. There is a disconnect between the current method of supplying kernels and what had been supplied that needs to be evaluated and improved. I'm unclear on why we can't continue to use what was in the old area. I'm also wondering how app tests are passing with some of the changes that have made.

Concerned User.

@swalterfub
Copy link

After the transition to use downloadIsisData (ISIS 7.1.0) we got a problem with spiceinit on CTX data. The error message was:
**ERROR** No existing files found with a numerical version matching [pck?????.tpc] in [/mnt/IsisData/mro/kernels/pck].
The mro/kernels/pck directory contained only the db file and not the proper pck00008.tpc kernel (also not aareadme.txt).
We managed to correctly synchronize the directory using the rclone command as described at the very bottom of the Readme, though.

@github-actions
Copy link

Thank you for your contribution!

Unfortunately, this issue hasn't received much attention lately, so it is labeled as 'stale.'

If no additional action is taken, this issue will be automatically closed in 180 days.

@github-actions github-actions bot added the inactive Issue that has been inactive for at least 6 months label Jun 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
inactive Issue that has been inactive for at least 6 months
Projects
None yet
Development

No branches or pull requests

8 participants