Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test metadig-engine on k8s against a hashstore #453

Open
5 tasks
jeanetteclark opened this issue Oct 2, 2024 · 2 comments
Open
5 tasks

test metadig-engine on k8s against a hashstore #453

jeanetteclark opened this issue Oct 2, 2024 · 2 comments
Assignees

Comments

@jeanetteclark
Copy link
Collaborator

jeanetteclark commented Oct 2, 2024

Testing locally has gone well but it would be nice to test the engine against a hashstore on the dev cluster

to that end I've mounted the tdg subvolume on metadig-worker, and that subvolume was mounted on dev.nceas where there is a hashstore metacat running. See helm/metadig-worker/pv.yaml and helm/metadig-worker/pvc.yaml for details on the existing mounts.

In order to actually test though the following steps are needed:

  • copy the contents of metacat/hashstore to /mnt/tdg-repos/dev via parallel Rsync
  • symlink that hashstore to metacat
  • update the metacat.properties store.store_path field to be var/data/respos/dev/hashstore
  • deploy metadig-engine to the test cluster
  • submit datasets to dev.nceas (via metacatUI or any other client)
@doulikecookiedough doulikecookiedough self-assigned this Oct 8, 2024
@doulikecookiedough
Copy link

doulikecookiedough commented Oct 11, 2024

Update:

The rsync + parallel process to copy the contents of /var/metacat/hashstore to /mnt/tdg-repos/dev/metacat/hashstore has been completed.

  • The re-sync process takes approximately 25 minutes to complete with just the first level folders/items in the /var/metacat/hashstore folder.
    • I am re-running the process with individual rsync commands for each file in the mean time to see if it's any faster

Next Steps:

  • Sync up with Jing to coordinate metacat switchover (and symlinking new directory)

To Do List:

  • copy the contents of metacat/hashstore to /mnt/tdg-repos/dev via parallel Rsync
  • symlink that hashstore to metacat
  • update the metacat.properties store.store_path field to be /mnt/tdg-repos/dev/metacat/hashstore
    • Note: This does not have to be done since we created a symlink
  • deploy metadig-engine to the test cluster
  • submit datasets to dev.nceas (via metacatUI or any other client)

For reference:

# How to produce a text file with just the first level of hashstore folders to rsync
mok@dev:~/testing$ sudo find /var/metacat/hashstore -mindepth 1 -maxdepth 1 > mc_hs_dir_list.txt
mok@dev:~/testing$ cat mc_hs_dir_list.txt
/var/metacat/hashstore/objects
/var/metacat/hashstore/metadata
/var/metacat/hashstore/refs
/var/metacat/hashstore/hashstore.yaml

# How to use rsync with a list of folders
mok@dev:~/testing$ cat mc_hs_dir_list.txt | parallel --eta sudo rsync -aHAX {} /mnt/tdg-repos/dev/metacat/hashstore/
# First get the list of files found under `/hashstore`
mok@dev:~/testing$ sudo find /var/metacat/hashstore -type f -printf '%P\n' > mc_obj_list.txt

# How to feed a single command at a time for a file to rsync
# The /./ between `metacat` and `hashstore` instructs rsync to copie folders from hashstore (and omits the previous directories) into the desired folder
mok@dev:~/testing$ parallel --eta sudo rsync -aHAXR /var/metacat/./hashstore/{} /mnt/tdg-repos/dev/metacat :::: mc_obj_list.txt
  • Note: Not defining the amount of cores will default rsync to determine its own limit (which when undefined, went to the max # of cores (ex. 44). When I added -j 30 it was limited to 30.)

@doulikecookiedough
Copy link

doulikecookiedough commented Oct 14, 2024

Metacat on dev.nceas.ucsb.edu has been moved over to write to the ceph fs mount point - a symlink has been created between /var/metacat/hashstore and /mnt/tdg-repos/dev/metacat/hashstore.

  • Note: We initially ran into a read-only file system issue that was caused due to how tomcat set-up its access control rules (the actual path to write above needed to be added to its configuration settings).

rsync was re-ran and the process to sync with a list of direct subfolders after /var/metacat/hashstore was the fastest. I tested with feeding rsync individual commands (ex. via :::: list_of_files.txt) but this seemed to be very slow. The re-sync process took approximately 5 minutes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Progress
Development

No branches or pull requests

2 participants