-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stuck Harvest - NOAA causing delays in the pipeline #917
Comments
NOAA caused delays in the harvesting pipeline on 4/8: source_id: c084a438-6f6b-470d-93e0-16aeddb9f513 | created_time: 2023-04-06 14:31:27.072752 | current_time: 2023-04-08 06:38:49.505180+00:00 | gather_started: 2023-04-06 14:31:27.420864 | gather_finished: 2023-04-06 14:33:50.501971 | running_length: 1 day, 16:07:22.432428 | source_title: NOAA/NESDIS/ncei/accessions | organization: National Oceanic and Atmospheric Administration, Department of Commerce
source_id: 8f77b6d5-f630-4995-bdf3-0aee7158a7f3 | created_time: 2023-04-07 05:54:47.763399 | current_time: 2023-04-08 06:38:49.505180+00:00 | gather_started: 2023-04-07 07:14:21.686083 | gather_finished: 2023-04-07 07:14:23.195498 | running_length: 1 day, 0:44:01.741781 | source_title: Alaska Division of Geological and Geophysical Surveys | organization: State of Alaska
source_id: 8507fa43-f429-4095-b732-2177330ce485 | created_time: 2023-04-07 05:54:46.680944 | current_time: 2023-04-08 06:38:49.505180+00:00 | gather_started: 2023-04-07 07:14:06.720925 | gather_finished: 2023-04-07 07:14:19.763617 | running_length: 1 day, 0:44:02.824236 | source_title: SFO JSON | organization: City of San Francisco
source_id: f35df04a-a619-4f92-bf5c-b9915b083bb1 | created_time: 2023-04-07 05:54:47.977264 | current_time: 2023-04-08 06:38:49.505180+00:00 | gather_started: 2023-04-07 07:14:23.220864 | gather_finished: 2023-04-07 07:14:27.803802 | running_length: 1 day, 0:44:01.527916 | source_title: Alaska Department of Natural Resources, IRM | organization: State of Alaska
source_id: 7590e386-229e-453a-8e53-6f18e200e421 | created_time: 2023-04-07 05:54:46.104598 | current_time: 2023-04-08 06:38:49.505180+00:00 | gather_started: 2023-04-07 07:13:52.071500 | gather_finished: 2023-04-07 07:14:06.253432 | running_length: 1 day, 0:44:03.400582 | source_title: Chicago JSON | organization: City of Chicago
source_id: ee428166-33c7-4eef-aee8-66156e0e9e08 | created_time: 2023-04-07 05:54:18.179824 | current_time: 2023-04-08 06:38:49.505180+00:00 | gather_started: 2023-04-07 06:03:29.676371 | gather_finished: 2023-04-07 06:06:19.979303 | running_length: 1 day, 0:44:31.325356 | source_title: NGDC Paleo | organization: National Oceanic and Atmospheric Administration, Department of Commerce As of yesterday, NOAA was the only one still "stuck" source_id: c084a438-6f6b-470d-93e0-16aeddb9f513 | created_time: 2023-04-06 14:31:27.072752 | current_time: 2023-04-09 06:38:39.540218+00:00 | gather_started: 2023-04-06 14:31:27.420864 | gather_finished: 2023-04-06 14:33:50.501971 | running_length: 2 days, 16:07:12.467466 | source_title: NOAA/NESDIS/ncei/accessions | organization: National Oceanic and Atmospheric Administration, Department of Commerce @FuhuXia I really don't like that the harvesting takes 72 hours to force completion, can we make it 48 hours? |
Note for future self: the logic for the system timing out is here and these functions here. Those last functions are key. It first tries to see "has there been any movement on any of the harvest objects within the timeout limit" (in our current case, within the last 72 hours), if so don't force timeout. If there are no objects that have been processed, then it tries to go off the job info (when was the gather done). So our timeout should be longer than the following:
I think 48 hours is probably overly safe, but please correct me if I'm wrong or missing something @FuhuXia . I would advise 24 hours, but I know @FuhuXia noted some problems with that implementation in practice. |
Related to |
Workflow with Issue: 4 - Automated CKAN Jobs
Job Failed: ckan-auto-command
CKAN Command (in question): ckan geodatagov check-stuck-jobs
CKAN Command Schedule: 30 6 * * *
Cloud.gov Environment: prod
Last Commit: 25b56a2
Number of times run: 1
Last run by: btylerburton
Github Action Run: https://github.com/GSA/catalog.data.gov/actions/runs/4649204023
The text was updated successfully, but these errors were encountered: