Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

O+M 2022-07-14 #4382

Closed
10 tasks
hkdctol opened this issue Jul 7, 2023 · 5 comments
Closed
10 tasks

O+M 2022-07-14 #4382

hkdctol opened this issue Jul 7, 2023 · 5 comments
Assignees
Labels
O&M Operations and maintenance tasks for the Data.gov platform

Comments

@hkdctol
Copy link
Contributor

hkdctol commented Jul 7, 2023

As part of day-to-day operation of Data.gov, there are many Operation and Maintenance (O&M) responsibilities. Instead of having the entire team watching notifications and risking some notifications slipping through the cracks, we have created an O&M Triage role. One person on the team is assigned the Triage role which rotates each sprint. This is not meant to be a 24/7 responsibility, only East Coast business hours. If you are unavailable, please note when you will be unavailable in Slack and ask for someone to take on the role for that time.

Check the O&M Rotation Schedule for future planning.

Miscs

Acceptance criteria

You are responsible for all O&M responsibilities this week. We've highlighted a few so they're not forgotten. You can copy each checklist into your daily report.

Daily Checklist

Check Production State/Actions

Note: Catalog Auto Tasks
You will need to update the chart values manually. Click the Action link in each issue and grab the values from monitor task output and check runtime.

Weekly Checklist

@hkdctol hkdctol moved this to 📟 Sprint Backlog [7] in data.gov team board Jul 7, 2023
@hkdctol hkdctol added the O&M Operations and maintenance tasks for the Data.gov platform label Jul 7, 2023
@Jin-Sun-tts Jin-Sun-tts moved this from 📟 Sprint Backlog [7] to 🏗 In Progress [8] in data.gov team board Jul 10, 2023
@Jin-Sun-tts
Copy link
Contributor

Jin-Sun-tts commented Jul 10, 2023

Monday 07/10

https://github.com/GSA/data.gov/

Image

Check Catalog Auto Tasks

  • DB-Solr Sync:
    1 packages need to be removed from Solr
    1166 packages need to be updated/added to Solr
    3603 packages without harvest_object need to be mannually deleted
    Finished 893s

  • Tracking Update: 280511 package indexes to be rebuilt starting from 2023-06-30 00:00:00, and did not finished without 6 hours.

Check Harvesting Emails

  • Catalog:
  • no harvesting job reported yet at 1pm EST

Other

catalog prod were down for about 30 min on Sunday (07/09) morning:

Created: Jul 9, 2023 10:37am
Issue closed on Jul 9, 2023 11:06am
Duration: 29m

checked NewRelic log, there were 50 '409' errors from July 9th 10:27am - 10:36am

@Jin-Sun-tts
Copy link
Contributor

Tuesday 07/11

https://github.com/GSA/data.gov/

Check Catalog Auto Tasks

  • DB-Solr Sync:
    0 packages need to be removed from Solr
    100 packages need to be updated/added to Solr
    123 packages without harvest_object need to be mannually deleted
    Finished 505s

Check Harvesting Emails

Other

@Jin-Sun-tts
Copy link
Contributor

Jin-Sun-tts commented Jul 12, 2023

Wednesday 07/12

https://github.com/GSA/data.gov/

Check Catalog Auto Tasks

  • DB-Solr Sync:
    0 packages need to be removed from Solr
    100 packages need to be updated/added to Solr
    123 packages without harvest_object need to be mannually deleted
    Finished 495s

Looks like above packages were not updated, rerun the process manually and the packaged are updated in Solr.

Check Harvesting Emails

Other

Checked the catalog web, it runs successfully without problem, but saw some INFO/ERROR in the log like below:

  • Checked the log for catalog-web, there are about 150 INFO/errors in 30min,
    INFO [ckan.config.middleware.flask_app] 500 Internal Server Error: The server encountered an internal error and was unable to complete your request. Either the server is overloaded or there is an error in the application.

  • There were about 200 search error in 30min like below:
    ERROR [ckan.views.dataset] Dataset search error: ('Wrong bounding box provided',)

@Jin-Sun-tts
Copy link
Contributor

Thursday 07/13

https://github.com/GSA/data.gov/

Check Catalog Auto Tasks

  • DB-Solr Sync:
    0 packages need to be removed from Solr
    0 packages need to be updated/added to Solr
    123 packages without harvest_object need to be mannually deleted
    Finished 486s

Check Harvesting Emails

  • Catalog:
  • 9 harvesting job errors reported at 6:13am EST, some errors like below:
    • Transformation to ISO failed
    • Identifier: 246; Title: Mozambique Compact - Road Rehabilitation & Construction; 1 Error(s) Found. ### ERROR Build Failing - Error importing PyZ3950 #1: 'keyword' is a required property.
    • Identifier: 75; Title: Tanzania Threshold - Governance and Anticorruption; 1 Error(s) Found. ### ERROR Build Failing - Error importing PyZ3950 #1: 'keyword' is a required property.
    • Parent identifier not found: "NCHS - Natality Measures for Females by Race and Hispanic Origin: United States"
    • Identifier: https://data.cdc.gov/api/views/n8mc-b4w4; Title: COVID-19 Case Surveillance Public Use Data with Geography; 1 Error(s) Found. ### ERROR Build Failing - Error importing PyZ3950 #1: 'accrualPeriodicity':'Monthly' is not valid under any of the given schemas.
    • Parent identifier not found: "NCHS - Births and General Fertility Rates: United States"

Other

@Jin-Sun-tts
Copy link
Contributor

Friday 07/14

https://github.com/GSA/data.gov/

Check Catalog Auto Tasks

  • DB-Solr Sync:
    0 packages need to be removed from Solr
    100 packages need to be updated/added to Solr
    123 packages without harvest_object need to be mannually deleted
    Finished 489s

Check Harvesting Emails

  • Catalog:
    61 harvesting job errors reported at 10:11am EST, some errors like below:
  • many report like:
    • No records to change
  • others:
    • Error loading json content: not enough values to unpack (expected 2, got 0).
    • JSONDecodeError loading json. Expecting value: line 2 column 1 (char 1)
    • Object 1adeda46-9a60-43ee-9498-2c9344825de2 already has this guid gov.noaa.ncei:SeaSurfaceTemperature
    • Transformation to ISO failed

Other

Weekly Checklist

@Jin-Sun-tts Jin-Sun-tts moved this from 🏗 In Progress [8] to ✔ Done in data.gov team board Jul 17, 2023
@hkdctol hkdctol closed this as completed Jul 20, 2023
@hkdctol hkdctol moved this from ✔ Done to 🗄 Closed in data.gov team board Jul 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
O&M Operations and maintenance tasks for the Data.gov platform
Projects
Archived in project
Development

No branches or pull requests

2 participants