Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

change tracking update from nightly job to weekly #4345

Closed
FuhuXia opened this issue Jun 8, 2023 · 4 comments
Closed

change tracking update from nightly job to weekly #4345

FuhuXia opened this issue Jun 8, 2023 · 4 comments
Assignees
Labels
bug Software defect or bug

Comments

@FuhuXia
Copy link
Member

FuhuXia commented Jun 8, 2023

Stats shows catalog visits to /_tracking in June doubles in number compared to April stats. The count of unique dataset page visits is four times more. Together with recent questionable Solr performance, they makes the nightly tracking update job takes too long to finish. Give the fact the page visit stats is not critical to be processes nightly, we can change to a weekly job.

Sketch

  1. tracking-update is supposed to run nightly according to CKAN core. We need to change to default behavior in ckanext-geodatagov so it handles weekly data.
  2. Change GH action cron to weekly.
@FuhuXia FuhuXia added the bug Software defect or bug label Jun 8, 2023
@FuhuXia
Copy link
Member Author

FuhuXia commented Jun 8, 2023

Stats were collected 3 days in April (10, 11, 12) and June (05, 06, 07).
Newrelic shows hits to /_tracking are 394K in April and 762K in June.
DB query shows 65K unique dataset visited in April and 256K in June.

--sample query
SELECT count(*) FROM tracking_summary
where package_id!='~~not~found~~'
and tracking_date in ('2023-06-05', '2023-06-06', '2023-06-07')

@hkdctol hkdctol moved this to 📟 Sprint Backlog [7] in data.gov team board Jun 8, 2023
@Jin-Sun-tts Jin-Sun-tts self-assigned this Jun 12, 2023
@Jin-Sun-tts Jin-Sun-tts moved this from 📟 Sprint Backlog [7] to 🏗 In Progress [8] in data.gov team board Jun 12, 2023
@Jin-Sun-tts
Copy link
Contributor

Jin-Sun-tts commented Jun 13, 2023

catalog cron job change: GSA/catalog.data.gov#962

@Jin-Sun-tts Jin-Sun-tts moved this from 🏗 In Progress [8] to ✔ Done in data.gov team board Jun 14, 2023
@hkdctol hkdctol moved this from ✔ Done to 🗄 Closed in data.gov team board Jun 22, 2023
@nickumia-reisys
Copy link
Contributor

Random follow-on notes: We suspect this issue is caused by bot tracking inflating the /tracking route. I thought a ticket was made, but apparently not. If we limit bot traffic to the catalog instance and this action should be able to be run nightly again.

@robert-bryson robert-bryson mentioned this issue Sep 5, 2023
10 tasks
@FuhuXia
Copy link
Member Author

FuhuXia commented Sep 6, 2023

Random follow-on notes: We suspect this issue is caused by bot tracking inflating the /tracking route. I thought a ticket was made, but apparently not. If we limit bot traffic to the catalog instance and this action should be able to be run nightly again.

Not to limit bot traffic, but to exclude bot traffic from tracking count. We need to use above three days data, April (10, 11, 12) and June (05, 06, 07), combined with CloudWatch log data (containing user-agent info), to see the effect of excluding bot traffic to the tracking count.

@hkdctol hkdctol mentioned this issue Sep 7, 2023
10 tasks
@hkdctol hkdctol mentioned this issue Sep 15, 2023
10 tasks
This was referenced Sep 25, 2023
This was referenced Oct 6, 2023
@Jin-Sun-tts Jin-Sun-tts mentioned this issue Oct 20, 2023
10 tasks
@hkdctol hkdctol mentioned this issue Oct 27, 2023
10 tasks
@btylerburton btylerburton mentioned this issue Nov 6, 2023
10 tasks
This was referenced Nov 9, 2023
@hkdctol hkdctol mentioned this issue Nov 22, 2023
10 tasks
@hkdctol hkdctol mentioned this issue Dec 1, 2023
10 tasks
@rshewitt rshewitt mentioned this issue Dec 11, 2023
11 tasks
@hkdctol hkdctol mentioned this issue Dec 15, 2023
12 tasks
This was referenced Jun 28, 2024
This was referenced Jul 14, 2024
@hkdctol hkdctol mentioned this issue Jul 24, 2024
14 tasks
@Jin-Sun-tts Jin-Sun-tts mentioned this issue Aug 5, 2024
14 tasks
@FuhuXia FuhuXia mentioned this issue Aug 12, 2024
14 tasks
@btylerburton btylerburton mentioned this issue Aug 19, 2024
14 tasks
@FuhuXia FuhuXia mentioned this issue Aug 26, 2024
14 tasks
@Jin-Sun-tts Jin-Sun-tts mentioned this issue Sep 3, 2024
14 tasks
@github-project-automation github-project-automation bot moved this from 🗄 Closed to ✔ Done in data.gov team board Sep 3, 2024
@btylerburton btylerburton moved this from ✔ Done to 🗄 Closed in data.gov team board Sep 3, 2024
@rshewitt rshewitt mentioned this issue Sep 9, 2024
14 tasks
@hkdctol hkdctol mentioned this issue Sep 13, 2024
14 tasks
@FuhuXia FuhuXia mentioned this issue Sep 23, 2024
14 tasks
@Jin-Sun-tts Jin-Sun-tts mentioned this issue Sep 30, 2024
14 tasks
@rshewitt rshewitt mentioned this issue Oct 7, 2024
14 tasks
@btylerburton btylerburton mentioned this issue Oct 15, 2024
14 tasks
@hkdctol hkdctol mentioned this issue Oct 18, 2024
14 tasks
@hkdctol hkdctol mentioned this issue Oct 25, 2024
14 tasks
@rshewitt rshewitt mentioned this issue Nov 4, 2024
14 tasks
@hkdctol hkdctol mentioned this issue Nov 11, 2024
14 tasks
@FuhuXia FuhuXia mentioned this issue Nov 18, 2024
14 tasks
@hkdctol hkdctol mentioned this issue Nov 22, 2024
14 tasks
@BFrost313 BFrost313 mentioned this issue Dec 1, 2024
14 tasks
@rshewitt rshewitt mentioned this issue Dec 2, 2024
14 tasks
@btylerburton btylerburton mentioned this issue Dec 7, 2024
14 tasks
@FuhuXia FuhuXia mentioned this issue Dec 16, 2024
14 tasks
@Jin-Sun-tts Jin-Sun-tts mentioned this issue Dec 23, 2024
14 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Software defect or bug
Projects
Archived in project
Development

No branches or pull requests

4 participants