Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix PG foreign key error on deleting devices #232

Merged
merged 2 commits into from
May 18, 2023

Conversation

timcowlishaw
Copy link
Contributor

This fixes #214 - deleting devices failed since we moved the postprocessing data to a seperate table, as the foreign key constraint prevented the postprocessing table entry being orphaned.

The solution is to make this relationship depenedent: destroy - when a device is removed, we remove its postprocessing info too.

I've also added a test to guard against regressions here and make the behaviour we expect explicit.

@timcowlishaw timcowlishaw requested a review from oscgonfer April 19, 2023 10:00
Copy link
Contributor

@oscgonfer oscgonfer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, do we know what will happen with current devices that are "deleted" but that have raised an error because of the postprocessing key?

@timcowlishaw
Copy link
Contributor Author

Thinking about this some more, I'm pretty certain that the logic in the sidekiq job isn't quite right - it deletes archived if they were created more than 24 hours ago, rather than if they were archived more than 24 hours ago. (so devices which are more than 24 hours old will be deleted more or less immediately when archived)

in this respect, the current bug is kinda obscuring this more fundamental one - devices are never being deleted due to the error, rather than being deleted too early.

Therefore, I think i need to add a column to the devices model to store an archived_at date and populate it when it is archived then use that date in the test in the sidekiq job. For devices that are already archived, i'll set the date to 'now', so that they get a 24 hour grace period after deploy - that way we can be sure nothing will be deleted before the end of the grace period.

@oscgonfer
Copy link
Contributor

@viktorsmari pinging you here to check briefly if you can kindly give us an input. Are we missing something?

@viktorsmari
Copy link
Collaborator

viktorsmari commented May 10, 2023

Hi!
Been a while since I looked at this code, but yes it's weird it's comparing the created_at < 24.hours.ago - I don't remember why that is 🤔
Maybe we only wanted to delete devices that were at least 1 day old? In case some new devices would be archived by accident for some reason? 🤔

Are you looking at the log/checkups.log to double check if it's trying to delete the correct devices?

if device.created_at < 24.hours.ago
CheckupNotifyJob.perform_now("deleting archived device #{device.id}")

checkups_log = Logger.new('log/checkups.log', 2, 10.megabytes)

Maybe that can help debugging 🤓

@oscgonfer
Copy link
Contributor

oscgonfer commented May 11, 2023

@viktorsmari Thanks so much for the input!

Checking the logs as suggested:

I, [2023-03-22T04:07:32.773257 #1]  INFO -- : Delete archived devices
I, [2023-03-22T04:07:32.804637 #1]  INFO -- : deleting archived device 15581
I, [2023-03-22T05:54:37.588205 #1]  INFO -- : Delete archived devices
I, [2023-03-22T05:54:37.617792 #1]  INFO -- : deleting archived device 15581
I, [2023-03-22T06:30:14.011402 #1]  INFO -- : Delete archived devices
I, [2023-03-22T06:30:14.066491 #1]  INFO -- : deleting archived device 15581
I, [2023-03-22T06:47:32.302801 #1]  INFO -- : Delete archived devices
I, [2023-03-22T06:47:32.335891 #1]  INFO -- : deleting archived device 15581
I, [2023-03-22T09:33:08.410407 #1]  INFO -- : Delete archived devices
I, [2023-03-22T09:33:08.414322 #1]  INFO -- : Delete archived users
I, [2023-03-22T09:33:08.599846 #1]  INFO -- : deleting archived device 16135
I, [2023-03-22T09:33:08.649107 #1]  INFO -- : deleting archived device 15581
I, [2023-03-22T09:33:10.452873 #1]  INFO -- : Delete archived devices
I, [2023-03-22T09:33:10.457849 #1]  INFO -- : Delete archived users
I, [2023-03-22T09:33:10.607537 #1]  INFO -- : deleting archived device 15581
I, [2023-03-22T09:33:28.183381 #1]  INFO -- : Delete archived devices
I, [2023-03-22T09:33:28.213820 #1]  INFO -- : deleting archived device 15581
I, [2023-03-22T09:33:35.139554 #1]  INFO -- : Delete archived devices
I, [2023-03-22T09:33:35.184532 #1]  INFO -- : deleting archived device 15581
I, [2023-03-22T09:34:22.532129 #1]  INFO -- : Delete archived devices
I, [2023-03-22T09:34:22.567927 #1]  INFO -- : deleting archived device 15581
I, [2023-03-22T09:34:42.571098 #1]  INFO -- : Delete archived devices
I, [2023-03-22T09:34:42.594598 #1]  INFO -- : deleting archived device 15581
I, [2023-03-22T09:35:39.856861 #1]  INFO -- : Delete archived devices
I, [2023-03-22T09:35:39.935590 #1]  INFO -- : deleting archived device 15581

Looking at the sentry logs (obviously many as each device is retried), we see the 15581 retries:

{"class":"ActiveJob::QueueAdapters::SidekiqAdapter::JobWrapper","wrapped":"DeleteArchivedDevicesJob","queue":"default","args":[{"job_class":"DeleteArchivedDevicesJob","job_id":"68976306-95b7-40b3-8b72-2b93b27888db","provider_job_id":null,"queue_name":"default","priority":null,"arguments":[],"executions":0,"exception_executions":{},"locale":"en","timezone":"UTC","enqueued_at":"2023-05-10T14:01:50Z"}],"retry":true,"jid":"f5520304b83da37f000141c3","created_at":1683727310.4135225,"enqueued_at":1683789669.2500598,"error_message":"PG::ForeignKeyViolation: ERROR:  update or delete on table \"devices\" violates foreign key constraint \"fk_rails_f48c216ac2\" on table \"postprocessings\"\nDETAIL:  Key (id)=(15581) is still referenced from table \"postprocessings\".\n","error_class":"ActiveRecord::InvalidForeignKey","failed_at":1683727310.5678108,"retry_count":12,"retried_at":1683768814.1019285}

And finally looking at the devices in the Archive:

Device.unscoped.where(workflow_state: "archived")

For instance device 15581:

#<Device id: 15581, owner_id: 8401, name: "I-CHANGE SAN TEODORO", description: "Smart Citizen Kit 2.1 with Urban Sensor Board", mac_address: nil, latitude: 44.4185649, longitude: 8.9147217, created_at: "2022-08-08 15:01:31", updated_at: "2022-09-01 10:21:14", kit_id: 37, latest_data: nil, geohash: "spykdy5sbh", last_recorded_at: "2022-09-01 10:21:13", meta: {"exposure"=>"outdoor"}, location: {"city"=>"Genoa", "address"=>"Largo San Francesco da Paola, San Teodoro, Centro Ovest, Genoa, Liguria, 16134, Italy", "state_code"=>"Liguria", "state_name"=>"Liguria", "postal_code"=>"16134", "country_code"=>"it"}, data: {""=>"2022-09-01T10:21:13Z", "10"=>98.0, "14"=>29.0, "53"=>58.26, "55"=>26.96, "56"=>63.88, "58"=>100.21, "87"=>18.0, "88"=>31.0, "89"=>11.0, "112"=>0.0, "113"=>0.0, "133"=>0.244875, "134"=>0.244625, "135"=>0.398219, "136"=>0.401375, "138"=>0.282187, "139"=>0.27725, "140"=>0.329547, "141"=>12.287812, "153"=>3.969987446211956, "157"=>12.294759423597302, "10_raw"=>98.0, "14_raw"=>29.0, "53_raw"=>58.26, "55_raw"=>26.96, "56_raw"=>63.88, "58_raw"=>100.21, "87_raw"=>18.0, "88_raw"=>31.0, "89_raw"=>11.0, "112_raw"=>0.0, "113_raw"=>0.0, "133_raw"=>0.244875, "134_raw"=>0.244625, "135_raw"=>0.398219, "136_raw"=>0.401375, "138_raw"=>0.282187, "139_raw"=>0.27725, "140_raw"=>0.329547, "141_raw"=>12.287812, "153_raw"=>3.969987446211956, "157_raw"=>12.294759423597302}, old_data: {""=>"2022-09-01T10:21:13Z", "10"=>98.0, "14"=>29.0, "53"=>58.26, "55"=>26.96, "56"=>63.88, "58"=>100.21, "87"=>18.0, "88"=>31.0, "89"=>11.0, "112"=>0.0, "113"=>0.0, "133"=>0.244875, "134"=>0.244625, "135"=>0.398219, "136"=>0.401375, "138"=>0.282187, "139"=>0.27725, "140"=>0.329547, "141"=>12.287812, "153"=>3.969987446211956, "157"=>12.294759423597302, "10_raw"=>98.0, "14_raw"=>29.0, "53_raw"=>58.26, "55_raw"=>26.96, "56_raw"=>63.88, "58_raw"=>100.21, "87_raw"=>18.0, "88_raw"=>31.0, "89_raw"=>11.0, "112_raw"=>0.0, "113_raw"=>0.0, "133_raw"=>0.244875, "134_raw"=>0.244625, "135_raw"=>0.398219, "136_raw"=>0.401375, "138_raw"=>0.282187, "139_raw"=>0.27725, "140_raw"=>0.329547, "141_raw"=>12.287812, "153_raw"=>3.969987446211956, "157_raw"=>12.294759423597302}, owner_username: nil, uuid: "0eef8c37-929c-4d72-a82b-4c0623005316", migration_data: nil, workflow_state: "archived", csv_export_requested_at: "2023-02-24 11:20:18", old_mac_address: nil, state: "has_published", device_token: "npfnrh", hardware_info: {"id"=>"9EFA05F35055344E312E3120FF0C132A", "mac"=>"*****", "time"=>"2022-09-01T10:21:13Z", "esp_bd"=>"2022-04-05T16:06:52Z", "hw_ver"=>"2.1", "sam_bd"=>"2022-04-05T16:06:19Z", "esp_ver"=>"0.9.8-54e49bf", "sam_ver"=>"0.9.8-54e49bf"}, notify_stopped_publishing_timestamp: "2019-01-21 16:07:41", notify_low_battery_timestamp: "2019-01-21 16:07:41", notify_low_battery: false, notify_stopped_publishing: false, is_private: false, is_test: false>,

But nothing, nowhere, for the 16135 device that appears in the log, nor in the Device or the DeviceArchive...

Conclusion: I agree with @tim that the way to go would be to add a column named archived_at, but maybe we are still missing something.

@viktorsmari
Copy link
Collaborator

Is the problem only happening with one device, id 15581?
Did it successfully delete device 16135?

I, [2023-03-22T09:33:08.599846 #1] INFO -- : deleting archived device 16135 <-- I only see this 1x in the log, so it was deleted?
I, [2023-03-22T09:33:08.649107 #1] INFO -- : deleting archived device 15581

@oscgonfer
Copy link
Contributor

oscgonfer commented May 11, 2023

Is the problem only happening with one device, id 15581? Did it successfully delete device 16135?

I, [2023-03-22T09:33:08.599846 #1] INFO -- : deleting archived device 16135 <-- I only see this 1x in the log, so it was deleted? I, [2023-03-22T09:33:08.649107 #1] INFO -- : deleting archived device 15581

It's not on the Devices or in the DevicesArchive, so yes (?)

Is the problem only happening with one device, id 15581?

That's the one that appears most times in the log, but there are other occurrences.

This fixes #214 - deleting devices failed since we moved the postprocessing data to a seperate table, as the foreign key constraint prevented the postprocessing table entry being orphaned.

The solution is to make this relationship `depenedent: destroy` - when a device is removed, we remove its postprocessing info too.

I've also added a test to guard against regressions here and make the behaviour we expect explicit.
This was a bug that was hidden by the existing FK error - when deleting archived devices, we check the date they were created, not archived. This has the result that devices *created* more than 24 hours ago don't have the 24 hour grace period and are deleted immediately.

To fix this, we now store an archived_at date on the device, which is populated on archive, and check this in the deletion job.
@timcowlishaw timcowlishaw force-pushed the bugfix/214-delete-devices-raises-fk-error branch from a36429b to 69e7682 Compare May 12, 2023 08:29
@timcowlishaw
Copy link
Contributor Author

Hey @oscgonfer I've added an archived_at column and logic to populate it, as discussed!

@timcowlishaw timcowlishaw requested a review from oscgonfer May 12, 2023 08:30
@oscgonfer
Copy link
Contributor

Thanks @timcowlishaw
I'll test in staging and merge if all good.

@oscgonfer
Copy link
Contributor

oscgonfer commented May 18, 2023

Hi!

Tested in staging with success! Two devices, one with postprocessing and another with postprocessing=null.
It triggered after the DeleteDevices Sidekiq task was queued (once a day apparently).

@timcowlishaw I would suggest to document this in https://github.com/fablabbcn/smartcitizen-api/pull/232#top in a quick .md file and I will add it to the developer.smartcitizen.me docs.

Anyway, merging it!

@oscgonfer oscgonfer merged commit 055665a into master May 18, 2023
Copy link
Contributor

@oscgonfer oscgonfer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merging!

@timcowlishaw timcowlishaw deleted the bugfix/214-delete-devices-raises-fk-error branch July 4, 2024 08:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Delete devices job breaks after the postprocessings integration
3 participants