-
Notifications
You must be signed in to change notification settings - Fork 4
Images being repeatedly sideloaded #18
Comments
Let me know what you find out, and I can start doing some inspection on my end. |
here's an example -- from NPR Story ID 1087462333 on March 19 I have 3 identical versions of image 0b6a9874_custom-a4f9308ca707a02ec5dcd6c9d1a3a032525f9a8d.jpg and I have 9 versions of this March 10 nursing-related image 12 versions of a March 3 Wuhan related image I'd be interested if you guys see the same multiple downloads for these (or other) images. |
I haven't been able to verify any on my end. Do you have any recorded duplicates after March 30, 2022? I committed a change to the repo on March 30th, which changed line 391 in NPRAPIWordpress.php. Previously, that line was using the internal Wordpress ID for the imported article, instead of the ID of the imported image. |
Here's two images from June 13 This image has 9 versions -- ap22145785567460_wide-dfbbced2e2a33aa5a0899ca7d6cd6f257e13ed37-9-scaled.jpg its from June 13 "Republican primaries show that Trump voters don't always follow his endorsements" story id 1103956855 |
I'm using 1.9.3.1, I updated to it on June 8 |
I've got a feeling the issue is that this is failing:
I'm going to error_log those two values for a couple of days and see what I see. |
Okay, previous theory busted. I'll run some tests with the 2 above you referenced. Also, comparing the filenames you referenced with the output from the API, (example: |
Don't worry about the '-scaled' I was using '-8-scaled' as a grep criteria against ls |
I'm starting to think the issue is earlier -- you're automatically sideloading the image and creating an attachment always, but then you compare the filenames of the new attachment with the old attachments; if they match you delete the new attachment. However, because the new image has already been downloaed and the filename set to incremented '-version.jpg' the match always fails. I'll work on a patch. |
Yeah, good point. I think that logic was created before our time, though I touched it last when I swapped around what IDs were being checked. Doing the filename check earlier sounds like a good idea. |
Agreed, it looks like it was 7 years ago. |
I think I have a solution cooked up if need be. Otherwise, I will defer to yours. |
you're probably further along, go for it. I'm still trying to debug where the filename matching isn't happening.
…________________________________________
From: Jared Counts ***@***.***>
Sent: Wednesday, June 15, 2022 4:04 PM
To: OpenPublicMedia/nprapi-wordpress
Cc: Tam, Will; Author
Subject: Re: [OpenPublicMedia/nprapi-wordpress] Images being repeatedly sideloaded (Issue #18)
CAUTION: This email originated from outside of WNET. Do not follow guidance, click links, or open attachments unless you recognize the sender and know the content is safe.
I think I have a solution cooked up if need be. Otherwise, I will defer to yours.
—
Reply to this email directly, view it on GitHub<#18 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ACHTAAQL6BDIHMIB6MKMUILVPIZOHANCNFSM5YZBU43A>.
You are receiving this because you authored the thread.Message ID: ***@***.***>
|
Okay, sent the commit to the repo. Basically made it loop through attached images right after determining If no attached images match, then the download/sideload/etc. actually happens. Looks like it will work, but need to run some tests. |
Nope, too aggressive. My testing site isn't importing images at all now. |
Fixed it. Forgot that 'filename' from |
I think a related issue is introduced by the '-scaled' thing -- a built-in WordPress feature since 5.3. Here's some debug output I got from a statement right after line 403 -- $file_array is the parsed $image_url, $attach_url_parts is from the post's existing attached images, and $imagep_url_parts is from the newly sideloaded image. $file_array['name'] = ap22140559338813-71e6ea258db1d20a0c21b6459be57551dc1ce167.jpg |
We need an additional arg to the two calls to wp_get_attachment_url() -- in both cases we need to specify the size, 'full' aka the original downloaded size |
Unfortunately, that function only accepts the ID. However, they introduced a new function in 5.3: |
I'd actually thought we were using wp_get_attachment_image_src() but doh. I'll check it out. |
Well, I'm still seeing the issue. I added a little bit of debugging to see what's happening -- line 378 right before the decision to 'continue' I added Here's the output from the most recent run:
However, "bmf_3885-920766ab62abd8b5da6f9356ab8af60807f72918-3" ended up being added to the Media Library. I'll see if I can figure out what is going on and why it's still getting sideloaded. |
actually I already see the issue, the 'continue' doesn't actually stop the download from happening in line 385. |
You're right. The
|
easier would be to just change the orig continue to 'continue 2' I think |
I'm testing the 'continue 2' approach on my server now, with debugging, I'll let you know how it goes |
Oh wow, I didn't know you could do that. Makes sense though. |
Looking good! Just got 5 story updates, none of them got downloaded, but a new story got its image. |
Nice! I've got a couple of small bug fixes queued for a release, so this will round it out pretty well. |
I've noticed since late December that my site is sideloading several extra copies of the 'primary' image for stories. I've gone from sideloading about 3-5GB of images from NPR stories per month to around 15GB. @jwcounts
I need to do some further review, but I think it might have something to do with the changes around this block of code in NPRAPIWordpress.php starting at line 393:
I'm guessing what's happening is that the previous filename matching logic is breaking somehow, so every time there's a revision to a story the new image gets created.
This could well be some issue on the API end as well -- I was on vacation the week we start seeing this behavior so a code update making its way through seems.... unlikely.
The text was updated successfully, but these errors were encountered: