You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
By default (when the autofetch behavior is activated if I'm not mistaken), the crawler automatically fetches images from srcset of <img> tags so that all resolutions are available in the WARC.
However, this seems to not take into account the situation where the srcset condition is activated, and it is hence the src of the image which is never fetched (and break under some conditions).
Since I crawled with --mobileDevice "Pixel 2", images/b/b8/Avatar-mujer.jpg has automatically been fetched by the browser, but the autoFetch behavior seems to never have fetched images/thumb/b/b8/Avatar-mujer.jpg/300px-Avatar-mujer.jpg.
Nota: I'm not sure this website HTML code is 100% valid to the spec, in general I see that img src is repeated in srcset as well, but I didn't find any spec around this (is this just a good practice - to avoid situation like this one - or a spec?).
The text was updated successfully, but these errors were encountered:
@ikreymer sorry, this is not at all a wombat issue, I don't know what happened in my mind when opening this issue. Can you move this to webrecorder/browsertrix-crawler and fix the title which is wrong?
By default (when the
autofetch
behavior is activated if I'm not mistaken), the crawler automatically fetches images fromsrcset
of<img>
tags so that all resolutions are available in the WARC.However, this seems to not take into account the situation where the
srcset
condition is activated, and it is hence thesrc
of the image which is never fetched (and break under some conditions).Sample website: https://enciclopedia.banrepcultural.org/index.php?title=Delcy_Morelos_Sandoval
Sample WARC: crawl-enciclopedia-banrep-onepage-20240930.warc.gz (this WARC has images displayed only a DPR 1.5 or above, with DPR 1 all images are broken)
HTML source code causing the issue:
Since I crawled with
--mobileDevice "Pixel 2"
,images/b/b8/Avatar-mujer.jpg
has automatically been fetched by the browser, but the autoFetch behavior seems to never have fetchedimages/thumb/b/b8/Avatar-mujer.jpg/300px-Avatar-mujer.jpg
.Full crawl command:
Nota: I'm not sure this website HTML code is 100% valid to the spec, in general I see that img src is repeated in srcset as well, but I didn't find any spec around this (is this just a good practice - to avoid situation like this one - or a spec?).
The text was updated successfully, but these errors were encountered: