Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

monolith is not embedding SVG files correctly #289

Closed
scubanarc opened this issue Jul 9, 2024 · 7 comments
Closed

monolith is not embedding SVG files correctly #289

scubanarc opened this issue Jul 9, 2024 · 7 comments
Labels
bug Something isn't working

Comments

@scubanarc
Copy link

Monolith is capable of embedding SVG files in the output HTML, but when I hoard a page with Hoarder that includes SVG images, the monolith output is broken.

Here's an example page that fails:

https://musictheory.pugetsound.edu/mt21c/DiatonicChordsInMinor.html

If I grab this page with Hoarder, the archive is broken. Anywhere that there was an SVG there is a broken image icon.

However, if I grab it with monolith manually, the output HTML file is correct. Here's my monolith command line:

monolith -o svgtest.htm https://musictheory.pugetsound.edu/mt21c/DiatonicChordsInMinor.html

My testing was done with monolith 2.8.1 downloaded directly from their github here:

https://github.com/Y2Z/monolith/releases/download/v2.8.1/monolith-gnu-linux-x86_64
@scubanarc
Copy link
Author

More info...

I entered the worker container and ran the monolith command above. From within the worker container, the SVG files are not pulled correctly. From outside the container, the SVG files are pulled and embedded correctly.

Inside the container, the resulting HTML file is 14.2 MB, while outside the container the resulting HTML file is 17.6 MB.

The monolith logs from inside and outside the container are identical.

@kamtschatka
Copy link
Collaborator

hm I just ran the same test and it works fine for me.
The worker has monolith version 2.8.1 as you said:

/app/apps/workers # monolith --version
monolith 2.8.1

The filesize is 17.58MB after downloading it in the container.
Is it working for you in the meantime? Otherwise maybe you have some kind of issue in the network or a firewall or similar?

@scubanarc
Copy link
Author

I just ran the command in the worker container again and it worked this time. I got the full 17.6 MB capture from inside the container, which is different from a few days ago.

So I deleted my capture in Hoarder and recreated it. The problem persists. That's got me scratching my head.

I was able to delete the asset.bin and replace it with my manually captured "svgtest.htm" file (renamed) just to make sure that it wasn't a render issue, and it renders just fine as if it was captured correctly.

It's possibly a network/firewall issue, but I'm a network person and we are having no other issues that I can detect.

Can you try Hoarding that page through the web interface and see if you get the full 17.6 MB. If you do, then I'll know that I'm having a local issue.

@kamtschatka
Copy link
Collaborator

kamtschatka commented Jul 13, 2024

OK i tried it out and confirm that it does not show the images correctly in hoarder.

The code shows that this is the used commandline:

monolith  - -Ije -t 5 -b ${baseUrl} -o ${assetPath} 

I modified your command to this:

monolith  -Ije -t 5 -o svg.html https://musictheory.pugetsound.edu/mt21c/DiatonicChordsInMinor.html

This also works fine (please doublecheck) and the file contains the proper images. (i did not provide the baseUrl, but i think that is fine)

The thing is, that the code is actually also passing the html from the previous crawling step into monolith instead of providing the URL, which seems to cause this issue.
Once you confirmed that the above command works fine for you as well, I can dig a bit deeper into this and ask @MohamedBassem why it was implemented like this.

Edit: OK I tried it out with piping the html from the page to monolith directly and the outcome is different and the svg is no longer captured. I guess we should simply make a new request to the page to get all the resources properly.

@MohamedBassem
Copy link
Collaborator

The thing is, that the code is actually also passing the html from the previous crawling step into monolith instead of providing the URL, which seems to cause this issue.

Monolith doesn't execute javascript. So if you have a pure SPA, monolith will see an empty page. That's why you want chrome to first run the javascript and load the page, then pass the final html to monolith.

@kamtschatka
Copy link
Collaborator

OK turns our this is caused by the basePath we are passing to monolith.
Currently we pass https://musictheory.pugetsound.edu, which causes it not work, but if we pass https://musictheory.pugetsound.edu/mt21c, it works.

Gotta figure out a way to pass the correct path to it.

@MohamedBassem MohamedBassem added the bug Something isn't working label Jul 13, 2024
@scubanarc
Copy link
Author

scubanarc commented Jul 13, 2024

monolith -Ije -t 5 -o svg.html https://musictheory.pugetsound.edu/mt21c/DiatonicChordsInMinor.html

Yes, this works fine from inside the container.

Strangely this file is smaller, but still complete.

kamtschatka added a commit to kamtschatka/hoarder-app that referenced this issue Jul 14, 2024
passing in the URL of the page to have the proper URL for resolving relative paths
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants