Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTML Input for PDF Generation #1455

Merged
merged 5 commits into from
Sep 5, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -92,12 +92,21 @@
<artifactId>utils-mail-dkim</artifactId>
<version>2.0.1</version>
</dependency>

<!-- Used to generate PDFs -->
<dependency>
<groupId>org.xhtmlrenderer</groupId>
<artifactId>flying-saucer-pdf</artifactId>
<version>9.8.0</version>
</dependency>

<!-- Used to clean HTML before generating PDF -->
<dependency>
<groupId>org.jsoup</groupId>
<artifactId>jsoup</artifactId>
<version>1.17.2</version>
</dependency>

<!-- POI is used to generate excel exports -->
<dependency>
<groupId>org.apache.poi</groupId>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,9 @@

package sirius.web.templates.pdf;

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Entities;
import org.xhtmlrenderer.pdf.ITextRenderer;
import sirius.kernel.di.std.Part;
import sirius.kernel.di.std.Register;
Expand Down Expand Up @@ -53,14 +56,38 @@ public boolean generate(Generator generator, OutputStream out) throws Exception
ITextRenderer renderer = new ITextRenderer();
renderer.getSharedContext()
.setReplacedElementFactory(new ImageReplacedElementFactory(renderer.getOutputDevice()));
renderer.setDocumentFromString(content);
renderer.setDocumentFromString(cleanHtml(content));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional: Maybe this could be enabled optionally? Don't know if it could break stuff in S2

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, how? As an additional pseudo-parameter in the context? As a system-wide setting? Both do not seem to be particularly elegant 😇

Let's see what S2 is going to say about this PR, okay? We can also flag it as breaking, but it should process all previously valid input as it was, and only affect the behaviour in case of faulty input.

renderer.layout();
renderer.createPDF(out);
out.flush();

return true;
}

/**
* Cleans the given HTML content for use as input to the PDF generator.
* <p>
* This is done by first generally removing all {@code <script>} elements from the entire document. Then, all
* {@code <style>} elements are deleted that are outside the {@code <header>} element. Finally, the DOM tree
* is encoded as XHTML fit for the strict SAX parser employed by {@link ITextRenderer}.
*
* @param html the HTML content to clean
* @return the given content with problematic elements removed and encoded as valid XHTML
*/
private String cleanHtml(String html) {
Document document = Jsoup.parse(html);

// in theory, the following two lines should be possible with a single CSS selector; however, in practice, Jsoup
// does not select the style elements correctly when attempting that
document.select("script").remove();
document.select("body style").remove();

document.outputSettings().syntax(Document.OutputSettings.Syntax.xml);
document.outputSettings().escapeMode(Entities.EscapeMode.xhtml);
document.outputSettings().charset("UTF-8");
return document.html();
}

@Override
public int getPriority() {
return DEFAULT_PRIORITY;
Expand Down