Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PdfEmitter, Enhancement: Enable options to set the PDF/Version & archiving formats PDF/A (A1A & A1B) #1486

Closed
speckyspooky opened this issue Nov 5, 2023 · 13 comments · Fixed by #1487
Assignees
Labels
Enhancement Small change to improve the current supported functionality
Milestone

Comments

@speckyspooky
Copy link
Contributor

The target is to support different PDF-versions and PDF archive formats PDF/A (A1A & A1B)

Based on openPDF version 1.3.30 the following features will be available:
• setting of the PDF-version: 1.3 till 1.7
• setting of the PDF/A-version: A1A & A1B
• settong of the PDF/A-version: PDF.X32002

The default of the PDF creation won't be changed.
The new options will be activated only with the configuration of the according user properties.

The following user properties will be available on the PdfEmitter:

PdfEmitter.Version, String, enumeration
• supported values: 1.3. 1.4, 1.5, 1.6, 1.7
• default: 1.5

PdfEmitter.Conformance, String, enumeration
• supported values: PDF.A1A, PDF.A1B, PDF.X32002
• default: PDFXNONE
• PDF.A1A & PDF.A1B overwrite the configured PDF-version with PDF-version 1.4

PdfEmitter.IccProfileFile, String, file path
• path to the ICC color profile
• default: sRGB IEC61966-2.1

PdfEmitter.IccColorType, String, enumeration
• supported values: RGB, CMYK
• default: RGB

PdfEmitter.PDFA.AddDocumentTitle, Boolean
• false, avoid the title at document / true, add the title at document (openPDF 1.3.30: PDF/A will be invalid)
• default: false
Workaround: openPDF 1.3.30 issue on creating the PDF/A title, the tag "dc:title" is invalid due to missing language-atrribute at the title content

PdfEmitter.PDFA.FallbackFont, String, font file name
• path of an alternative installed font name which can be used instead of an not mebeddable font
• default: null

The PDF/A results are tested and validated with 3 different PDF/A validators:
• pdfforge.org
• avepdf.com
• veraPDF

For the testing the results of 3 different basic test cases was used incl. images (direct & background images),
the basic tests are:
• Version: PDF/A1A
• Version: PDF/A1A with fallback font of unembeddable fonts
• Version: PDF/A1B with fallback font of unembeddable fonts
(• Version: PDF with configured PDF-version is tested and validated with the AdobeReader details dialog.)

First results of the new options, PDF/A1A with fallback font to be a valid PDF/A document:

Attached the following test reports are given:

• 4 test reports of:

  • Version: PDF with configured version 1.7
  • Version: PDF/A1A
  • Version: PDF/A1A with fallback font of unembeddable fonts
  • Version: PDF/A1B with fallback font of unembeddable fonts
    pdf_enhancement_test_reports.zip

Attached the following documentation is given:

• 4 pdf documents of

  • Version: PDF with configured version 1.7
  • Version: PDF/A1A
  • Version: PDF/A1A with fallback font of unembeddable fonts
  • Version: PDF/A1B with fallback font of unembeddable fonts
    pdf_enhancement_documents.zip

Attached the following PDF/A validation documentation is given:

• 3 screens of each PDF/A-validator of the results of the PDF/A-documents

@speckyspooky speckyspooky added this to the 4.14 milestone Nov 5, 2023
@speckyspooky speckyspooky added the Enhancement Small change to improve the current supported functionality label Nov 5, 2023
@speckyspooky speckyspooky self-assigned this Nov 5, 2023
speckyspooky added a commit to speckyspooky/birt that referenced this issue Nov 5, 2023
@hvbtup
Copy link
Contributor

hvbtup commented Nov 6, 2023

This is definitely a step into the right direction.

I'm not sure about some details, though.

First of all, would it be possible to also create PDF/A-3 (or at least in a later step?)
That would simplify creating ZUGFeRD aka Factur-X invoices (ATM I am Javascript to create an invoice XML file on the fly while generating the PDF report and then use a post processing step to convert the PDF and XML files to a ZUGFeRD file). ZUGFeRD is based on PDF/A-3.

Next, I'm not sure about PDF/A-1a (EDITED: correct is 1a (a for accessible) instead of 1b (b for basic).
Do you really create a tagged PDF or just a dummy tag structure?
Or are the validators just not properly checking this?
You know, PDF/UA also requires a properly tagged PDF.

If the PDF is not properly tagged, it would be better to not allow declaring the PDF to be PDF/A-1b conformant.

Regarding the title and language:
I think defining a language property (which would be like an inheritable CSS property) would be a good idea.
In the easiest case, one would define the language at the report level.

Regarding font handling:
I think when the fallback font actually has to be used, that is an issue with the report design, because the report will not look as intended. There should be an option to treat this as an error and add this to the error list (which allows the caller to detect this problem).

@speckyspooky
Copy link
Contributor Author

I'm aware of the topic with PDF/A-3 and the ZUGFeRD-invoice-method of german governance agencies.
So I'm checked all this details before.

  • This change is from my side only the first steps and open this option for further PDF/A-levels.

  • openPDF 1.3.30 support: PDF/A-1a & -1b, currently there is no PDF/A-3 support given

  • openPDF tagged PDF: this is the standard mechanism of openPDF (real structur, no dummy, no special implementation here)

  • The validator checks if the structure is a valid tagged structure. If the structure is not given then it isn't a valid PDF/A.

  • Language of "dc:title" couln't be set by default with CCS-properties on report level, because there are special values according der PDF-definition. But the main issue is that there is no option given to write the XMP metadata (=XML structure) in the right way. I have done it but the problem is that the structure includes the PDF-creation date and also the PDF-modification date and the validation rules means that these values must be equal to the date stamps at the PDF-document (, make sense).
    But the datetime stamps cannot be in sync because if I write the XMP metadata I will have another stamps like on the PDF-document because it will be created later and currently there is no simple way to set the datetime stamps directly.
    All documented mechanism are based on a physical stored file which can be use to overwrite the properties (post-processing).

  • Fallback font: Yes, the responsibility of the fonts is a task which is located on the report-developer. But during the development and also for the migration from existing reports it will help to implements and migrate the reports. This value isn't set by default so you would get an invalid pdf with the standard stacktrace into the PDF-document (based on text level, not like PDF-written text).

@hvbtup
Copy link
Contributor

hvbtup commented Nov 6, 2023

The validator checks if the structure is a valid tagged structure.
openPDF tagged PDF: this is the standard mechanism of openPDF (real structur, no dummy, no special implementation here)

I cannot believe this.

AFAIK calling writer.setTagged() merely declares that the document is tagged, but it doesn't add the tag structure AFAIK actually.
Otherwise we could have PDF/UA "for free" (see #1234).

You can use the PAC 2021 (PDF Accessibility Checker) to actually see the tag structure.
https://pdfua.foundation/de/pac-2021-der-kostenlose-pdf-accessibility-checker/

@speckyspooky
Copy link
Contributor Author

I have not set the the structure will be a full representation of all elements which are included on the DPF-document.
The PDF-document will be marked as tagged and the master structure internally will be created (no dummy).
Yes, would be more to do to have all tagged at the end.

But this is not my target.
The target is to get an option on BIRT that we can create PDF/A (1a & 1b) as a standard function.
My target currently is not to create PDF/A-3 and not to create PDF/UA.

These topics are steps for the next level.

@hvbtup
Copy link
Contributor

hvbtup commented Nov 6, 2023

Even though the validators say that the generated PDF is valid PDF/A-1a, this is not the case.
The "a" stands for accessibility and this requires a complete tag structure (just as in PDF/UA).
https://www.pdfa.org/wp-content/until2016_uploads/2011/08/das_bessere_pdfa-1b.pdf

If we add an option to create (seemingly) PDF/A-1a, people would of course think that BIRT supports PDF/A-1a.
But this isn't the case yet.

This topic is very important for the future of BIRT and thus I think the output should not "lie" by stating it's PDF/A-1a conformant when it isn't really.

So, in a first step we should only support PDF/A-1b ("b" for basic profile).

Adding support for PDF/A-1b (and for PDF/UA, by the way) will be easy once we can create tagged PDF.

@speckyspooky
Copy link
Contributor Author

@hvbtup
But this isn't the same. The PDF/A1a ist different like PDF/UA. all PDF/UA-documents are vailid PDF/A1a documents but not all PDF/A1a documents are valid PDF/UA.

The PDF/UA conformance needs according to your explanations more details and the full tagged structure.
I used your tool and one thing was found that the "dc:title" isn't listed. But this is only a requirement to be PDF/UA conform but not to be PDF/A1a conform.

The PDF/UA stanards is a subset of PDF/A1a and PDF/UA is more restrictive instead of PDF/A1a.
So it isn't correct to say that the PDF/A1a isn't valid.

So I checked also a PDF/A1a with the wrong "dc:title" definition with the added Title but the unnoted language-attribute.
All 3 validators give an error because the title-definition is wrong. Your PDF/UA-checker means all is fine.

Currently my point of view is that the PDF/A1a version is a valid option.
Yes, it is not PDF/UA conform because for that there are some open point.

But you can test the created PDF/A1a documents from my demo-zip-file and you will finde,
that 3 error types are given of PDF/UA which means "dc:title" unset, "DisplayDocTitle" is unset, PDF/UA identifier not found and an operation "cm" was used which is not allowed. This is coret of PDF/UA but not required on PDF/A1a.

@speckyspooky
Copy link
Contributor Author

I have tested some further PDF/A-validators and here is a result which represents the difference between PDF/A-1a & PDF/UA-1. So I think our PDF/A-1a based on openPDF is a valid option.

PDF-A-1a-validation

Additional tested validators

@hvbtup
Copy link
Contributor

hvbtup commented Nov 7, 2023

I don't own a copy of the EN ISO 19005 standard, so probably you are right and it only means that the reading order is OK.

I tested your document pdf_enhancement_version_PDF_A1A_FBfont.pdf with an evaluation version of a commercial tool from the Swiss PDF Tools AG if it conforms to PDF/A-1a.

This tool says:

- PDF: Graphics operator cm is not allowed in text object. (content of page on page 1)
Document conforms to PDF/A-1a: false

The validator does not complain about the missing tag structure, although this structure is empty (as can be seen with PAC 2021), so that's ok for me.

The same test result also for PDF/A-1b conformance.
Other validators I used earlier (when I used Apache PDFBox to convert the BIRT PDF to PDF/A-1b) did not say anything about the cm operator, so I guess this is a minor issue in reality.

Probably many documents claiming PDF/A1-a (or PDF/A1-b) conformance aren't actually conformant, so while we know that there are issues left (like the cm operator and the title), I'm fine with your PR.

@hvbtup
Copy link
Contributor

hvbtup commented Nov 7, 2023

@wimjongman
If we mention this in the release notes for 4.14, we must make it clear that the generated PDFs are not really accessible.

@wimjongman
Copy link
Contributor

This issue is the release note.

@speckyspooky
Copy link
Contributor Author

@hvbtup & @wimjongman

I prefer that we add the new option on the release notes and additional on it we should not that the PDF/A* will be genereated with "openPDF version 1.3.30". So it is full transparent which library is used to generete the PDF/A.

I agree that we should add the hints PDF/A1b = ok, PDF/A1a with limitation based openPDF.

At the end with my experience in the past and you Henning that PDF/A is complex and one of the main issue is to have a good generator of PDF/A and also to have a good validator.

I tested with "https://www.pdf-online.com/osa/validate.aspx" agaian and there is noted the "cm"-operator and the issue with the "dc:title".

So we have know 9 validators of PDF/A1a with 7time=ok, 1times = faild due to "cm", 1times failed due to "cm" and "dc:title" (if it used). And all this differences of a standard which was published 2005.

@merks
Copy link
Contributor

merks commented Nov 7, 2023

🙈

@hvbtup
Copy link
Contributor

hvbtup commented Nov 8, 2023

One of the main problems with standards like EN ISO 19005 is that they are not open source.
So you have to pay to read them.
Many developers are not willing to or cannot afford to do this (particular in the open source eco system), and so they don't know the exact content of the standards.
Apart from that, writing a validator that validates e.g. the tag structure is complicated.
I think that's why different software producing PDF or validating PDF shows different results.

For example, I was not able to find a statement that cm is not allowed or an explanation why cm is not allowed in PDF/A in the public internet.

speckyspooky added a commit that referenced this issue Nov 8, 2023
…hiving formats PDF/A (A1A & A1B) #1486  (#1487)

PdfEmitter, Enhancement: Enable options to set the PDF/Version & archiving formats PDF/A (A1A & A1B) (#1486)
speckyspooky added a commit to speckyspooky/birt that referenced this issue Nov 11, 2023
speckyspooky added a commit that referenced this issue Nov 12, 2023
…s, static constants access) (#1486) (#1495)

* Fixing different eclipse warnings (missing comments, unused parameters, static constants access) (#1486)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Small change to improve the current supported functionality
Projects
None yet
4 participants