PdfEmitter, Enhancement: Enable options to set the PDF/Version & archiving formats PDF/A (A1A & A1B) #1486

speckyspooky · 2023-11-05T23:04:36Z

The target is to support different PDF-versions and PDF archive formats PDF/A (A1A & A1B)

Based on openPDF version 1.3.30 the following features will be available:
• setting of the PDF-version: 1.3 till 1.7
• setting of the PDF/A-version: A1A & A1B
• settong of the PDF/A-version: PDF.X32002

The default of the PDF creation won't be changed.
The new options will be activated only with the configuration of the according user properties.

The following user properties will be available on the PdfEmitter:

PdfEmitter.Version, String, enumeration
• supported values: 1.3. 1.4, 1.5, 1.6, 1.7
• default: 1.5

PdfEmitter.Conformance, String, enumeration
• supported values: PDF.A1A, PDF.A1B, PDF.X32002
• default: PDFXNONE
• PDF.A1A & PDF.A1B overwrite the configured PDF-version with PDF-version 1.4

PdfEmitter.IccProfileFile, String, file path
• path to the ICC color profile
• default: sRGB IEC61966-2.1

PdfEmitter.IccColorType, String, enumeration
• supported values: RGB, CMYK
• default: RGB

PdfEmitter.PDFA.AddDocumentTitle, Boolean
• false, avoid the title at document / true, add the title at document (openPDF 1.3.30: PDF/A will be invalid)
• default: false
• Workaround: openPDF 1.3.30 issue on creating the PDF/A title, the tag "dc:title" is invalid due to missing language-atrribute at the title content

PdfEmitter.PDFA.FallbackFont, String, font file name
• path of an alternative installed font name which can be used instead of an not mebeddable font
• default: null

The PDF/A results are tested and validated with 3 different PDF/A validators:
• pdfforge.org
• avepdf.com
• veraPDF

For the testing the results of 3 different basic test cases was used incl. images (direct & background images),
the basic tests are:
• Version: PDF/A1A
• Version: PDF/A1A with fallback font of unembeddable fonts
• Version: PDF/A1B with fallback font of unembeddable fonts
(• Version: PDF with configured PDF-version is tested and validated with the AdobeReader details dialog.)

First results of the new options, PDF/A1A with fallback font to be a valid PDF/A document:

PDF-document
pdf_enhancement_version_PDF_A1A_FBfont.pdf
Validation screen of "veraPDF"

Attached the following test reports are given:

• 4 test reports of:

Version: PDF with configured version 1.7
Version: PDF/A1A
Version: PDF/A1A with fallback font of unembeddable fonts
Version: PDF/A1B with fallback font of unembeddable fonts
pdf_enhancement_test_reports.zip

Attached the following documentation is given:

• 4 pdf documents of

Version: PDF with configured version 1.7
Version: PDF/A1A
Version: PDF/A1A with fallback font of unembeddable fonts
Version: PDF/A1B with fallback font of unembeddable fonts
pdf_enhancement_documents.zip

Attached the following PDF/A validation documentation is given:

• 3 screens of each PDF/A-validator of the results of the PDF/A-documents

Version: PDF/A1A
Version: PDF/A1A with fallback font of unembeddable fonts
Version: PDF/A1B with fallback font of unembeddable fonts
pdf_enhancement_validation_documentation.zip

archiving formats PDF/A (A1A & A1B) (eclipse-birt#1486)

hvbtup · 2023-11-06T08:52:26Z

This is definitely a step into the right direction.

I'm not sure about some details, though.

First of all, would it be possible to also create PDF/A-3 (or at least in a later step?)
That would simplify creating ZUGFeRD aka Factur-X invoices (ATM I am Javascript to create an invoice XML file on the fly while generating the PDF report and then use a post processing step to convert the PDF and XML files to a ZUGFeRD file). ZUGFeRD is based on PDF/A-3.

Next, I'm not sure about PDF/A-1a (EDITED: correct is 1a (a for accessible) instead of 1b (b for basic).
Do you really create a tagged PDF or just a dummy tag structure?
Or are the validators just not properly checking this?
You know, PDF/UA also requires a properly tagged PDF.

If the PDF is not properly tagged, it would be better to not allow declaring the PDF to be PDF/A-1b conformant.

Regarding the title and language:
I think defining a language property (which would be like an inheritable CSS property) would be a good idea.
In the easiest case, one would define the language at the report level.

Regarding font handling:
I think when the fallback font actually has to be used, that is an issue with the report design, because the report will not look as intended. There should be an option to treat this as an error and add this to the error list (which allows the caller to detect this problem).

speckyspooky · 2023-11-06T09:22:15Z

I'm aware of the topic with PDF/A-3 and the ZUGFeRD-invoice-method of german governance agencies.
So I'm checked all this details before.

This change is from my side only the first steps and open this option for further PDF/A-levels.
openPDF 1.3.30 support: PDF/A-1a & -1b, currently there is no PDF/A-3 support given
openPDF tagged PDF: this is the standard mechanism of openPDF (real structur, no dummy, no special implementation here)
The validator checks if the structure is a valid tagged structure. If the structure is not given then it isn't a valid PDF/A.
Language of "dc:title" couln't be set by default with CCS-properties on report level, because there are special values according der PDF-definition. But the main issue is that there is no option given to write the XMP metadata (=XML structure) in the right way. I have done it but the problem is that the structure includes the PDF-creation date and also the PDF-modification date and the validation rules means that these values must be equal to the date stamps at the PDF-document (, make sense).
But the datetime stamps cannot be in sync because if I write the XMP metadata I will have another stamps like on the PDF-document because it will be created later and currently there is no simple way to set the datetime stamps directly.
All documented mechanism are based on a physical stored file which can be use to overwrite the properties (post-processing).
Fallback font: Yes, the responsibility of the fonts is a task which is located on the report-developer. But during the development and also for the migration from existing reports it will help to implements and migrate the reports. This value isn't set by default so you would get an invalid pdf with the standard stacktrace into the PDF-document (based on text level, not like PDF-written text).

hvbtup · 2023-11-06T14:25:56Z

The validator checks if the structure is a valid tagged structure.
openPDF tagged PDF: this is the standard mechanism of openPDF (real structur, no dummy, no special implementation here)

I cannot believe this.

AFAIK calling writer.setTagged() merely declares that the document is tagged, but it doesn't add the tag structure AFAIK actually.
Otherwise we could have PDF/UA "for free" (see #1234).

You can use the PAC 2021 (PDF Accessibility Checker) to actually see the tag structure.
https://pdfua.foundation/de/pac-2021-der-kostenlose-pdf-accessibility-checker/

speckyspooky · 2023-11-06T14:39:51Z

I have not set the the structure will be a full representation of all elements which are included on the DPF-document.
The PDF-document will be marked as tagged and the master structure internally will be created (no dummy).
Yes, would be more to do to have all tagged at the end.

But this is not my target.
The target is to get an option on BIRT that we can create PDF/A (1a & 1b) as a standard function.
My target currently is not to create PDF/A-3 and not to create PDF/UA.

These topics are steps for the next level.

hvbtup · 2023-11-06T15:14:57Z

Even though the validators say that the generated PDF is valid PDF/A-1a, this is not the case.
The "a" stands for accessibility and this requires a complete tag structure (just as in PDF/UA).
https://www.pdfa.org/wp-content/until2016_uploads/2011/08/das_bessere_pdfa-1b.pdf

If we add an option to create (seemingly) PDF/A-1a, people would of course think that BIRT supports PDF/A-1a.
But this isn't the case yet.

This topic is very important for the future of BIRT and thus I think the output should not "lie" by stating it's PDF/A-1a conformant when it isn't really.

So, in a first step we should only support PDF/A-1b ("b" for basic profile).

Adding support for PDF/A-1b (and for PDF/UA, by the way) will be easy once we can create tagged PDF.

speckyspooky · 2023-11-06T17:00:33Z

@hvbtup
But this isn't the same. The PDF/A1a ist different like PDF/UA. all PDF/UA-documents are vailid PDF/A1a documents but not all PDF/A1a documents are valid PDF/UA.

The PDF/UA conformance needs according to your explanations more details and the full tagged structure.
I used your tool and one thing was found that the "dc:title" isn't listed. But this is only a requirement to be PDF/UA conform but not to be PDF/A1a conform.

The PDF/UA stanards is a subset of PDF/A1a and PDF/UA is more restrictive instead of PDF/A1a.
So it isn't correct to say that the PDF/A1a isn't valid.

So I checked also a PDF/A1a with the wrong "dc:title" definition with the added Title but the unnoted language-attribute.
All 3 validators give an error because the title-definition is wrong. Your PDF/UA-checker means all is fine.

Currently my point of view is that the PDF/A1a version is a valid option.
Yes, it is not PDF/UA conform because for that there are some open point.

But you can test the created PDF/A1a documents from my demo-zip-file and you will finde,
that 3 error types are given of PDF/UA which means "dc:title" unset, "DisplayDocTitle" is unset, PDF/UA identifier not found and an operation "cm" was used which is not allowed. This is coret of PDF/UA but not required on PDF/A1a.

speckyspooky · 2023-11-06T19:29:15Z

I have tested some further PDF/A-validators and here is a result which represents the difference between PDF/A-1a & PDF/UA-1. So I think our PDF/A-1a based on openPDF is a valid option.

Additional tested validators

hvbtup · 2023-11-07T11:39:07Z

I don't own a copy of the EN ISO 19005 standard, so probably you are right and it only means that the reading order is OK.

I tested your document pdf_enhancement_version_PDF_A1A_FBfont.pdf with an evaluation version of a commercial tool from the Swiss PDF Tools AG if it conforms to PDF/A-1a.

This tool says:

- PDF: Graphics operator cm is not allowed in text object. (content of page on page 1)
Document conforms to PDF/A-1a: false

The validator does not complain about the missing tag structure, although this structure is empty (as can be seen with PAC 2021), so that's ok for me.

The same test result also for PDF/A-1b conformance.
Other validators I used earlier (when I used Apache PDFBox to convert the BIRT PDF to PDF/A-1b) did not say anything about the cm operator, so I guess this is a minor issue in reality.

Probably many documents claiming PDF/A1-a (or PDF/A1-b) conformance aren't actually conformant, so while we know that there are issues left (like the cm operator and the title), I'm fine with your PR.

hvbtup · 2023-11-07T12:05:27Z

@wimjongman
If we mention this in the release notes for 4.14, we must make it clear that the generated PDFs are not really accessible.

wimjongman · 2023-11-07T12:18:13Z

This issue is the release note.

speckyspooky · 2023-11-07T20:11:39Z

@hvbtup & @wimjongman

I prefer that we add the new option on the release notes and additional on it we should not that the PDF/A* will be genereated with "openPDF version 1.3.30". So it is full transparent which library is used to generete the PDF/A.

I agree that we should add the hints PDF/A1b = ok, PDF/A1a with limitation based openPDF.

At the end with my experience in the past and you Henning that PDF/A is complex and one of the main issue is to have a good generator of PDF/A and also to have a good validator.

I tested with "https://www.pdf-online.com/osa/validate.aspx" agaian and there is noted the "cm"-operator and the issue with the "dc:title".

So we have know 9 validators of PDF/A1a with 7time=ok, 1times = faild due to "cm", 1times failed due to "cm" and "dc:title" (if it used). And all this differences of a standard which was published 2005.

merks · 2023-11-07T20:17:53Z

🙈

hvbtup · 2023-11-08T07:59:12Z

One of the main problems with standards like EN ISO 19005 is that they are not open source.
So you have to pay to read them.
Many developers are not willing to or cannot afford to do this (particular in the open source eco system), and so they don't know the exact content of the standards.
Apart from that, writing a validator that validates e.g. the tag structure is complicated.
I think that's why different software producing PDF or validating PDF shows different results.

For example, I was not able to find a statement that cm is not allowed or an explanation why cm is not allowed in PDF/A in the public internet.

…hiving formats PDF/A (A1A & A1B) #1486 (#1487) PdfEmitter, Enhancement: Enable options to set the PDF/Version & archiving formats PDF/A (A1A & A1B) (#1486)

static constants access) (eclipse-birt#1486)

…s, static constants access) (#1486) (#1495) * Fixing different eclipse warnings (missing comments, unused parameters, static constants access) (#1486)

speckyspooky added this to the 4.14 milestone Nov 5, 2023

speckyspooky added the Enhancement Small change to improve the current supported functionality label Nov 5, 2023

speckyspooky self-assigned this Nov 5, 2023

speckyspooky added a commit to speckyspooky/birt that referenced this issue Nov 5, 2023

PdfEmitter, Enhancement: Enable options to set the PDF/Version &

a83e374

archiving formats PDF/A (A1A & A1B) (eclipse-birt#1486)

speckyspooky mentioned this issue Nov 5, 2023

PdfEmitter, Enhancement: Enable options to set the PDF/Version & archiving formats PDF/A (A1A & A1B) #1486 #1487

Merged

wimjongman linked a pull request Nov 6, 2023 that will close this issue

PdfEmitter, Enhancement: Enable options to set the PDF/Version & archiving formats PDF/A (A1A & A1B) #1486 #1487

Merged

speckyspooky closed this as completed in #1487 Nov 8, 2023

speckyspooky added a commit to speckyspooky/birt that referenced this issue Nov 11, 2023

Fixing different eclipse warnings (missing comments, unused parameters,

9bae6bf

static constants access) (eclipse-birt#1486)

hvbtup mentioned this issue Jul 22, 2024

PDF/A-3 support, validation fails #1817

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PdfEmitter, Enhancement: Enable options to set the PDF/Version & archiving formats PDF/A (A1A & A1B) #1486

PdfEmitter, Enhancement: Enable options to set the PDF/Version & archiving formats PDF/A (A1A & A1B) #1486

speckyspooky commented Nov 5, 2023

hvbtup commented Nov 6, 2023 •

edited

Loading

speckyspooky commented Nov 6, 2023

hvbtup commented Nov 6, 2023

speckyspooky commented Nov 6, 2023

hvbtup commented Nov 6, 2023

speckyspooky commented Nov 6, 2023

speckyspooky commented Nov 6, 2023

hvbtup commented Nov 7, 2023

hvbtup commented Nov 7, 2023

wimjongman commented Nov 7, 2023

speckyspooky commented Nov 7, 2023

merks commented Nov 7, 2023

hvbtup commented Nov 8, 2023

PdfEmitter, Enhancement: Enable options to set the PDF/Version & archiving formats PDF/A (A1A & A1B) #1486

PdfEmitter, Enhancement: Enable options to set the PDF/Version & archiving formats PDF/A (A1A & A1B) #1486

Comments

speckyspooky commented Nov 5, 2023

The target is to support different PDF-versions and PDF archive formats PDF/A (A1A & A1B)

The following user properties will be available on the PdfEmitter:

First results of the new options, PDF/A1A with fallback font to be a valid PDF/A document:

Attached the following test reports are given:

Attached the following documentation is given:

Attached the following PDF/A validation documentation is given:

hvbtup commented Nov 6, 2023 • edited Loading

speckyspooky commented Nov 6, 2023

hvbtup commented Nov 6, 2023

speckyspooky commented Nov 6, 2023

hvbtup commented Nov 6, 2023

speckyspooky commented Nov 6, 2023

speckyspooky commented Nov 6, 2023

hvbtup commented Nov 7, 2023

hvbtup commented Nov 7, 2023

wimjongman commented Nov 7, 2023

speckyspooky commented Nov 7, 2023

merks commented Nov 7, 2023

hvbtup commented Nov 8, 2023

hvbtup commented Nov 6, 2023 •

edited

Loading