Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Flatten PDF forms #232

Open
OpenNingia opened this issue Oct 14, 2015 · 38 comments
Open

ENH: Flatten PDF forms #232

OpenNingia opened this issue Oct 14, 2015 · 38 comments
Labels
is-feature A feature request needs-pdf The issue needs a PDF file to show the problem workflow-forms From a users perspective, forms is the affected feature/workflow

Comments

@OpenNingia
Copy link

pdftk provides the feature to embed the form fields' text in the pdf itself.
This is very useful if you want to use an editable pdf as a template to be filled by code.

from the pdftk manual:

[ flatten ]
Use this option to merge an input PDF’s interactive form fields (and their data) with the PDF’s pages. Only one input PDF can be given. Sometimes used with the fill_form operation.

usage example:

    with open(source, 'rb') as source_fp:
        reader = PdfFileReader(source_fp)

        writer.appendPagesFromReader(
            reader, lambda x: writer.updatePageFormFieldValues(x, fields))

        with open(dest, 'wb') as output_fp:
            writer.write(output_fp, flatten_fields=True)
@whitemice
Copy link

+1 A way to flatten a form would be excellent. I would like to avoid having another dependency for my code, which uses PyPDF2. But shipping filled in forms around the interwebz creates problems with a variety of vendors and their [I assume not based on PyPDF2] software.

@mertz3hack
Copy link

It would be great if PyPDF2 had the ability to fill in forms and flatten them!

@oscardssmith
Copy link
Contributor

I also would really appreciate this

@mstamy2
Copy link
Collaborator

mstamy2 commented Aug 5, 2016

(In progress) We can accomplish this by setting Bit Position 1 of the field flags.

Ref: Table 8.70 of PDF 1,7 spec

@OpenNingia
Copy link
Author

OpenNingia commented Aug 5, 2016

Setting a field read-only might be a way, however pdftk works differently; afaik it replaces each /Field instance with a simple text object. 😕

@mstamy2
Copy link
Collaborator

mstamy2 commented Aug 5, 2016

You're right, that's the better option. Should be able to implement that soon

@nberrios
Copy link

nberrios commented Nov 4, 2016

I agree. This would be totally awesome!

@jamoham
Copy link

jamoham commented Nov 7, 2016

Is there any update on this?
I am looking to use an editable pdf as a template which will be filled by code.

@kherrett
Copy link

I'm with @jamoham on this... for the same exact use case.

@zhiwehu
Copy link

zhiwehu commented Apr 24, 2017

+1

@Rob1080
Copy link
Contributor

Rob1080 commented May 27, 2017

Any update on this?

@BeGrimm
Copy link

BeGrimm commented Apr 17, 2018

Can you flatten a file with PyPDF2 yet? I've not found anything on this being implemented.

@DrLou
Copy link

DrLou commented Jan 18, 2019

I do see some code to _flatten in the PdfFileReader, but not in the writer. Will someone be taking a swing at this?

@Joshua-IRT
Copy link

I have exactly the same scenario as mentioned by @jamoham, @kherrett and @zhiwehu above. Has there been any progress on either being able to flatten a PDF, or set the fields as read-only?

@Joshua-IRT
Copy link

Joshua-IRT commented Aug 2, 2019

Rough bit of code if anyone needs to set fields to read-only prior to an update to the module (assumes you imported the whole module as PyPDF2). Works in a similar fashion to the existing updatePageFormFieldValues() method.

class PDFModifier(PyPDF2.PdfFileWriter):
    '''Extends the PyPDF2.PdfFileWriter class and adds functionality missing
    from the PyPDF2 module.'''

    def updatePageFormFieldFlags(self, page, fields, or_existing=True):
        '''
        Update the form field values for a given page from a fields dictionary.
        Copy field flag values from fields to page.

        :param page: Page reference from PDF writer where the annotations
            and field data will be updated.
        :param fields: a Python dictionary of field names (/T) and flag
            values (/Ff); the flag value should be an unsigned 32-bit integer
            (i.e. a number between 0 and 4294967295)
        :param or_existing: if there are existing flags, OR them with the
            new values (default True)
        '''

        # Iterate through pages and update field flag
        for j in range(0, len(page['/Annots'])):
            writer_annot = page['/Annots'][j].getObject()
            for field in fields:
                if writer_annot.get('/T') == field:
                    if or_existing:
                        current_flags = writer_annot.get('/Ff')
                        if current_flags is not None:
                            fields[field] = int(bin(current_flags | fields[field]),2)

                    writer_annot.update({
                        PyPDF2.generic.NameObject("/Ff"): PyPDF2.generic.NumberObject(fields[field])
                    })

@chickendiver
Copy link

+1 for flattening, such as in pdftk!

@techNoSavvy-debug
Copy link

+1 for a method for flattening pdfs

@paulzuradzki
Copy link

paulzuradzki commented Mar 7, 2022

@mstamy2 , @OpenNingia

One thing I noticed with the approach of flattening/making forms read-only by setting the field flag bit to 1: when I try to merge resulting PDFs, only the values from the first document make it to the merged file. I don't think this is expected behavior.

  • pdftk does not seem to have this issue with its approach to flattening.
  • I believe it happens when merging filled-PDFs via PyPDF2 because the fields share the same field name. I'm not really sure on the best way around this beside vaguely trying to emulate pdftk's approach.

@paulzuradzki
Copy link

paulzuradzki commented Mar 7, 2022

Cross-posting this useful recipe by @Redjumpman: #506

Remember to update the form field name if you want to merge multiple documents made from the same template form. Else, the merged PDF result will have identical pages due to each document sharing the same field names.

@MartinThoma MartinThoma added is-feature A feature request workflow-forms From a users perspective, forms is the affected feature/workflow and removed PDF Forms labels Apr 16, 2022
@MartinThoma MartinThoma changed the title Provide a way to flatten pdf forms ENH: Flatten PDF forms Jul 9, 2022
@pubpub-zz
Copy link
Collaborator

PdfWriter.append() should provide you with capability to add pages with data fields.

Can you confirm that this issue can get closed?

@pubpub-zz
Copy link
Collaborator

without feed back I close this issue as fixed. Feel free to provides updates if yuo wan to reopen it.

@rolisz
Copy link
Contributor

rolisz commented Mar 14, 2023

I don't think the original issue is closed: how do you make fields non-editable easily? The use case being taking a PDF with editable forms, filling out the forms and outputing a PDF with non-editable fields.

@pubpub-zz
Copy link
Collaborator

pubpub-zz commented Mar 14, 2023

the read-only flag defined here in the Pdf 1.7 reference (page 676)
image

therefore you have to set the flags. Below an example setting all the fields in readonly:

import pypdf
r = pypdf.PdfReader("input_form.pdf")
for f,v in r.get_fields().items():
  o=v.indirect_reference.get_object()   # this will provide access to the actual PDF dictionary 
  o[NameObject("/Ff")] = NumberObject( o.get("/Ff",0)|1)
w = pypdf.PdfWriter()
w.clone_document_from_reader(r)
w.write("output_form.pdf")

@OpenNingia
Copy link
Author

OpenNingia commented Mar 14, 2023

What you are suggesting is not "flattening" thou. The output pdf will still present data fields (widgets) .
Flattening as pdftk does is replacing the data field with text.

@pubpub-zz
Copy link
Collaborator

pubpub-zz commented Mar 15, 2023

@OpenNingia Can you provide a non-flat PDF file and its flattened version for review?

@MartinThoma MartinThoma added the needs-pdf The issue needs a PDF file to show the problem label Mar 15, 2023
@MartinThoma MartinThoma reopened this Mar 15, 2023
@OpenNingia
Copy link
Author

Multiple pdf merged and flattened:
Ichiro Yasuhigo.pdf

One of the editable source:
sheet_all.pdf

@pubpub-zz
Copy link
Collaborator

The flattening process is quite tough to compute (create XOBject with the good characteristics) modify the content to place them.
I see personnally very limited advantage vs time to implement an for me the readonly alternative could be sufficient ; I will have no time to propose a PR. Any candidate ?

@pubpub-zz pubpub-zz added the soon PRs that are almost ready to be merged, issues that get solved pretty soon label Jun 25, 2023
@pubpub-zz
Copy link
Collaborator

since we have now #1864, flattening should be quite simple

@MartinThoma MartinThoma removed the soon PRs that are almost ready to be merged, issues that get solved pretty soon label Aug 14, 2023
@rohit11544
Copy link

rohit11544 commented Dec 1, 2023

Can someone please provide a simple code snippet here for flattening a pdf?

@matsavage
Copy link

matsavage commented May 15, 2024

I have subclassed the PdfWriter class to be able to flatten forms here, so it can be done.

Would you accept PR for this, and do you have any idea of the interface which would be best for implementation?

I think this would be the easiest option, or there could be something more advanced, where you pass a list to be flattened, but all is the default, but I wouldn’t want to go too far on this.
https://gist.github.com/matsavage/a50d9c541957f276088c341cc84a9e7f

@pubpub-zz
Copy link
Collaborator

@matsavage
your code seems to have some good idea your function should integrate PdfWriter. In order to ease you should fork pypdf and build a branch with your mods : this will ease its merging.

What you should try is to convert the global ["/AP"]["/N"] into an XForm (that way you will not worry about merging the resources, drawing and so on into the page) and just add in the main page content a cm operation to do the translation to the proper rectangle, call the new XForm with Do operator : this should fit with all type of widgets

@matsavage
Copy link

I only did things this way to see if the flattening could be done, to save the effort of setting up the development environment on my machine, this is more the template than the PR

Thanks for the advice, I’ll try and have a look at this some time

@pubpub-zz
Copy link
Collaborator

Looking forward 😊

@nicholas-alonzo
Copy link

Looking forward for this feature!

@matsavage
Copy link

Honestly I haven’t been able to look at this since May, feel free to have your own attempt at implementing it if it’s something you need.

@pubpub-zz
Copy link
Collaborator

At your marks.... get set ... go! 😉😄😄😄

@matsavage
Copy link

At your marks.... get set ... go! 😉😄😄😄

I think it’s the one everyone wants, but no one wants to do

@nicholas-alonzo
Copy link

Honestly I haven’t been able to look at this since May, feel free to have your own attempt at implementing it if it’s something you need.

Darn, I wouldn't even know where to start 🥴

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
is-feature A feature request needs-pdf The issue needs a PDF file to show the problem workflow-forms From a users perspective, forms is the affected feature/workflow
Projects
None yet
Development

No branches or pull requests