Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fail to identify fonts (name and size) #629

Closed
mbideau-atreal opened this issue Aug 7, 2023 · 1 comment · Fixed by #630
Closed

Fail to identify fonts (name and size) #629

mbideau-atreal opened this issue Aug 7, 2023 · 1 comment · Fixed by #630

Comments

@mbideau-atreal
Copy link
Contributor

  • PHP Version: 7.4.5 (built: Apr 19 2020 08:43:53)
  • PDFParser Version: c974994 2023-08-03

Description:

The fonts names and size are not found (always returns fontid=-1 and fontsize=1).

PDF input

See attached PDF sample
20230803-160138-lettretype-arrete.pdf

Expected output & actual output

Expected output:

Fonts:
 - [3_0] Helvetica (Type1)
 - [4_0] Courier-Bold (Type1)
 - [5_0] Times-Roman (Type1)
 - [6_0] Times-Bold (Type1)
 - [7_0] AAAAAF+ArialMT (Type0)
 - [9_0] Helvetica-Oblique (Type1)
 - [14_0] AAAAAF+ArialMT (CIDFontType2)
Items:
- font: Courier-Bold (Type1) size: 40.000000  'IL - DOCUMENT DE TRAVAIL - DOCUMENT'
- font: Courier-Bold (Type1) size: 40.000000  'DE TRAVAIL - DOCUMENT DE TRAVAIL - DOCUMENT DE'
- font: Times-Bold (Type1) size: 10.000000  'Déclaration préalable'
- font: Times-Bold (Type1) size: 10.000000  'ARRETE'
- font: Times-Bold (Type1) size: 10.000000  '______________________________________________________________________'
- font: Times-Bold (Type1) size: 10.000000  'Dossier numéro         DP 020001 23 00001P0  déposé le 21/06/2023'
- font: Times-Bold (Type1) size: 10.000000  'par                              Monsieur a'
- font: Times-Bold (Type1) size: 10.000000  'Correspondant :        Monsieur a'
- font: Times-Bold (Type1) size: 10.000000  '                                     '
- font: Times-Bold (Type1) size: 10.000000  '                                      '
- font: Times-Bold (Type1) size: 10.000000  '                                       France'
- font: Times-Bold (Type1) size: 10.000000  'sur le terrain                    '
- font: Times-Bold (Type1) size: 10.000000  'arrondissement          '
- font: Times-Bold (Type1) size: 10.000000  '______________________________________________________________________'
- font: Times-Bold (Type1) size: 10.000000  'Dossier suivi par   -  -  - '
- font: Times-Roman (Type1) size: 10.000000  'Le Maire,'
- font: Times-Roman (Type1) size: 10.000000  'Vu la demande de '
- font: Times-Bold (Type1) size: 10.000000  'Déclaration préalable'
- font: Times-Roman (Type1) size: 10.000000  ' susvisée,'
- font: Times-Roman (Type1) size: 10.000000  'Vu le code de l'urbanisme et notamment ses articles L421-1 et suivants, R421-1 et suivants'
- font: Times-Roman (Type1) size: 10.000000  'Vu les pièces du dossier'
- font: Times-Roman (Type1) size: 10.000000  '                                             A R R E T E'
- font: Times-Roman (Type1) size: 10.000000  '                                                                                                              &ville , le  03/08/2023'
- font: Times-Roman (Type1) size: 10.000000  '                                                                                                              le Maire &delaville'
- font: Times-Roman (Type1) size: 10.000000  '                                                                                                              '
- font: Times-Bold (Type1) size: 10.000000  '&nom'
- font: Times-Roman (Type1) size: 10.000000  '                                                                                                              '
- font: Helvetica (Type1) size: 10.000000  'Entête signature'
- font: AAAAAF+ArialMT (Type0) size: 15.000000  '                                                                  {{signature_placeholder}}'
- font: AAAAAF+ArialMT (Type0) size: 15.000000  ' '
- font: AAAAAF+ArialMT (Type0) size: 15.000000  ' '
- font: AAAAAF+ArialMT (Type0) size: 15.000000  ' '
- font: AAAAAF+ArialMT (Type0) size: 15.000000  ' '
- font: AAAAAF+ArialMT (Type0) size: 8.000000  ' '
- font: Helvetica (Type1) size: 10.000000  '                                                                                                   Pied de signature'
- font: Helvetica (Type1) size: 1.000000  'Powered by TCPDF (www.tcpdf.org)'
- font: Helvetica-Oblique (Type1) size: 8.000000  'Page 1/1'

Actual output:

Fonts:
 - [3_0] Helvetica (Type1)
 - [4_0] Courier-Bold (Type1)
 - [5_0] Times-Roman (Type1)
 - [6_0] Times-Bold (Type1)
 - [7_0] AAAAAF+ArialMT (Type0)
 - [9_0] Helvetica-Oblique (Type1)
 - [14_0] AAAAAF+ArialMT (CIDFontType2)
Items:
- font: -1 size: 1  'IL - DOCUMENT DE TRAVAIL - DOCUMENT'
- font: -1 size: 1  'DE TRAVAIL - DOCUMENT DE TRAVAIL - DOCUMENT DE'
- font: -1 size: 1  'Déclaration préalable'
- font: -1 size: 1  'ARRETE'
- font: -1 size: 1  '______________________________________________________________________'
- font: -1 size: 1  'Dossier numéro         DP 020001 23 00001P0  déposé le 21/06/2023'
- font: -1 size: 1  'par                              Monsieur a'
- font: -1 size: 1  'Correspondant :        Monsieur a'
- font: -1 size: 1  '                                     '
- font: -1 size: 1  '                                      '
- font: -1 size: 1  '                                       France'
- font: -1 size: 1  'sur le terrain                    '
- font: -1 size: 1  'arrondissement          '
- font: -1 size: 1  '______________________________________________________________________'
- font: -1 size: 1  'Dossier suivi par   -  -  - '
- font: -1 size: 1  'Le Maire,'
- font: -1 size: 1  'Vu la demande de '
- font: -1 size: 1  'Déclaration préalable'
- font: -1 size: 1  ' susvisée,'
- font: -1 size: 1  'Vu le code de l'urbanisme et notamment ses articles L421-1 et suivants, R421-1 et suivants'
- font: -1 size: 1  'Vu les pièces du dossier'
- font: -1 size: 1  '                                             A R R E T E'
- font: -1 size: 1  '                                                                                                              &ville , le  03/08/2023'
- font: -1 size: 1  '                                                                                                              le Maire &delaville'
- font: -1 size: 1  '                                                                                                              '
- font: -1 size: 1  '&nom'
- font: -1 size: 1  '                                                                                                              '
- font: -1 size: 1  'Entête signature'
- font: -1 size: 1  '                                                                  {{signature_placeholder}}'
- font: -1 size: 1  ' '
- font: -1 size: 1  ' '
- font: -1 size: 1  ' '
- font: -1 size: 1  ' '
- font: -1 size: 1  ' '
- font: -1 size: 1  '                                                                                                   Pied de signature'
- font: -1 size: 1  'Powered by TCPDF (www.tcpdf.org)'
- font: -1 size: 1  'Page 1/1'

Code

<?php
  
require_once __DIR__.'/pdfparser/alt_autoload.php-dist';

$config = new \Smalot\PdfParser\Config();
$config->setDataTmFontInfoHasToBeIncluded(true);
$parser = new \Smalot\PdfParser\Parser(array(), $config);

$pdf = $parser->parseFile('/tmp/doc.pdf');

$pages = $pdf->getPages();
$lastpage = end($pages);
$data = $lastpage->getDataTm();

$pdf_fonts = $pdf->getFonts();
echo "Fonts:".PHP_EOL;
foreach($pdf_fonts as $index => $pdf_font) {
    echo " - [$index] ".$pdf_font->getName()." (".$pdf_font->getType().")".PHP_EOL;
}

echo "Items:".PHP_EOL;
foreach($data as $item) {
    if(is_array($item)) {
        if (isset($pdf_fonts[$item[2]])) {
            echo "- font: ".$pdf_fonts[$item[2]]->getName()." (".$pdf_font->getType().")"." size: ".$item[3]."  '".$item[1]."'".PHP_EOL;
        }
        elseif(!empty($font = $lastpage->getFont($item[2]))) {
            echo "- font: ".$font->getName()." (".$font->getType().")"." size: ".$item[3]."  '".$item[1]."'".PHP_EOL;
        }
        else {
            echo "- font: ".$item[2]." size: ".$item[3]."  '".$item[1]."'".PHP_EOL;
        }   
    }   
}
@k00ni
Copy link
Collaborator

k00ni commented Aug 7, 2023

Thanks for reporting.

@shtayerc based on Git's blame output these lines were added by you in #516. Can you please check the error? #630 seems to fix it.

@k00ni k00ni linked a pull request Aug 11, 2023 that will close this issue
k00ni added a commit that referenced this issue Aug 11, 2023
* fix: Fail to identify fonts (name and size) #629

* docs(fix): in sample code, get font from the Page rather than the PDF list, as font's IDs are different

* added two tests to prove changes in #630 are working

---------

Co-authored-by: Konrad Abicht <hi@inspirito.de>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants