Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Object reference not set to an instance of an object." #874

Closed
rklec opened this issue Aug 6, 2024 · 6 comments
Closed

"Object reference not set to an instance of an object." #874

rklec opened this issue Aug 6, 2024 · 6 comments
Labels
bug document-reading Related to reading documents

Comments

@rklec
Copy link

rklec commented Aug 6, 2024

STR

PdfDocument.Open(pdfBytes) with the some PDF file. As it contains sensitive data, i unfortunately cannot attach it here and I was unfortunately unable to create a minimal example, but some hints:

  • it was a quite complex one with images and some header etc. (not much text though)
  • contains some comments shown on hover (but not with the Acrobat icon it usually shows)
  • it was last edited with Firefox and some notes and drawings were added
  • it was last edited at 2023-08/2023-09

Much like this and I tried to reproduce it with this example, but it does not work:
grafik

Thus, i only attach this image, because with the PDF I've created it is not reproducible.

What happens

System.NullReferenceException
  HResult=0x80004003
  Nachricht = Object reference not set to an instance of an object.
  Quelle = UglyToad.PdfPig
  Stapelüberwachung:
   bei UglyToad.PdfPig.PdfExtensions.TryGet[T](DictionaryToken dictionary, NameToken name, IPdfTokenScanner tokenScanner, T& token)

Apparently, this is the line of failure:

if (!dictionary.TryGet(name, out var t) || !(t is T typedToken))

What should happen

At least PdfDocumentFormatException if you consider the file invalid.

However, IMHO, the file is valid an can be opened with both Adobe Acrobat Reader and Firefox. Thus, actually parsing it would be good.

Also, when opening it with Adobe Acrobat Reader and re-saving it, it can be parsed!

System

PDFPig 0.1.8
reproducible on Windows 10

Interne Referenz: 2118

@BobLd
Copy link
Collaborator

BobLd commented Aug 6, 2024

Hi @rklec it's going to be complicated to help you without the document...

Can you try with the latest version of PdfPig (pre-release 1.9.0, available via Nuget packages)?

@jmjohnson05
Copy link

I'm running into this issue as well with the attached document. If I set SkipMissingFonts to true, the above exceptions gets thrown. When that option is not specified, I get the following exception instead:
ErcotFacts.pdf

   at UglyToad.PdfPig.Util.DictionaryTokenExtensions.GetNameOrDefault(DictionaryToken dictionaryToken, NameToken name)
   at UglyToad.PdfPig.PdfFonts.Parser.Handlers.Type0FontHandler.ParseDescendant(DictionaryToken dictionary)
   at UglyToad.PdfPig.PdfFonts.Parser.Handlers.Type0FontHandler.Generate(DictionaryToken dictionary)
   at UglyToad.PdfPig.PdfFonts.FontFactory.Get(DictionaryToken dictionary)
   at UglyToad.PdfPig.Content.ResourceStore.LoadFontDictionary(DictionaryToken fontDictionary)
   at UglyToad.PdfPig.Content.ResourceStore.LoadResourceDictionary(DictionaryToken resourceDictionary)
   at UglyToad.PdfPig.Content.BasePageFactory`1.Create(Int32 number, DictionaryToken dictionary, PageTreeMembers pageTreeMembers, NamedDestinations namedDestinations)
   at UglyToad.PdfPig.Content.Pages.GetPage[TPage](IPageFactory`1 pageFactory, Int32 pageNumber, NamedDestinations namedDestinations, ParsingOptions parsingOptions)
   at UglyToad.PdfPig.Content.Pages.GetPage(Int32 pageNumber, NamedDestinations namedDestinations, ParsingOptions parsingOptions)
   at UglyToad.PdfPig.PdfDocument.GetPage(Int32 pageNumber)
   at UglyToad.PdfPig.PdfDocument.<GetPages>d__34.MoveNext()
   at System.Collections.Generic.LargeArrayBuilder`1.AddRange(IEnumerable`1 items)
   at System.Collections.Generic.EnumerableHelpers.ToArray[T](IEnumerable`1 source)
   at System.Linq.SystemCore_EnumerableDebugView`1.get_Items()

Any help with a fix for this would be greatly appreciated!

@rklec
Copy link
Author

rklec commented Aug 22, 2024

The linked ErcotFacts.pdf does not throw for me, surprisingly, though. (Encdoded and decoded in a mail, though)

@jmjohnson05
Copy link

Hi @rklec should have clarified, but the exception I'm seeing occurs when calling the GetPages() method.

For example:

using PdfDocument? document = PdfDocument.Open( stream );

if ( document is null )
{
    _logger.LogWarning( "Failed to open PDF document" );

    return result;
}

foreach ( var pg in document.GetPages() ) 
{
    _logger.LogInformation( "Processing page {PageNumber}", pg.Number );
}

BobLd added a commit to BobLd/PdfPig that referenced this issue Aug 22, 2024
BobLd added a commit to BobLd/PdfPig that referenced this issue Aug 22, 2024
@BobLd
Copy link
Collaborator

BobLd commented Aug 22, 2024

thanks for sharing the document, I've created a PR that fixes the issue when SkipMissingFonts = true

BobLd added a commit to BobLd/PdfPig that referenced this issue Aug 22, 2024
@jmjohnson05
Copy link

Much appreciated @BobLd

@EliotJones EliotJones added bug document-reading Related to reading documents labels Sep 29, 2024
@BobLd BobLd closed this as completed in 5c168f9 Sep 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug document-reading Related to reading documents
Projects
None yet
Development

No branches or pull requests

4 participants