Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PdfDocument::bookmarks::iter skips the root bookmark #120

Closed
xVanTuring opened this issue Nov 7, 2023 · 5 comments
Closed

PdfDocument::bookmarks::iter skips the root bookmark #120

xVanTuring opened this issue Nov 7, 2023 · 5 comments
Assignees

Comments

@xVanTuring
Copy link
Contributor

The doc says it starting from the top-level root bookmark. I assume that means including the root(first) bookmark.

Code

use pdfium_render::prelude::*;

pub fn main() -> Result<(), PdfiumError> {
    let bindings = Pdfium::bind_to_library(Pdfium::pdfium_platform_library_name_at_path("./"))
        .or_else(|_| Pdfium::bind_to_system_library())?;

    let pdfidum = Pdfium::new(bindings);
    let document: PdfDocument<'_> = pdfidum.load_pdf_from_file(
        "F:/archive/pdf/NET-Microservices-Architecture-for-Containerized-NET-Applications.pdf",
        None,
    )?;

    let bookmarks = document.bookmarks();
    println!("root: {}", bookmarks.root().unwrap().title().unwrap());
    println!("Iter:");
    for (idx, bookmark) in bookmarks.iter().enumerate() {
        println!("{idx}: {}", bookmark.title().unwrap());
    }
    Ok(())
}

Output

root: Introduction to Containers and Docker
Iter: ## skipped the root bookmark
0: Choosing Between .NET and .NET Framework for Docker Containers
1: Architecting container and microservice-based applications
2: Development process for Docker-based applications
3: Designing and Developing Multi-Container and Microservice-Based .NET Applications
4: Tackle Business Complexity in a Microservice with DDD and CQRS Patterns
5: Implement resilient applications
6: Make secure .NET Microservices and Web Applications
7: .NET Microservices Architecture key takeaways
@xVanTuring
Copy link
Contributor Author

Also iter_all_descendants seems not working like the description(It should iterator all node and those child)

Code

println!("root: {}", root.title().unwrap());
for (idx, bookmark) in root.iter_all_descendants().enumerate() {
    println!("    {idx}: {}", bookmark.title().unwrap());
}

Output

root: Introduction to Containers and Docker
    0: What is Docker?
    1: Docker terminology
    2: Docker containers, images, and registries

But 0: What is Docker have some sub-bookmarks.

bookmark

@ajrcarey ajrcarey self-assigned this Nov 7, 2023
@ajrcarey
Copy link
Owner

ajrcarey commented Nov 7, 2023

Hi @xVanTuring , thank you for reporting the issue. Let's focus on this issue first, since you have a work-around for your other issue. Are you able to provide a non-copyrighted sample document that demonstrates the problem?

@xVanTuring
Copy link
Contributor Author

Hi @xVanTuring , thank you for reporting the issue. Let's focus on this issue first, since you have a work-around for your other issue. Are you able to provide a non-copyrighted sample document that demonstrates the problem?

Bookmark.pdf
Here is a simple pdf I made contains only some bookmarks.

ajrcarey pushed a commit that referenced this issue Nov 9, 2023
@ajrcarey
Copy link
Owner

ajrcarey commented Nov 9, 2023

I agree, the traversal methodology used by the PdfBookmarksIterator is rather peculiar and it gives unexpected results. I have rewritten the iterator to use a standard depth-first graph traversal technique. Using a slightly adjusted version of your sample code:

use pdfium_render::prelude::*;

pub fn main() -> Result<(), PdfiumError> {
    let bindings =
        Pdfium::bind_to_library(Pdfium::pdfium_platform_library_name_at_path("../pdfium/"))
            .or_else(|_| Pdfium::bind_to_system_library())?;

    let pdfidum = Pdfium::new(bindings);
    let document = pdfidum.load_pdf_from_file("Bookmark.pdf", None)?;

    let bookmarks = document.bookmarks();
    println!("root: {}", bookmarks.root().unwrap().title().unwrap());
    println!("Iter root direct children:");
    for (idx, bookmark) in bookmarks.root().unwrap().iter_direct_children().enumerate() {
        println!("{idx}: {}", bookmark.title().unwrap());
    }
    println!("Iter root all descendants:");
    for (idx, bookmark) in bookmarks.root().unwrap().iter_all_descendants().enumerate() {
        println!("{idx}: {}", bookmark.title().unwrap());
    }
    println!("Iter entire tree from root:");
    for (idx, bookmark) in bookmarks.iter().enumerate() {
        println!("{idx}: {}", bookmark.title().unwrap());
    }
    Ok(())
}

and applying it to your sample document, I now get the following output:

root: Chapter 1
Iter root direct children:
0: Chapter 1
1: 1.1
2: 1.2
3: 1.3
Iter root all descendants:
0: Chapter 1
1: 1.1
2: 1.2
3: 1.2.1
4: 1.2.2
5: 1.2.2.1
6: 1.2.2.2
7: 1.3
Iter entire tree from root:
0: Chapter 1
1: 1.1
2: 1.2
3: 1.2.1
4: 1.2.2
5: 1.2.2.1
6: 1.2.2.2
7: 1.3
8: Chapter 2
9: 2.1
10: 2.2
11: 2.2.1
12: 2.2.2
13: 2.2.2.1
14: 2.2.2.2
15: 2.3
16: 2.3.1
17: 2.3.2

which looks more like the expected result.

@ajrcarey
Copy link
Owner

ajrcarey commented Nov 10, 2023

Extended sample code to check siblings as well:

use pdfium_render::prelude::*;

pub fn main() -> Result<(), PdfiumError> {
    let bindings =
        Pdfium::bind_to_library(Pdfium::pdfium_platform_library_name_at_path("../pdfium/"))
            .or_else(|_| Pdfium::bind_to_system_library())?;

    let pdfidum = Pdfium::new(bindings);
    let document = pdfidum.load_pdf_from_file("Bookmark.pdf", None)?;

    let bookmarks = document.bookmarks();
    println!("root: {}", bookmarks.root().unwrap().title().unwrap());
    println!("Iter root siblings:");
    for (idx, bookmark) in bookmarks.root().unwrap().iter_siblings().enumerate() {
        println!("{idx}: {}", bookmark.title().unwrap());
    }
    println!("Iter root direct children:");
    for (idx, bookmark) in bookmarks.root().unwrap().iter_direct_children().enumerate() {
        println!("{idx}: {}", bookmark.title().unwrap());
    }
    println!("Iter root all descendants:");
    for (idx, bookmark) in bookmarks.root().unwrap().iter_all_descendants().enumerate() {
        println!("{idx}: {}", bookmark.title().unwrap());
    }
    println!("Iter entire tree from root:");
    for (idx, bookmark) in bookmarks.iter().enumerate() {
        println!("{idx}: {}", bookmark.title().unwrap());
    }
    Ok(())
}

Made a small change to PdfBookmarksIterator to ensure a skip sibling is never yielded as part of iteration. This avoids a bookmark being included in its own list of siblings. The sample code output is now:

root: Chapter 1
Iter root siblings:
0: Chapter 2
Iter root direct children:
0: Chapter 1
1: 1.1
2: 1.2
3: 1.3
Iter root all descendants:
0: Chapter 1
1: 1.1
2: 1.2
3: 1.2.1
4: 1.2.2
5: 1.2.2.1
6: 1.2.2.2
7: 1.3
Iter entire tree from root:
0: Chapter 1
1: 1.1
2: 1.2
3: 1.2.1
4: 1.2.2
5: 1.2.2.1
6: 1.2.2.2
7: 1.3
8: Chapter 2
9: 2.1
10: 2.2
11: 2.2.1
12: 2.2.2
13: 2.2.2.1
14: 2.2.2.2
15: 2.3
16: 2.3.1
17: 2.3.2

Updated README. Ready to release as part of 0.8.16.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants