-
Notifications
You must be signed in to change notification settings - Fork 17
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Need coordinate conversion help #228
Comments
I just solved my own issue, while researching Apache PDFBox: https://stackoverflow.com/a/54045861. I think you sort of alluded to it in other support tickets, but I had to convert from x,y in the bottom left hand corner of the document to x,y in the top left. I'm going to leave it open only as I think the docs would benefit from a brief section explaining how to convert "PDF Canvas Units" to typical x/y coordinate space. Feel free to close if you disagree! |
Hi, nice to hear you essentially figured out already. Yes, in PDF, the coordinate system's origin is typically the bottom left corner (unlike top left for bitmaps), though in theory the PDF spec allows the coordinate system to be laid out between any opposite corners (I think, anyway). As you say, comments #214 (comment) and #214 (comment) kind of discuss that already. As this seems to be a common problem, I suppose you're right the docs would deserve a section on coordinate conversion. Maybe even some support model around |
get_charbox
, bottom & top values seem to be inaccurate
Thanks! Re-reading those comments it's clear in retrospect, I just didn't grasp it the first time. I implemented it just using python and not considering rotation. For completeness, it seems like you are suggesting that this will work most of the time, but not all. Is rotation the only additional case to consider? Or is the easiest solution just to use the raw APIs for each coordinate pair in the bounding box since it will handle it reliably? |
Yes, that's what I meant. |
Got it! I'm very unfamiliar with ctypes, but based on the method signature it seems to suggest that the method I would be using
Am I understanding this incorrectly? If so, the logic for the method in Is it possible currently to easily call the raw methods on a page object like |
Ooh, yes. If you're not actually targeting a bitmap to draw on, that sounds like a problem.
Sadly the |
OK thank you! This has been incredibly helpful -- really appreciate the pointers. Yes I think I'm doing something a bit different than others here (but it does work!) GetDisplayMatrix is actually really simple as well so we've solved my issue for now -- https://pdfium.googlesource.com/pdfium.git/+/798e18f5e5cfb672c7f3186f6358b84c5ff7785b/core/fpdfapi/page/cpdf_page.cpp |
That's good to hear, thanks! However, I'm still left to think what I should do with pypdfium2 now. |
I am rendering a "highlight" layer in a web interface to highlight specific text in a displayed pdf. The rendering engine uses percentage values to determine where to place items so I need to use the right coordinate space. I'm fairly new to all of this so honestly not sure if my suggestion is too narrow....but as far as what would be helpful to my use-case, if you had a python API implementation of |
I see, thank you for elaborating. Maybe, as an alternative to a python re-implementation, we could ask pdfium to add a float equivalent of |
That would work perfectly! |
Commit a379ecc (in the devel branch) adds a helper around The quest for float coordinate normalization still stands. |
Thanks for the update! |
Our docs often mention coordinate order, such as |
I think I'll convert this to a discussion, because I figured I don't think it a good idea to implement coordinate conversion from scratch in pypdfium2 (nor would I have the time to do so). Especially given there is However, to any users affected, feel free to file a feature request at pdfium for float coordinate normalization (or perhaps even contribute a patch yourself). |
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
Firstly, thanks so much for creating this great library @mara004
I read #204 & especially #214 but neither seem to answer my question. I'm looking to get bounding boxes for text as percentages of the canvas found via search. I execute the search using the
textpage.search
method to get the starting index. Then I loop through and useget_charbox
with the loose option to build my bounding boxes as seen in the snippet below:This almost works, but I'm noticing two broken behaviors that seem potentially related:
top
value seems to be greater than thebottom
value, which I think doesn't make sense in a coordinate system?As a way to compare and check my logic here, I opened the pdf in Mac Preview and drew a rectangle in approximately the same area of the PDF that I was looking to extract. Here, the left/right values were again accurate, but top & bottom were off by ~40-60 canvas units.
Do you have any recommendations here? Am I using the APIs incorrectly? Apologies if this was already answered elsewhere or is included in the documentation.
Thanks so much for taking a look. If you need me to provide a full working example with an attached pdf I can do that as well, just wanted to see if it was something obvious first.
The text was updated successfully, but these errors were encountered: