Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

extract_tables does not extract properly when there are highlighted cells #111

Open
2 tasks
dsobo opened this issue Oct 31, 2019 · 0 comments
Open
2 tasks

Comments

@dsobo
Copy link

dsobo commented Oct 31, 2019

Please specify whether your issue is about:

If I try to run extract_tables on a PDF that has a table with highlighted cells then only the highlighted cells will end up in the final table. I am not quite sure how to do a reproducible example (i tried based on the how to) but I attached the PDF that does not work with extract_tables().

When I run extract_text on this same PDF all cell values end up in the resulting string.

  • [x ] a possible bug
  • a question about package functionality
  • a suggested code or documentation change, improvement to the code, or feature request

If you are reporting (1) a bug or (2) a question about code, please supply:

  • ensure that you can install and successfully load rJava
  • a fully reproducible example using a publicly available dataset (or provide your data)
  • if an error is occurring, include the output of traceback() run immediately after the error occurs
  • the output of sessionInfo()

Put your code here:

## rJava loads successfully
# install.packages("rJava")
library("rJava")

## load package
library("tabulizer")
[pdf_test2.pdf](https://github.com/ropensci/tabulizer/files/3795083/pdf_test2.pdf)

## code goes here
tabulizer::extract_tables("pdf_test2.pdf")

## session info for your system
sessionInfo()
![sessionInfo](https://user-images.githubusercontent.com/37000302/67970975-f21d6b00-fbd9-11e9-9647-a1aca39f0e6b.png)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant