You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi,
I am trying to recover text from reference section from a pdf article, but the text block sometimes break up into multiple lines. A somewhat related issue might be #4629
Incorrect text splitting:
Correct text splitting:
I would like to recover text without extra line delimiters. Below is code I use to get text from pdf:
// PDFJS.version = '1.0.85';
// PDFJS.build = '094d0e2';
context.getPDFText = function pico_getPDFText() {
// Variable to hold PDF text data
var pdfTextContent = '';
// pdfObj: global reference to PDF file object
// Get each page and extract text content of the page
for (var p = 1; p <= context.pdfObj.numPages; p++) {
// Asynchronous processing
context.pdfObj.getPage(p).then(function(res) {
var content = res.getTextContent().then(function(textContent) {
var promise = new Promise(function(resolve, reject) {
for (var i = 0; i < textContent.items.length; i++) {
var line = textContent.items[i].str.trim().toLowerCase();
if ((line.indexOf('reference') > -1 || line.indexOf('bibliograph') > -1) &&
(line === 'references' || line === 'bibliography')) {
// print references / bibliographic citations
console.log(line);
}
}
});
});
});
}
}
Thank you.
Sid
The text was updated successfully, but these errors were encountered:
Closing as incomplete, please provide a link to the PDF file (or attach it to the issue, since GitHub now supports that) in order to re-open the issue.
Hi,
I am trying to recover text from reference section from a pdf article, but the text block sometimes break up into multiple lines. A somewhat related issue might be #4629
Incorrect text splitting:
Correct text splitting:
I would like to recover text without extra line delimiters. Below is code I use to get text from pdf:
Thank you.
Sid
The text was updated successfully, but these errors were encountered: