Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exit Code 15 - An exception occurred while executing the pipeline #1017

Closed
eduardodataeasy opened this issue Sep 22, 2022 · 6 comments
Closed

Comments

@eduardodataeasy
Copy link

When I try to OCR a specific file it shows the following error log:

ocrmypdf --force-ocr --optimize 0 --fast-web-view 0 --output-type pdf -l por -v 1 --deskew --remove-background --clean "D:\applications\dotNet\EasyMidia\TESTE_OCR\IN\PROCESSADO_NUANCE_317311740_1_1.PDF" "D:\applications\dotNet\EasyMidia\TESTE_OCR\OUT\REPROCESSADO_NUANCE_317311740_1_2_teste.PDF"
[WinError 2] O sistema não pode encontrar o arquivo especificado
[WinError 2] O sistema não pode encontrar o arquivo especificado
ocrmypdf 12.0.1
Running: ['C:\Tesseract-OCR\tesseract.EXE', '--list-langs']
stdout/stderr = List of available languages (166):
afr
amh
ara
asm
aze
aze_cyrl
bel
ben
bod
bos
bre
bul
cat
ceb
ces
chi_sim
chi_sim_vert
chi_tra
chi_tra_vert
chr
cos
cym
dan
dan_frak
deu
deu_frak
div
dzo
ell
eng
enm
epo
equ
est
eus
fao
fas
fil
fin
fra
frk
frm
fry
gla
gle
glg
grc
guj
hat
heb
hin
hrv
hun
hye
iku
ind
isl
ita
ita_old
jav
jpn
jpn_vert
kan
kat
kat_old
kaz
khm
kir
kmr
kor
kor_vert
lao
lat
lav
lit
ltz
mal
mar
mkd
mlt
mon
mri
msa
mya
nep
nld
nor
oci
ori
osd
pan
pol
por
pus
que
ron
rus
san
script/Arabic
script/Armenian
script/Bengali
script/Canadian_Aboriginal
script/Cherokee
script/Cyrillic
script/Devanagari
script/Ethiopic
script/Fraktur
script/Georgian
script/Greek
script/Gujarati
script/Gurmukhi
script/HanS
script/HanS_vert
script/HanT
script/HanT_vert
script/Hangul
script/Hangul_vert
script/Hebrew
script/Japanese
script/Japanese_vert
script/Kannada
script/Khmer
script/Lao
script/Latin
script/Malayalam
script/Myanmar
script/Oriya
script/Sinhala
script/Syriac
script/Tamil
script/Telugu
script/Thaana
script/Thai
script/Tibetan
script/Vietnamese
sin
slk
slk_frak
slv
snd
spa
spa_old
sqi
srp
srp_latn
sun
swa
swe
syr
tam
tat
tel
tgk
tgl
tha
tir
ton
tur
uig
ukr
urd
uzb
uzb_cyrl
vie
yid
yor

Running: ['C:\unpaper\unpaper.EXE', '--version']
Found unpaper 6.2
Running: ['C:\Tesseract-OCR\tesseract.EXE', '--version']
Found tesseract 5.0.0-alpha.20210506
Running: ['C:\gs9.54.0\bin\gswin64c.EXE', '--version']
Found gs 9.54.0
Scanning contents: 0%| | 0/6 [00:00<?, ?page/s][WinError 2] O sistema não pode encontrar o arquivo especificado
[WinError 2] O sistema não pode encontrar o arquivo especificado
[WinError 2] O sistema não pode encontrar o arquivo especificado
[WinError 2] O sistema não pode encontrar o arquivo especificado
Scanning contents: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 6/6 [00:00<00:00, 6.55page/s]
Using Tesseract OpenMP thread limit 1
Start processing 6 pages concurrently
OCR: 0%| | 0.0/6.0 [00:00<?, ?page/s][WinError 2] O sistema não pode encontrar o arquivo especificado
[WinError 2] O sistema não pode encontrar o arquivo especificado
[WinError 2] O sistema não pode encontrar o arquivo especificado
[WinError 2] O sistema não pode encontrar o arquivo especificado
[WinError 2] O sistema não pode encontrar o arquivo especificado
[WinError 2] O sistema não pode encontrar o arquivo especificado
[WinError 2] O sistema não pode encontrar o arquivo especificado
[WinError 2] O sistema não pode encontrar o arquivo especificado
[WinError 2] O sistema não pode encontrar o arquivo especificado
[WinError 2] O sistema não pode encontrar o arquivo especificado
[WinError 2] O sistema não pode encontrar o arquivo especificado
[WinError 2] O sistema não pode encontrar o arquivo especificado
2 Rasterize with pnggray, rotation 0
3 Rasterize with pngmono, rotation 0
1 Rasterize with png16m, rotation 0
4 Rasterize with pngmono, rotation 0
2 Running: ['C:\gs9.54.0\bin\gswin64c.EXE', '-dQUIET', '-dSAFER', '-dBATCH', '-dNOPAUSE', '-sDEVICE=pnggray', '-dFirstPage=2', '-dLastPage=2', '-r99.943004x99.943004', '-o', '-', '-sstdout=%stderr', '-dAutoRotatePages=/None', '-f', 'C:\Users\eduar\AppData\Local\Temp\ocrmypdf.io.ddmqbmck\origin.pdf']
3 Running: ['C:\gs9.54.0\bin\gswin64c.EXE', '-dQUIET', '-dSAFER', '-dBATCH', '-dNOPAUSE', '-sDEVICE=pngmono', '-dFirstPage=3', '-dLastPage=3', '-r300.003562x300.003562', '-o', '-', '-sstdout=%stderr', '-dAutoRotatePages=/None', '-f', 'C:\Users\eduar\AppData\Local\Temp\ocrmypdf.io.ddmqbmck\origin.pdf']
1 Running: ['C:\gs9.54.0\bin\gswin64c.EXE', '-dQUIET', '-dSAFER', '-dBATCH', '-dNOPAUSE', '-sDEVICE=png16m', '-dFirstPage=1', '-dLastPage=1', '-r300.000000x300.000000', '-o', '-', '-sstdout=%stderr', '-dAutoRotatePages=/None', '-f', 'C:\Users\eduar\AppData\Local\Temp\ocrmypdf.io.ddmqbmck\origin.pdf']
4 Running: ['C:\gs9.54.0\bin\gswin64c.EXE', '-dQUIET', '-dSAFER', '-dBATCH', '-dNOPAUSE', '-sDEVICE=pngmono', '-dFirstPage=4', '-dLastPage=4', '-r300.003562x300.003562', '-o', '-', '-sstdout=%stderr', '-dAutoRotatePages=/None', '-f', 'C:\Users\eduar\AppData\Local\Temp\ocrmypdf.io.ddmqbmck\origin.pdf']
5 Rasterize with pngmono, rotation 0
6 Rasterize with pnggray, rotation 0
5 Running: ['C:\gs9.54.0\bin\gswin64c.EXE', '-dQUIET', '-dSAFER', '-dBATCH', '-dNOPAUSE', '-sDEVICE=pngmono', '-dFirstPage=5', '-dLastPage=5', '-r300.003562x300.003562', '-o', '-', '-sstdout=%stderr', '-dAutoRotatePages=/None', '-f', 'C:\Users\eduar\AppData\Local\Temp\ocrmypdf.io.ddmqbmck\origin.pdf']
6 Running: ['C:\gs9.54.0\bin\gswin64c.EXE', '-dQUIET', '-dSAFER', '-dBATCH', '-dNOPAUSE', '-sDEVICE=pnggray', '-dFirstPage=6', '-dLastPage=6', '-r99.943004x99.943004', '-o', '-', '-sstdout=%stderr', '-dAutoRotatePages=/None', '-f', 'C:\Users\eduar\AppData\Local\Temp\ocrmypdf.io.ddmqbmck\origin.pdf']
3 STREAM b'IHDR' 16 13
3 STREAM b'iCCP' 41 2296
3 iCCP profile name b'default_gray.icc'
3 Compression method 0
3 STREAM b'pHYs' 2349 9
3 STREAM b'tEXt' 2370 31
3 STREAM b'IDAT' 2413 8192
3 Rotating output by 0
2 STREAM b'IHDR' 16 13
2 STREAM b'iCCP' 41 2296
2 iCCP profile name b'default_gray.icc'
2 Compression method 0
2 STREAM b'pHYs' 2349 9
2 STREAM b'tEXt' 2370 31
2 STREAM b'IDAT' 2413 8192
2 Rotating output by 0
4 STREAM b'IHDR' 16 13
4 STREAM b'iCCP' 41 2296
4 iCCP profile name b'default_gray.icc'
4 Compression method 0
4 STREAM b'pHYs' 2349 9
4 STREAM b'tEXt' 2370 31
4 STREAM b'IDAT' 2413 8192
4 Rotating output by 0
6 STREAM b'IHDR' 16 13
6 STREAM b'iCCP' 41 2296
6 iCCP profile name b'default_gray.icc'
6 Compression method 0
6 STREAM b'pHYs' 2349 9
6 STREAM b'tEXt' 2370 31
6 STREAM b'IDAT' 2413 8192
6 Rotating output by 0
5 STREAM b'IHDR' 16 13
5 STREAM b'iCCP' 41 2296
5 iCCP profile name b'default_gray.icc'
5 Compression method 0
5 STREAM b'pHYs' 2349 9
5 STREAM b'tEXt' 2370 31
5 STREAM b'IDAT' 2413 8192
5 Rotating output by 0
3 background removal skipped on mono page
3 background removal skipped on mono page
4 background removal skipped on mono page
5 background removal skipped on mono page
3 STREAM b'IHDR' 16 13
3 STREAM b'pHYs' 41 9
3 STREAM b'IDAT' 62 8192
4 background removal skipped on mono page
3 Running: ['C:\unpaper\unpaper.EXE', '-v', '--dpi', '300.003562', '--layout', 'none', '--mask-scan-size', '100', '--no-border-align', '--no-mask-center', '--no-grayfilter', '--no-blackfilter', '--no-deskew', 'C:\Users\eduar\AppData\Local\Temp\tmpubwxtqst\input.pnm', 'C:\Users\eduar\AppData\Local\Temp\tmpubwxtqst\output.pbm']
5 background removal skipped on mono page
4 STREAM b'IHDR' 16 13
4 STREAM b'pHYs' 41 9
4 STREAM b'IDAT' 62 8192
6 STREAM b'IHDR' 16 13
6 STREAM b'pHYs' 41 9
6 STREAM b'IDAT' 62 8192
5 STREAM b'IHDR' 16 13
5 STREAM b'pHYs' 41 9
5 STREAM b'IDAT' 62 8192
4 Running: ['C:\unpaper\unpaper.EXE', '-v', '--dpi', '300.003562', '--layout', 'none', '--mask-scan-size', '100', '--no-border-align', '--no-mask-center', '--no-grayfilter', '--no-blackfilter', '--no-deskew', 'C:\Users\eduar\AppData\Local\Temp\tmpl5ohj40q\input.pnm', 'C:\Users\eduar\AppData\Local\Temp\tmpl5ohj40q\output.pbm']
6 Running: ['C:\unpaper\unpaper.EXE', '-v', '--dpi', '99.943004', '--layout', 'none', '--mask-scan-size', '100', '--no-border-align', '--no-mask-center', '--no-grayfilter', '--no-blackfilter', '--no-deskew', 'C:\Users\eduar\AppData\Local\Temp\tmpevvec3s4\input.pnm', 'C:\Users\eduar\AppData\Local\Temp\tmpevvec3s4\output.pgm']
2 STREAM b'IHDR' 16 13
2 STREAM b'pHYs' 41 9
2 STREAM b'IDAT' 62 8192
5 Running: ['C:\unpaper\unpaper.EXE', '-v', '--dpi', '300.003562', '--layout', 'none', '--mask-scan-size', '100', '--no-border-align', '--no-mask-center', '--no-grayfilter', '--no-blackfilter', '--no-deskew', 'C:\Users\eduar\AppData\Local\Temp\tmpbh3g3se7\input.pnm', 'C:\Users\eduar\AppData\Local\Temp\tmpbh3g3se7\output.pbm']
2 Running: ['C:\unpaper\unpaper.EXE', '-v', '--dpi', '99.943004', '--layout', 'none', '--mask-scan-size', '100', '--no-border-align', '--no-mask-center', '--no-grayfilter', '--no-blackfilter', '--no-deskew', 'C:\Users\eduar\AppData\Local\Temp\tmp4i1yrr38\input.pnm', 'C:\Users\eduar\AppData\Local\Temp\tmp4i1yrr38\output.pgm']
6 stdout/stderr = unpaper 6.2
License GPLv2: GNU GPL version 2.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.


Processing sheet #1: C:\Users\eduar\AppData\Local\Temp\tmpevvec3s4\input.pnm -> C:\Users\eduar\AppData\Local\Temp\tmpevvec3s4\output.pgm
input-file for sheet 1: C:\Users\eduar\AppData\Local\Temp\tmpevvec3s4\input.pnm
output-file for sheet 1: C:\Users\eduar\AppData\Local\Temp\tmpevvec3s4\output.pgm
sheet size: 826x1169
...
noise-filter ... deleted 107 clusters.
blur-filter... deleted 0 pixels.
writing output.

OCR: 0%| | 0.0/6.0 [00:02<?, ?page/s]
An exception occurred while executing the pipeline
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "c:\python39\lib\shutil.py", line 616, in _rmtree_unsafe
os.unlink(fullname)
PermissionError: [WinError 32] O arquivo já está sendo usado por outro processo: 'C:\Users\eduar\AppData\Local\Temp\tmpevvec3s4\output.pgm'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "c:\python39\lib\tempfile.py", line 801, in onerror
_os.unlink(path)
PermissionError: [WinError 32] O arquivo já está sendo usado por outro processo: 'C:\Users\eduar\AppData\Local\Temp\tmpevvec3s4\output.pgm'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "c:\python39\lib\multiprocessing\pool.py", line 125, in worker
result = (True, func(*args, **kwds))
File "c:\python39\lib\site-packages\ocrmypdf_sync.py", line 189, in exec_page_sync
ocr_image, preprocess_out = make_intermediate_images(
File "c:\python39\lib\site-packages\ocrmypdf_sync.py", line 158, in make_intermediate_images
ocr_image = preprocess(
File "c:\python39\lib\site-packages\ocrmypdf_sync.py", line 105, in preprocess
image = preprocess_clean(image, page_context)
File "c:\python39\lib\site-packages\ocrmypdf_pipeline.py", line 486, in preprocess_clean
unpaper.clean(
File "c:\python39\lib\site-packages\ocrmypdf_exec\unpaper.py", line 134, in clean
run(input_file, output_file, dpi=dpi, mode_args=unpaper_args)
File "c:\python39\lib\site-packages\ocrmypdf_exec\unpaper.py", line 100, in run
raise SubprocessOutputError(
File "c:\python39\lib\tempfile.py", line 826, in exit
self.cleanup()
File "c:\python39\lib\tempfile.py", line 830, in cleanup
self._rmtree(self.name)
File "c:\python39\lib\tempfile.py", line 812, in _rmtree
_shutil.rmtree(name, onerror=onerror)
File "c:\python39\lib\shutil.py", line 740, in rmtree
return _rmtree_unsafe(path, onerror)
File "c:\python39\lib\shutil.py", line 618, in _rmtree_unsafe
onerror(os.unlink, fullname, sys.exc_info())
File "c:\python39\lib\tempfile.py", line 804, in onerror
cls._rmtree(path)
File "c:\python39\lib\tempfile.py", line 812, in _rmtree
_shutil.rmtree(name, onerror=onerror)
File "c:\python39\lib\shutil.py", line 740, in rmtree
return _rmtree_unsafe(path, onerror)
File "c:\python39\lib\shutil.py", line 599, in _rmtree_unsafe
onerror(os.scandir, path, sys.exc_info())
File "c:\python39\lib\shutil.py", line 596, in _rmtree_unsafe
with os.scandir(path) as scandir_it:
NotADirectoryError: [WinError 267] O nome do diretório é inválido: 'C:\Users\eduar\AppData\Local\Temp\tmpevvec3s4\output.pgm'
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "c:\python39\lib\site-packages\ocrmypdf_sync.py", line 374, in run_pipeline
exec_concurrent(context, executor)
File "c:\python39\lib\site-packages\ocrmypdf_sync.py", line 271, in exec_concurrent
executor(
File "c:\python39\lib\site-packages\ocrmypdf_concurrent.py", line 82, in call
self._execute(
File "c:\python39\lib\site-packages\ocrmypdf\builtin_plugins\concurrency.py", line 132, in _execute
for result in results:
File "c:\python39\lib\multiprocessing\pool.py", line 870, in next
raise value
NotADirectoryError: [Errno 20] O nome do diretório é inválido: 'C:\Users\eduar\AppData\Local\Temp\tmpevvec3s4\output.pgm'

Test file:
317311740_1_1.PDF

@jbarlow83
Copy link
Collaborator

jbarlow83 commented Sep 22, 2022 via email

@eduardodataeasy
Copy link
Author

So in theory just update python 3.10. I'll try and let you know the result. I didn't quite understand the ocrmypdf update. But I will try.

@jbarlow83
Copy link
Collaborator

Yes, just update to python 3.10. The version of ocrmypdf shouldn't matter much for this issue.

@eduardodataeasy
Copy link
Author

ocrmypdf --force-ocr --optimize 0 --fast-web-view 0 --output-type pdf -l por -v 1 --deskew --remove-background --clean "D:\applications\dotNet\EasyMidia\TESTE_OCR\IN\PROCESSADO_NUANCE_317311740_0_1.pdf" "D:\applications\dotNet\EasyMidia\TESTE_OCR\OUT\REPROCESSADO_NUANCE_317311740_1_3_teste.PDF"
ocrmypdf 14.0.0
Running: ['C:\unpaper\unpaper.EXE', '--version']
Found unpaper 6.2
Running: ['C:\Tesseract-OCR\tesseract.EXE', '--version']
Found tesseract 5.0.0-alpha.20210506
Running: ['C:\Tesseract-OCR\tesseract.EXE', '--version']
Running: ['C:\gs9.54.0\bin\gswin64c.EXE', '--version']
Found gs 9.54.0
Running: ['C:\gs9.54.0\bin\gswin64c.EXE', '--version']
Running: ['C:\Tesseract-OCR\tesseract.EXE', '--list-langs']
stdout/stderr = List of available languages (166):
afr
amh
ara
asm
aze
aze_cyrl
bel
ben
bod
bos
bre
bul
cat
ceb
ces
chi_sim
chi_sim_vert
chi_tra
chi_tra_vert
chr
cos
cym
dan
dan_frak
deu
deu_frak
div
dzo
ell
eng
enm
epo
equ
est
eus
fao
fas
fil
fin
fra
frk
frm
fry
gla
gle
glg
grc
guj
hat
heb
hin
hrv
hun
hye
iku
ind
isl
ita
ita_old
jav
jpn
jpn_vert
kan
kat
kat_old
kaz
khm
kir
kmr
kor
kor_vert
lao
lat
lav
lit
ltz
mal
mar
mkd
mlt
mon
mri
msa
mya
nep
nld
nor
oci
ori
osd
pan
pol
por
pus
que
ron
rus
san
script/Arabic
script/Armenian
script/Bengali
script/Canadian_Aboriginal
script/Cherokee
script/Cyrillic
script/Devanagari
script/Ethiopic
script/Fraktur
script/Georgian
script/Greek
script/Gujarati
script/Gurmukhi
script/HanS
script/HanS_vert
script/HanT
script/HanT_vert
script/Hangul
script/Hangul_vert
script/Hebrew
script/Japanese
script/Japanese_vert
script/Kannada
script/Khmer
script/Lao
script/Latin
script/Malayalam
script/Myanmar
script/Oriya
script/Sinhala
script/Syriac
script/Tamil
script/Telugu
script/Thaana
script/Thai
script/Tibetan
script/Vietnamese
sin
slk
slk_frak
slv
snd
spa
spa_old
sqi
srp
srp_latn
sun
swa
swe
syr
tam
tat
tel
tgk
tgl
tha
tir
ton
tur
uig
ukr
urd
uzb
uzb_cyrl
vie
yid
yor

Opened a file
Scanning contents: 100%|██████████████████████████████████████████████████████████████| 6/6 [00:00<00:00, 333.30page/s]
Using Tesseract OpenMP thread limit 1
Start processing 6 pages concurrently
Opened a file
1 Rasterize with png16m, rotation 0
2 Rasterize with pnggray, rotation 0
3 Rasterize with pngmono, rotation 0
4 Rasterize with pngmono, rotation 0
5 Rasterize with pngmono, rotation 0
6 Rasterize with pnggray, rotation 0
1 Running: ['C:\gs9.54.0\bin\gswin64c.EXE', '-dQUIET', '-dSAFER', '-dBATCH', '-dNOPAUSE', '-dInterpolateControl=-1', '-sDEVICE=png16m', '-dFirstPage=1', '-dLastPage=1', '-r300.000000x300.000000', '-o', '-', '-sstdout=%stderr', '-dAutoRotatePages=/None', '-f', 'C:\Users\eduar\AppData\Local\Temp\ocrmypdf.io.216t7n56\origin.pdf']
2 Running: ['C:\gs9.54.0\bin\gswin64c.EXE', '-dQUIET', '-dSAFER', '-dBATCH', '-dNOPAUSE', '-dInterpolateControl=-1', '-sDEVICE=pnggray', '-dFirstPage=2', '-dLastPage=2', '-r99.943004x99.943004', '-o', '-', '-sstdout=%stderr', '-dAutoRotatePages=/None', '-f', 'C:\Users\eduar\AppData\Local\Temp\ocrmypdf.io.216t7n56\origin.pdf']
3 Running: ['C:\gs9.54.0\bin\gswin64c.EXE', '-dQUIET', '-dSAFER', '-dBATCH', '-dNOPAUSE', '-dInterpolateControl=-1', '-sDEVICE=pngmono', '-dFirstPage=3', '-dLastPage=3', '-r300.003562x300.003562', '-o', '-', '-sstdout=%stderr', '-dAutoRotatePages=/None', '-f', 'C:\Users\eduar\AppData\Local\Temp\ocrmypdf.io.216t7n56\origin.pdf']
5 Running: ['C:\gs9.54.0\bin\gswin64c.EXE', '-dQUIET', '-dSAFER', '-dBATCH', '-dNOPAUSE', '-dInterpolateControl=-1', '-sDEVICE=pngmono', '-dFirstPage=5', '-dLastPage=5', '-r300.003562x300.003562', '-o', '-', '-sstdout=%stderr', '-dAutoRotatePages=/None', '-f', 'C:\Users\eduar\AppData\Local\Temp\ocrmypdf.io.216t7n56\origin.pdf']
6 Running: ['C:\gs9.54.0\bin\gswin64c.EXE', '-dQUIET', '-dSAFER', '-dBATCH', '-dNOPAUSE', '-dInterpolateControl=-1', '-sDEVICE=pnggray', '-dFirstPage=6', '-dLastPage=6', '-r99.943004x99.943004', '-o', '-', '-sstdout=%stderr', '-dAutoRotatePages=/None', '-f', 'C:\Users\eduar\AppData\Local\Temp\ocrmypdf.io.216t7n56\origin.pdf']
4 Running: ['C:\gs9.54.0\bin\gswin64c.EXE', '-dQUIET', '-dSAFER', '-dBATCH', '-dNOPAUSE', '-dInterpolateControl=-1', '-sDEVICE=pngmono', '-dFirstPage=4', '-dLastPage=4', '-r300.003562x300.003562', '-o', '-', '-sstdout=%stderr', '-dAutoRotatePages=/None', '-f', 'C:\Users\eduar\AppData\Local\Temp\ocrmypdf.io.216t7n56\origin.pdf']
3 Rotating output by 0
2 Rotating output by 0
6 Rotating output by 0
5 Rotating output by 0
4 Rotating output by 0
3 background removal skipped on mono page
3 Running: ['C:\Tesseract-OCR\tesseract.EXE', '-l', 'por', '--psm', '2', 'C:\Users\eduar\AppData\Local\Temp\ocrmypdf.io.216t7n56\000003_rasterize.png', 'stdout']
4 background removal skipped on mono page
5 background removal skipped on mono page
4 Running: ['C:\Tesseract-OCR\tesseract.EXE', '-l', 'por', '--psm', '2', 'C:\Users\eduar\AppData\Local\Temp\ocrmypdf.io.216t7n56\000004_rasterize.png', 'stdout']
5 Running: ['C:\Tesseract-OCR\tesseract.EXE', '-l', 'por', '--psm', '2', 'C:\Users\eduar\AppData\Local\Temp\ocrmypdf.io.216t7n56\000005_rasterize.png', 'stdout']
3 background removal skipped on mono page
3 Running: ['C:\Tesseract-OCR\tesseract.EXE', '-l', 'por', '--psm', '2', 'C:\Users\eduar\AppData\Local\Temp\ocrmypdf.io.216t7n56\000003_rasterize.png', 'stdout']
4 background removal skipped on mono page
4 Running: ['C:\Tesseract-OCR\tesseract.EXE', '-l', 'por', '--psm', '2', 'C:\Users\eduar\AppData\Local\Temp\ocrmypdf.io.216t7n56\000004_rasterize.png', 'stdout']
5 background removal skipped on mono page
5 Running: ['C:\Tesseract-OCR\tesseract.EXE', '-l', 'por', '--psm', '2', 'C:\Users\eduar\AppData\Local\Temp\ocrmypdf.io.216t7n56\000005_rasterize.png', 'stdout']
1 Rotating output by 0
3 Running: ['C:\unpaper\unpaper.EXE', '-v', '--dpi', '300.003562', '--layout', 'none', '--mask-scan-size', '100', '--no-border-align', '--no-mask-center', '--no-grayfilter', '--no-blackfilter', '--no-deskew', 'C:\Users\eduar\AppData\Local\Temp\ocrmypdf.io.216t7n56\000003_pp_deskew.png', 'C:\Users\eduar\AppData\Local\Temp\tmpdhsw8f1x\output.pnm']
4 Running: ['C:\unpaper\unpaper.EXE', '-v', '--dpi', '300.003562', '--layout', 'none', '--mask-scan-size', '100', '--no-border-align', '--no-mask-center', '--no-grayfilter', '--no-blackfilter', '--no-deskew', 'C:\Users\eduar\AppData\Local\Temp\ocrmypdf.io.216t7n56\000004_pp_deskew.png', 'C:\Users\eduar\AppData\Local\Temp\tmp4owguc1k\output.pnm']
5 Running: ['C:\unpaper\unpaper.EXE', '-v', '--dpi', '300.003562', '--layout', 'none', '--mask-scan-size', '100', '--no-border-align', '--no-mask-center', '--no-grayfilter', '--no-blackfilter', '--no-deskew', 'C:\Users\eduar\AppData\Local\Temp\ocrmypdf.io.216t7n56\000005_pp_deskew.png', 'C:\Users\eduar\AppData\Local\Temp\tmpba2c5vul\output.pnm']
3 stdout/stderr = unpaper 6.2
License GPLv2: GNU GPL version 2.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.


Processing sheet #1: C:\Users\eduar\AppData\Local\Temp\ocrmypdf.io.216t7n56\000003_pp_deskew.png -> C:\Users\eduar\AppData\Local\Temp\tmpdhsw8f1x\output.pnm
input-file for sheet 1: C:\Users\eduar\AppData\Local\Temp\ocrmypdf.io.216t7n56\000003_pp_deskew.png
output-file for sheet 1: C:\Users\eduar\AppData\Local\Temp\tmpdhsw8f1x\output.pnm
sheet size: 2480x3509
...
noise-filter ... deleted 0 clusters.
blur-filter... deleted 0 pixels.
writing output.

3 resolution (299.9994, 299.9994)
3 convert
3 PIL format = PNG
3 imgformat = PNG
3 input dpi = 300 x 300
3 rotation = 0°
3 input colorspace = 1
3 width x height = 2480px x 3509px
3 read_images() embeds a PNG
3 convert done
3 Running: ['C:\\Tesseract-OCR\\tesseract.EXE', '-l', 'por', '-c', 'textonly_pdf=1', 'C:\\Users\\eduar\\AppData\\Local\\Temp\\ocrmypdf.io.216t7n56\\000003_ocr.png', 'C:\\Users\\eduar\\AppData\\Local\\Temp\\ocrmypdf.io.216t7n56\\000003_ocr_tess', 'pdf', 'txt']
4 stdout/stderr = unpaper 6.2

License GPLv2: GNU GPL version 2.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.


Processing sheet #1: C:\Users\eduar\AppData\Local\Temp\ocrmypdf.io.216t7n56\000004_pp_deskew.png -> C:\Users\eduar\AppData\Local\Temp\tmp4owguc1k\output.pnm
input-file for sheet 1: C:\Users\eduar\AppData\Local\Temp\ocrmypdf.io.216t7n56\000004_pp_deskew.png
output-file for sheet 1: C:\Users\eduar\AppData\Local\Temp\tmp4owguc1k\output.pnm
sheet size: 2480x3509
...
noise-filter ... deleted 0 clusters.
blur-filter... deleted 0 pixels.
writing output.

4 resolution (299.9994, 299.9994)
4 convert
4 PIL format = PNG
4 imgformat = PNG
4 input dpi = 300 x 300
4 rotation = 0°
4 input colorspace = 1
4 width x height = 2480px x 3509px
4 read_images() embeds a PNG
4 convert done
4 Running: ['C:\\Tesseract-OCR\\tesseract.EXE', '-l', 'por', '-c', 'textonly_pdf=1', 'C:\\Users\\eduar\\AppData\\Local\\Temp\\ocrmypdf.io.216t7n56\\000004_ocr.png', 'C:\\Users\\eduar\\AppData\\Local\\Temp\\ocrmypdf.io.216t7n56\\000004_ocr_tess', 'pdf', 'txt']
5 stdout/stderr = unpaper 6.2

License GPLv2: GNU GPL version 2.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.


Processing sheet #1: C:\Users\eduar\AppData\Local\Temp\ocrmypdf.io.216t7n56\000005_pp_deskew.png -> C:\Users\eduar\AppData\Local\Temp\tmpba2c5vul\output.pnm
input-file for sheet 1: C:\Users\eduar\AppData\Local\Temp\ocrmypdf.io.216t7n56\000005_pp_deskew.png
output-file for sheet 1: C:\Users\eduar\AppData\Local\Temp\tmpba2c5vul\output.pnm
sheet size: 2480x3509
...
noise-filter ... deleted 0 clusters.
blur-filter... deleted 0 pixels.
writing output.

5 resolution (299.9994, 299.9994)
5 convert
5 PIL format = PNG
5 imgformat = PNG
5 input dpi = 300 x 300
5 rotation = 0°
5 input colorspace = 1
5 width x height = 2480px x 3509px
5 read_images() embeds a PNG
5 convert done
5 Running: ['C:\\Tesseract-OCR\\tesseract.EXE', '-l', 'por', '-c', 'textonly_pdf=1', 'C:\\Users\\eduar\\AppData\\Local\\Temp\\ocrmypdf.io.216t7n56\\000005_ocr.png', 'C:\\Users\\eduar\\AppData\\Local\\Temp\\ocrmypdf.io.216t7n56\\000005_ocr_tess', 'pdf', 'txt']

OCR: 0%| | 0.0/6.0 [00:09<?, ?page/s]
An exception occurred while executing the pipeline
Traceback (most recent call last):
File "C:\Python310\lib\site-packages\ocrmypdf_sync.py", line 393, in run_pipeline
optimize_messages = exec_concurrent(context, executor)
File "C:\Python310\lib\site-packages\ocrmypdf_sync.py", line 280, in exec_concurrent
executor(
File "C:\Python310\lib\site-packages\ocrmypdf_concurrent.py", line 87, in call
self._execute(
File "C:\Python310\lib\site-packages\ocrmypdf\builtin_plugins\concurrency.py", line 141, in _execute
result = future.result()
File "C:\Python310\lib\concurrent\futures_base.py", line 438, in result
return self.__get_result()
File "C:\Python310\lib\concurrent\futures_base.py", line 390, in __get_result
raise self._exception
File "C:\Python310\lib\concurrent\futures\thread.py", line 52, in run
result = self.fn(*self.args, **self.kwargs)
File "C:\Python310\lib\site-packages\ocrmypdf_sync.py", line 196, in exec_page_sync
ocr_image, preprocess_out = make_intermediate_images(
File "C:\Python310\lib\site-packages\ocrmypdf_sync.py", line 139, in make_intermediate_images
preprocess_out = preprocess(
File "C:\Python310\lib\site-packages\ocrmypdf_sync.py", line 108, in preprocess
image = preprocess_remove_background(image, page_context)
File "C:\Python310\lib\site-packages\ocrmypdf_pipeline.py", line 477, in preprocess_remove_background
raise NotImplementedError("--remove-background is temporarily not implemented")
NotImplementedError: --remove-background is temporarily not implemented

Apparently I think the problem is in this specific file, as others it ocerizes normally. Is it a bug in the file?

PROCESSADO_317311740_0_1.pdf

@eduardodataeasy
Copy link
Author

I updated python 310 and also ocrmypdf from 12 to 14, with that the command: --remove-background stopped working, I removed it and the problem disappeared.

@jbarlow83
Copy link
Collaborator

That makes sense - unfortunately I still have not a chance to replace -remove-background.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants