Sumi
is a simple OCR application with support for corrections.
- Go
- GTK+ 3.16 or later
- Tesseract 3.04.00 or later
- Trained data for your language
- One of the following:
If none of the screenshot utilities above are available on your system, you can
use the SUMI_SCREENCAPTURE
environment variable to provide your own. The
utility is expected to select a part of the screen and write to the file path
specified in the last argument, e.g. for scrot
the valid SUMI_SCREENCAPTURE
value would be scrot -s
.
- Download and install all dependencies from the list above
go get github.com/tsudoko/sumi
To use a language other than Japanese (or more than one language at once), pass
the ISO 639-3 code of the desired language in a -l
flag, i.e. sumi -l eng
.
Please note though that sumi
was designed to work specifically with Japanese,
therefore it might give worse results when used with other languages.
Sumi
prints scanned text to stdout
. It's possible to send it to other
programs automatically, examples below.
X11, requires xclip
.
./sumi | while read -r a; do echo "$a" | xclip -i -sel clip; done
Windows, requires a sh
-compatible shell and iconv
. You have to replace $cp
with your locale's codepage, for Japanese it's cp932
.
./sumi.exe | while read -r a; do echo "$a" | iconv -t $cp | clip; done
With ep:
./sumi | xargs -n1 ep
With myougiden:
./sumi | xargs -n1 myougiden