Skip to content

Technical Specification

Skylar Scott edited this page Sep 2, 2020 · 10 revisions

Abstract

Stream-line language immersion and word capture for non-selectable language content.

There are many proposed methods for acquiring a secondary language, and the argument for whose methods are best is a continuous argument in the language learning community. However, all methods share the same goal in the end; to gain the ability to better participate in the target language. As developers and language learners, we place our value on the ability to understand and digest native language content. LangOCR is a software pack designed to assist language learners in the process inputting native language material and quickly turn new words into high quality Anki flash cards.

Introduction

Pertinent Links

This document will expect readers to be comfortable with the terminology and methodologies of the following projects and software.

MIA Anki Add-ons

Morphman

Overview

LangOCR is developed in line with the Mass Immersion Approach (MIA) methodologies for studying a secondary language via immersion based learning. Immersion occurs either through Reading or Listening in the target learning, LangOCR aims to enable users a time-efficient method for quick word lockup and sentence mining of visual content that is not selectable.

Glossary

  • Active Immersion - Immersion based listening or reading where 100% of the learners attention is on the immersion content.
  • Target Language - The language that the user is trying to acquire.
  • Target Word - The word that a user is interested in getting information on.
  • i+1 Sentences - A sentence where all words are known by the user excluding a single target word. i+1 sentence can be audio or text based.
  • Sentence Mining - The act of actively immersing in content and creating Anki flash cards for i+1 sentences.

Goals

  • Target mediums are ebooks, comics, games, and hardsub video - LangOCR will emphasize common reading immersion sources that where the text is not selectable. These are the use-cases for software operation.
  • Software is focused on maintaining reading momentum - Design decisions should be made such that they emphasize the ability to continuously read with minimal break in immersion content.
  • Quick Dictionary Look-up - Users should be able to quickly get a dictionary entry on a target word.
  • Access to Pitch Information w/ Sound Clips - Users should be able to quickly get pitch information on a target word. That pitch information should be accompanied by sound clips for the word in question.
  • Word entries should be displayed near the target word. - All information should be displayed near the target word such that the user does not need to move their eyes far from the sentence.
  • Ease of sentence mining - It should be simple and quick to make high quality i+1 sentence cards to the user’s specification in Anki.
  • Monitor Word Frequency - LangOCR should be keeping a word database of previously scanned words and known words by the user. See Morphman.
  • Monitor user medium - Capture the medium where the target words sourced from in a global database.
  • Example Sentences - Target words should be accompanied by example sentences used by that word on the users request. Those sentences should be organized by known words of the user, and by the retention rates of that sentence against other users of a comparable level.
  • Monolingual Transition - LangOCR will also provide itself as a tool for easing learners into the monolingual transition, a notably important and difficult step of the MIA process.
  • Ease of obtaining user corrections for re-training - The interface shall allow users to quickly make corrections when the OCR engine is incorrect. This is done either by selecting from a list of look-alike OCR choices, or by getting a drawing from the user.
  • Coexist with existing tools where possible - We don’t want to re-invent work that is already in development without reasonable justification. If there are existing tools that satisfy a requirement, first consider how LangOCR can integrate with that software.
  • Support third-party software - Focus on interfaces that allow other programs to utilize the power of LangOCR.

Future Goals

  • Enable the same features with selectable content - Be able to bypass OCR with the same functionality support.
  • Provide users with data refined by user data - Use user data such as medium and Anki retention rate to give refined results.
  • Give users suggestion on reading material based on Users data - Given retention rate and skill level relative to other users, give suggestions for optimally leveled content.
  • Proctor a end result of selectable text medium - be able to take a PDF or collection of image files as input, give the users the opportunity to correct the file/s with LangOCR style corrections, then export an output format of selectable text.

Requirements

Requirements Glossary

Main Widget

Refers the main widget where information is housed, and settings can be changed. Made of as a collection of widgets, and options.

Widget

Refers to a widget that is designed to complete one and only one purpose. Widgets are housed within the Main Widget.

Image Packet

A packet served from a client to be received at the server. The packet contains a pertinent MIME image type plus meta-data such as image format descriptor, binary size, request type, and/or a session key for the user.

Image Decoding

The action of taking an image of a language script and obtaining the language script as unicode text. This will be done via OCR.

Region Type

Given a image with potentially multiple bodies of text, the image can be broken down into regions that contain text. The region type describes the type of region that text is held within.

‘Block’ type

The region describes the text within the region has multiple lines on contiguous text. e.g. a paragraph might be considered a block.

‘Line’ type

The region describes the text within the region is of a single line of contiguous text. e.g. a word made of two symbols such as ‘学校’ might be considered a line.

‘Symbol’ type

The region describes the text within the region is of a single symbol. e.g. a single latin letter, or the symbol ‘学’ would be considered a symbol type.

Key Entry

A word in which a dictionary entry is requested on.

Dictionary Entry

A dictionary entry is dictionary data for defining a specific word entry with information including part of speech, definition, pronunciation, and tone information.

1 - General Requirements

  • LangOCR shall be written as a separate GUI and Daemon application - LangOCR will be separated into two main components, langocr (GUI application) and langocrd (Daemon application). langocr-cli is tertiary program used to debug langocrd, and give users a program to test with.

langocr shall be able to detect or boot-up an instance of langocrd if an instance of langocrd is not running. .

2 - LangOCR GUI (langocr)

Widget can be displayed in the Main Widget, or floating outside the main widget.

Widgets should only do one action and one action only.

Widgets should be able to be quickly displayed, and removed from view in location relative to the cursor.

Capture data can be obtained by moving the mouse over text. A capture window will be automatically drawn based on guess where the text bubble begins and ends.

3 - LangOCR Daemon (langocrd)

General langocrd

langocrd shall be cross-platform for Windows, MacOS, and Linux.

langocrd should be an implementation of a RPC server.

langorcd should use JSON as it’s main communication format.

Raw binary should use msgpack-rpc with meta-packet data that explains the data format where JSON is not satisfactory.

The server shall be RESTful This is aimed to enforce simplicity in the protocol, and to allow simply scaling of the server as load requirements are not addressable at this stage. (is RESTFUL strictly HTTP?)

langocrd communication should be broken into three main communication types, user authentication, ocr requests, and dictionary requests.

langocrd shall accept authentication requests from clients. If authentication is successful, the request will respond with a session key that can be used up until session timeout. In the event of session timeout, the client will receive an authentication error.

All ocr and dictionary requests to langocrd shall include the session key as part of the request.

Successive requests with the session key should restart the session timeout for that key.

OCR specific requirements

langocrd supports BMP, PNG, and JPEG image formats.

Upon langocrd receiving an image request, langocrd shall respond with JSON containing a field with an image id. The image id is valid until a timeout occurs and the image becomes invalid. The client is notified of this when it tries to use that image id with a failure response.

All API requests for a image shall only be valid given a valid image id. An error is issued to the user in the event of an invalid id.

The image request must be accompanied with a MIME type identifier, and the language of the image’s script. If the language is not supported by langocrd, an error must be returned to the client.

The client may determine the language of the script as ‘unknown’ in a image request. langocrd will then make a guess to the language depicted and will respond given that guessed language.

The client may send a request to langocrd that will be responded to with a suggested language.

All image decodings must be given in format of a list of possible outputs accompanied with a probability value of how correct the decoding is believed to be.

langocrd must support a command to return the regions where an image is expected to contain text along with a region type and region-id.

Pending decision.

A region is really just another image. Should regions just be treated as another image with an image id?

Region types will include ‘block’, ‘line’, and ‘symbol’.

langocrd

Dictionary specific requirements

All dictionary requests shall include a definition language field, and a known language field. The definition language field defines the language that the key entry is in. The known language is the language that the dictionary entry is written in.

All dictionary requests and responses shall be done using JSON as it’s main communication scheme. Exceptions to this rule are binary data which are communicated over FTP.

Pending though

is FTP an important specifier, or unnecessary.

langocrd shall support the following commands as client requests. For these commands, if the word can be broken up into multiple possible entries, then the command results are made into a list for each of those possible entries.

  • Given a word, response with a dictionary entry.
  • Given a word and language specifier, respond with a tone information. All tone information is accompanied by a tone id.
  • Given a tone id, respond with sound clip of a word said in that tone. A tone-id has a one-to-one relationship with a word, a word’s possible tones, and it’s sound clip.

    Examples:

    • 橋(はし↓) (hashi↓) and 箸(は↓し) (ha↓shi) have individual tone ids.
    • 怪しい (あ↑やしい) (a↑yashii) and 怪しい (あやし↓い) (ayashi↓i) have individual tone ids

langocrd shall support the following commands as client requests. For these commands, a list of possible entries in response is not desired.

  • Given a word, a number n of sentences desired, and a ordering method, return a list of example sentences that use the given word in order of the ordering method.

LangOCR CLI (langocr-ctl)

langocr-ctl shall be used for directly interacting with and debugging the langocrd.

langocr-ctl shall cross-platform for Windows, MacOS, and Linux.

All commands and requests that langocrd supports shall be made by langocr-ctl.

Flags shall be supported to print raw request information to stdout.

Flags shall be supported to print human readable information to stdout.

Solutions

MockUI

For an example of the Mock UI, please see doc/ of the main repository. You can either view the PDF, or import the file into draw.io for modifications or a better viewing experience.

API Specification

And API Specification will be done using doxypress in C++, and will be used for compilation. For more information, check lib/ in the main repository.