Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds document text detection tutorial. #868

Merged
merged 3 commits into from
Mar 22, 2017
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions vision/cloud-client/document_text/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
output-text.jpg
110 changes: 110 additions & 0 deletions vision/cloud-client/document_text/README.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,110 @@
.. This file is automatically generated. Do not edit this file directly.

Google Cloud Vision API Python Samples
===============================================================================

This directory contains samples for Google Cloud Vision API. `Google Cloud Vision API`_ allows developers to easily integrate vision detection features within applications, including image labeling, face and landmark detection, optical character recognition (OCR), and tagging of explicit content




.. _Google Cloud Vision API: https://cloud.google.com/vision/docs

Setup
-------------------------------------------------------------------------------


Authentication
++++++++++++++

Authentication is typically done through `Application Default Credentials`_,
which means you do not have to change the code to authenticate as long as
your environment has credentials. You have a few options for setting up
authentication:

#. When running locally, use the `Google Cloud SDK`_

.. code-block:: bash

gcloud beta auth application-default login


#. When running on App Engine or Compute Engine, credentials are already
set-up. However, you may need to configure your Compute Engine instance
with `additional scopes`_.

#. You can create a `Service Account key file`_. This file can be used to
authenticate to Google Cloud Platform services from any environment. To use
the file, set the ``GOOGLE_APPLICATION_CREDENTIALS`` environment variable to
the path to the key file, for example:

.. code-block:: bash

export GOOGLE_APPLICATION_CREDENTIALS=/path/to/service_account.json

.. _Application Default Credentials: https://cloud.google.com/docs/authentication#getting_credentials_for_server-centric_flow
.. _additional scopes: https://cloud.google.com/compute/docs/authentication#using
.. _Service Account key file: https://developers.google.com/identity/protocols/OAuth2ServiceAccount#creatinganaccount

Install Dependencies
++++++++++++++++++++

#. Install `pip`_ and `virtualenv`_ if you do not already have them.

#. Create a virtualenv. Samples are compatible with Python 2.7 and 3.4+.

.. code-block:: bash

$ virtualenv env
$ source env/bin/activate

#. Install the dependencies needed to run the samples.

.. code-block:: bash

$ pip install -r requirements.txt

.. _pip: https://pip.pypa.io/
.. _virtualenv: https://virtualenv.pypa.io/

Samples
-------------------------------------------------------------------------------

Document Text tutorial
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++



To run this sample:

.. code-block:: bash

$ python doctext.py

usage: doctext.py [-h] image_file

positional arguments:
image_file The image for text detection.

optional arguments:
-h, --help show this help message and exit




The client library
-------------------------------------------------------------------------------

This sample uses the `Google Cloud Client Library for Python`_.
You can read the documentation for more details on API usage and use GitHub
to `browse the source`_ and `report issues`_.

.. Google Cloud Client Library for Python:
https://googlecloudplatform.github.io/google-cloud-python/
.. browse the source:
https://github.com/GoogleCloudPlatform/google-cloud-python
.. report issues:
https://github.com/GoogleCloudPlatform/google-cloud-python/issues


.. _Google Cloud SDK: https://cloud.google.com/sdk/
22 changes: 22 additions & 0 deletions vision/cloud-client/document_text/README.rst.in
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
# This file is used to generate README.rst

product:
name: Google Cloud Vision API
short_name: Cloud Vision API
url: https://cloud.google.com/vision/docs
description: >
`Google Cloud Vision API`_ allows developers to easily integrate vision
detection features within applications, including image labeling, face and
landmark detection, optical character recognition (OCR), and tagging of
explicit content.

setup:
- auth
- install_deps

samples:
- name: Document Text tutorial
file: doctext.py
show_help: True

cloud_client_library: true
125 changes: 125 additions & 0 deletions vision/cloud-client/document_text/doctext.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,125 @@
#!/usr/bin/env python

# Copyright 2017 Google Inc. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

"""Outlines document text given an image.

Example:
python doctext.py resources/text_menu.jpg
"""
# [START full_tutorial]
# [START imports]
import argparse
from enum import Enum
import io

from google.cloud import vision
from PIL import Image, ImageDraw
# [END imports]


class FeatureType(Enum):
PAGE = 1
BLOCK = 2
PARA = 3
WORD = 4
SYMBOL = 5


def draw_boxes(image, blocks, color):
"""Draw a border around the image using the hints in the vector list."""
# [START draw_blocks]
draw = ImageDraw.Draw(image)

for block in blocks:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh nice! For the boundaries here the objects might just work too because they're x,y tuples.

draw.polygon([block.vertices[0].x, block.vertices[0].y,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ew hanging indents.

draw.polygon([
    block...,
    block...,
], None, color)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

block.vertices[1].x, block.vertices[1].y,
block.vertices[2].x, block.vertices[2].y,
block.vertices[3].x, block.vertices[3].y], None, color)
return image
# [END draw_blocks]


def get_document_bounds(image_file, feature):
# [START detect_bounds]
"""Returns document bounds given an image."""
vision_client = vision.Client()

bounds = []

with io.open(image_file, 'rb') as image_file:
content = image_file.read()

image = vision_client.image(content=content)
document = image.detect_full_text()

# Append specified feature bounds by enumerating all document features
for page in document.pages:

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for these blank newlines in nested fors.

for block in page.blocks:

for paragraph in block.paragraphs:

for word in paragraph.words:

for symbol in word.symbols:

if (feature == FeatureType.SYMBOL):
bounds.append(symbol.bounding_box)

if (feature == FeatureType.WORD):
bounds.append(word.bounding_box)

if (feature == FeatureType.PARA):
bounds.append(paragraph.bounding_box)

if (feature == FeatureType.BLOCK):
bounds.append(block.bounding_box)

if (feature == FeatureType.PAGE):
bounds.append(block.bounding_box)

return bounds
# [END detect_bounds]


def render_doc_text(filein, fileout):
# [START render_doc_text]
image = Image.open(filein)
bounds = get_document_bounds(filein, FeatureType.PAGE)
draw_boxes(image, bounds, 'blue')
bounds = get_document_bounds(filein, FeatureType.PARA)
draw_boxes(image, bounds, 'red')
bounds = get_document_bounds(filein, FeatureType.WORD)
draw_boxes(image, bounds, 'yellow')

if fileout is not 0:
image.save(fileout)
else:
image.show()
# [END render_doc_text]


if __name__ == '__main__':
# [START run_crop]
parser = argparse.ArgumentParser()
parser.add_argument('detect_file', help='The image for text detection.')
parser.add_argument('-out_file', help='Optional output file', default=0)
args = parser.parse_args()

parser = argparse.ArgumentParser()
render_doc_text(args.detect_file, args.out_file)
# [END run_crop]
# [END full_tutorial]
24 changes: 24 additions & 0 deletions vision/cloud-client/document_text/doctext_test.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
# Copyright 2017 Google Inc. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

import os

import doctext


def test_text(cloud_config, capsys):
"""Checks the output image for drawing the crop hint is created."""
doctext.render_doc_text('resources/text_menu.jpg', 'output-text.jpg')
out, _ = capsys.readouterr()
assert os.path.isfile('output-text.jpg')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add output-text.jpg to gitignore.

2 changes: 2 additions & 0 deletions vision/cloud-client/document_text/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
google-cloud-vision==0.23.2
pillow==4.0.0
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.