Skip to content

Commit

Permalink
[dlp] testing: fix Pub/Sub notifications (#3925)
Browse files Browse the repository at this point in the history
* re-generated README.rst with some more setup info
* use parent with the global location attached
* re-enabled some tests with Pub/Sub notification
* stop waiting between test retries
  • Loading branch information
Takashi Matsuo authored Jun 3, 2020
1 parent 4b968e8 commit 3102486
Show file tree
Hide file tree
Showing 8 changed files with 98 additions and 102 deletions.
34 changes: 18 additions & 16 deletions dlp/README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,15 @@ This directory contains samples for Google Data Loss Prevention. `Google Data Lo

.. _Google Data Loss Prevention: https://cloud.google.com/dlp/docs/

To run the sample, you need to enable the API at: https://console.cloud.google.com/apis/library/dlp.googleapis.com


To run the sample, you need to have the following roles:
* `DLP Administrator`
* `DLP API Service Agent`



Setup
-------------------------------------------------------------------------------

Expand Down Expand Up @@ -58,15 +67,6 @@ Install Dependencies
.. _pip: https://pip.pypa.io/
.. _virtualenv: https://virtualenv.pypa.io/

#. For running *_test.py files, install test dependencies
.. code-block:: bash
$ pip install -r requirements-test.txt
$ pytest inspect_content_test.py
** *_test.py files are demo wrappers and make API calls. You may get rate limited for making high number of requests. **

Samples
-------------------------------------------------------------------------------

Expand All @@ -83,7 +83,7 @@ To run this sample:

.. code-block:: bash
$ python quickstart.py <project-id>
$ python quickstart.py
Inspect Content
Expand All @@ -101,15 +101,16 @@ To run this sample:
$ python inspect_content.py
usage: inspect_content.py [-h] {string,file,gcs,datastore,bigquery} ...
usage: inspect_content.py [-h] {string,table,file,gcs,datastore,bigquery} ...
Sample app that uses the Data Loss Prevention API to inspect a string, a local
file or a file on Google Cloud Storage.
positional arguments:
{string,file,gcs,datastore,bigquery}
{string,table,file,gcs,datastore,bigquery}
Select how to submit content to the API.
string Inspect a string.
table Inspect a table.
file Inspect a local file.
gcs Inspect files on Google Cloud Storage.
datastore Inspect files on Google Datastore.
Expand All @@ -135,13 +136,14 @@ To run this sample:
$ python redact.py
usage: redact.py [-h] [--project PROJECT] [--info_types INFO_TYPES]
usage: redact.py [-h] [--project PROJECT]
[--info_types INFO_TYPES [INFO_TYPES ...]]
[--min_likelihood {LIKELIHOOD_UNSPECIFIED,VERY_UNLIKELY,UNLIKELY,POSSIBLE,LIKELY,VERY_LIKELY}]
[--mime_type MIME_TYPE]
filename output_filename
Sample app that uses the Data Loss Prevent API to redact the contents of a
string or an image file.
Sample app that uses the Data Loss Prevent API to redact the contents of an
image file.
positional arguments:
filename The path to the file to inspect.
Expand All @@ -151,7 +153,7 @@ To run this sample:
-h, --help show this help message and exit
--project PROJECT The Google Cloud project id to use as a parent
resource.
--info_types INFO_TYPES
--info_types INFO_TYPES [INFO_TYPES ...]
Strings representing info types to look for. A full
list of info categories and types is available from
the API. Examples include "FIRST_NAME", "LAST_NAME",
Expand Down
8 changes: 7 additions & 1 deletion dlp/README.rst.in
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ product:
name: Google Data Loss Prevention
short_name: Data Loss Prevention
url: https://cloud.google.com/dlp/docs/
description: >
description: >
`Google Data Loss Prevention`_ provides programmatic access to a powerful
detection engine for personally identifiable information and other
privacy-sensitive data in unstructured data streams.
Expand All @@ -13,6 +13,12 @@ setup:
- auth
- install_deps

required_api_url: https://console.cloud.google.com/apis/library/dlp.googleapis.com

required_roles:
- DLP Administrator
- DLP API Service Agent

samples:
- name: Quickstart
file: quickstart.py
Expand Down
20 changes: 0 additions & 20 deletions dlp/conftest.py

This file was deleted.

21 changes: 12 additions & 9 deletions dlp/inspect_content.py
Original file line number Diff line number Diff line change
Expand Up @@ -459,11 +459,12 @@ def inspect_gcs_file(
url = "gs://{}/{}".format(bucket, filename)
storage_config = {"cloud_storage_options": {"file_set": {"url": url}}}

# Convert the project id into a full resource id.
parent = dlp.project_path(project)
# Convert the project id into full resource ids.
topic = google.cloud.pubsub.PublisherClient.topic_path(project, topic_id)
parent = dlp.location_path(project, 'global')

# Tell the API where to send a notification when the job is complete.
actions = [{"pub_sub": {"topic": "{}/topics/{}".format(parent, topic_id)}}]
actions = [{"pub_sub": {"topic": topic}}]

# Construct the inspect_job, which defines the entire inspect content task.
inspect_job = {
Expand Down Expand Up @@ -623,11 +624,12 @@ def inspect_datastore(
}
}

# Convert the project id into a full resource id.
parent = dlp.project_path(project)
# Convert the project id into full resource ids.
topic = google.cloud.pubsub.PublisherClient.topic_path(project, topic_id)
parent = dlp.location_path(project, 'global')

# Tell the API where to send a notification when the job is complete.
actions = [{"pub_sub": {"topic": "{}/topics/{}".format(parent, topic_id)}}]
actions = [{"pub_sub": {"topic": topic}}]

# Construct the inspect_job, which defines the entire inspect content task.
inspect_job = {
Expand Down Expand Up @@ -790,11 +792,12 @@ def inspect_bigquery(
}
}

# Convert the project id into a full resource id.
parent = dlp.project_path(project)
# Convert the project id into full resource ids.
topic = google.cloud.pubsub.PublisherClient.topic_path(project, topic_id)
parent = dlp.location_path(project, 'global')

# Tell the API where to send a notification when the job is complete.
actions = [{"pub_sub": {"topic": "{}/topics/{}".format(parent, topic_id)}}]
actions = [{"pub_sub": {"topic": topic}}]

# Construct the inspect_job, which defines the entire inspect content task.
inspect_job = {
Expand Down
37 changes: 23 additions & 14 deletions dlp/inspect_content_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,8 @@
BIGQUERY_DATASET_ID = "dlp_test_dataset" + UNIQUE_STRING
BIGQUERY_TABLE_ID = "dlp_test_table" + UNIQUE_STRING

TIMEOUT = 300 # 5 minutes


@pytest.fixture(scope="module")
def bucket():
Expand Down Expand Up @@ -298,6 +300,7 @@ def cancel_operation(out):
client.cancel_dlp_job(operation_id)


@pytest.mark.flaky(max_runs=2, min_passes=1)
def test_inspect_gcs_file(bucket, topic_id, subscription_id, capsys):
try:
inspect_content.inspect_gcs_file(
Expand All @@ -307,15 +310,16 @@ def test_inspect_gcs_file(bucket, topic_id, subscription_id, capsys):
topic_id,
subscription_id,
["EMAIL_ADDRESS", "PHONE_NUMBER"],
timeout=1
timeout=TIMEOUT
)

out, _ = capsys.readouterr()
assert "Inspection operation started" in out
assert "Info type: EMAIL_ADDRESS" in out
finally:
cancel_operation(out)


@pytest.mark.flaky(max_runs=2, min_passes=1)
def test_inspect_gcs_file_with_custom_info_types(
bucket, topic_id, subscription_id, capsys):
try:
Expand All @@ -331,15 +335,16 @@ def test_inspect_gcs_file_with_custom_info_types(
[],
custom_dictionaries=dictionaries,
custom_regexes=regexes,
timeout=1)
timeout=TIMEOUT)

out, _ = capsys.readouterr()

assert "Inspection operation started" in out
assert "Info type: EMAIL_ADDRESS" in out
finally:
cancel_operation(out)


@pytest.mark.flaky(max_runs=2, min_passes=1)
def test_inspect_gcs_file_no_results(
bucket, topic_id, subscription_id, capsys):
try:
Expand All @@ -350,15 +355,16 @@ def test_inspect_gcs_file_no_results(
topic_id,
subscription_id,
["EMAIL_ADDRESS", "PHONE_NUMBER"],
timeout=1)
timeout=TIMEOUT)

out, _ = capsys.readouterr()

assert "Inspection operation started" in out
assert "No findings" in out
finally:
cancel_operation(out)


@pytest.mark.flaky(max_runs=2, min_passes=1)
def test_inspect_gcs_image_file(bucket, topic_id, subscription_id, capsys):
try:
inspect_content.inspect_gcs_file(
Expand All @@ -368,14 +374,15 @@ def test_inspect_gcs_image_file(bucket, topic_id, subscription_id, capsys):
topic_id,
subscription_id,
["EMAIL_ADDRESS", "PHONE_NUMBER"],
timeout=1)
timeout=TIMEOUT)

out, _ = capsys.readouterr()
assert "Inspection operation started" in out
assert "Info type: EMAIL_ADDRESS" in out
finally:
cancel_operation(out)


@pytest.mark.flaky(max_runs=2, min_passes=1)
def test_inspect_gcs_multiple_files(bucket, topic_id, subscription_id, capsys):
try:
inspect_content.inspect_gcs_file(
Expand All @@ -385,15 +392,16 @@ def test_inspect_gcs_multiple_files(bucket, topic_id, subscription_id, capsys):
topic_id,
subscription_id,
["EMAIL_ADDRESS", "PHONE_NUMBER"],
timeout=1)
timeout=TIMEOUT)

out, _ = capsys.readouterr()

assert "Inspection operation started" in out
assert "Info type: EMAIL_ADDRESS" in out
finally:
cancel_operation(out)


@pytest.mark.flaky(max_runs=2, min_passes=1)
def test_inspect_datastore(
datastore_project, topic_id, subscription_id, capsys):
try:
Expand All @@ -404,14 +412,15 @@ def test_inspect_datastore(
topic_id,
subscription_id,
["FIRST_NAME", "EMAIL_ADDRESS", "PHONE_NUMBER"],
timeout=1)
timeout=TIMEOUT)

out, _ = capsys.readouterr()
assert "Inspection operation started" in out
assert "Info type: EMAIL_ADDRESS" in out
finally:
cancel_operation(out)


@pytest.mark.flaky(max_runs=2, min_passes=1)
def test_inspect_datastore_no_results(
datastore_project, topic_id, subscription_id, capsys):
try:
Expand All @@ -422,10 +431,10 @@ def test_inspect_datastore_no_results(
topic_id,
subscription_id,
["PHONE_NUMBER"],
timeout=1)
timeout=TIMEOUT)

out, _ = capsys.readouterr()
assert "Inspection operation started" in out
assert "No findings" in out
finally:
cancel_operation(out)

Expand Down
Loading

0 comments on commit 3102486

Please sign in to comment.