Skip to content

Commit

Permalink
updates (#37)
Browse files Browse the repository at this point in the history
* updates

* Notebook tweaks, pandas version update

* pandas version bounding

Co-authored-by: John Myers <john@gretel.ai>
  • Loading branch information
johntmyers and John Myers authored Aug 20, 2020
1 parent 6988af2 commit 95bbfec
Show file tree
Hide file tree
Showing 4 changed files with 43 additions and 18 deletions.
4 changes: 4 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,10 @@

[![Documentation Status](https://readthedocs.org/projects/gretel-client/badge/?version=latest)](https://gretel-client.readthedocs.io/en/stable/?badge=stable)

Check out our [documentation](https://gretel-client.readthedocs.io/en/stable/?badge=stable) for getting started guides and module references.

For more advanced usage, please refer to our [blueprints](blueprints).

The Gretel Python Client provides bindings to the Gretel REST API and a transformation sub-package that provides interfaces to manipulate data based on a variety of use cases.

The REST API bindings and transformer interfaces can be used separately or together to solve a variety of data analysis, anonymization, and other ETL use cases.
Expand Down
53 changes: 37 additions & 16 deletions notebooks/launch_transformers.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -8,11 +8,13 @@
"\n",
"Welcome to the Gretel Transformers walkthrough! In this tutorial we will take you through the process of creating a data pipeline to apply a variety of transformations to your data.\n",
"\n",
"This tutorial assumes you have already uploaded data to Gretel.\n",
"This tutorial assumes you have already uploaded data to a [Gretel Project](https://console.gretel.cloud).\n",
"\n",
"The transformers in this example work on entity labels only. We have chosen a subset of labels we see often in data.\n",
"\n",
"If you would like to build field-level transforms, please look through our blueprints directory (in the top level of the repository) for examples."
"If you would like to build field-level transforms or see more advanced use cases please look through our [blueprints directory](https://github.com/gretelai/gretel-python-client/tree/master/blueprints) for more examples.\n",
"\n",
"For a more exhaustive list of possible transformations, please reference our [documentation](https://gretel-client.readthedocs.io/en/latest/transformers/api_ref.html#module-reference-transformers)."
]
},
{
Expand All @@ -25,6 +27,8 @@
},
"outputs": [],
"source": [
"# NOTE: Run this cell and copy your Gretel URI into the text box below\n",
"\n",
"import getpass\n",
"import os\n",
"\n",
Expand All @@ -37,7 +41,8 @@
"source": [
"## Create a Gretel Project Instance\n",
"\n",
"In the code below, we will utilize the gretel-client to create an instance of a project that will be used to syntesize data from. "
"In the code below, we will utilize the gretel-client to create an instance of a Project that we can use to iterate\n",
"labeled records from."
]
},
{
Expand All @@ -49,7 +54,7 @@
"outputs": [],
"source": [
"%%capture\n",
"!pip install \"gretel-client==0.7.0.rc7\" --upgrade"
"!pip install gretel-client --upgrade"
]
},
{
Expand All @@ -73,14 +78,14 @@
"metadata": {},
"outputs": [],
"source": [
"# We can see how many records we've ingested and how many fields we've discovered, just to show the\n",
"# project is active.\n",
"print(f'Total Records Received: {project.record_count}')\n",
"print(f'Total Fields Discovered: {project.field_count}')\n",
"\n",
"print(\"\")\n",
"print('Previewing project dataframe')\n",
"project.head(5)"
"# Example JSON record and Gretel Metadata from the Project stream\n",
"\n",
"# Components of a record:\n",
"# - id: A unique ID that represents a position in the stream the record resides\n",
"# - data: A flattened version of the raw record that was received\n",
"# - metadata: A dictionary of metadata, keyed by field name\n",
"\n",
"project.sample()[0]"
]
},
{
Expand Down Expand Up @@ -117,6 +122,9 @@
"email_mask = StringMask(start_pos=3)\n",
"email_transformer = [RedactWithCharConfig(labels=[\"email_address\"], minimum_score=Score.MED, mask=[email_mask])]\n",
"\n",
"ip_mask = StringMask(start_pos=-6)\n",
"ip_transformer = [RedactWithCharConfig(labels=[\"ip_address\"], minimum_score=Score.MED, mask=[ip_mask])]\n",
"\n",
"# let's mask the last 2 digits of zip codes\n",
"zip_mask = StringMask(start_pos=-2)\n",
"zip_transformer = [RedactWithCharConfig(labels=[\"us_zip_code\"], minimum_score=Score.MED, mask=[zip_mask])]\n",
Expand All @@ -129,6 +137,9 @@
"# let's replace phone numbers with totally fake, but consistent ones\n",
"phone_transformer = [FakeConstantConfig(labels=[\"phone_number\"], minimum_score=Score.MED, seed=1234, fake_method=\"phone_number\")]\n",
"\n",
"# let's replace person names with totally fake, but consistent ones\n",
"person_transformer = [FakeConstantConfig(labels=[\"person_name\"], minimum_score=Score.MED, seed=1234, fake_method=\"person_name\")]\n",
"\n",
"# aggressively mask all locations\n",
"location_transformer = [RedactWithLabelConfig(labels=[\"location\"], minimum_score=Score.MED)]\n",
"\n",
Expand All @@ -140,7 +151,7 @@
"# since we are only working on automatic transforms based on labels\n",
"# they can all go into one datapath\n",
"\n",
"all_transformers = email_transformer + zip_transformer + token_transformer + phone_transformer + location_transformer + lat_lon_transformer\n",
"all_transformers = email_transformer + ip_transformer + zip_transformer + token_transformer + phone_transformer + person_transformer + location_transformer + lat_lon_transformer\n",
"data_path = [\n",
" DataPath(input=\"*\", xforms=all_transformers)\n",
"]\n",
Expand Down Expand Up @@ -194,15 +205,25 @@
"# Print out Git-style diffs between source and transformed records\n",
"for original, transformed in zip(records, transformed_records):\n",
" show_record_diff(original[\"data\"], transformed[\"data\"])\n",
" input()"
" input(\"Press enter / return to go to the next record\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
"source": [
"# If you have data constantly ingesting to the Gretel API, you can consume the labeled\n",
"# data and automatically apply your transforms like so:\n",
"#\n",
"# NOTE: If you do not have data ingesting currently, this operation will block until records are received\n",
"#\n",
"for record in project.iter_records():\n",
" # from here you may route your transformed records to anywhere!\n",
" transformed = pipeline.transform_record(record)\n",
" print(transformed[\"record\"])"
]
}
],
"metadata": {
Expand All @@ -226,4 +247,4 @@
},
"nbformat": 4,
"nbformat_minor": 4
}
}
2 changes: 1 addition & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@
'dataclasses;python_version<"3.7"'
],
extras_require={
"pandas": ["pandas==1.0.3"],
"pandas": ["pandas>1.0.0,<1.1.0"],
"fpe": ["numpy", "pycryptodome==3.9.8", "dateparser==0.7.6"]
},
)
2 changes: 1 addition & 1 deletion transformers.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
Welcome to the Gretel Transformers documentation! Here we will introduce you to the concepts in the Transformers sub-package and
provide some basic tutorials for getting started.

For more advanced usage, please refer to our tutorials / guides on [our blog](https://www.medium.com/gretel-ai).
For more advanced usage, please refer to our [blueprints](https://github.com/gretelai/gretel-python-client/tree/master/blueprints).

## Installation

Expand Down

0 comments on commit 95bbfec

Please sign in to comment.