Releases · gretelai/gretel-synthetics

04 Aug 17:25

v0.11.0.rc2

deb22ec

RC2 Pre-release

Pre-release

Support parallel synthetic text generation using multiprocessing (#39)

* Support parallel synthetic text generation using multiprocessing

* add cloudpickle to test reqs

* review comments

* set CUDA_VISIBLE_DEVICES to -1 in workers

* decode symbols one by one

* remove un-used var, bump version for RC

Co-authored-by: Malte Isberner <malte@gretel.ai>
Co-authored-by: John Myers <john@gretel.ai>

Assets 2

04 Aug 14:58

johntmyers

v0.11.0.rc1

deb22ec

0.11.0 RC1 Pre-release

Pre-release

RC1. Adds "read only" most for Batches and default generation via CPUs with maximum parallelization.

Assets 2

19 Jun 23:47

zredlined

v0.10.3

57e6222

Bugfix

🐛 Corrects calculation of batch_size when using the DataFrameBatch interface

Assets 2

19 Jun 21:41

johntmyers

v0.10.2

4ab732c

Bugfix

🐛 When generating new lines via Batch mode, passed max_invalid param is now used vs the module default

Assets 2

18 Jun 20:14

johntmyers

v0.10.1

f4f6279

Bugfix for 0.10.x

🐞 Fix when synthetic Batches are converted back to a DataFrame with a custom field delimiter

Assets 2

16 Jun 00:14

johntmyers

v0.10.0

5be7e8c

DataFrame support and more!

Major changes to Gretel Synthetics including native support for DataFrames and batched column training!

⚙️ Introduce a batch module that allows a DataFrame to be ingested and split into batches of smaller DataFrames where each batch has a subset of the columns of the source DataFrame. This allows training of datasets with several columns while still allowing the preservation of correlations and statistical data. See our Medium Blog for details and our example dataframe_batch Notebook located in the examples directory.

📖 Massive updates to docstrings for the config module. Details for each config parameter.

🤖 Update to generation functionality. If a validator is provided, the gen_lines config option will be used only to count valid lines that are generated. In order to stop run away generation, a max_invalid parameter exists that specifies the maximum number of invalid lines that can be generated. If this number of invalid lines is exceeded, a RunTimeError will be thrown and generation will be halted.

Assets 2

03 Jun 21:52

johntmyers

v0.9.3

508276d

Sentence Piece Updates

⬆️ Upgraded to latest SetencePiece and added a max_line_len param to the Config options. This allows you to override the default SentencePiece line limit and set a custom one. During our testing, we found that we had to set the limit a few thousand characters higher than the actual line limit. For a line that was 49500 chars long, we had to make the limit about 53000, etc.

Assets 2

26 May 15:09

johntmyers

v0.9.2

5edeff8

PyPI Bug Fix

🐛 On installation from PIP where setup.py would fail.

📓 Updates to UCI Notebook

Assets 2

22 May 00:34

johntmyers

v0.9.1

484e411

Python 3.6 Support

This update removes the annotations module from being used in order to provide type checks. We also provide Python 3.6 support by using the [3.6] extras option. By default, the package will work on Colab since Colab already installs a back port of dataclasses. So installing on Colab with the extras is not necessary.

Assets 2

19 May 18:22

johntmyers

v0.9.0

e164132

Config and generation updates

NOTE: This release introduces some new constructs that are NOT backwards compatible with older versions.

⚙️ Configuration Changes:

By default, we will not assume any structure in your training text. Lines will be generated without any presumed delimiter between the text. To use a delimiter you must specify the field_delimiter param when constructing your configuration. Our example notebooks have been updated to reflect this.
Overwrite protection, if there is already a model and tokenizer in your checkpoint directory, you will receive a RunTimeError when attempting to train a new model that would overwrite the old data. If you wish to keep overwriting (like during rapid model generation / testing), set the overwrite param to True in your configuration. Example notebook has been updated to show this param.

👩‍🍳 Cooking up new data

Previously, we would yield a dict when generating a new record. Instead, we will yield a gen_text object. This object has the same data, but you access the various components as attrs of the object, for example if you have a line variable that was emitted from the generator, you can access the raw text by doing line.text
If you provided a delimiter during configuration. The gen_text objects are aware of this, and you can get your generated fields by using the values_to_list() method of the object. See our docs for more detail on this object: https://gretel-synthetics.readthedocs.io/en/stable/api/generate.html

👨‍💻 Code cleanup and test updates

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: gretelai/gretel-synthetics

RC2

0.11.0 RC1

Bugfix

Bugfix

Bugfix for 0.10.x

DataFrame support and more!

Sentence Piece Updates

PyPI Bug Fix

Python 3.6 Support

Config and generation updates