This document lists the changes between release versions.
These are user-facing changes. To see the changes in the code between versions you can compare git tags.
The format is based on Keep a Changelog and this project adheres to Semantic Versioning.
Added
for new features.Changed
for changes in existing functionality.Deprecated
for soon-to-be removed features.Removed
for now removed features.Fixed
for any bug fixes.Security
in case of vulnerabilities.
- Fixed an issue where postgres seed table was using invalid datatype
DATETIME
. changed toTIMESTAMP
- Fixed a issue when updates to MSSQL data would result in multiple messages coming back from the server, e.g. when triggers update multiple tables and NOCOUNT is OFF. Another scenario is before or after scripts that call stored procs wth PRINT statements in them or that return multiple resultsets before completing. Without the fix, this issue can result in tables only partially anonymized.
- Fix error messages when running MSSQL anonymization with --verbose enabled
- pynonymizer CLI will now catch errors and log them instead of dumping the stacktrace.
- Fixed a bug where MSSQL provider would override the server connection with (local).
- Fixed a bug where failing dump & restore operations would not be caught by pynonymizer. Pynonymizer will now error when mysql/mysqldump or psql/pgdump returns an error code.
- Fixed an issue with the release workflow for the docker deployment.
- Fixed a bug where verbose logging was not lowering the log level correctly
- Logging output will now be colored according to level
- Added debug log: sql statements to mssql, mysql, postgres providers
- Added debug log: connection string log to mssql
- Fixed a bug where command line switches were not named correctly.
--mysql-cmd-opts
and--mssql-backup-compression
were duplicated, making some options difficult to access. - Error codes from the CLI were not working correctly. Pynonymizer will now report failure on validations and other errors.
- Fixed a bug where some environment variables were changed. Both variations will now be supported to ease migration.
PYNONYMIZER_START_AT
,PYNONYMIZER_START_AT_STEP
,PYNONYMIZER_STOP_AT
,PYNONYMIZER_STOP_AT_STEP
,PYNONYMIZER_STRATEGY
,PYNONYMIZER_STRATEGYFILE
- pynonmizer now has docker images with the database client and dependencies built in. These are in beta, please report any issues to the github. See https://hub.docker.com/r/rwnxt/pynonymizer
- mssql now sets Mars_Connection=Yes on connection as this allows support for threading without connection busy.
- Fixed a bug where
--skip-steps
would be ignored if passed via the CLI. [#152]
--workers
option to control number of threads to use when anonymizing. Table anonymization will be done in threads and should result in a major speed boost. By default, a single worker is used.--mssql-ansi-warnings-off
option (default: on) to disable ansi warnings when making updates
- Fixed a bug where using the raw
pynonymizer
command would cause an unexpected error.
- Better error messages for unsupported fake types - the error should now explain the problem and link to the docs in the right section. [#133]
- Error message for when a fake_type is used with the wrong config kwargs (these would have previously been caught under "unsupported fake types")
- event hooks for progress events. you can now use your own progress bar if you're invoking the process via the python interface.
- support for connection strings in mssql.
- removed credentials validation (db_user, db_password) from the initial cli validation, as these are now handled by your original database. This should grant the most flexibility for different types of credentials.
- Positional INPUT. Use the -i/--input option instead
- Positional STRATEGYFILE. Use the -s/--strategy option instead
- Positional OUTPUT. Use the -o/--output option instead
- Deprecated environment keys:
DB_TYPE, DB_HOST, DB_NAME, DB_USER, DB_PASS
. --fake-locale
-l
$PYNONYMIZER_FAKE_LOCALE
cli option. Use thelocale:
key in your strategyfile insteaddotenv
is no longer a dependency. you are expected to load your environment variables yourself!- Removed local server check for mssql, as manual overrides using connection string will be possible. pynonymizer isn't going to stop you!
- Postgres FAKE_UPDATE strategy now uses an id-based randomization to pick from the seed table. This speeds up updates on large tables, as the seed table no longer needs to be ordered for every row in the target table. Also this makes performances less dependent on the length of the seed table.
- In MySQL,
db_host
,db_user
,db_pass
anddb_port
are now optional and will be omitted from CLI options if not passed. You can now use themy.cnf
file to pass config to the mysql CLI. [#108]. Read more about the rationale behind these changes in doc/database-credentials.md. - In Postgres,
db_pass
will be optional and will be omitted from CLI options if not passed. You can now use the.pgpass
file to securely pass the password to the pg CLI. Read more about the rationale behind these changes in doc/database-credentials.md.
- Added offical test support for python 3.10
- Added a new option,
--ignore-anonymization-errors
that will allow the anonymization step to error without propagating errors upstream. This is useful if you always want the resulting dumpfile, even when there are db or schema faults.
- Changed how seed rows are assigned their default value and made sure this value made it's way into the documentation (150 rows by default).
- Column Strategy "empty" has been deprecated because its effect was inconsistent on different providers and column types.
using it will generate a warning on config parse. It has been removed from the documentation so as to cut down on confusion.
The recommended way to update a column to empty is to use a
literal
set to the appropriate "empty" data for the column.
- Removed offical test support for python 3.6
- Removed process step
GET_SOURCE
(no-op) as it was causing confusion. This is not considered a breaking change as it was never implemented.
- Changed anonymization process to attempt to anonymize all tables before throwing errors. [#96]
- Fixed a bug in mysql/postgres that didn't wait for the restore dump process to complete before starting the anonymize procedure
- Added a workflow that automatically runs
black
on incoming PRs, to set a canonical standard for formatting in the project.
- Fixed a bug in mysql/postgres where stdin is not closed after reading
- Fixed a bug in the config parser where literal column strategies that contained
type
threw an error
- Fixed a bug in the config parser that would always override the locale with either the cli argument or the fallback
- Added Custom Providers to strategyfiles. You can now include custom Faker providers in your
strategyfiles, using the
providers
top-level key. - Added Locale key
locale
(top-level) to strategyfiles. This is designed to replace the-l
/--fake-locale
CLI option.
- Deprecated
-l
/--fake-locale
option. You should use thelocale
key in your strategyfile. This will be removed in a future release.
- Fixed a bug in postgres where table and column names were not consistently escaped.
- Fixed some documentation errors.
- Fixed a bug where postgres tables could not be anonymized if they contained a json or jsonb column.
- Added a new compact syntax for column strategy
literal
. Surrounding your value in parentheses()
will select the literal strategy. - Added new cli option
--only-step
to select a step to run. Useful for running anonymization routines without having to use start-at and stop-at simultaneously. e.g.--only-step ANONYMIZE_DB
.
- Improvements to mssql driver auto-selection process, to attempt to select a more recent driver by default.
- Fixed a bug where on a system with no ODBC drivers, an incorrect error would be thrown.
- Fixed a fatal error when deprecated environment vars were used (See 1.2.0 notes for a full list).
- Added Integration tests for MSSQL
- Added Driver auto-selection for MSSQL. This should find the correct driver on different systems. Where multiple drivers are detected a warning will be issued, and the first driver will be selected. To override this behaviour, you can specify the driver manually using cli opt
--mssql-driver / $PYNONYMIZER_MSSQL_DRIVER
- Added support for remote servers in MSSQL, provided RESTORE_DB/DUMP_DB is not being used.
- Added support for MSSQL servers using custom ports.
- Changed Integration tests to show cmd output better in CI.
- Changed MSSQL connection behaviour: provider will now allow truncation using
ANSI_WARNINGS off;
when updating tables. - Changed verbose-mode behaviour to include more information via debug logs during CLI runs.
- Fixed an issue in MSSQL where anonymization would be deterministically applied to all rows the same instead of distributed.
- Fixed a typo in mssql, mysql and postgres when running strategy before/after scripts.
- Added New additional-opts style parameters which allow you to control the behaviour of the underlying dump and restore tools more effectively.
*-cmd-opts
will be appended to the batched restore subprocess, while*-dump-opts
will be appended to the dump (pgdump, mysqldump) subprocess These are advanced features and can seriously modify or even break the behaviour of pynonymizer. Use with caution!--mysql-cmd-opts/$PYNONYMIZER_MYSQL_CMD_OPTS
--postgres-cmd-opts/$PYNONYMIZER_POSTGRES_CMD_OPTS
--postgres-dump-opts/$PYNONYMIZER_POSTGRES_DUMP_OPTS
- Added cli warnings for deprecated positional arguments and environmental variables.
- Added Python 3.9 to official support description(pypi) and unit test targets.
- Changed order of
additional_args
for Mysql dump runner, so additional options are added after the existing args.
- Fixed a bug where
$PYNONYMIZER_VERBOSE
was not being detected correctly.
- Added support for LZMA
*.xz
compression on file output. To use this feature, Specify an output path ending with.xz
.
- Minor improvements to internal test suite.
- Added main function
run
exposure topynonymizer
package. Check out pynonymizer/pynonymize.py for more usage informationfrom pynonymizer import run
- Added more documentation to reflect usage & project features.
- Exclude
tests/
directory from pip package.
- Added
fake_update
Strategy feature:sql_type
: Choose an sql datatype to cast to when anonymizing values. The new value will be cast to the chosen datatype.
- Fixed PostgreSQL column escaping. Keywords like
SELECT
,FROM
orWHERE
can now be used in strategy files.
- Added new table strategy
DELETE
, which should delete with checks (e.g. foreign keys) on most providers.
-
Added ability to pipe output to/from pynonymizer from stdout, using
-
in place of the input/output arguments. This functionality is available for mysql and postgres providers.This means you can now use pynonymizer as part of a pipeline with other tools, e.g.
mysqldump [...] | pynonymizer -i - -o - | aws s3 cp - s3://bucket/aws-test.tar.gz
- Changed default logging output to stderr. This is to facilitate stdin/out being used for data.
- Removed production logging feature in favour of stderr/out logging. Logging to files will no longer by considered pynonymizer's concern.
- Changed package metadata to improve PyPI presence.
- Fixed an incorrectly labelled version string:
1.11.10
. Re-releasing under this version.
- Changed mysql provider to include an arbitrary delay after anonymize_db, to prevent
interference with transactional dump
mysqldump
calls.
- Removed the database connection pre-test as its use suggests conditions about the database that are not present in all circumstances with all providers. Pynonymizer will no longer test a connection as part of a dry-run.
- Fixed a confusing note in the README.md that was introduced accidentally
- Added
--mysql-dump-opts
/$PYNONYMIZER_MYSQL_DUMP_OPTS
to allow custom command overrides to the mysqldump process.
- Fixed a bug where complex arguments to faker could cause an invalid seed table column name to be generated.
- Fixed a bug where using no fake_update columns would cause an error
- Changed Documentation for
fake_update
strategies to remove an erroneous example type.
- Added
--dry-run
/$PYNONYMIZER_DRY_RUN
option to run all the non-destructive fail-fast options without fear of actual process execution.
- Fixed a bug in mysql provider where using the default unspecified port would cause an error.
- Fixed a bug in postgres provider where seed_rows was being ignored.
- Fixed a bug in the Postgres provider that would cause all rows to be updated with the same 'random' value.
- Fixed a bug in mysql/postgres providers that would cause before/after scripts to fail to run.
- Added option
--db-port
/-P
/$PYNONYMIZER_DB_PORT
to specify your database provider's connection port. - Added option
--verbose
to increase the verbosity of the logging feature. Currently, this is used to log more info from a database error, but more areas will be included in future.
- Fixed a mysql provider issue: in MariaDB, multiple string columns in a strategy could cause a row length error when constructing the seed table.
- Fixed a bug where static code relying on pyodbc would cause a ModuleNotFoundError
- Added errors for mssql ODBC installation issues, missing extras.
- Added support for linux paths in MSSQL backup file moves/restore.
- Added support for postgres (subprocess-based psql/pg_dump): use
--db-type/-t postgres
- Changed mssql support to an package extra. If mssql support is required, install extra
pynonymizer[mssql]
.
- Fixed typo in mssql dependency error (local server required).
- Added Support for different strategyfile formats:
.json
/.yaml
/.yml
- Added option
--seed-rows
: specify the seed row size for fake data generated. - Added MSSQL provider: use
--db-type/-t mssql
- Added MSSQL option
---mssql-backup-compression
and added convention for provider-specific arguments (prefixed bydbtype-
) - Added optional table strategy key:
schema
. For supported databases (mssql), you can now specify the schema of the table strategy. - Added strategy parsing mode for multi-table and multi-column updates of the same table/column names
- Changed mandatory arguments for main process to account for different process step permutations:
- input is optional if
RESTORE_DB
is skipped - strategyfile is optional if
ANONYMIZE_DB
is skipped - output is optional if
DUMP_DB
is skipped - db_name is mandatory if a step prevents it from being determined automatically e.g. strategyfile is missing
- input is optional if
- Fixed an issue where import syntax was preventing certain modules from being loaded in python 3.6
- Added all faker providers to the
fake_update
type. - Added
fake_args
kwargs key to thefake_update
type. You can now make use of parameterized providers in faker! - Added "verbose-style" strategy format to complement original shorthand autodetection.
- Added
where
option for where-clause support on all update_column strategies. Columns with a matching where option are grouped together for execution. - Added Stock strategies for sylius and wordpress 4.4 in the main repository.
- Added column strategy:
literal
for setting literal values e.g.RAND()
,'A String'
- Changed parsing for mapping column strategy keys to classes: Parser no longer ignored unused keys
- Added
scripts
strategyfile section. you can now specifybefore
andafter
scripts to be run decorating the anonymization process. (see doc/strategyfiles.md) - Added Process steps, e.g.
CREATE_DB
,ANONYMIZE_DB
for improved logging and skip behaviour (see doc/process-control.md) - Added process control options:
--start-at STEP
: choose a step to start at (inclusive)--stop-at STEP
: choose a step to stop at (inclusive)--skip-steps STEP [..STEP]
: specify one or more steps to skip in the process
- Changed the way mysql provider handles execution so CalledProcessErrors no longer expose all command parameters by default.
- Fixed some minor spelling errors in the help text.
- Fixed an issue where dumping an empty database(unlikely, but still possible) could cause an unhandled exception.
- Added new environment variables, optionals for all arguments. See the help
pynonymizer -h
for more information
- Changed Internal structure to assist with better testing.
- Deprecated positional arguments. These will no longer appear in helptext and are not required. You can now use all-environmental, all-optional or a mix of both.
Going forward, The preferred solution is
--optional-vars
or environment variables. - Deprecated old environment variables in favour of new prefix
PYNONIMIZER_
. These will continue to work but will be removed in a future release.DB_TYPE -> PYNONYMIZER_DB_TYPE
DB_HOST -> PYNONYMIZER_DB_HOST
DB_NAME -> PYNONYMIZER_DB_NAME
DB_USER -> PYNONYMIZER_DB_USER
DB_PASS -> PYNONYMIZER_DB_PASSWORD
FAKE_LOCALE -> PYNONYMIZER_FAKE_LOCALE
- Added
-v
--version
flag argument. [#4] - Added additional metadata to
setup.py
for better PyPI info. [#3]
- Package Release - Hooray!