Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Add support for clearing tables between tests #55

Open
goldsam opened this issue Aug 31, 2019 · 9 comments
Open

Feature Request: Add support for clearing tables between tests #55

goldsam opened this issue Aug 31, 2019 · 9 comments

Comments

@goldsam
Copy link

goldsam commented Aug 31, 2019

Testing would be easier if database state could be reset at the beginning of each test

@vladholubiev
Copy link
Member

Hey Sam!

I totally agree, this will simplify my tests as well. I'll put this on my agenda.

@goldsam
Copy link
Author

goldsam commented Sep 5, 2019

I ended up solving this by creating a helper script which explicitly deletes every item from one or all tables. I invoke these methods in a beforeEach block to reset database state before each test. Although this works, it has some limitations.

What I quickly discovered was that Jest runs test files in parallel which becomes problematic for code using a shared resource such as DynamoDB due to race conditions. I ended up solving this (at least temporarily) by restructuring my tests so that all code using a given table is invoked from one root test file and thus executed sequentially.

A better solution would be to create distinct "environments" for each test. I can think of a few approaches:

  1. Within the same DynamoDB instance, create table names uniquely for each test using some kind of random post/prefix. That post/prefix could then be injected into the test so it knows what table names to use. This seems ugly to me.
  2. Instead, allow the test code to simply setup and teardown the database state manualy. It would be nice to simply invoke a method and pass the tables configuration array similar to what is supported in jest-dynamodb-config.js. Admittedly, This solution requires no changes to your library.
  3. Spin up a new DynamoDB instance for each test and inject the port number of that instance into the test environment. This has a lot of cost overhead but provides a strong level of data isolation.

Although this is arguably an unrelated or secondary problem at best, its seemed worthwhile to at least start a discussion.

Below is the dynamodb-utils.ts helper script I mentioned above:

import * as AWS from 'aws-sdk'; 
import { AttributeMap, KeySchemaElement, Key } from 'aws-sdk/clients/dynamodb';
import { DynamoDB } from 'aws-sdk';

function itemToKey(item: AttributeMap, keySchema: KeySchemaElement[]): Key {
    let itemKey: Key = {};
    keySchema.map(key => { 
        itemKey = { ...itemKey, [key.AttributeName]: item[key.AttributeName] };
    });
    return itemKey;
};

export async function clearTable(dynamoDB: AWS.DynamoDB, tableName: string): Promise<void> {
    // get the table keys
    const { Table = {} } = await dynamoDB
      .describeTable({ TableName: tableName })
      .promise();
  
    const keySchema = Table.KeySchema || [];
  
    // get the items to delete
    const scanResult = await dynamoDB.scan({
        AttributesToGet: keySchema.map(key => key.AttributeName),
        TableName: tableName,
        ConsistentRead: true
    }).promise();
    const items = scanResult.Items || [];
  
    if (items.length > 0) {
        const deleteRequests = items.map(item => ({
            DeleteRequest: { Key: itemToKey(item, keySchema) },
        }));

        await dynamoDB
            .batchWriteItem({ RequestItems: { [tableName]: deleteRequests } })
            .promise();
    }
};

export async function clearAllTables(dynamoDb: DynamoDB): Promise<void> {
    const { TableNames } = await dynamoDb.listTables().promise();
    for (const tableName of TableNames) {
        await clearTable(dynamoDb, tableName);
    }

    await new Promise(resolve => setTimeout(resolve, 500));
}

@blakedietz
Copy link

Is there a way to utilize support for transactions to make this even faster?

@goldsam
Copy link
Author

goldsam commented Sep 6, 2019

@blakedietz Do you mean batch operations? Yes, batch operations would have definitely been faster. Another possible approach might be to:

  1. Read the table schema definition.
  2. Delete the table in one operation
  3. Recreate the table using the schema definition from (1)

@freshollie
Copy link

freshollie commented Oct 12, 2019

@goldsam FYI - I rewrote this library with this use case in mind:

https://github.com/freshollie/jest-dynalite

I used dynalite as a mock for dynamo instead of dynamodb-local for several reasons.

Firstly, dynalite is much lighter and so allows us to spin up a single instance for each runner.
Secondly, dynalite allows tables to be created and destroyed quickly.
Thirdly, dynalite does not need java to run.

jest-dynalite provides isolation between tests and between test suites. Give it a go.

@vladholubiev
Copy link
Member

Great job, @freshollie. I've mentioned jest-dynalite in README: https://github.com/shelfio/jest-dynamodb#alternatives

@adrians5j
Copy link

Having the same problem myself. Good thing I found this issue and now I know it's not just me. 🙂

@msoffredi
Copy link

I'm using @goldsam utils (thank you very much for sharing!), adapted to TypeScript and dynamoose (it was already 95% compatible, though), and I found an interesting issue I want to share here. Keep reading if you are trying the same and running multiple tests fails, but running them individually works just fine.

By clearing your tables all at once on every run, you may end up in a race condition because jest would try to optimize and run multiple tests simultaneously. You may end up having tests affect other tests by clearing the entire tables in the middle of running tests.

If this is your case, an easy fix is just to run tests sequentially. I prefer this to other options.

Other options:

  • Ensure tests don't get affected by other tests or existing data (sometimes hard)
  • Make each test clean its own data. This approach prevents you from re-using mock data on multiple tests.

@ohmtech-rdi
Copy link

ohmtech-rdi commented Aug 29, 2022

Just to add on that thread, as we had a lot of discussions here about it (and thanks everyone for their hard work on that!):

dynalite doesn't support transactions and DynamoDB streams, so if you need them, that's an immediate show-stopper.

You can't really unfortunately rely on the previous workaround mentioned on this issue: DeleteRequest are write operations, and because your read operations have eventual consistency (whatever you request, see below), you might get "zombie" items after deleting each table items, which is probably not what you expect from a unit-test isolation perspective.

This issue is quite rare, but testing other 10,000 tests show this issue from time to time. It makes a flaky test. There is no way around it, as one limitation of AWS DynamoDB local is to not acknowledge strong-consistency reads:

Read operations are eventually consistent. However, due to the speed of DynamoDB running on your computer, most reads appear to be strongly consistent.
Source (Emphasis on word "most" added).

Said differently, starting a test by assuming that the database is empty, requires strong-consistency reads if you "clean" a table before running a test. And you can't assume that because of DynamoDB local limitations.

We implemented the solution 1. in @goldsam post upper (creating a new table for each test), as we believe it is, at least conceptually, and whether they said it was ugly, a classic test isolation strategy (avoiding collisions by partitioning space), and the best approach you could have after spinning a new database instance each time (like the excellent jest-dynalite does).

This offers some important features:

  • Each single test is truly isolated from others,
  • Because of that, they can run in parallel, which becomes quickly important if you try to run thousands of tests locally in less than a minute, or in less than 10 minutes on CI, over multiple jest workers.

The following assume you follow the single-table design with DynamoDB.

Here is our setup:

// jest-dynamodb-config.js

module.exports = {
   tables: [], // A new table is created before each test, so don't declare anything here
   port: 8000,
   options: [
      '-sharedDb',
      
      // This uses `:memory:` sqlite in-memory table which is
      // ways of magnitude faster than their file-relative usage.
      // This makes creating a new table instant, and accelerates all database operations.
      // This should be probably a `jest-dynamodb` default.
      '-inMemory', 
   ]
};

In your jest tests:

beforeEach(async () => {
   // reset all modules to isolate every single tests...
   jest.resetModules();
   jest.mock('./../store/client');

   const { prepareNewTable } = require('./helper');
   const tableName = await prepareNewTable();
   
   // ... so that your database use the new table name for every test
   process.env.TableName = tableName;
});

helper creates a new table based on the CloudFormation template, but gives the table name a random name each time to provide isolation:

// helper.js

'use strict';

const crypto = require('crypto');
const fs = require('fs');

const { dynamoDBClient } = require('./../store/client');
const { CreateTableCommand } = require("@aws-sdk/client-dynamodb");
const yaml = require('js-yaml');
const { CLOUDFORMATION_SCHEMA } = require('cloudformation-js-yaml-schema');


// ---------------------------------------------------------------------------

const getCloudFormationDynamoDbTableSchema = () => {
   const templateYaml = '../template.yaml';
   const templateYamlContent = fs.readFileSync(templateYaml, 'utf8');
   const cf = yaml.load(templateYamlContent, { schema: CLOUDFORMATION_SCHEMA });

   let resources = [];
   Object.keys(cf.Resources).forEach(item => {
      resources.push(cf.Resources[item]);
   });

   const tables = resources
      .filter(r => r.Type === 'AWS::DynamoDB::Table')
      .map(r => {
         let table = r.Properties;
         delete table.TableName;                // will be renamed
         delete table.TimeToLiveSpecification;  // errors on DynamoDB local
         return table;
      });

   return tables[0];  // we have only one table per service
};

const TABLE_SCHEMA = getCloudFormationDynamoDbTableSchema ();


// ---------------------------------------------------------------------------

const prepareNewTable = async () => {
   const tableName = crypto.randomBytes(16).toString('hex');

   await dynamoDBClient.send(
      new CreateTableCommand({
         ...TABLE_SCHEMA,
         TableName: tableName,
      })
   );

   return tableName;
};


// ---------------------------------------------------------------------------

module.exports = {
   prepareNewTable,
};

The store implementation:

'use strict';

const {
   GetCommand,
   QueryCommand,
   TransactWriteCommand,
   UpdateCommand,
} = require('@aws-sdk/lib-dynamodb');

const { dynamoDBClient } = require('./client');

// this gets evaluated on each single test,
// because modules are reset for each single test
const { TableName } = process.env;

Running around 1000 tests for a service (with all the rest of the code) takes around 10 seconds on a 10-cpu computer, and around 2 minutes on GitHub actions, with DynamoDB taking the most time for each test. That's a bit slow, but that means you can probably run 5000 tests in around the ideal 10 minutes for CI, which should provide in most cases an excellent level of unit testing.

Finally, you can achieve this way to not have any sort of test infrastructure leakage in your production code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

7 participants