performance improvement of acceptance tests and performant approach for managing test data

performance improvement of acceptance tests and performant approach for managing test data, so that we run our build in < 10 minutes on CI MIFOS-4420

Goal of the proposal
Speed up the build time on developer machines.
Speed up the build time on CI server (without a change in its configuration, not implying that a change is configuration is-not/would-not-be required).

I have spiked an approach for considerably speeding up acceptance tests run and managing test data. I feel confident that we can now attack these problems and bring down the build time to < 10 minutes on CI. I have explained the spikes and approach below. I have also explained the context for everyone's benefit.

Current State

Acceptance tests

I think most of us aware of it but still. This is the case with most (if not all) acceptance tests.

They delete the data in all the tables
Recreate their own data using dbUnit dataset. (There are a lot these datasets. Some of them are reused. Even when reused it need to perform step 1-4).
Reinitialize the application by reloading the cache.
Perform the test.

Test data

We have two kinds of test data. Integration test data and acceptance test data. Integration test data is in sql form in two files custom_data.sql and testdbinsertionscript.sql. Acceptance test data as mentioned earlier is maintained in a set of dbUnit XML files.
For changing database we add our changes to latest-schema.sql. Whenever we do this we need to do the following for test data i.e. If we want to add specific data for it (default values can also be used in which case we don't need to do the following).

Integration test data Go through two sql files and edit them.
Acceptance test data Go through the dbUnit files and modify the XML files. We have another approach which automatically exports the data, and also modifies all the dbUnit files.

Consequence

Acceptance test suite run takes a long time to run.
Cannot run tests in parallel.
Cannot have clusters of server (this is important for this would tell us if there are scaling out issues).
Changing acceptance test data in quite painful. The dbUnit output file also need to be changed.
Test data setup takes long time.

Proposed approach to solve acceptance tests run performance

Clearly we cannot do 1-4 for all tests. Ideally it should be done only once. This would make for most performant test suite. At the same time how do we ensure one test doesn't step on another by changing the data.
These form our basic requirement. Our current approach ensures correctness but is rather unusable because of performance we get. So we probably should relax fool proof correctness for performance, on which I am basing this proposal.

Get the union of all data sets
This step is spiked now. The idea was to find out one dataset which is union of all the datasets. Where an item in this set is a database table row (row) and the set contains rows from different tables. In order to define union we need to define equality. So, two rows are equal when:
primary key matches
unique key columns are equal
if no primary key, then all column values are equal
I used Guava to find union (key programs used are attached as files, its throwaway code so the quality is not great but it has tests to understand). I ignored some datasets which were very large and targeted for handful of reporting tests. We would need to handle these separately. I also recorded the rejected data using intersection (hopefully be useful for troubleshooting).
You can think of the attached programs as a tool which we would use to create a single data set, during the period of transition. It goes through all the dbUnit data sets, convert every item in each of these datasets to Row objects, uses Guava (google library in Java which allows you do perform set operations) to do a union of these rows and writes them to a file. Again this is a one time tool not meant to used after we move to a new approach.

Setup data only once
The sql version of this union (we have this after the spike, to large to attach) found above is run against the database at the beginning before all tests start. In other words instead of running custom_data.sql, testdbinsertionscript.sql and latest-data.sql we run this union, functional-test-data.sql file.

Individual tests isolate their own data if required
Obviously this would mean affect of one test on another test when common data is changed. We wait for this to happen and test to fail because of this. Only when this happens (or when we are writing new test) the test should define its data in a way that it is not known to other test. In other words the tests are responsible for maintaining isolation of their data.

Tests run without setup its own data and reinitialization of application

Some indicative result of running acceptance tests after this spike on my machine. 100 tests, 35 success, 65 failure. 407 seconds. Obviously we need to fix all the tests. Like when fixing integration tests we need to put some effort towards this. I have not looked at all the failures but what I saw was that pattern was same within a test class. This is encouraging because this might mean that spotting/troubleshooting one problem might resolve multiple issues.

Possible questions on above

Why not use transaction to rollback the data after every test as in integration test? Acceptance test differ with integration test. The test don't run in the same process as application under test. The tests are exponentially long running. Multiple transactions are performed. Given these if we use transaction then.

It would not be a black box test.
We would have to make modifications in application under test for running tests in this mode.
We would have issues when running tests in parallel, as long running multiple transactions would create database locks, deadlocks and waits.

What if two tests need incompatible data or configuration? There are two scenarios here, if we run test in sequence or in parallel. When running test in sequence we can create REST interface in application to be able to change these from the test. This would not work when we want to run tests in parallel. If we do not have too many combinations of this data then can group them. e.g. elsim, glim and then run them one after other.
Also, the spike program can be made aware of elsim and glim configuration by annotating the dataset and creating two separate union output. (A further analysis might throw up more options here though)

Why are we need this improvement given that we are going to follow test pyramid strategy? I think even when we go with test pyramid approach we would still have some selenium functional tests. As the kind of coverage provided by these are not provided by service level tests. As mifos matures the number of such tests would only increase.

If we have a staged CI server, do we care that all acceptance tests take a long time to execute? (where developer only runs smoke group?) Lets take a scenario. We have 195 tests. If we choose 5% (10) of these and move them to smoke. Also lets call the first stage on CI as Dev and next one as Acceptance. The Dev stage build time on CI would come down to 13 minutes or so. If we just choose the database improvement part of proposal it would be 9 minutes. The Acceptance stage would be 25 and 20 respectively.
On developer machines complete build takes from 50 minutes to 3 hours depending on machine configuration. Based on just database optimization it would be 45 minutes to 3 hours.
The build time on developer machine would come down significantly may be close to 15-30 minutes. The real issue which would remain (aggravated by writing of more acceptance tests) is time to run acceptance tests.
I think we should not underestimate the new pains we would see when we break up the build. After the split when the commit is done and the Dev stage is green. All developers would continue the commit. This is good. But we still need to keep Acceptance stage green. Having a suite which runs extremely slow would lead to reluctance, not refactoring acceptance tests, not willing to write new ones and so on. In fact we should do in-memory database, web driver etc keep it as fast as possible. Although these have marginal returns especially on less powerful machines.

Should tests be run in random order if using same test data to ensure early tests are correcting or hiding problems for later tests? This would definitely bring out all the problems that exist in the test suite. This can a be optional piece of work which can be picked up while doing this improvement. This probably is not a must have though we would be hiding problems in the test and not in the application-under-test.

Proposed approach for faster creation of test data

Setting us test data for integration test and acceptance test (after we implement above) would still take quite a lot of time. 10 minutes on my machine. Although this doesn't really provide value worth of 10 minutes. We can reduce this to < 20 seconds with the following.

Keep source of data in dbUnit but just one copy to be used when we want to modify the data.
Maintain database dumps Essentially we can treat the test data as production data and apply database upgrade scripts on it. The only difference being we would be applying them every build once at the beginning of test. We can use mysqlhotcopy to export the data/index files from mysql, zip them up and commit it to source control. (We can also copy these directly from mysql datadir like /var/lib/mysql/mifos folder).
Restore database from the checked in dump Unzip the dump to the mysql datadir. (e.g. /var/lib/mysql/mifos)
Apply database upgrade scripts to these dumps
Run the test (integration or acceptance)