How to Accomodate Variable Test Data Volume

Two truths:

  1. Developers hate waiting for local builds to complete.
  2. Developers need to smoke test local builds using realistic data.

Consider the following simple ERD:

Sample ERD

Very simple. A person has [0..n] orders, an order has [0..n] items, each item represents one product.

Let’s say that part of your developer’s responsibilities are to make sure that their nifty new GridView-inspired widget doesn’t break with a large volume of data in the database. Therefore, you want to have some mechanism to insert that data on local dev databases.

However, some large percentage of time, developers are testing functionality that isn’t very dependent on the volume of test data (simple example: product detail page).

Developers don’t want to rebuild with full data all the time, because it slows things all to hell and isn’t really necessary anyway. How do you accomplish that?

Well, I haven’t been doing a great job at this so far. My system, developed over the past 4 years or so, works great, and DB builds are effective and repeatable and auditable and all that, but I don’t (yet) have a great answer for the “limited data set” problem (more precisely, the “switch sets as necessary” problem). My preferred mechanism to date has been to create a hierarchical directory of SQL scripts, which have to be manually tied together and tested to make sure that the “most dependent” data inserts (in this case, the order items) have all the FK dependencies in the database first. The hierarchy is not a problem; it actually works pretty well. What IS a problem is that I don’t see a great way to take just a portion of that data (from top to bottom) in order to speed up the “everyday” dev builds.

Consider the following two diagrams:

Sample Order Tree Sample Order Tree Data Slice

Both diagrams represent a hierarchical arrangment of Person, Order, and Order Item data, from top to bottom. Consider the topmost node as a “ghost” node. So we have three people who have placed six orders, and there are nine order items.

What I’d like to do is take a “slice” of that larger data set — the orange shaded section in the second diagram — and use it for “everyday” dev builds. So I would have one person, two orders, and three order items in my database after the dev build completes.

Off the top of my head, I see a few ways to do this:

  1. From a full data set, write queries to recursively build INSERT statements based on a limited set of topmost (Person) data. The pseudocode would be something like this:
    SELECT TOP 1 * FROM person ORDER BY name;
    SELECT * FROM order WHERE person_id IN (SELECT TOP 1 person_id FROM person ORDER BY name);
    SELECT * FROM order_item WHERE order_id in (SELECT order_id FROM order….) etc.
  2. Manually build the data by hand
  3. Create C# modules, perhaps called from NUnit or TestDriven.NET, that build a consistent set of limited data in your database?
  4. Use XML or Excel or some other tabular format to provide a quick-entry way to populate the database.

To be honest, none of those look that great. Remember what I’m trying to achieve (and avoid):

  1. Short build times
  2. Two data sets, both realistic: “small and quick” and “full and complete”
  3. Zero-maintenance
  4. One-click deployment of test data

At this point, I think I’ll go do some Googling and see what others have done to get past this hurdle.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: