Test Data Standards

Example Work

Page last updated: Fri, Sep, 1 2023 @ 17:34:46 UTC | Estimated minute read time ( words)

In many companies, test data standards are overlooked or defined using useless generalities ("Must be secure", great but specifically what does that mean). As a result, QA Engineers often, with the best intentions, create inaccurate, unstable, unscalable, and in the long term unusable test data because they do not adhere to proper standards.

Even worse, the test data generated may violate security policies or cause the company to fail an audit. For example, if you were trying to use a credit card number to test a payment system you would not want QA Engineers to use their own credit card numbers.

A dangerous, but far too common, practice by companies is the tendency to copy production data into a test system so that it contains "accurate" data.

This should never be done.

This action simply makes your test system a target for hackers to steal valuable data. Remember the hacking doesn't have to be just from the outside. Employees can easily walk out with valuable information and cause great harm to a company. A good example of this test system pitfall is the 2016 Uber data loss that resulted in large fines (though some of those fines were due to coverup operations). Here is a blog article from the FTC about that data loss.

The action of moving data from Production to a lower environment must contain data-masking. However, I believe that engineering data from the bottom up, if time and complexity allows, is superior for several reasons:

  1. When engineering data, the team stops to think about how and you can possibly gain a better understanding of your data and the releationships between data elements.
  2. A process failure doesn't risk the system putting unmasked data into a testing environment.

The sections below outline a few basic test data types and the standards I recommend applying. The goal is to create data that is safe and secure but also accurate.

Enjoy! If you have any feedback, contact information is in the footer.

- Krypton -

If paragraphs of text are needed Lorem Ipsum is a great placeholder. There are lots of generators online (see external links below) or in Microsoft Word the following command can be used to generate this style of text:

=lorem(Number of Paragraphs, Number of Lines)

North America

Country calling code = 1 | Format = NXX NXX-NXXX

It’s commonly known that the area code 555 is used for fictional usage, but the reserved number range is actually limited to 555-0100 through 555-0199. This phone number range is what should be used because some numbers outside of that range are used for legit purposes.

Fun fact: Universal Pictures purchased a phone numbers to use in their production work they were so dedicated to using fake phone numbers.

Address generation can be as simple as a random number generator attached to a listing of the most popular street names in America. For an added benefit attach a random alorithm that includes alphanumeric apartment numbers. This should create data that meets the majority of needed scenarios.

Color names occasionally merge with common last names, but the use is relatively safe. A good resource for common color names is the Wikipedia page for Crayola crayon colors. In addition, it's a fast way of confirming that data has been run through a conversion tool.

A social security number (SSN) is 9 digits in the following format: ###-##-####. Originally, the Social Security Administration (SSA) issued SSNs with the first three digits indicating the region in which the SSN was distributed but that practice has been discontinued because of population distribution. The SSA changed the way SSNs are issued in June of 2011, opting for a “randomization” system to extend the longevity of the nine-digit SSN system. Technically you can generate a nine-digit SSN and it could be associated with an actual account but this is usually acceptable when paired with fictional names, addresses, and phone numbers.
However, if requirements dictate it must be a fake number, currently numbers beginning with the following three digits are not being assigned:

  • 000
  • 666
  • 900-999

This information is found on the Social Security Administration website.

Another alternative fake number to use is 078-05-1120. This will not be issued as it is the most abused number in history.

Identification cards are a tricky subject for test data. It is not recommended to upload real images of IDs belonging to QA Engineers for privacy reasons. It is also a document that can be faked so many issuing regions are hesitant to post examples.

Some states will post examples of what their IDs look like so that people can become familiar with valid format. These websites, like this DMV page from Virginia, are a great source of test data.

These IDs can be used to test document analyzers like Azure AI Document Intelligence

External Links

  • mockaroo.com - test data generator with many options for customization
  • lipsum.com - Lorem Ipsum website that includes more information and a generator

Sources