|  
 
    In the test design task, two significant artifacts were identified and described: Test Scripts and Test Cases. Without
    Test Data, these two artifacts cannot be implemented and executed. They are merely descriptions of conditions,
    scenarios, and paths without concrete values to succinctly identify them. Test Data, while not an artifact in its own,
    significantly impacts the success (or failure) of test. Testing cannot be implemented and executed without Test Data,
    as Test Data is required for the following:
 
    - 
        input to create a condition
    
 
    - 
        output to evaluate a requirement
    
 
    - 
        support (a precondition to the test)
    
 
 
    Therefore identifying the values is an important effort which is done when Test Cases are identified (see Artifact: Test Case and Guideline: Test Case).
 
    There are four attributes of Test Data that should be addressed when identifying the actual Test Data:
 
    - 
        depth - the volume or amount of data in the Test Data
    
 
    - 
        breadth - the degree of variation in the Test Data
    
 
    - 
        scope - the relevancy of the Test Data to the test objective
    
 
    - 
        architecture - the physical structure of the Test Data
    
 
 
    Each of these characteristics are discussed in greater detail in the sections below:
 
    Depth is the volume or amount of data used in testing. Depth is an important consideration in that too little data may
    not reflect real-life conditions, while too much data is hard to manage and maintain. Ideally, testing should begin
    with a small set of data that supports the critical Test Cases (usually the positive Test Cases). As confidence is
    gained during testing, the Test Data should be increased until the depth of data is representative of the deployed
    environment (or what is appropriate and feasible).
 
    Breadth refers to the degree to which the Test Data values vary. One could increase the depth of Test Data by just
    creating more records. While this is often a good solution, it does not address the true variations in data that we
    would expect to see in actual data. Without these variations in our Test Data, we may fail to identify defects (after
    all, not every withdrawal from an ATM is for $50.00). Therefore, Test Data values should reflect the data values found
    in the deployed environment, such as withdrawing $10.00, or $120.00. Additionally, Test Data should reflect real-world
    information such as:
 
    - 
        Names including titles, numerical values, punctuation, and suffixes: 
        
            - 
                Dr. James Bandlin, Ms. Susan Smith, and Rev. Joseph P. Mayers
            
 
            - 
                James Johnson III, Steven Wilshire 3rd, and Charles James Ellsworth, Esq.
            
 
            - 
                Ellen Jones-Smythe, Brian P. Tellstor
            
 
         
     
    - 
        Addresses with multiple address lines such as: 
        
            - 
                6500 Broadway Street
 
                 Suite 175
             
            - 
                1550 Broadway
 
                 Floor 17 
                 Mailstop 75A
             
         
     
    - 
        City (and Country) Codes and Telephone Numbers that are real and correspond 
        
            - 
                Lexington, MA, USA + 01 781 676 2400
            
 
            - 
                Kista, Sweden +46 8 56 62 82 00
            
 
            - 
                Paris, France +33 1 30 12 09 50
            
 
         
     
 
    Test Data values can be either a physical representation or a statistical representation of the real data to obtain
    sufficient breadth. Both methods are valuable and suggested.
 
    To create Test Data based upon a physical representation of the deployed data, identify the allowable values (or
    ranges) for each data element in the deployed database and ensure that, for each data element, at least one record in
    the Test Data contains each allowable value.
 
    For example:
 
    
        
            
                | 
                     
                 | 
                
                    Account Number (range)
                 | 
                
                    PIN number 
                     (integer)
                 | 
                
                    Account Balance 
                     (decimal)
                 | 
                
                    Account Type 
                     (string)
                 | 
             
            
                
                    (S) 0812 0000 0000 to 
                     0812 9999 9999 
                    
                        © 0829 0000 0000 to 
                         0829 9999 9999
                     
                    
                        (X) 0799 0000 0000 to 
                         0799 9999 9999
                     
                 | 
                
                    0000 - 9999
                 | 
                
                    -999,999.99 to 999,999.99
                 | 
                
                    S, C, X
                 | 
             
            
                | 
                    record 1
                 | 
                
                    0812 0837 0293
                 | 
                
                    8493
                 | 
                
                    -3,123.84
                 | 
                
                    S
                 | 
             
            
                | 
                    record 2
                 | 
                
                    0812 6493 8355
                 | 
                
                    3558
                 | 
                
                    8,438.53
                 | 
                
                    S
                 | 
             
            
                | 
                    record 3
                 | 
                
                    0829 7483 0462
                 | 
                
                    0352
                 | 
                
                    673.00
                 | 
                
                    C
                 | 
             
            
                | 
                    record 4
                 | 
                
                    0799 4896 1893
                 | 
                
                    4896
                 | 
                
                    493,498.49
                 | 
                
                    X
                 | 
             
        
     
 
    The above matrix contains the minimum number of records that would physically represent the acceptable data values. For
    the Account Number, there is one record for each of the three ranges, all the PIN numbers are within the range
    specified, there are several different Account Balances - including one that is negative, and there are records for
    each of the different Account Types. The matrix above is the minimum data, and best practice would be to have data
    values at the limits of each range as well as inside the range (see Guideline: Test Case).
 
    The advantage of physical representation is that the Test Data is limited in size and manageable, focused on and
    targeting the acceptable values. The disadvantage however, is that actual, real-world data is not completely random.
    Real data tends to have statistical profiles that may affect performance, which when using physical representation,
    would not be observed.
 
    Statistical Test Data representation is Test Data that reflects a statistical sampling (of the same percentages) of the
    production data. For example, using the same data elements as above, if we analyzed the production database and
    discovered the following:
 
    - 
        Total number of records: 294,031
    
 
    - 
        Total number of account type S: 141,135 (48 % of total)
    
 
    - 
        Total number of account type C: 144,075 (49 %)
    
 
    - 
        Total number of account type X: 8,821 (3 %)
    
 
    - 
        Account numbers and PIN numbers are evenly distributed
    
 
 
    our Test Data, based upon statistical sampling would include 294 records (as compared to the four we noted above):
 
    
        
            
                | 
                     
                 | 
                
                    Test Data (at .1 percent of production)
                 | 
             
            
                | 
                    Number of Records
                 | 
                
                    Percent
                 | 
             
            
                | 
                    Total Number of records
                 | 
                
                    294
                 | 
                
                    100
                 | 
             
            
                
                    Account numbers 
                     (S) 0812 0000 0000 to 
                     0812 9999 9999
                 | 
                
                    141
                 | 
                
                    48
                 | 
             
            
                
                    Account numbers 
                     © 0829 0000 0000 to 
                     0829 9999 9999
                 | 
                
                    144
                 | 
                
                    49
                 | 
             
            
                
                    Account numbers 
                     (X) 0799 0000 0000 to 
                     0799 9999 9999
                 | 
                
                    9
                 | 
                
                    3
                 | 
             
        
     
 
    The above matrix only addresses the account types. In developing the best Test Data based upon statistical
    representation, you'd include the significant data elements. In the above example, that would include reflecting the
    actual account balances.
 
    A disadvantage of the statistical representation is that may not reflect the full range of acceptable values.
 
    Typically, both methods of identifying Test Data are used to ensure that the Test Data address all values and
    performance / population issues.
 
    Test Data breadth is relevant to the Test Data used as input as well as the Test Data used to support testing (in
    pre-existing data).
 
    Scope is the relevancy of the Test Data to the test objective, and is related to depth and breadth. Having a lot of
    data does not mean its the right data. As with the breadth of Test Data, we must ensure that the Test Data is relevant
    to the test objective, that is, that there is Test Data to support our specific test objective.
 
    For example, in the matrix below, the first four Test Data records address the acceptable values for each data element.
    However, there are no records to evaluate negative balances for account types C and X. Therefore, although this Test
    Data correctly includes a negative balances (valid breadth), the data below would be insufficient in its scope to
    support any testing using negative account balances for each account type. Expanding this data to include additional
    records, including negative balances for each of the different account types would be necessary to address this
    oversight.
 
    
        
            
                | 
                 | 
                
                    Account Number (range)
                 | 
                
                    PIN number 
                     (integer)
                 | 
                
                    Account Balance 
                     (decimal)
                 | 
                
                    Account Type 
                     (string)
                 | 
             
            
                
                    (S) 0812 0000 0000 to 
                     0812 9999 9999 
                    
                        © 0829 0000 0000 to 
                         0829 9999 9999
                     
                    
                        (X) 0799 0000 0000 to 
                         0799 9999 9999
                     
                 | 
                
                    0000 - 9999
                 | 
                
                    -999,999.99 to 999,999.99
                 | 
                
                    S, C, X
                 | 
             
            
                | 
                    record 1
                 | 
                
                    0812 0837 0293
                 | 
                
                    8493
                 | 
                
                    -3,123.84
                 | 
                
                    S
                 | 
             
            
                | 
                    record 2
                 | 
                
                    0812 6493 8355
                 | 
                
                    3558
                 | 
                
                    8,438.53
                 | 
                
                    S
                 | 
             
            
                | 
                    record 3
                 | 
                
                    0829 7483 0462
                 | 
                
                    0352
                 | 
                
                    673.00
                 | 
                
                    C
                 | 
             
            
                | 
                    record 4
                 | 
                
                    0799 4896 1893
                 | 
                
                    4896
                 | 
                
                    493,498.49
                 | 
                
                    X
                 | 
             
            
                | 
                    New Record 1
                 | 
                
                    0829 3491 4927
                 | 
                
                    0352
                 | 
                
                    -995,498.34
                 | 
                
                    C
                 | 
             
            
                | 
                    New Record 2
                 | 
                
                    0799 6578 9436
                 | 
                
                    4896
                 | 
                
                    -64,913.87
                 | 
                
                    X
                 | 
             
        
     
 
    Test Data scope is relevant to the Test Data used as input as well as the Test Data used to support testing (in
    pre-existing data).
 
    The physical structure of Test Data is relevant only to any pre-existing data used by the target-of-test to support
    testing, such as an application's database or rules table.
 
    Testing is not executed once and finished. Testing is repeated within and between iterations. In order to consistently,
    confidently, and efficiently execute testing, the Test Data should be returned to its initial state prior to the
    execution of test. This is especially true when the testing is to be automated.
 
    Therefore, for to ensure the integrity, confidence, and efficiency of testing, it is critical that Test Data be free of
    all external influences, and it state be known at the start, during, and end of the test execution. There are two
    issues that must be addressed in order to achieve this test objective:
 
    Each of these issues will affect how you manage your test database, design your test model, and interact with other
    roles.
 
    Test Data may become unstable for the following reasons:
 
    - 
        external, non-test related influences modify the data
    
 
    - 
        other testers are not aware of what data is used by others
    
 
 
    To maintain the confidence and integrity of testing, the Test Data should be highly controlled and isolated from these
    influences. Strategies to insure the Test Data is isolated include:
 
    - 
        separate test environments-testers have their own test environment, physically separate from others. The testers
        share nothing, that is, they have their own target-of-test and data. This may be accomplished for example with each
        tester having his or her own PC.
 
     
    - 
        separate Test Data base instances-testers have their own instance of data, isolated from all other influences. The
        physical environment, perhaps even the target-of-test, are shared, but with each tester having his or her own
        instance of data, there is little risk of contaminating the Test Data.
 
     
    - 
        Test Data / database partitioning-all testers share the database and are knowledgeable about the data others are
        using (and avoid using other tester's data). For example, one tester may use records 0 - 99, and another tester may
        use records 100 - 199, or someone uses customers with last names Aa - Kz, while another tester uses patients named
        La - Zz.
    
 
 
    The other Test Data architecture issue that must be addressed is that of the initial state of the Test Data at the
    beginning of test execution. This is especially true when test automation is being used. Just as the target-of-test
    must begin the execution of test in a known, desired state, so to must the Test Data. This contributes to the
    repeatability and confidence that each test execution is the same as the previous.
 
    Four strategies are commonly used to address this issue:
 
    - 
        data refresh
    
 
    - 
        data re-initialize
    
 
    - 
        data reset
    
 
    - 
        data roll forward
    
 
 
    Each is described in greater detail below.
 
    The method used will depend upon several factors, including the physical characteristics of the database, the technical
    competence of the testers, the availability of external (non-test) roles, and the target-of-test.
 
    The most desirable method of returning Test Data to its initial state is Data Refresh. This method involves creating a
    copy of the data base in its initial state and storing it. Upon the completion of test execution (or prior to the
    execution of test), the archived copy of the test database is copied into the test environment for use. This ensures
    that the initial state of the Test Data is the same at the start of each test execution.
 
    An advantage of this method is that data can be archived in several different initial states. For example, Test Data
    maybe archived at end-of-day state, end-of-week state, end-of-month state, etc. This provides the tester a method of
    quickly refreshing the to a given state to support a test, such as testing of the end of month use case(s).
 
    If data cannot be refreshed, the next best method is to restore the data to its initial state through some programmatic
    means. Data re-initialize relies on special use cases and tools to return the Test Data to its initial values.
 
    Care must be taken to ensure all data, relationships, and key values are returned to their appropriate initial value to
    ensure that no errors are introduced into the data.
 
    On advantage of this method is that it can support the testing of the invalid values in the database. Under normal
    conditions, invalid data values would be trapped and not allowed entry into the data (for example by a validation rule
    in the client). However, another actor may affect the data (for example an electronic update from another system).
    Testing needs to verify that invalid data is identified and handled appropriately, independent of how it occurs.
 
    A simple method of returning data to its initial state is to "reverse the changes" made to the data during the test.
    This method relies upon using the target-of-test to enter reversing entries, that is, adding records / values that were
    deleted, un-modifying modified records / values, and deleting data that was added.
 
    There are risks associated with this method however, including:
 
    - 
        all the actions must be reversed, not just some
    
 
    - 
        relies upon use cases in the target-of-test (which must be tested to verify proper functionality before they can be
        used for data reset).
    
 
    - 
        database keys, indices, and points may not or cannot be reset
    
 
 
    If this is the only method available in your test environment, avoid using database keys, indices and pointers as the
    primary targets for verification. That is, for example, use the Patient Name field to determine if the patient was
    added to the database instead of using a system generated Patient ID number.
 
    Data roll forward is the least desirable method of addressing the initial state of the Test Data. In fact, it doesn't
    really address the issue. Instead, the state of the data at the completion of test execution becomes the new initial
    state of the Test Data. Typically, this requires modifying the Test Data used for input and / or the Test Cases and
    Test Data used for the evaluation of the results.
 
    There are some instances when when this is necessary, for example at month-end. If no archive of the data, just prior
    to month's end, then the Test Data and Test Scripts from each day and week must be executed to "roll forward" the data
    to the state needed for the test of the month end processing.
 
    Risks associated with this method include:
 
    - 
        database keys, indices, and points cannot be reset (and cannot be used for verification)
    
 
    - 
        data is constantly changing
    
 
    - 
        requires additional effort to certify verification of results
    
 
  
  |