Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Test Data Checked into the Package

We Ideally we do not want to keep large amounts of data under version control. For test data, I think 10 kilobytes or so is Ok, but when it gets larger one should use the external location discussed below. Files that you do check in could go right in the test directory alongisde the test, or you can create new directories for test data. Another standard directory in the SConsTools system is data, which is fine, but it is intended for package data as opposed to testing data. One could also create a subdirectory to the test directory, such as

  • test/data  or
  • test/fixtures

To hold small amounts of test data.

However for data files checked in with the package, the general problem is where to put them, and how to find them at run time. Probably the best practice is to use the data subdirectory. This is intended for application data with the package, so you may want to make a subdirectory under it for testing files. The advantage of the data directory is that it is wired into the release. So if we add the directories

  • MyPkg/data/testdata

and then create the file

  • MyPkg/data/testdata/mytestfile.txt

there that starts with the string "my", then we can write a unit test that uses a psana utility to find the file and test that it starts with "my". The psana utility is the Python class AppDataPath in the AppUtils package (there is a C++ version there as well). The unit test would To construct the correct path to read test data, note that during the nightly build ,the current working directory will be the release directory. Hence a Python unit test might look like

Code Block
languagepython
    def test_mytesttestMyFile(self): 
        text = file('MyPkg/test/fixtures/myfile.txt','r').read()import os
        self.assertTrue(text.startswith('file text'))

One thing to note, scons test only runs tests for packages that are part of the working release. It does not run tests for all the packages that are part of the base release. I think the simplest thing for looking for package data, is to just look in one place, using a path relative to the working release directory like above. This works for the nightly build. It won't work for developers that use the new sit_setup feature that allows them to change directories away from their working release, but I think we should just follow the convention that we run scons test from the working release directory. This is different than how packages work with files in the 'data' subdirectory, where they really should check is the package is part of the working/test release and then look in the base release.

External Test Data Location

A directory for test data has been set up here:

/reg/g/psdm/data_test

that was created expressly for the purpose of storing test data for the analysis releases. Presently it holds xtc files, and some calibration constant files. We do not want to copy entire xtc files from the experiments into this location as they are to big. We need to select the parts of the xtc file necessary for testing. The current organization of the data_test directory is

data_test/Translatorsamples from approximately 80 different xtc files that cover a broad range of psana types and Translator issues. A unit tests will typically work with one of these xtc files at a time.
data_test/multifilesamples from 8 different experiments, suitable for unit tests that work with the psana datasource string specification to work with a set of xtc files from an experiment
data_test/typessoft links to files in data_test/Translator to easily identify a file with a given type
data_test/calibcalibration test data. Same structure as calib directory to an experiment

Keeping the test data files small makes the preparation of xtc test data tedious. One must identifying the parts of the xtc file that you need for your test. An xtc file is comprised of datagrams. Event data are in L1Accept datagrams, but a properly formatted xtc file includes transition datagrams that precede and follow the L1Accept datagrams. At the low most tedious level, making a small xtc file for testing involves identifying the beginning and ending offsets of the datagrams you need to make your file. I'll give an example below of how I do this with tools I've written. Other people are welcome to add examples of using other tools that they find easier to work with.

Once you have prepared some test data, you can either add it to the Translator subdirectory, or the multifile subdirectory, or create a new sub directory, maybe with your package name (like I did when I made the Translator subdirectory). If you want to add it to Translator or multifile, please contact me (davidsch) as these files have specific naming conventions and there are unit tests in the psana_test package that access them. Creating a new subdirectory requires less coordination, however if you think the test data is going to be useful to others, we should work together on it. One of the benefits of using the psana_test package, is I have a mechanism for checking in the md5 checksums of the test data into svn. This allows the unit tests to verify that the test data has not changed.

Making a small xtc test file

This section covers different methods to make small xtc test files. Presently the largest xtc test file in data_test is about 1GB, which is bigger than it needs to be. I think we should be able to keep test files down to 20-100 MB, smaller files mean faster unit tests as well.

Using psana_test and xtclinedump

from AppUtils.AppDataPath import AppDataPath
        testFileDataRelPath = os.path.join('MyPkg', 'testdata', 'mytestfile.txt')
        testFilePath = AppDataPath(testFileDataRelPath).path()
        assert len(testFilePath)>0 , "test file (relative to release data dir): %s not found." % testFileDataRelPath
        fileText = file(testFilePath, 'r').read()
        self.assertTrue(fileText.startswith("my"),
                        msg="Test file=%s doesn't start with my" % testFilePath)

It may be worth understanding the mechanism by which AppDataPath works. At run time, SconsTools will create two directories:

  • unitTestTutorial/data               # release data dir
  • unitTestTutorial/data/MyPkg    # soft link to unitTestTutorial/MyPkg/data

Moreover, when sit_setup was run, it will set the environment variable SIT_DATA. SIT_DATA is a : separated list of paths, the first being the absolute path to unitTestTutorial/data, the second being the absolute path to the data directory of the base release. AppDataPath goes through these paths in order, returning the first match. One thing to note, scons test only runs tests for packages that are part of the working release. It does not run tests for all the packages that are part of the base release. Given this, there is no reason to search the base release, but there should be no harm as well. Harm could conceivably befall a developer who was modifying a test that is checked into an existing package in the base release. Were the developer to change the name of the test file in the working/test release, but not modify the unit test code to use the new name, then AppDataPath would find the old test file in the base release data directory.

External Test Data Location

A directory for test data has been set up here:

/reg/g/psdm/data_test

that was created expressly for the purpose of storing test data for the analysis releases. Presently it holds xtc files, and some calibration constant files. We do not want to copy entire xtc files from the experiments into this location as they are to big. We need to select the parts of the xtc file necessary for testing. The current organization of the data_test directory is

data_test/Translatorsamples from approximately 80 different xtc files that cover a broad range of psana types and Translator issues. A unit tests will typically work with one of these xtc files at a time.
data_test/multifilesamples from 8 different experiments, suitable for unit tests that work with the psana datasource string specification to work with a set of xtc files from an experiment
data_test/typessoft links to files in data_test/Translator to easily identify a file with a given type
data_test/calibcalibration test data. Same structure as calib directory to an experiment

Keeping the test data files small makes the preparation of xtc test data tedious. One must identifying the parts of the xtc file that you need for your test. An xtc file is comprised of datagrams. Event data are in L1Accept datagrams, but a properly formatted xtc file includes transition datagrams that precede and follow the L1Accept datagrams. At the low most tedious level, making a small xtc file for testing involves identifying the beginning and ending offsets of the datagrams you need to make your file. I'll give an example below of how I do this with tools I've written. Other people are welcome to add examples of using other tools that they find easier to work with.

Once you have prepared some test data, you can either add it to the Translator subdirectory, or the multifile subdirectory, or create a new sub directory, maybe with your package name (like I did when I made the Translator subdirectory). If you want to add it to Translator or multifile, please contact me (davidsch) as these files have specific naming conventions and there are unit tests in the psana_test package that access them. Creating a new subdirectory requires less coordination, however if you think the test data is going to be useful to others, we should work together on it. One of the benefits of using the psana_test package, is I have a mechanism for checking in the md5 checksums of the test data into svn. This allows the unit tests to verify that the test data has not changed.

Making a small xtc test file

This section covers different methods to make small xtc test files. Presently the largest xtc test file in data_test is about 1GB, which is bigger than it needs to be. I think we should be able to keep test files down to 20-100 MB, smaller files mean faster unit tests as well.

Using psana_test

Psana_test includes a library of Python code with a function to copy out a few datagrams from each xtc file for a run. An example of use is

Code Block
languagepython
import psana_test.psanaTestLib as ptl
ptl.copyToMultiTestDir('cxie9214',63,1,2,'/reg/g/psdm/data_test/multifile/test_012_cxie9214')

For experiment cxie9214, run 64, 1 calib cycle, the first 2 events from this calib cycle (for each stream) are copied into xtc files with the same name in the given directory. Moreover a 'index' subdirectory will be made and index files will be written there.

Using xtclinedump

Before writing that function, I would do things by hand. Suppose we want some test data for Epix100aConfig. We know it is somewhere in this file:For the testing that I have done, I typically want to run psana on a few datagrams in an xtc file to test how it parses a new type or handles some damaged data. Suppose we don't have unit tests to see how psana handles Epix100aConfig version1 and EpixElement version 2, and we have identified an experiment xtc file with these types, namely

/reg/d/psdm/xcs/xcsi0314/xtc/e524-r0213-s03-c00.xtc

...