Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Test Data Checked into the Package

We Ideally we do not want to keep large amounts of data under version control. For test data, I think 10 kilobytes or so is Ok, but when it gets larger one should use the external location discussed below. Files that you do check in could go right in the test directory alongisde the test, or you can create new directories for test data. Another standard directory in the SConsTools system is data, which is fine, but it is intended for package data as opposed to testing data. One could also create a subdirectory to the test directory, such as

  • test/data  or
  • test/fixtures

To hold small amounts of test data.

However for data files checked in with the package, the general problem is where to put them, and how to find them at run time. Probably the best practice is to use the data subdirectory. This is intended for application data with the package, so you may want to make a subdirectory under it for testing files. The advantage of the data directory is that it is wired into the release. So if we add the directories

  • MyPkg/data/testdata

and then create the file

  • MyPkg/data/testdata/mytestfile.txt

there that starts with the string "my", then we can write a unit test that uses a psana utility to find the file and test that it starts with "my". The psana utility is the Python class AppDataPath in the AppUtils package (there is a C++ version there as well). The unit test would To construct the correct path to read test data, note that during the nightly build ,the current working directory will be the release directory. Hence a Python unit test might look like

Code Block
languagepython
    def test_mytesttestMyFile(self):
        import os
        text = file('MyPkg/test/fixtures/myfile.txt','r').read(from AppUtils.AppDataPath import AppDataPath
        testFileDataRelPath = os.path.join('MyPkg', 'testdata', 'mytestfile.txt')
        self.assertTrue(text.startswith('file text'))

One thing to note, scons test only runs tests for packages that are part of the working release. It does not run tests for all the packages that are part of the base release. I think the simplest thing for looking for package data, is to just look in one place, using a path relative to the working release directory like above. This works for the nightly build. It won't work for developers that use the new sit_setup feature that allows them to change directories away from their working release, but I think we should just follow the convention that we run scons test from the working release directory. This is different than how packages work with files in the 'data' subdirectory, where they really should check is the package is part of the working/test release and then look in the base release.

External Test Data Location

A directory for test data has been set up here:

/reg/g/psdm/data_test

that was created expressly for the purpose of storing test data for the analysis releases. Presently it holds xtc files, and some calibration constant files. We do not want to copy entire xtc files from the experiments into this location as they are to big. We need to select the parts of the xtc file necessary for testing. The current organization of the data_test directory is

data_test/Translatorsamples from approximately 80 different xtc files that cover a broad range of psana types and Translator issues. A unit tests will typically work with one of these xtc files at a time.
data_test/multifilesamples from 8 different experiments, suitable for unit tests that work with the psana datasource string specification to work with a set of xtc files from an experiment
data_test/typessoft links to files in data_test/Translator to easily identify a file with a given type
data_test/calibcalibration test data. Same structure as calib directory to an experiment

Keeping the test data files small makes the preparation of xtc test data tedious. One must identifying the parts of the xtc file that you need for your test. An xtc file is comprised of datagrams. Event data are in L1Accept datagrams, but a properly formatted xtc file includes transition datagrams that precede and follow the L1Accept datagrams. At the low most tedious level, making a small xtc file for testing involves identifying the beginning and ending offsets of the datagrams you need to make your file. I'll give an example below of how I do this with tools I've written. Other people are welcome to add examples of using other tools that they find easier to work with.

Once you have prepared some test data, you can either add it to the Translator subdirectory, or the multifile subdirectory, or create a new sub directory, maybe with your package name (like I did when I made the Translator subdirectory). If you want to add it to Translator or multifile, please contact me (davidsch) as these files have specific naming conventions and there are unit tests in the psana_test package that access them. Creating a new subdirectory requires less coordination, however if you think the test data is going to be useful to others, we should work together on it. One of the benefits of using the psana_test package, is I have a mechanism for checking in the md5 checksums of the test data into svn. This allows the unit tests to verify that the test data has not changed.

Making a small xtc test file

This section covers making small xtc test files. Developers are welcome to add sections for other tools they find useful. Presently the largest xtc test file in data_test is about 1GB, which is bigger than it needs to be. I think we should be able to keep test files down to 20-100 MB, smaller files mean faster unit tests as well.

Using psana_test and xtclinedump

testFilePath = AppDataPath(testFileDataRelPath).path()
        assert len(testFilePath)>0 , "test file (relative to release data dir): %s not found." % testFileDataRelPath
        fileText = file(testFilePath, 'r').read()
        self.assertTrue(fileText.startswith("my"),
                        msg="Test file=%s doesn't start with my" % testFilePath)

It may be worth understanding the mechanism by which AppDataPath works. At run time, SconsTools will create two directories:

  • unitTestTutorial/data               # release data dir
  • unitTestTutorial/data/MyPkg    # soft link to unitTestTutorial/MyPkg/data

Moreover, when sit_setup was run, it will set the environment variable SIT_DATA. SIT_DATA is a : separated list of paths, the first being the absolute path to unitTestTutorial/data, the second being the absolute path to the data directory of the base release. AppDataPath goes through these paths in order, returning the first match. One thing to note, scons test only runs tests for packages that are part of the working release. It does not run tests for all the packages that are part of the base release. Given this, there is no reason to search the base release, but there should be no harm as well. Harm could conceivably befall a developer who was modifying a test that is checked into an existing package in the base release. Were the developer to change the name of the test file in the working/test release, but not modify the unit test code to use the new name, then AppDataPath would find the old test file in the base release data directory.

External Test Data Location

A directory for test data has been set up here:

/reg/g/psdm/data_test

that was created expressly for the purpose of storing test data for the analysis releases. Presently it holds xtc files, and some calibration constant files. We do not want to copy entire xtc files from the experiments into this location as they are to big. We need to select the parts of the xtc file necessary for testing. The current organization of the data_test directory is

data_test/Translatorsamples from approximately 80 different xtc files that cover a broad range of psana types and Translator issues. A unit tests will typically work with one of these xtc files at a time.
data_test/multifilesamples from 8 different experiments, suitable for unit tests that work with the psana datasource string specification to work with a set of xtc files from an experiment
data_test/typessoft links to files in data_test/Translator to easily identify a file with a given type
data_test/calibcalibration test data. Same structure as calib directory to an experiment

Keeping the test data files small makes the preparation of xtc test data tedious. One must identifying the parts of the xtc file that you need for your test. An xtc file is comprised of datagrams. Event data are in L1Accept datagrams, but a properly formatted xtc file includes transition datagrams that precede and follow the L1Accept datagrams. At the low most tedious level, making a small xtc file for testing involves identifying the beginning and ending offsets of the datagrams you need to make your file. I'll give an example below of how I do this with tools I've written. Other people are welcome to add examples of using other tools that they find easier to work with.

Once you have prepared some test data, you can either add it to the Translator subdirectory, or the multifile subdirectory, or create a new sub directory, maybe with your package name (like I did when I made the Translator subdirectory). If you want to add it to Translator or multifile, please contact me (davidsch) as these files have specific naming conventions and there are unit tests in the psana_test package that access them. Creating a new subdirectory requires less coordination, however if you think the test data is going to be useful to others, we should work together on it. One of the benefits of using the psana_test package, is I have a mechanism for checking in the md5 checksums of the test data into svn. This allows the unit tests to verify that the test data has not changed.

Making a small xtc test file

This section covers different methods to make small xtc test files. Presently the largest xtc test file in data_test is about 1GB, which is bigger than it needs to be. I think we should be able to keep test files down to 20-100 MB, smaller files mean faster unit tests as well.

Using psana_test

Psana_test includes a library of Python code with a function to copy out a few datagrams from each xtc file for a run. An example of use is

Code Block
languagepython
import psana_test.psanaTestLib as ptl
ptl.copyToMultiTestDir('cxie9214',63,1,2,'/reg/g/psdm/data_test/multifile/test_012_cxie9214')

For experiment cxie9214, run 64, 1 calib cycle, the first 2 events from this calib cycle (for each stream) are copied into xtc files with the same name in the given directory. Moreover a 'index' subdirectory will be made and index files will be written there.

Using xtclinedump

Before writing that function, I would do things by hand. Suppose we want some test data for Epix100aConfig. We know it is somewhere in this file:For the testing that I have done, I typically want to run psana on a few datagrams in an xtc file to test how it parses a new type or handles some damaged data. Suppose we don't have unit tests to see how psana handles Epix100aConfig version1 and EpixElement version 2, and we have identified an experiment xtc file with these types, namely

/reg/d/psdm/xcs/xcsi0314/xtc/e524-r0213-s03-c00.xtc

One of the tools in psana_test is xtclinedump, some documentation is in the psana - Module Catalog. it is a line oriented header dump of xtc files. One can use grep to filter the output so one only sees the datagram headers (which include file offsets in the file) and any xtc headers for a type that has epix as a part of it. Here is a command line that lets me see that there is epix in datagram 5, and to see the offset of where datagram 6 begins. This will be the first part of the xtc I want to save in my small test file. I am also going to want to get some transitions at the end of the file to form a correct xtc file - however this is not necessary, psana can handle xtc files that end abruptly.

Code Block
~/rel2/unitTestTutorial $ xtclinedump xtc /reg/d/psdm/xcs/xcsi0314/xtc/e524-r0213-s03-c00.xtc | grep -i "dg=\|epix" | head -10
dg=    1 offset=0x00000000 tp=Event sv=      Configure ex=1 ev=0 sec=54754FDF nano=1E31D0E8 tcks=0000000 fid=1FFFF ctrl=84 vec=0000 env=0000161C
xtc d=2  offset=0x00022F4C extent=00108834 dmg=00000 src=01003069,19002300 level=1 srcnm=XcsEndstation.0:Epix100a.0 typeid=84 ver=1 value=10054 compr=0 compr_ver=1 type_name=Epix100aConfig plen=1083424 payload=0x0B...
dg=    2 offset=0x0012B780 tp=Event sv=       BeginRun ex=0 ev=0 sec=54755EBD nano=35A7112E tcks=0000000 fid=1FFFF ctrl=06 vec=0000 env=000000D5
dg=    3 offset=0x0012B820 tp=Event sv=BeginCalibCycle ex=0 ev=0 sec=54755EBE nano=00D5EE07 tcks=0000000 fid=1FFFF ctrl=08 vec=0000 env=00000000
dg=    4 offset=0x0012C188 tp=Event sv=         Enable ex=0 ev=0 sec=54755EBE nano=0124366D tcks=0000000 fid=1FFFF ctrl=0A vec=0000 env=80000000
dg=    5 offset=0x0012C228 tp=Event sv=       L1Accept ex=1 ev=1 sec=54755EBE nano=05EE73D6 tcks=005094A fid=144F9 ctrl=8C vec=146F env=00000003
xtc d=2  offset=0x0012C264 extent=0010A454 dmg=00000 src=01003069,19002300 level=1 srcnm=XcsEndstation.0:Epix100a.0 typeid= 1 ver=1 value=10001 compr=0 compr_ver=1 type_name=Xtc
xtc d=3  offset=0x0012C278 extent=0010A440 dmg=00000 src=01003069,19002300 level=1 srcnm=XcsEndstation.0:Epix100a.0 typeid=75 ver=2 value=2004B compr=0 compr_ver=2 type_name=EpixElement plen=1090604 payload=0x00...
dg=    6 offset=0x00266284 tp=Event sv=       L1Accept ex=1 ev=1 sec=54755EBE nano=06EE644D tcks=0050974 fid=144FF ctrl=8C vec=1475 env=00000003
xtc d=2  offset=0x002662C0 extent=0010A454 dmg=00000 src=01003069,19002300 level=1 srcnm=XcsEndstation.0:Epix100a.0 typeid= 1 ver=1 value=10001 compr=0 compr_ver=1 type_name=Xtc

...

See psana - Module Catalog for more detail on the filtering modules. Information on PSXtcOutput can be found at psana - Reference Manual. Note, xtc files created in this fashion are xtc files that "end abruptly", that is the last datagram will be a L1Accept as opposed to the transition sequence EncCalibCycle, EndRun.

Make a Unit Test to Process Output

...