You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 10 Next »

Intro

notes about conda that might be useful for other groups to move to conda packaging. Assumes knowledge of conda.

this url: https://confluence.slac.stanford.edu/display/PSDMInternal/Conda+Details

Good reading:

https://www.continuum.io/blog/developer-blog/whats-old-and-new-conda-build

http://technicaldiscovery.blogspot.com/2013/12/why-i-promote-conda.html

Prod & Dev Installations

updating conda is scary.
conda has been rapidly changing, adding features to support central installations that user's don't have write access too, many features we need.

However it is a python program with a number of dependencies.
Several times a conda update has rendered conda inoperable and I've deleted the entire installation to start again from scratch.

However we're getting better at repairing conda (https://github.com/conda/conda/issues/3928)

Defense: two installations prod, dev, that is completely separate conda installations.
User facing is prod.
Use dev to update conda and test new packages. Then build in prod.

Any tests for conda should go into unit tests (in https://github.com/slaclab/anarel-test)
this is essentially integration testing for complete environments - separate than the per package testing that one

Still if conda breaks the dev installation, it is a pain.

Todo: automate creating new conda installation, clone our current root environment, update conda, test (from a user account)

RHEL5 RHEL6 RHEL7 Installations

All together, I have six conda installations,

a rhel5/rhel6/rhel7
in both dev and prod.

This is cumbersome, but no more so than what we did with RPM's before conda.

PIP vs Conda Packaging

Part of anaconda's success is the fact that pip works.

However another part is conda packaging is better, as discussed in conda blog linked above it handles dependency tracking better, makes building a big software stack with numpy, etc more robust.

For the production, multi-user conda environments that we maintain, I would prefer we don't pip anything in them. Issues:

  • Might end up with two of something, for example numpy, one from a pip package dependencies, another from our condaThe fact that pip works 
  • things like cloning an environment, etc, might not work, conda can't track pip as well as its own package

Build Recipes

Kind of what you get from defaults:

https://github.com/conda/conda-recipes

conda-forge:
https://github.com/conda-forge/feedstocks/tree/master/feedstocks

for example boost:
https://github.com/conda-forge/boost-feedstock/tree/master/recipe

My recipes https://github.com/slaclab/anarel-manage/tree/master/recipes

Tips

For Python, your build.sh will probably run a packages setup.py, or use pip to get a wheel file.

Don't do easy install, .pth files that trigger site.py PYTHONPATH manipulation don't work well, in fact conda-build may now check to make sure you don't do that.

When you use pip, you don't want it to trigger installing package dependencies like numpy, you may want to do

With setup.py, probably want

However I do see conda packages that just use pip without arguments like that. We usesd to use --OLD_AND_UNMANAGEABLE for h5py, otherwise it got put in a egg file in a way that didn't work for the RPM system

Boost Issues/Cross Platform Issues

boost gets complicated,
you compile it under gcc v4
and it might create names different than gcc v7.

If you deliver a package that contains libraries for developers with v4 names
and somone compiles with v7,
they might get undefined symbols -

the v7 developer may need special switches to compiler your boost header files.

Organizing

right now I am organizing recipes 

https://github.com/slaclab/anarel-manage/tree/master/recipes

into

system - packages only to be built as LCLS, depends on system things like LSF installed at SLAC, and packages that depend on these
external - generally available packages, say we need to build our own version of guppy
psana - things related to our code

better to keep them all in one place?

Templatizing

Until recently, I had many recipe directories, for different versions and build variants, i.e,

hdf5-1.8.15-prod
hdf5-1.8.17-dbg

however gets cumbersome.

see whats-old-and-new-conda-build for more on templatizing recipe's.

{% set version = 1.10.5 %}
...
source:
  fn: openmpi-{{version}}

COULD DO: develop release system to create all build variants and packages from one recipe file, use jinja2 conda build features to expand template

Conda Recipes

where to put them

They really should be part of the software.

Great example: https://github.com/paulscherrerinstitute/cbf/tree/master/conda-recipe

let me figure out issue with installing cbf - how they built against numpy had changed

But many examples are external, wrappers

Psana Recipe/Home Grown Install

examples that piggy back on top of

make PREFIX=conda environment prefix

python setup.py PREFIX=conda location, etc

but for packaging our own software, like psana, did the following

  • recipe build.sh calls new target I added to SConsTools build system (home grown, like a Makefile, implements build logic for scons)
  • That code copies built files to conda environment locations, i.e
    • arch/x86-gccxx/bin/* --> $CONDA_PREFIX/bin/*

scons

Packages

Moving them Around

conda-build defaults to put output in a subdir of the central install.

Makes it easy to put it in anaconda cloud.

I put it in a local file channel.
Later in anaconda channels (like https://anaconda.org/lcls-rhel7)

Part of why I wrote ana-rel-admin. I do

ana-rel-admin --cmd pkg-build --recipe path/to/repicpe

and it does a number of the plumbing steps

  • maintain package log files
  • copy to file channel
  • updates index of the file channel

Channels

/reg/g/psdm/sw/channels

then separate dirs for system, external, psana

What we need to Build

openmpi to get LSF support

hdf5 to get parallel support

h5py to get parallel support

mpi4py and tables – depend on above

Package Precedence

channel order ensures that hdf5 comes from our channel

before channel order, package string, build number was used (I still set build number to 101 in some recipes)

Package Naming

I have been doing things like

hdf5-1.8.17-openmpi_101.tar.bz2
hdf5-1.8.17-openmpi_dbg_101.tar.bz2

to name build variants. However then when you do

conda install hdf5

conda doesn't know what to pick. A user has to be specific, something like

conda install hdf5=1.8.17=openmpi_dbg_101

You can use features - add the dbg feature to the recipe for the latter, then the dbg needs to be installed in an environment that accepts the dbg variant.

Can also put it in the package name, ie

hdf5_dbg==1.8.17

use the recipe conflicts with to make sure not installed with hdf5

tensorflow now doing this for their gpu build, ie, you can pip install

tensorflow

tensorflow_gpu

names like this could be a better solution for multi-host compiling.

Build Matrix

You'll find packages like matplotlib built many times, i.e, against many python versions and many numpy versions. 

conda-build has options to specify python/numpy at the command line.

With psana, I'm building with very specifc versions, but for running, trying to make it more flexible - but I haven't tested with different versions.

psana meta.yaml

Conda Build Debugging

building openmpi has taken 40 minutes

very frustrating when it fails.

Recent failure, my test section had a typo, instead of (in openmpi meta.yaml)

test:
  commands:
    command -v ompi_info

I had

test:
  commands:
    command -v ompi_infox

Really not clear from my log what happened (maybe my anarel-admin wrapper gets in the way) from the log what happened:

TEST START: openmpi-2.0.1-lsf_verbs_1
Deleting work directory, /reg/g/psdm/sw/conda/inst/miniconda2-dev-rhel7/conda-bld/openmpi-2_1484019610790/work/openmpi-2.0.1
The following packages will be downloaded:
    package                    |            build
    ---------------------------|-----------------
    openmpi-2.0.1              |      lsf_verbs_1         3.0 MB  local
The following NEW packages will be INSTALLED:
    openmpi: 2.0.1-lsf_verbs_1 local
TESTS FAILED: openmpi-2.0.1-lsf_verbs_1

Run conda-build -h, there is a --test flag, but doesn't work at first.

conda build's work is in /path/to/install/conda-bld

if you poke around, you'll see a subdir broken

you can

  • copy broken/openmpi-2.0.1_lsf_verbs_.tar.bz2 into linux-64
  • cd linux-64
  • conda index # update index
  • conda-build --test path/to/recipe

When it runs, you'll see things like

(manage) (psreldev) psel701: /reg/g/psdm/sw/conda/manage/recipes/system $ conda build -t openmpi-2
TEST START: openmpi-2.0.1-lsf_verbs_1
Deleting work directory, /reg/g/psdm/sw/conda/inst/miniconda2-dev-rhel7/conda-bld/openmpi-2_1484022219868/work
updating index in: /reg/g/psdm/sw/conda/inst/miniconda2-dev-rhel7/conda-bld/linux-64
updating index in: /reg/g/psdm/sw/conda/inst/miniconda2-dev-rhel7/conda-bld/noarch

The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    openmpi-2.0.1              |      lsf_verbs_1         3.0 MB  file:///reg/g/psdm/sw/conda/channels/system-rhel7

The following NEW packages will be INSTALLED:

    openmpi: 2.0.1-lsf_verbs_1 file:///reg/g/psdm/sw/conda/channels/system-rhel7

+ source /reg/g/psdm/sw/conda/inst/miniconda2-dev-rhel7/bin/activate /reg/g/psdm/sw/conda/inst/miniconda2-dev-rhel7/conda-bld/openmpi-2_1484022219868/_t_env
+ /bin/bash -x -e /reg/g/psdm/sw/conda/inst/miniconda2-dev-rhel7/conda-bld/openmpi-2_1484022219868/test_tmp/run_test.sh
+ ls
conda_test_runner.sh  helloworld.c  helloworld.cxx  run_test.py  run_test.sh
+ pwd
/reg/g/psdm/sw/conda/inst/miniconda2-dev-rhel7/conda-bld/openmpi-2_1484022219868/test_tmp
+ command -v ompi_infox
TESTS FAILED: openmpi-2.0.1-lsf_verbs_1

This is better, so somethings we see

  • conda build creates these temporary testing (and build environments) in unique directory names
  • you can activate a environment by giving a path, not just a name
  • conda-build creates a file run_test.sh from your meta.yaml that you can rerun
  • it is hard to find where these things are

When you get into debugging the build step, you'll find the build happening in long/long directory names, with placeholder repeated as much as possible - this is to make as much room as possible for subsequent changing of rpath

debugging conda build, cumbersome - but easier then debugging rpm builds

Package Dependencies

example of:
  • specify package to build against
  • more leniant package description to run against
  • using a patch
  • specifying relocation
conda always does RPATH manip, 
this is to look for the hard coded PREFIX paths in binaries, text files, etc

Release Management

 
conda has the tools for release management,

  • python/C++ packaging
  • flexible package dependency tracking
  • environments, user and centrally installed multi-user

But I would not call it a release system. Things I've implemented

  • local file channels for our packages
  • keeping logs of builds
  • tool to rebuild all local packages
  • configuration for what our releases look like, i.e
    • ana-1.0.x ...
    • ana-1.0.x-py3 ..
      • what packages are in these releases
  • parameterize build/releases based on rhel5/6 etc
    • more flexible creation of conda environments
  • scheme to keep track of 'current' release
    • keep old environments built to easily allow people to go back

Managing Package Upgrades - Don't break users code

Central Install

Following the way things have been done in the past, we maintain central multi-user installations of the analysis software.

At any time there will be conda environments with names:

ana-1.0.1
ana-1.0.2

etc,

when we upgrade a package, we create a new environment first.

User Management

Alternative, we just provide channels with packages.

Users make their own conda environments, or just update the root environment.

Ideally, users just do

conda update psana

to keep up with our software (if they want/need to)

or they create an environment from a file we provide, something like:

conda create --name ana --file environment-file

This has a lot of advantages, but maybe not quite seemless enough to put on users.

One issue - psana depends on environment variables like SIT_ROOT.

ana-rel-admin

The repo: https://github.com/slaclab/anarel-manage

Has all my release management code.

gives all the commands. The anarel-manage repo has configuration like:

  • all the packages we make use of
  • what packages comprise our releases

Then implements tools to

  • build a package with logging, move it around (described above)
  • build all the packages (from config file)
  • index all the channels
  • update all six .condarc's from a template
  • build releases
  • automate the building/testing of all ana releases, i.e:
    •  variants (p27/py3/gpu)
    • hosts(rhel5/rhel6/rhel7)
    • prod/dev installs
    • integration testing from user account
    • This is the kind of thing that is probably better done in buildbot or travis, but I rolled my own.

Building an Environment

This should be as simple as cloning the old one and updating some packages, or maintaining an environment file that we build from, however

  • I ran into issues with environment files and local channels
  • Rules for channel precedence changed during development
  • I had a lot of trouble getting the precise package versions/builds that I wanted
  • Issues with numpy, mkl packages were clobbering certain files
  • several packages had bugs with permissions, admin account could read but not users

Environment Defense

To deal with those issues, I build the environment in stages, config file is here: (yaml file):

https://github.com/slaclab/anarel-manage/blob/master/config/anarel.yaml

Building in stages lets me

  • checkpoint environments to make sure numpy is still working after a new package
  • get finer control over what channels packages come from

also of interest maybe, the .condarc

https://github.com/slaclab/anarel-manage/blob/master/config/condarc_template

key thing is getting our channels in the search path.

Staying up to Date with External Packages

The packages in the rpm release system are old.
Updating them is cumbersome
There is no integration testing of them
Backing out a new version that broke something is expensive

In the anrel.yaml, you'll notice I specify 'latest' for many packages, but for others they are pinned.
Whenever we build a new set of environments, not quite sure what versions we'll pick up, or if they will all work together.

For instance, anarel.yaml - pydateutil line
It wasn't until my testing step that I found 2.6.0 of pydateutil broke a unit test in pandas

Relatively easy to clean up

see testing below 

Channel Precedence

I'd like to use conda-forge for everything, but numpy from conda-forge doesn't work on rhel5: github condaforge numpy issue

However numpy works from defaults, so my condarc's list defaults, then conda-forge

 

Testing

here is the repo

https://github.com/slaclab/anarel-test

  • test from user account, not admin account
  • verify I can run things like
    • psana -h
      for many packages
  • import many packages
  • check that I can load many .so files
  • Use nose to run tests for many pakcages
    • scipy
    • numpy
    • pandas
  • Run psana tests
  • Run separate tests for
    • h5py
    • conda
    • hdf5
    • mpi4py
    • openmpi
  • conda tests are very important

Features

  • uses paramiko to run processes on rhel5,rhel6,rhel7 machines
    • inputs credentials for a tester account
  • builds environments in dev first, tests
  • then builds environments in prod, tests again
    • includes integration tests mentioned above
  • reports on package updates
  • tester writes files to world writable central directory
  • emails when done

 

 

 

  • No labels