-

Intro

notes about conda that might be useful for other groups to move to conda packaging. Assumes knowledge of conda.

this url: https://confluence.slac.stanford.edu/display/PSDMInternal/Conda+Details

Good reading:

https://www.continuum.io/blog/developer-blog/whats-old-and-new-conda-build

http://technicaldiscovery.blogspot.com/2013/12/why-i-promote-conda.html

Prod & Dev Installations

updating conda is scary

conda has been rapidly changing
recent: features for multi-user install

However it is a python program with a number of dependencies.

Several times a conda update has rendered conda inoperable
- have deleted entire installations
getting better at repairing: (requests==12 and conda)

Defense:

two installations prod, dev
- completely separate conda installations.
User facing is prod.
Use dev to update conda and test new packages.
Then build in prod.

Test conda

part of https://github.com/slaclab/anarel-test
- integration testing for complete environments
  - separate than the per package testing that one

Still if conda breaks the dev installation, it is a pain.

Todo: automate creating new conda installation
clone our current root environment
update conda, test (from a user account)

RHEL5 RHEL6 RHEL7 Installations

All together, I have six conda installations,

a rhel5/rhel6/rhel7
in both dev and prod.

This is cumbersome,
but no more so than what we did with RPM's before conda.

PIP vs Conda Packaging

Part of anaconda's success is the fact that pip works.

However another part is conda packaging is better
as discussed in conda blog linked above
it handles dependency tracking better,
makes building a big software stack with numpy, etc more robust.

For the production, multi-user conda environments that we maintain,
prefer no pip anything in them. Issues:

Might end up with two of something,
- for example numpy, one from a pip package dependencies,
  another from conda package
things like cloning an environment, etc, might not work,
conda can't track pip as well as its own package

Build Recipes

Kind of what you get from defaults:

https://github.com/conda/conda-recipes

conda-forge:
https://github.com/conda-forge/feedstocks/tree/master/feedstocks

for example boost:
https://github.com/conda-forge/boost-feedstock/tree/master/recipe

My recipes https://github.com/slaclab/anarel-manage/tree/master/recipes

Tips

For Python,

your build.sh will probably run a packages setup.py,
or use pip to get a wheel file.
Don't do easy install,
.pth files that trigger site.py sys.path manipulation don't work well,
in fact conda-build may now check to make sure you don't do that.
Likewise, no egg files, see google group egg packages
When you use pip, you don't want it to trigger installing package dependencies like numpy
- use pip install --no-deps
  like in this example (tensorflow build script, use pip to get pkg from a wheel file on the internet)

With setup.py, want

python setup.py install --single-version-externally-managed --record=record.txt
like in this example (keras, no build.sh, simple enough install you can do from meta.yaml)
- this puts the package in site-packages in a straightforward way,
  i,e, you get a subdirectory to site-packages that is your python module to import

Boost Issues/Cross Platform Issues

boost gets complicated,
you compile it under gcc v4
and it might create names different than gcc v7.

If you deliver a package that contains libraries for developers with v4 names
and somone compiles with v7,
they might get undefined symbols -

the v7 developer may need special switches to compiler your boost header files.

Organizing

right now I am organizing recipes

https://github.com/slaclab/anarel-manage/tree/master/recipes

into

system - packages only to be built as LCLS, depends on system things like LSF installed at SLAC, and packages that depend on these
external - generally available packages, say we need to build our own version of guppy
psana - things related to our code

better to keep them all in one place?

Templatizing

Until recently, I had many recipe directories, for different versions and build variants, i.e,

hdf5-1.8.15-prod
hdf5-1.8.17-dbg

however gets cumbersome.

see whats-old-and-new-conda-build for more on templatizing recipe's.

{% set version = 1.10.5 %}
...
source:
  fn: openmpi-{{version}}

COULD DO: develop release system to create all build variants and packages from one recipe file, use jinja2 conda build features to expand template

Conda Recipes

where to put them

They really should be part of the software.

Great example: https://github.com/paulscherrerinstitute/cbf/tree/master/conda-recipe

let me figure out issue with installing cbf - how they built against numpy had changed

But many recipe is external, wrappers

Psana Recipe/Home Grown Install

many examples do thinks like

make PREFIX=path/to/conda/prefix

python setup.py PREFIX=conda location, etc

However psana installs in its own directory structure for RPM release system
Not so simple to use existing install target

Did the following

recipe build.sh calls new target I added to SConsTools build system (home grown, like a Makefile, implements build logic for scons)
That code copies built files to conda environment locations, i.e
- arch/x86-gccxx/bin/* --> $CONDA_PREFIX/bin/*
details: conda_install.py
- note - you can create new subdirectories to conda, i.e, data/web subdirs
- don't have to put everything in bin/lib/include

Packages

Moving them Around

conda-build defaults to put output in a subdir of the central install.

Makes it easy to put it in anaconda cloud.

I put it in a local file channel.
Later in anaconda channels (like https://anaconda.org/lcls-rhel7)

Part of why I wrote ana-rel-admin. I do

ana-rel-admin --cmd pkg-build --recipe path/to/repicpe

and it does a number of the plumbing steps

maintain package log files
copy to file channel
updates index of the file channel

Channels

/reg/g/psdm/sw/conda/channels

then separate dirs for system, external, psana

What we need to Build

openmpi to get LSF support

hdf5 to get parallel support

h5py to get parallel support

mpi4py and tables – depend on above

Package Precedence

channel order ensures that hdf5 comes from our channel

channel precedence new feature
used to be version number
people used high build numbers to force conda to use their package
(I still set build number to 101 in some recipes)

Package Naming

I have been doing things like

hdf5-1.8.17-openmpi_101.tar.bz2
hdf5-1.8.17-openmpi_dbg_101.tar.bz2

to name build variants. However then when you do

conda install hdf5

conda doesn't know what to pick. A user has to be specific, something like

conda install hdf5=1.8.17=openmpi_dbg_101

You can use features. Example for gpu

create gpu feature package
Track gpu feature in other packages meta.yaml
- this is for tensorflow

Can also put it in the package name, ie

hdf5_dbg==1.8.17

use the recipe conflicts with to make sure not installed with hdf5

tensorflow now doing this for their gpu build, ie, you can pip install

tensorflow

tensorflow_gpu

names like this could be a better solution for multi-host compiling.

Build Matrix

You'll find packages like matplotlib built many times, i.e, against many python versions and many numpy versions.

conda-build has options to specify python/numpy at the command line.

With psana, I'm building with very specifc versions, but for running, trying to make it more flexible - but I haven't tested with different versions.

psana meta.yaml

Conda Build Debugging

building openmpi has taken 40 minutes

very frustrating when it fails.

Recent failure, my test section had a typo, instead of (in openmpi meta.yaml)

test:
  commands:
    command -v ompi_info

I had

test:
  commands:
    command -v ompi_infox

Really not clear from my log what happened (maybe my anarel-admin wrapper gets in the way) from the log what happened:

TEST START: openmpi-2.0.1-lsf_verbs_1
Deleting work directory, /reg/g/psdm/sw/conda/inst/miniconda2-dev-rhel7/conda-bld/openmpi-2_1484019610790/work/openmpi-2.0.1
The following packages will be downloaded:
    package                    |            build
    ---------------------------|-----------------
    openmpi-2.0.1              |      lsf_verbs_1         3.0 MB  local
The following NEW packages will be INSTALLED:
    openmpi: 2.0.1-lsf_verbs_1 local
TESTS FAILED: openmpi-2.0.1-lsf_verbs_1

Run conda-build -h, there is a --test flag, but doesn't work at first.

conda build's work is in /path/to/install/conda-bld

if you poke around, you'll see a subdir broken

you can

copy broken/openmpi-2.0.1_lsf_verbs_.tar.bz2 into linux-64
cd linux-64
conda index # update index
conda-build --test path/to/recipe

When it runs, you'll see things like

(manage) (psreldev) psel701: /reg/g/psdm/sw/conda/manage/recipes/system $ conda build -t openmpi-2
TEST START: openmpi-2.0.1-lsf_verbs_1
Deleting work directory, /reg/g/psdm/sw/conda/inst/miniconda2-dev-rhel7/conda-bld/openmpi-2_1484022219868/work
updating index in: /reg/g/psdm/sw/conda/inst/miniconda2-dev-rhel7/conda-bld/linux-64
updating index in: /reg/g/psdm/sw/conda/inst/miniconda2-dev-rhel7/conda-bld/noarch

The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    openmpi-2.0.1              |      lsf_verbs_1         3.0 MB  file:///reg/g/psdm/sw/conda/channels/system-rhel7

The following NEW packages will be INSTALLED:

    openmpi: 2.0.1-lsf_verbs_1 file:///reg/g/psdm/sw/conda/channels/system-rhel7

+ source /reg/g/psdm/sw/conda/inst/miniconda2-dev-rhel7/bin/activate /reg/g/psdm/sw/conda/inst/miniconda2-dev-rhel7/conda-bld/openmpi-2_1484022219868/_t_env
+ /bin/bash -x -e /reg/g/psdm/sw/conda/inst/miniconda2-dev-rhel7/conda-bld/openmpi-2_1484022219868/test_tmp/run_test.sh
+ ls
conda_test_runner.sh  helloworld.c  helloworld.cxx  run_test.py  run_test.sh
+ pwd
/reg/g/psdm/sw/conda/inst/miniconda2-dev-rhel7/conda-bld/openmpi-2_1484022219868/test_tmp
+ command -v ompi_infox
TESTS FAILED: openmpi-2.0.1-lsf_verbs_1

This is better, so somethings we see

conda build creates these temporary testing (and build environments) in unique directory names
you can activate a environment by giving a path, not just a name
conda-build creates a file run_test.sh from your meta.yaml that you can rerun
it is hard to find where these things are

When you get into debugging the build step, you'll find the build happening in long/long directory names, with placeholder repeated as much as possible - this is to make as much room as possible for subsequent changing of rpath

debugging conda build, cumbersome - but easier then debugging rpm builds

Package Dependencies

https://github.com/slaclab/anarel-manage/blob/master/recipes/system/hdf5-1.8.17-prod/meta.yaml

example of:

specify package to build against

more leniant package description to run against

using a patch

specifying relocation

conda always does RPATH manip,

this is to look for the hard coded PREFIX paths in binaries, text files, etc

Release Management

conda has the tools for release management,

python/C++ packaging
flexible package dependency tracking
environments, user and centrally installed multi-user

But I would not call it a release system. Things I've implemented

local file channels for our packages
keeping logs of builds
tool to rebuild all local packages
configuration for what our releases look like, i.e
- ana-1.0.x ...
- ana-1.0.x-py3 ..
  - what packages are in these releases
parameterize build/releases based on rhel5/6 etc
- more flexible creation of conda environments
scheme to keep track of 'current' release
- keep old environments built to easily allow people to go back

Managing Package Upgrades - Don't break users code

Central Install

Following the way things have been done in the past, we maintain central multi-user installations of the analysis software.

At any time there will be conda environments with names:

ana-1.0.1
ana-1.0.2

etc,

when we upgrade a package, we create a new environment first.

User Management

Alternative, we just provide channels with packages.

Users make their own conda environments, or just update the root environment.

Ideally, users just do

conda update psana

to keep up with our software (if they want/need to)

or they create an environment from a file we provide, something like:

conda create --name ana --file environment-file

This has a lot of advantages, but maybe not quite seemless enough to put on users.

One issue - psana depends on environment variables like SIT_ROOT.

ana-rel-admin

The repo: https://github.com/slaclab/anarel-manage

Has all my release management code.

gives all the commands. The anarel-manage repo has configuration like:

all the packages we make use of
what packages comprise our releases

Then implements tools to

build a package with logging, move it around (described above)
build all the packages (from config file)
index all the channels
update all six .condarc's from a template
build releases
automate the building/testing of all ana releases, i.e:
- variants (p27/py3/gpu)
- hosts(rhel5/rhel6/rhel7)
- prod/dev installs
- integration testing from user account
- This is the kind of thing that is probably better done in buildbot or travis, but I rolled my own.

Building an Environment

This should be as simple as cloning the old one and updating some packages, or maintaining an environment file that we build from, however

I ran into issues with environment files and local channels
Rules for channel precedence changed during development
I had a lot of trouble getting the precise package versions/builds that I wanted
Issues with numpy, mkl packages were clobbering certain files
several packages had bugs with permissions, admin account could read but not users

Environment Defense

To deal with those issues, I build the environment in stages, config file is here: (yaml file):

https://github.com/slaclab/anarel-manage/blob/master/config/anarel.yaml

Building in stages lets me

checkpoint environments to make sure numpy is still working after a new package
get finer control over what channels packages come from

Problem

later stages can undo previous
- could solve by pinning
- or do one stage

also of interest maybe, the .condarc

https://github.com/slaclab/anarel-manage/blob/master/config/condarc_template

key thing is getting our channels in the search path.

Staying up to Date with External Packages

The packages in the rpm release system are old.
Updating them is cumbersome
There is no integration testing of them
Backing out a new version that broke something is expensive

In the anrel.yaml, you'll notice I specify 'latest' for many packages, but for others they are pinned.
Whenever we build a new set of environments, not quite sure what versions we'll pick up, or if they will all work together.

For instance, anarel.yaml - pydateutil line
It wasn't until my testing step that I found 2.6.0 of pydateutil broke a unit test in pandas

Relatively easy to clean up

see testing below

Channel Precedence

I'd like to use conda-forge for everything, but numpy from conda-forge doesn't work on rhel5: github condaforge numpy issue

However numpy works from defaults, so my condarc's list defaults, then conda-forge

Testing

here is the repo

https://github.com/slaclab/anarel-test

test from user account, not admin account
verify I can run things like
- psana -h
  for many packages
import many packages
check that I can load many .so files
Use nose to run tests for many pakcages
- scipy
- numpy
- pandas
Run psana tests
Run separate tests for
- h5py
- conda
- hdf5
- mpi4py
- openmpi
conda tests are very important

Features

uses paramiko to run processes on rhel5,rhel6,rhel7 machines
- inputs credentials for a tester account
builds environments in dev first, tests
then builds environments in prod, tests again
- includes integration tests mentioned above
reports on package updates
tester writes files to world writable central directory
emails when done

HTML Report

https://pswww.slac.stanford.edu/user/psreldev/builds/auto-1.1.0/

Development Environments

One Instance of Package

make new environment
python:

- conda develop path/to/pkg/with/setup.py
  - creates pkg.pth in conda env site-packages
  - can uninstall with same tool
  - similar to pip --develop or pip --editable
For C++. not sure
- make with install to conda?
- or soft links from conda to your build?

Two Instances of Package

Python
- Use PYTHONPATH to get development package
  - What we do for psana python packages
  - more awkward for external packages like scikit-beam (need to install it somewhere)
- Important:
  - clean conda env with conda packages
    - no pip/easy_install that did site.py sys.path manip
C++
- Use LD_LIBRARY_PATH and PATH
- Issue: RUNPATH vs. RPATH
  - RPATH doesn't look at LD_LIBRARY_PATH, RUNPATH does
  - Everything in conda that uses your C++ needs to be built with RUNPATH
    - -Wl,--enable-new-dtags
- Issue: manipulating PATH takes some care
  - activating and deactivating conda environments manipulates PATH

Confluence and Jira now require federated login. Read more.

Page tree

Intro

Prod & Dev Installations

RHEL5 RHEL6 RHEL7 Installations

PIP vs Conda Packaging

Build Recipes

Tips

Boost Issues/Cross Platform Issues

Organizing

Templatizing

Conda Recipes

where to put them

Psana Recipe/Home Grown Install

Packages

Moving them Around

Channels

What we need to Build

Package Precedence

Package Naming

Build Matrix

Conda Build Debugging

Package Dependencies

Release Management

Managing Package Upgrades - Don't break users code

Central Install

User Management

ana-rel-admin

Building an Environment

Environment Defense

Staying up to Date with External Packages

Channel Precedence

Testing

Features

HTML Report

Development Environments

One Instance of Package

Two Instances of Package

Confluence and Jira now require federated login. Read more.

Page tree

Conda Details

Intro

Prod & Dev Installations

RHEL5 RHEL6 RHEL7 Installations

PIP vs Conda Packaging

Build Recipes

Tips

Boost Issues/Cross Platform Issues

Organizing

Templatizing

Conda Recipes

where to put them

Psana Recipe/Home Grown Install

Packages

Moving them Around

Channels

What we need to Build

Package Precedence

Package Naming

Build Matrix

Conda Build Debugging

Package Dependencies

Release Management

Managing Package Upgrades - Don't break users code

Central Install

User Management

ana-rel-admin

Building an Environment

Environment Defense

Staying up to Date with External Packages

Channel Precedence

Testing

Features

HTML Report

Development Environments

One Instance of Package

Two Instances of Package