You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 4 Next »

Content

Installation

Installation
on pslogin
ana-1.3.37
scs
cd ...
virtualenv venv-pymongo
source venv-pymongo/bin/activate

???
# python -m pip install pymongo 


Alternative installation:
-------------------------
# https://docs.mongodb.com/manual/tutorial/install-mongodb-on-linux/
cd lib

curl -O https://fastdl.mongodb.org/linux/mongodb-linux-x86_64-3.6.2.tgz

tar -zxvf mongodb-linux-x86_64-3.6.2.tgz

mkdir -p mongodb
cp -R -n mongodb-linux-x86_64-3.6.2/ mongodb

export PATH=/reg/neh/home/dubrovin/LCLS/venv-pymongo/lib/mongodb/mongodb-linux-x86_64-3.6.2/bin/:$PATH
echo $PATH
The same in 
source set_path_to_mongodb

1. Create the data directory
mkdir -p ./data/db

2. Set r/w permissions for the data directory
chmod 775 data
chmod 775 data/db 

Run server

Run server
pslogin
ssh psanaphi105
cd LCLS/venv-pymongo/
source bin/activate
source set_path_to_mongodb
assumes that ./data/db is already created 
mongod --dbpath ./data/db --bind_ip_all &

!!! DO NOT CLOSE WINDOW, 

Shell

Shell is a manual command line interface.

Shell
mongo --host psanaphi105 --port 27017

To exit the shell, type quit() or use the <Ctrl-C> shortcut.

> db
test

> show dbs
admin           0.000GB
calib-cxif5315  0.006GB
config          0.000GB
local           0.000GB

> use calib-cxif5315
switched to db calib-cxif5315

> show collections
cspad-0-cxids1-0
cspad-1

> db["cspad-0-cxids1-0"].find()
> db["cspad-0-cxids1-0"].find().pretty()
> help

Connection to DB in python

from pymongo import MongoClient
#client = MongoClient('localhost', 27017)
client = MongoClient('psanaphi105', 27017) #, username=uname, password=pwd)
db = client['calib-cxi12345']
col = db['camera-0-cxids1-0']

Connection time is 50-150ms depending on host and time.

Tentative model of the calibration store

Experiment-centric calibration data base

# Database for experiment
dbexp = client["calib-cxif5315"]

# Collections:
col1 = dbexp["cspad-0-cxids2-0"]
col2 = dbexp["cspad2x2-0-cxids2-0"]
col3 = dbexp["andor-0-cxids2-0"]

# Document content for dbexp
doc = {
   "_id":ObjectId("53402597d852426020000002"),
   "experiment": "cxif5315"
   "run": 123
   "detector": "cspad-0-cxids2-0"
   "ctype": "pedestals"
   "time_sec": 1516321053
   "time_nsec": 123456789
   "time_stamp": "2018-01-18T16:17:33.123456789-0800"
   "version": "v00-11-22"
   "facility": "LCLS2"
   "uid": "login-name"
   "host": "psanaphi102"
   "comments": ["very good constants", "throw them in trash immediately!"]
   "data_size": 32*185*388
   "data_shape": (32,185,388)
   "data_type": "int16"
   "data": np.array(...)
}

All meta-data information is accessible through a single-level document.

Detector-centric calibration data base

# References or DBRefs for detectors

dbdet = client['calib-cspad'] 
col1 = dbdet['cspad-0-cxids1-0']
col2 = dbdet['cspad-0-cxids2-0']
col3 = dbdet['cspad-0-cxidsd-0']
col4 = dbdet['cspad-0-xcsendstation-0']
col5 = dbdet['cspad-0-xppgon-0']
col6 = dbdet['cspad-0-sxrbeamline-1']
col7 = dbdet['cspad-0-mectargetchamber-0']

# Document content for dbdet
doc = {
   "_id":ObjectId("..."),
   "ref_id": ObjectId("534009e4d852427820000002"),
   etc...
}

Essentially document in the detector collection has a reference to the data in the experiment collections.

Preparation of data

Conversion of numpy array to unicode
nda = gu.random_standard(shape=(32,185,388), mu=20, sigma=5, dtype=gu.np.float)

import pickle
from bson.binary import Binary

t0_sec = time()

arr = nda.flatten()
arr = ' '.join(['%.2f' % v for v in arr])
sarr = Binary(pickle.dumps(arr, protocol=2), subtype=128)

doc = {
   "experiment": "cxi12345",
   "run": 124,
   ...
   "data": sarr,
}

dt_sec = time() - t0_sec
  • Preparation of cspad data in text/unicode format for inserting takes ~1sec.
  • Only limited precision data can be saved due to limit on document size 16MB.

Inserting data

Insert document in collection
doc_id = col.insert_one(doc).inserted_id

Insertion time is 110-180ms.

Find data

Find data
t0_sec = time()
docs = col.find({"run": 125})
dt_sec = time() - t0_sec

Finding data time is 50-60us

Unpack data

Unpack data from unicode to numpy array
doc = docs[0]
xcarr = pickle.loads(doc["data"]) # 30-40ms
arr = gu.np.fromstring(xcarr, dtype=float, count=-1, sep=' ') # 300ms

Time to unpack is 350ms.

Summary

  • MongoDB structure has limitations in number of levels and document size.
    • server may have many DBs
    • DB is a container for collections
    • collection is a group of documents
    • document is a JSON/BSON object of key:value pairs (dictionary). Each value may be dictionary itself etc, but further structure levels are not supported by DB structure.
      • document size has hardwired limit 16MB (in 2010 increased from 4 to 16MB and devs do not want to change it). CSPAD 2Mpix*8byte(double) = 16MB, but we may expect larger detectors like Jungfrau, Epix, Andor, etc.
      • Larger data size is suggested to save using GridFS; split data for chanks and save chunks in the same DB in different collections.
      • JSON (text) object in MongoDB is presented in unicode...(UTF-8).  Data should be converted to unicode force and back in saving retrieving.
  • schema-less DB looks interesting to certain extents, but in order to find something in DB there should be a schema...
  • other than that it is over-some and pretty good storage for social networks!

References

 

 

 

 

 

 

  • No labels