Essentially document in the detector collection has a reference to the data in the experiment collections.

Preparation of data

Code Block

title	Conversion of numpy array to unicode
collapse	true

nda = gu.random_standard(shape=(32,185,388), mu=20, sigma=5, dtype=gu.np.float)

import pickle
from bson.binary import Binary

t0_sec = time()

arr = nda.flatten()
arr = ' '.join(['%.2f' % v for v in arr])
sarr = Binary(pickle.dumps(arr, protocol=2), subtype=128)

doc = {
   "experiment": "cxi12345",
   "run": 124,
   ...
   "data": sarr,
}

dt_sec = time() - t0_sec

Preparation of cspad data in text/unicode format for inserting takes ~1sec.
Only limited precision data can be saved due to limit on document size 16MB.

Inserting data

Code Block
doc_id = col.insert_one(doc).inserted_id

Insertion time is 110-180ms.

Find data

Code Block
t0_sec = time() docs = col.find({"run": 125}) dt_sec = time() - t0_sec

Finding data time is 50-60us

Unpack data

Code Block
doc = docs[0] xcarr = pickle.loads(doc["data"]) # 30-40ms arr = gu.np.fromstring(xcarr, dtype=float, count=-1, sep=' ') # 300ms

time to unpack is 350ms.

Summary

MongoDB structure has limitations in number of levels and document size.
- server may have many DBs
- DB is a container for collections
- collection is a group of documents
- document is a JSON/BSON object of key:value pairs (dictionary). Each value may be dictionary itself etc, but further structure levels are not supported by DB structure.
  - document size has hardwired limit 16MB (in 2010 increased from 4 to 16MB and devs do not want to change it). CSPAD 2Mpix*8byte(double) = 16MB, but we may expect larger detectors like Jungfrau, Epix, Andor, etc.
  - Larger data size is suggested to save using GridFS; split data for chanks and save chunks in the same DB in different collections.
  - JSON (text) object in MongoDB is presented in unicode...(UTF-8). Data should be converted to unicode force and back in saving retrieving.
schema-less DB looks interesting to certain extents, but in order to find something in DB there should be a schema...
other than that it is over-some and pretty good storage for social networks!

...

Page tree

Versions Compared

Old Version 2

New Version 3

Key

Preparation of data

Inserting data

Find data

Unpack data

Summary

Page tree

Page History

Versions Compared

Old Version 2

New Version 3

Key

Preparation of data

Inserting data

Find data

Unpack data

Summary