TinyML in the context of HEP project - KA25

The TinyML community research theories, methods and approach to reduce model size and complexity towards their im

ProjectConfluence pageDescription
TinyML - KA25SNL DemoFully connected model using floating point values
TinyML - KA25SNL Demo - CNN  - Floating point (MNIST)
TinyML - KA25SNL Demo - CNN  - Quantized weight and bias
TinyML - KA25SNL Demo - CNN  - Quantized weight and bias and multipliers
TinyML - KA25SNL Demo - CNN  - Quantized weight and bias and multipliers
TinyML - KA25SNL Demo - CNN  - Binary model and dataset
TinyML - KA25SNL Demo - CNN  - Floating point trained on eMNIST

In ASIC SNL - KA25

Generating ASICs from HLS can follow a few possible flows. For example one can use the HDL code from Vitis HLS and import that into the digital flow for ASIC P&R. Or one can make the HLS code compatible with Stratus or Catapult and use these tools to generate the HDL code.

If using the Vitis tool flow, we have a reference design that can be adopted which has been used for the eFPGA project

28nm - https://github.com/slaclab/fabulous-28nm-asic/tree/main/targets

130nm - https://github.com/slaclab/fabulous-28nm/tree/main/asic/targets/digital_top

Application git for the fab28:

https://github.com/slaclab/fabulous-28nm-dev?tab=readme-ov-file#asicfwsw-co-simulation

For the ASIC flow we can reuse the digital top design and replace the core with an SNL model and keep all the interface the same (see picture below). If we then package the device, a strategy will be to make an FMC card that can be reuse to test all the digital ASICs without the need of custom wire bond boards.


Packaging 

From previous project we can estimate 1mm square ASIC in 28nm cost ˜12-15k and ˜3k for QFN packaging. This may not be part of the current KA25 project but could be added to a request for a second year on this project.



Project proposal for LCLS-II projects

Increasing data rates, image sizes and the multiple experimental techniques impose several requirements on real time data processing

Currently developing a model fully optimized (pruning, weights and bias) lead to a long turn around time if new synthesis are required

Second, most models that are developed for offline processing targeting GPU and super computers are too big to be implemented in FPGAs

Goals

  • Develop a library of models (Lenet style, FC style) that are FPGA friendly with similar quality as large models
  • Use Dynamic Function Exchange to reprogram the pre defined models
  • Enable reuse of a given model through SNL dynamic weight and bias models which could be done through re-training
  • Evaluate the need to perform data correction in real time and if this increase/enables real time processing implement image correction algorithms such as dark subtraction, gain equalization and possibly real time common mode noise correction
  • Train small models from large models

A successful development would enable LCLS to provide a set of models that can be used in real time to perform the first pass on data analyses and fast feedback to for users during beam time. Demonstrate the viability of this approach with the ePixHR detector (or similar) generating images at 5,000fps

List of possible new feature to be included in SNL to enable more models to be implemented and be part of the list of supported ML models in the FPGA context

  • Convo 2D layer we need to support arbitrary stride where strides can be anything 1,2,3,4,5 
  • Convo 2D layer support for padding= same also same thing with stride as above
  • Convo 1D support
  • Also convo2D with grouping
  • Also Maxpool2d padding = valid support
  • MaxPool2d pool_size
  • ZeroPadding2D layer
  • Sigmoid Function Activator
  • Hyperbolic Tangent (Tanh) Function Activator
  • Leaky ReLU Activator
  • Parametric ReLU (PReLU) Activator
  • Exponential Linear Unit (ELU) Activator
  • Upsampling2d
  • conv2d transpose
  • VAEs there are like loss functions, log_sigma, Square Mean, logsigma exp

SNL demos - BES

Increasing data rates, image sizes and the multiple experimental techniques impose several requirements on real time data processing

ProjectConfluence pageDescription
Daniel/Patrick workHit, Maybe, Miss

Reproduces the paper on HMM, keeping model structure and reducing image size and filter to make the model to fit in an FPGA

Final Presentation for Internship


Segmented Hit, Maybe, Miss

Reproduce the inital example with classification segmented on a carrier board level, then sum the votes to take final decision

Quantized Hit, Maybe, Miss

Reproduce original work (with or without adaptation) but reduce the model using quantization. Implement SNL with AP fixed

Binary Hit, Maybe, Miss

Reproduce original work with binary weights and bias. (Not sure SNL supports this)

HEP equivalent to Hit, Maybe, Miss

Take advantage of SNL dynamic bias and weights and tarted the same model to a different set of images. Dog, Maybe, Cat? Need to check with physicists what that could be

ePixHR10k equivalent to Hit, Maybe, Miss

Create a dataset using 3 types of images (laser pointer, cross and dark. Using the same model classify the images

Pre-processing - Dark subtraction

In HEP Cryo ASIC can be used as an example for channel equalization

Pre-processing - Gain equalization

In HEP Cryo ASIC can have its gain equalized (there is a gap in the midscale that should also be corrected 




Relevant literature

DateTitlelinkComments
2023

Implementation of a framework for deploying AI inference engines in FPGAs


https://arxiv.org/abs/2305.19455Principles of SNL. Must read for all interns.
2022

Performance of a convolutional autoencoder designed to remove electronic noise from p-type point contact germanium detector signals

https://arxiv.org/pdf/2204.06655.pdfThis approach, if successful in HW can be used to denoise nEXO data. 
2022

OPEN-SOURCE FPGA-ML CODESIGN FOR THE MLPERF™ TINY BENCHMARK

https://arxiv.org/abs/2206.11791
2020

Benchmarking TinyML Systems: Challenges and Direction

https://arxiv.org/abs/2003.04821"Many traditional ML use cases can be considered futuristic TinyML tasks. As ultra-low-power inference hardware continues to improve, the threshold of viability expands. Tasks like large label space image classification or object counting are well suited for low-power always-on applications but are currently too compute and memory hungry for today’s TinyML hardware."
2019

Image Classification on IoT Edge Devices: Profiling and Modeling

https://arxiv.org/pdf/1902.11119.pdf"In this paper, we show the feasibility and study the performance of image classification using IoT devices. Specifically, we explore the relationships between various factors of image classification algorithms that may affect energy consumption, such as dataset size, image resolution, algorithm type, algorithm phase, and device hardware."
2018

Accelerating CNN inference on FPGAs: A Survey

https://arxiv.org/pdf/1806.01683.pdfTalks about tricks that can be used to minimize computation and maximize performance in CNN applications on FPGAs. Also talks about quantization, so may be interesting to look at in the future.
2015

Learning both Weights and Connections for Efficient
Neural Networks 

https://proceedings.neurips.cc/paper/2015/file/ae0eb3eed39d2bcef4622b2499a05fe6-Paper.pdfconventional networks fix the architecture before training starts; as a result, training cannot improve the architecture. To address these limitations, they describe a method to reduce the storage and computation required by neural networks by an order of magnitude without affecting their accuracy by learning only the important connections.
2021

Anomaly Detection Based on Tiny Machine 
Learning: A Review

https://oiji.utm.my/index.php/oiji/article/view/148/109By using TensorFlow Lite Micro, the TinyML can be trained to undergo anomaly detection. However, the machine learning algorithm had to be exported from TensorFlow, then TensorFlow Lite, and finally TensorFlow Lite Micro in order to upload the machine learning algorithm into TinyML. This paper highlights the state of the art of the current works on TinyML. Some suggestions on the research direction are also introduced for potential future endeavors.
2019

Squeezing the last MHz for CNN acceleration on FPGAs

https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8871619Overclocking KCU1500 will decrease runtime while maintaining near-constant accuracy. Tested on 4 different CNN models: LeNet, AlexNet, VGG-16, VGG-19. Provides specific clock numbers and performance as well as accuracy for all 4 models.
2021

An Adaptive Row-based Weight Reuse Scheme for FPGA Implementation of Convolutional Neural Networks

https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9501490Proposes an adaptive row reuse scheme by applying each level of row-reuse for each layer depending on its characteristics. The proposed design is implemented with a Xilinx KCU1500 board with 1.7 times less buffer size than previous works when running VGG-16.
2018

A convolutional neural network-based screening tool for X-ray serial crystallography

 https://web.eecs.umich.edu/~stellayu/publication/doc/2018xrayJSR.pdf

Opportunity to explore SNL. That could be Daniel or Patrick research project

This papers proposes the use of convolutional network to classify datasets in the x-ray domain. It would be very interesting to reproduce this effort with the SNL and demonstrate how it performs with a hardware implementation. Dataset is available and was produce with the now legacy CSPAD.

2023

Artificial neural network on-chip and in-pixel implementation towards pulse amplitude measurement

https://iopscience.iop.org/article/10.1088/1748-0221/18/02/C02048

This paper presents a tiny model for ADC data correction on streaming. Can we correct data for CRYO? Also can this be applied to correct data for pixelated detectors (future gama TPC

  • No labels