TinyML-SNL project developments

TinyML in the context of HEP project - KA25

Project concept

The TinyML community research theories, methods and approach to reduce model size and complexity towards their im

Project	Confluence page	Description
TinyML - KA25	SNL Demo	Fully connected model using floating point values
TinyML - KA25	SNL Demo - CNN - Floating point (MNIST)
TinyML - KA25	SNL Demo - CNN - Quantized weight and bias
TinyML - KA25	SNL Demo - CNN - Quantized weight and bias and multipliers
TinyML - KA25	SNL Demo - CNN - Quantized weight and bias and multipliers
TinyML - KA25	SNL Demo - CNN - Binary model and dataset
TinyML - KA25	SNL Demo - CNN - Floating point trained on eMNIST

In ASIC SNL - KA25

Support material from eFPGA design

Generating ASICs from HLS can follow a few possible flows. For example one can use the HDL code from Vitis HLS and import that into the digital flow for ASIC P&R. Or one can make the HLS code compatible with Stratus or Catapult and use these tools to generate the HDL code.

If using the Vitis tool flow, we have a reference design that can be adopted which has been used for the eFPGA project

28nm - https://github.com/slaclab/fabulous-28nm-asic/tree/main/targets

130nm - https://github.com/slaclab/fabulous-28nm/tree/main/asic/targets/digital_top

Application git for the fab28:

https://github.com/slaclab/fabulous-28nm-dev?tab=readme-ov-file#asicfwsw-co-simulation

For the ASIC flow we can reuse the digital top design and replace the core with an SNL model and keep all the interface the same (see picture below). If we then package the device, a strategy will be to make an FMC card that can be reuse to test all the digital ASICs without the need of custom wire bond boards.

Packaging

From previous project we can estimate 1mm square ASIC in 28nm cost ˜12-15k and ˜3k for QFN packaging. This may not be part of the current KA25 project but could be added to a request for a second year on this project.

Project proposal for LCLS-II projects

Project ideas

Increasing data rates, image sizes and the multiple experimental techniques impose several requirements on real time data processing

Currently developing a model fully optimized (pruning, weights and bias) lead to a long turn around time if new synthesis are required

Second, most models that are developed for offline processing targeting GPU and super computers are too big to be implemented in FPGAs

Goals

Develop a library of models (Lenet style, FC style) that are FPGA friendly with similar quality as large models
Use Dynamic Function Exchange to reprogram the pre defined models
Enable reuse of a given model through SNL dynamic weight and bias models which could be done through re-training
Evaluate the need to perform data correction in real time and if this increase/enables real time processing implement image correction algorithms such as dark subtraction, gain equalization and possibly real time common mode noise correction
Train small models from large models

A successful development would enable LCLS to provide a set of models that can be used in real time to perform the first pass on data analyses and fast feedback to for users during beam time. Demonstrate the viability of this approach with the ePixHR detector (or similar) generating images at 5,000fps

List of possible new feature to be included in SNL to enable more models to be implemented and be part of the list of supported ML models in the FPGA context

Convo 2D layer we need to support arbitrary stride where strides can be anything 1,2,3,4,5
Convo 2D layer support for padding= same also same thing with stride as above
Convo 1D support
Also convo2D with grouping
Also Maxpool2d padding = valid support
MaxPool2d pool_size
ZeroPadding2D layer
Sigmoid Function Activator
Hyperbolic Tangent (Tanh) Function Activator
Leaky ReLU Activator
Parametric ReLU (PReLU) Activator
Exponential Linear Unit (ELU) Activator
Upsampling2d
conv2d transpose
VAEs there are like loss functions, log_sigma, Square Mean, logsigma exp

SNL demos - BES

SNL - demos

Increasing data rates, image sizes and the multiple experimental techniques impose several requirements on real time data processing

Project	Confluence page	Description
Daniel/Patrick work	Hit, Maybe, Miss	Reproduces the paper on HMM, keeping model structure and reducing image size and filter to make the model to fit in an FPGA Final Presentation for Internship
	Segmented Hit, Maybe, Miss	Reproduce the inital example with classification segmented on a carrier board level, then sum the votes to take final decision
	Quantized Hit, Maybe, Miss	Reproduce original work (with or without adaptation) but reduce the model using quantization. Implement SNL with AP fixed
	Binary Hit, Maybe, Miss	Reproduce original work with binary weights and bias. (Not sure SNL supports this)
	HEP equivalent to Hit, Maybe, Miss	Take advantage of SNL dynamic bias and weights and tarted the same model to a different set of images. Dog, Maybe, Cat? Need to check with physicists what that could be
	ePixHR10k equivalent to Hit, Maybe, Miss	Create a dataset using 3 types of images (laser pointer, cross and dark. Using the same model classify the images
	Pre-processing - Dark subtraction	In HEP Cryo ASIC can be used as an example for channel equalization
	Pre-processing - Gain equalization	In HEP Cryo ASIC can have its gain equalized (there is a gap in the midscale that should also be corrected

Relevant literature

List of papers

Date	Title	link	Comments
2023	Implementation of a framework for deploying AI inference engines in FPGAs	https://arxiv.org/abs/2305.19455	Principles of SNL. Must read for all interns.
2022	Performance of a convolutional autoencoder designed to remove electronic noise from p-type point contact germanium detector signals	https://arxiv.org/pdf/2204.06655.pdf	This approach, if successful in HW can be used to denoise nEXO data.
2022	OPEN-SOURCE FPGA-ML CODESIGN FOR THE MLPERF™ TINY BENCHMARK	https://arxiv.org/abs/2206.11791
2020	Benchmarking TinyML Systems: Challenges and Direction	https://arxiv.org/abs/2003.04821	"Many traditional ML use cases can be considered futuristic TinyML tasks. As ultra-low-power inference hardware continues to improve, the threshold of viability expands. Tasks like large label space image classification or object counting are well suited for low-power always-on applications but are currently too compute and memory hungry for today’s TinyML hardware."
2019	Image Classification on IoT Edge Devices: Profiling and Modeling	https://arxiv.org/pdf/1902.11119.pdf	"In this paper, we show the feasibility and study the performance of image classification using IoT devices. Specifically, we explore the relationships between various factors of image classification algorithms that may affect energy consumption, such as dataset size, image resolution, algorithm type, algorithm phase, and device hardware."
2018	Accelerating CNN inference on FPGAs: A Survey	https://arxiv.org/pdf/1806.01683.pdf	Talks about tricks that can be used to minimize computation and maximize performance in CNN applications on FPGAs. Also talks about quantization, so may be interesting to look at in the future.
2015	Learning both Weights and Connections for Efficient Neural Networks	https://proceedings.neurips.cc/paper/2015/file/ae0eb3eed39d2bcef4622b2499a05fe6-Paper.pdf	conventional networks fix the architecture before training starts; as a result, training cannot improve the architecture. To address these limitations, they describe a method to reduce the storage and computation required by neural networks by an order of magnitude without affecting their accuracy by learning only the important connections.
2021	Anomaly Detection Based on Tiny Machine Learning: A Review	https://oiji.utm.my/index.php/oiji/article/view/148/109	By using TensorFlow Lite Micro, the TinyML can be trained to undergo anomaly detection. However, the machine learning algorithm had to be exported from TensorFlow, then TensorFlow Lite, and finally TensorFlow Lite Micro in order to upload the machine learning algorithm into TinyML. This paper highlights the state of the art of the current works on TinyML. Some suggestions on the research direction are also introduced for potential future endeavors.
2019	Squeezing the last MHz for CNN acceleration on FPGAs	https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8871619	Overclocking KCU1500 will decrease runtime while maintaining near-constant accuracy. Tested on 4 different CNN models: LeNet, AlexNet, VGG-16, VGG-19. Provides specific clock numbers and performance as well as accuracy for all 4 models.
2021	An Adaptive Row-based Weight Reuse Scheme for FPGA Implementation of Convolutional Neural Networks	https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9501490	Proposes an adaptive row reuse scheme by applying each level of row-reuse for each layer depending on its characteristics. The proposed design is implemented with a Xilinx KCU1500 board with 1.7 times less buffer size than previous works when running VGG-16.
2018	A convolutional neural network-based screening tool for X-ray serial crystallography	https://web.eecs.umich.edu/~stellayu/publication/doc/2018xrayJSR.pdf	Opportunity to explore SNL. That could be Daniel or Patrick research project This papers proposes the use of convolutional network to classify datasets in the x-ray domain. It would be very interesting to reproduce this effort with the SNL and demonstrate how it performs with a hardware implementation. Dataset is available and was produce with the now legacy CSPAD.
2023	Artificial neural network on-chip and in-pixel implementation towards pulse amplitude measurement	https://iopscience.iop.org/article/10.1088/1748-0221/18/02/C02048	This paper presents a tiny model for ADC data correction on streaming. Can we correct data for CRYO? Also can this be applied to correct data for pixelated detectors (future gama TPC

Space shortcuts

Confluence Content

Child pages

TinyML in the context of HEP project - KA25

In ASIC SNL - KA25

Project proposal for LCLS-II projects

SNL demos - BES

Relevant literature