stuff I have done over the years
Ph.D, May 2012
Title: "Design optimizations of spiking hardware neurons"
Computational Neuroengineering Lab (CNEL)
Electrical and Computer Engineering, University of Florida
M.S., May 2008
Electrical and Computer Engineering, University of Florida
B.Tech, May 2005
Information and Communication Tech., DA-IICT
(July 2012 - )
Machine Learning Scientist, HP Labs, HP CTO Office
Machine Learning, Computational Neuroscience and Computer Architecture ( Jun 2019- present)
I work at the intersection of Machine Learning, hardware design and computational neuroscience.
Staff Engineer, Qualcomm AI Research
RNN, LSTM based Network design ( Jan 2018- May 2019)
A ultra low complexity network design for optimizing power consumption of the mobile GPU and DDR. This involved not just designing the network but also designing custom loss functions. I explored some reinforcement learning algorithms for this problem.
Staff Engineer, Qualcomm Research
Algorithm/Deep Learning design ( Jan 2016- Oct 2017)
Multi-modal deep learning network design for visual question answering. This included designing low complexity deep neural network architecture for scene recognition along with recurrent neural network for sentiment analysis. Tools used: Keras/Tensorflow and Python.
Algorithm/Neural Network design and Embedded ( Jan 2016- Oct 2017)
Performance Optimizations, including memory bandwidth and computation, for executing Recurrent neural network using LSTM and convolutional layers. Network was designed in Keras for a sample application and then optimized at the micro kernel level like GPU optimizations.
Machine Learning Focused Processor Modeling ( Nov 2015- Jan 2016)
Created a cycle accurate model for a new processor architecture in C++ and Synopsys’ language LISA. The processor model was aimed at accelerating deep learning networks. Optimizations were made on a layer-type basis.
Software Design ( Nov 2015- Jan 2016)
Backend tools Regression setup and management using Python and c-shell scripts. Abstracting and automating various hardware tools at Qualcomm such as synthesis, power estimation, gate-level simulations, static timing analysis and formal verification. The automation was done in a manner so that all the entire tool chain could be run as a single push- button script. Parsers were developed for determining whether or not a tool ran successfully and accordingly remove the subsequent jobs from the queue.
Senior Engineer, Qualcomm Research
Embedded System for Signal Processing ( Oct 2014-Nov 2015 )
Worked on mapping C++/C compute microkernels (for voice activation and hot word recognition using MFCC and GMM based approach) onto a new hardware accelerator for high performance. This involves parallelizing C++ kernels for optimum use of memory using the underlying ISA and profiling these through simulation. The profiling data is used to characterize performance of the architecture for improving the mico-architecture as well as enhancing the audio codec and machine learning algorithms.
Signal Processing/Machine Learning (Mar 2014 - Oct 2014)
Novel face recognition feature extraction algorithm development for computer vision applications. The algorithm was developed for a new kind of sparse sampling imager. The algorithm was tied to support vector machine based classifier for facial recognition. Since the underlying math for sparse sampling does not exist we had to develop that math and filed a few patents related to this. Also, had the opportunity to give a demo and presentation to the Qualcomm Executive team (CEO, CTO, CFO and the Executive VPs). Language used were python and Matlab.
Machine Learning (Mar 2014 - Oct 2014)
Led the effort to evaluate the feasibility of RNN on hardware. Extensively studied various existing recurrent neural networks (RNN) for handwriting recognition. Study was aimed at understanding various state-of-art RNNs such as fully connected RNNs, Hopfield networks, Elman Jordan networks, echo-state networks with an aim of implementing these networks in hardware. As part of the feasibility I studied various RNN topologies and tweaking some of the sub blocks such as LSTM (long short term memory) for a favorable hardware implementation while maintaining the ROC curve. The feasibility findings were presented to the leadership at Qualcomm R&D. Language used was Python.
Computational Neuroscience ( Oct 2013-Mar 2014 )
Sparse feature learning using multi-layer spiking neural networks. The patterns were sparse music patterns (RIFF codes) embedded in severe noise. These networks were very similar to the song-bird recognition networks. Development of these networks also required developing a very elaborate stimulus generator in python. The networks were developed in in-house network description language.
Statistical Signal Processing and Circuit Design ( Oct 2013-Mar 2014 )
Led a team for exploring inference based probabilistic algorithm implementation using stochastic hardware. Developed the project proposal and proposed this project to the Qualcomm R&D leadership. Worked on implementing an annealed Gibbs sampler on a Markov Random Field for stereo vision using stochastic hardware. Developed python libraries and simulated transistor based circuits running on ultra-low supply voltages such that they could act as stochastic hardware substrate.
ASIC Verification (Jul 2012 - Oct 2013)
My work included coming up with ASIC validation and verification (V&V) methodologies including test plans, verification reviews, and test and script development for neuromorphic (spiking neural networks) architectures implemented on FPGAs. Blocks verified:
a. Dual port and single port memory system architecture with multiple RAM banks and multiple read clients and fair arbitration.
b. Neuron models, dynamical systems that can be best described as large mathematical or DSP filter IIR models.
c. On chip flit routers with asynchronous flow control.
Each block required complete ownership of the verification environment. This required developing the test plan, tests and regression setups. Starting from scratch for each of the blocks I developed verification agents, coverage classes, tests, scoreboard, sequencers and checkers. The methodology used was UVM using System-Verilog and Python.
Ph.D Research Work
Research Assistant Computational Neuro-Engineering Laboratory (CNEL), University of Florida, Gainesville (May 2007 - May 2012)
Computational Neuro-Engineering Laboratory (CNEL), Gainesville Research work focused on developing circuits and networks inspired from computational neuroscience.
As part of my research work I designed and used silicon neurons as time based data encoders or asynchronous analog-to-digital converters. One of the applications was neural signal recording where the action potential is converted into asynchronous spike trains, resulting in significant bandwidth reduction. Over the course of my PhD I designed and taped-out about 20 chips with an emphasis on ultra-low power circuits. Modeled and contributed to theory of bench marking these irregular samplers.
I also contributed to the development efforts at my lab to build a configurable spiking network architecture. These architectures were used for phoneme recognition using liquid state machines and reservoir computing. One such contribution was developing a real-time system of recording audio data from the sound card and applying gamma- tone cochlear filters (DSP algorithms) and the Meddis hair cell model in Java and C++.
I was very fortunate to get the opportunity to attend the "Telluride Neuromorphic Engineering Workshop" in 2008. The workshop is invite only and only 20-30 participants are selected from world-over. As part of my projects there I worked on the java based API jAER to drive a car using the spike inputs from the silicon retina. For this work I was named the "Most promising Neuromorphic researcher of the year"
Some of the circuits I designed during this time are listed below. The work included design, simulation, fabrication, test measurements and debugging of the chips and PCB design using Cadence and Matlab. The circuit design focus is on the following analog, mixed-signal and digital blocks:
Low Power Comparators - Adaptively biased asynchronous track and latch comparators for minimum energy consumption, the circuit consumes near zero static-power. The chip was designed in AMI 0.6um and had measured energy of 38 pJ for a pulse rate of 100 KHz.
16 R2R Current Steering Digital-to-analog converter (DAC, 8-bits each)–The R2R ladder is built using MOSFETs for current division where each R is made up of 2 transistors. Combinations of these DACs are used for setting and tuning current and voltage biases for various Amplifiers and V-I blocks in the neural implant. The DAC was designed in AMI 0.6um and was tested for input currents varying from 1nA to 10uA and for output currents ranges of 0.1pA - 0.9nA and 0.6pA- 9.5uA respectively.
Wildrar Current Source with startup circuit- The current source is used as current input to the DAC mentioned above. The startup circuit has soft power control for gracefully starting and stopping the current source. The value of current output of the current source can be changed by an external resistance. It was designed in AMI 0.6 um and was tested for current outputs from 1nA to 100uA using digital trimming.
On chip Sine Wave Oscillator– Currently under design for generating a low amplitude (10 mVpp) sine wave in the wireless neural implant to serve as calibration signal for the V-I and the comparators.
Current Conveyor II – Currently under design, the CCII would be used for recording current from the electrodes in the neural implants, this used with the silicon integrate and fire neurons would get rid of the conventional approach of using voltage amplifiers as front ends in the neural implants. The range is 1pA-200nA.
Introduction to Microprocessors, Dhirubhai Institute of Information and Communication Technology, Gandhinagar, Gujarat, India. Jan 2005 – May 2005
Introduction to Information and Communication Technology, Dhirubhai Institute of Information and Communication Technology, Gandhinagar, Gujarat, India. Aug 2004– Dec 2004
Circuits II (EEL 3112), Electrical and Computer Engineering Dept., University of Florida, Aug 2006 to Dec 2006
Circuits II (EEL 3112), Electrical and Computer Engineering Dept., University of Florida, Jan 2007 to May 2007
Analytical Methods (EEL 3105), Electrical and Computer Engineering Department, University of Florida May 2007 to Aug 2007
Programming in ECE (EEL 3834), Electrical and Computer Engineering Department, University of Florida Nov 2011
Manu Rastogi. Design Optimizations of Spiking Hardware Neurons.Ph.D Thesis, University of Florida, 2012[link]
Manu Rastogi An ECG analysis System. B.Tech Thesis, Dhirubhai Ambani Institute of Information and Communication Technology (DA-IICT), Gandhinagar, Gujarat, India, 2005. (Download)
Manu Rastogi , Vaibhav Garg, John G. Harris. Low power integrate and fire circuit for data conversion. In: 2009 IEEE International Symposium on Circuits and Systems. IEEE; 2009:2669-2672. [link]
Sheng Feng Yen, Jie Xu, Manu Rastogi , et al. An integrated recording system using an asynchronous pulse representation. In: 2009 4th International IEEE/EMBS Conference on Neural Engineering. IEEE; 2009:399-402. [link] .
Sheng Feng Yen, Jie Xu, Manu Rastogi, et al. A biphasic integrate-and-fire system. In: 2009 IEEE International Symposium on Circuits and Systems. IEEE; 2009:657-660. [link]
John G. Harris , Manu Rastogi , Alexander Singh-Alvarado , et al. Real time signal reconstruction from spikes on a digital signal processor. In: 2008 IEEE International Symposium on Circuits and Systems. IEEE; 2008:1060-1063. [link]
Manu Rastogi , Dipankar Nagchoudhuri , Chetan Parikh. Quadratic Phase Coupling in ECG Signals. In: 2005 Asian Conference on Sensors and the International Conference on New Techniques in Pharmaceutical and Biomedical Research. IEEE; 2005:74-77. [link].
Manu Rastogi, Alexander Singh Alvarado , John G. Harris , JC Principe . Integrate and Fire Circuit as an ADC replacement. In: 2011 IEEE International Symposium on Circuits and Systems. [link] Best Paper Award given by the Neural Systems and Applications Technical Committee (NSATC)
Alexander Singh Alvarado, Manu Rastogi, John G. Harris, JC Principe. The Integrate-and-Fire Sampler: A Special Type of Asynchronous Σ - Δ Modulator. In: 2011 IEEE International Symposium on Circuits and Systems [link]
Quadratic Phase Coupling in ECG signals for detection of Arrhythmia at Sensors and the International Conference on new Techniques in Pharmaceutical and Biomedical Research, Organized by IEEE and Mara University of Technology, Malaysia 2005. [pdf]
Real time signal reconstruction from spikes on a digital signal processor, IEEE International Symposium on Circuits and Systems, Seattle, 2008. [pdf]
I&F Circuit for speech encoding. Presented to the panel on NSF-Partnership for Innovation Program headed by University of Florida, Dean College of Engineering, 2008.
Low Power Integrate and Fire Circuit for Data Conversion, IEEE International Symposium on Circuits and Systems, Taiwan, 2009.
Low Power circuits and models for asynchronous pulse representation of Continuous signals. Circuit Modeling Group, Intel Design and Technology Solutions Organization, Portland, Oregon, 2011.
The Integrate-and-Fire Sampler: a Special Type of Asynchronous Sigma-Delta Modulator, IEEE International Symposium on Circuits and Systems, Brazil, 2011.
The Integrate-and-Fire Circuit as an ADC replacement, IEEE International Symposium on Circuits and Systems, Brazil, 2011. [pdf]
Design Optimizations for Spiking Hardware Neurons, ASIC Design, Corporate R&D, Qualcomm Research, San Diego, 2011 [pdf]
Invited Tutorial on Power Reduction Techniques for Mobile Chips at IEEE SOI-3D-Subthreshold Microelectronics Technology Unified Conference.
V. Rangan, W. H. CONSTABLE, X. Wang, and M. RASTOGI, “Event-based down sampling,” US9883122 B2, 2018.
X. Wang, M. Rastogi, V. Rangan, and W. H. Constable, “Event-based spatial transformation,” US9846677B2, 2017.
J. Kan, M. Rastogi, K. Lee, and S. H. Kang, “Comparator including a magnetic tunnel junction (MTJ) device and a transistor,” US9813049B2, 2017.