Resume

stuff I have done over the years

Education

Ph.D, May 2012

Title: "Design optimizations of spiking hardware neurons"

Computational Neuroengineering Lab (CNEL)

Electrical and Computer Engineering, University of Florida

M.S., May 2008

Electrical and Computer Engineering, University of Florida

B.Tech, May 2005

Information and Communication Tech., DA-IICT

Work Experience

Machine Learning Manager, Apple

Machine Learning Manager ( Sept 2020- Oct 2024)

At Apple, I had the privilege of supporting a team of machine learning researchers and scientists within the Vision Pro organization, where we pushed the boundaries of what intelligent spatial computing can achieve. Our focus was on building AI systems that could perceive, understand, and respond to the world in deeply human ways — bridging the gap between the physical and digital.

We developed cutting-edge multi-modal algorithms and transformer models capable of learning from vision, motion, and context. We worked on ultra–low-power AI optimized for on-device performance, hardware-aware neural architecture search (NAS), and contextual understanding frameworks that allowed systems to interpret actions, environments, and intent in real time.

Our explorations also extended into LLM optimizations, temporal convolutional architectures, and both sparse and dense computer vision, all aimed at creating seamless, intelligent experiences that adapt to users and their surroundings.

At its core, my role was about shaping algorithmic architecture that made intelligence feel effortless — powering the kind of immersive, intuitive interactions that define the future of mixed reality.

Principal Engineer, HP Labs, HP CTO Office

Machine Learning at the Edge ( May 2019- June 2020)

I was part of a small team of machine learning engineers exploring the feasibility of detecting emotions using video and audio streams. To achieve higher accuracy, we explored joint representation learning from video and audio data. The models are being trained on the RAVDESS dataset.

Leading HP-NTU collaboration ( May 2019- June 2020)

I contributed to defining the machine learning accelerator (MLA) research agenda and setting the long-term MLA vision at HP. This involved advising business units pan HP, including ventures on the long-term MLA roadmap, strategy, and product definition. I partnered with product teams to understand their needs and facilitate tech transfer. I also provided regular updates on the research direction to the HP-CTO and his staff. This also included leading, guiding, managing, and setting the research agenda around MLA research for a research lab at the HP-NTU research center at Nanyang Technological University (NTU), Singapore. Research areas were: quantization, compression, model optimization, and network architecture search. (Team Size: 8)

Self-Supervised Deep Learning ( May 2019- June 2020)

Led the self-supervised research at HP Labs. Research involved developing speaker verification models for low-data enrollment regimes. We used contrastive predictive coding (CPC) and mutual information techniques for training the network. Work also involves understanding the effect of noise on latent space representation. We achieved 30-40% accuracy improvement over supervised techniques in low-utterance enrollment. We are in the preliminary stages of exploring this model for speaker diarization.

Staff Engineer/Manager, Qualcomm AI Research

RNN, LSTM-based Network design ( Jan 2018- May 2019)

An ultra-low complexity network design for optimizing power consumption of the mobile GPU and DDR. This involved not just designing the network but also designing custom loss functions. I explored some reinforcement learning algorithms for this problem.

Staff Engineer/Manager, Qualcomm Research

Algorithm/Deep Learning design ( Jan 2016- Oct 2017)

Multi-modal deep learning network design for visual question answering. This included designing a low complexity deep neural network architecture for scene recognition, along with a recurrent neural network for sentiment analysis. Tools used: Keras/Tensorflow and Python.

Algorithm/Neural Network design and Embedded ( Jan 2016- Oct 2017)

Performance Optimizations, including memory bandwidth and computation, for executing a Recurrent neural network using LSTM and convolutional layers. Network was designed in Keras for a sample application and then optimized at the micro kernel level, like GPU optimizations.

Machine Learning Focused Processor Modeling ( Nov 2015- Jan 2016)

Created a cycle-accurate model for a new processor architecture in C++ and Synopsys’ language LISA. The processor model was aimed at accelerating deep learning networks. Optimizations were made on a layer-type basis.

Software Design ( Nov 2015- Jan 2016)

Backend tools Regression setup and management using Python and c-shell scripts. Abstracting and automating various hardware tools at Qualcomm such as synthesis, power estimation, gate-level simulations, static timing analysis and formal verification. The automation was done in a manner so that all the entire tool chain could be run as a single push- button script. Parsers were developed for determining whether or not a tool ran successfully and accordingly remove the subsequent jobs from the queue.

Senior Engineer, Qualcomm Research

Embedded System for Signal Processing ( Oct 2014-Nov 2015 )

Worked on mapping C++/C compute microkernels (for voice activation and hot word recognition using MFCC and GMM based approach) onto a new hardware accelerator for high performance. This involves parallelizing C++ kernels for optimum use of memory using the underlying ISA and profiling these through simulation. The profiling data is used to characterize performance of the architecture for improving the mico-architecture as well as enhancing the audio codec and machine learning algorithms.

Signal Processing/Machine Learning (Mar 2014 - Oct 2014)

Novel face recognition feature extraction algorithm development for computer vision applications. The algorithm was developed for a new kind of sparse sampling imager. The algorithm was tied to support vector machine based classifier for facial recognition. Since the underlying math for sparse sampling does not exist we had to develop that math and filed a few patents related to this. Also, had the opportunity to give a demo and presentation to the Qualcomm Executive team (CEO, CTO, CFO and the Executive VPs). Language used were python and Matlab.

Machine Learning (Mar 2014 - Oct 2014)

Led the effort to evaluate the feasibility of RNN on hardware. Extensively studied various existing recurrent neural networks (RNN) for handwriting recognition. Study was aimed at understanding various state-of-art RNNs such as fully connected RNNs, Hopfield networks, Elman Jordan networks, echo-state networks with an aim of implementing these networks in hardware. As part of the feasibility I studied various RNN topologies and tweaking some of the sub blocks such as LSTM (long short term memory) for a favorable hardware implementation while maintaining the ROC curve. The feasibility findings were presented to the leadership at Qualcomm R&D. Language used was Python.

Computational Neuroscience ( Oct 2013-Mar 2014 )

Sparse feature learning using multi-layer spiking neural networks. The patterns were sparse music patterns (RIFF codes) embedded in severe noise. These networks were very similar to the song-bird recognition networks. Development of these networks also required developing a very elaborate stimulus generator in python. The networks were developed in in-house network description language.

Statistical Signal Processing and Circuit Design ( Oct 2013-Mar 2014 )

Led a team for exploring inference based probabilistic algorithm implementation using stochastic hardware. Developed the project proposal and proposed this project to the Qualcomm R&D leadership. Worked on implementing an annealed Gibbs sampler on a Markov Random Field for stereo vision using stochastic hardware. Developed python libraries and simulated transistor based circuits running on ultra-low supply voltages such that they could act as stochastic hardware substrate.

ASIC Verification (Jul 2012 - Oct 2013)

My work included coming up with ASIC validation and verification (V&V) methodologies including test plans, verification reviews, and test and script development for neuromorphic (spiking neural networks) architectures implemented on FPGAs. Blocks verified:

a. Dual port and single port memory system architecture with multiple RAM banks and multiple read clients and fair arbitration.

b. Neuron models, dynamical systems that can be best described as large mathematical or DSP filter IIR models.

c. On chip flit routers with asynchronous flow control.

Each block required complete ownership of the verification environment. This required developing the test plan, tests and regression setups. Starting from scratch for each of the blocks I developed verification agents, coverage classes, tests, scoreboard, sequencers and checkers. The methodology used was UVM using System-Verilog and Python.

Ph.D Research Work

Research Assistant Computational Neuro-Engineering Laboratory (CNEL), University of Florida, Gainesville (May 2007 - May 2012)

Computational Neuro-Engineering Laboratory (CNEL), Gainesville Research work focused on developing circuits and networks inspired from computational neuroscience.

As part of my research work I designed and used silicon neurons as time based data encoders or asynchronous analog-to-digital converters. One of the applications was neural signal recording where the action potential is converted into asynchronous spike trains, resulting in significant bandwidth reduction. Over the course of my PhD I designed and taped-out about 20 chips with an emphasis on ultra-low power circuits. Modeled and contributed to theory of bench marking these irregular samplers.

I also contributed to the development efforts at my lab to build a configurable spiking network architecture. These architectures were used for phoneme recognition using liquid state machines and reservoir computing. One such contribution was developing a real-time system of recording audio data from the sound card and applying gamma- tone cochlear filters (DSP algorithms) and the Meddis hair cell model in Java and C++.

I was very fortunate to get the opportunity to attend the "Telluride Neuromorphic Engineering Workshop" in 2008. The workshop is invite only and only 20-30 participants are selected from world-over. As part of my projects there I worked on the java based API jAER to drive a car using the spike inputs from the silicon retina. For this work I was named the "Most promising Neuromorphic researcher of the year"

Circuits Designed

Some of the circuits I designed during this time are listed below. The work included design, simulation, fabrication, test measurements and debugging of the chips and PCB design using Cadence and Matlab. The circuit design focus is on the following analog, mixed-signal and digital blocks:

Low Power Comparators - Adaptively biased asynchronous track and latch comparators for minimum energy consumption, the circuit consumes near zero static-power. The chip was designed in AMI 0.6um and had measured energy of 38 pJ for a pulse rate of 100 KHz.

16 R2R Current Steering Digital-to-analog converter (DAC, 8-bits each)–The R2R ladder is built using MOSFETs for current division where each R is made up of 2 transistors. Combinations of these DACs are used for setting and tuning current and voltage biases for various Amplifiers and V-I blocks in the neural implant. The DAC was designed in AMI 0.6um and was tested for input currents varying from 1nA to 10uA and for output currents ranges of 0.1pA - 0.9nA and 0.6pA- 9.5uA respectively.

Wildrar Current Source with startup circuit- The current source is used as current input to the DAC mentioned above. The startup circuit has soft power control for gracefully starting and stopping the current source. The value of current output of the current source can be changed by an external resistance. It was designed in AMI 0.6 um and was tested for current outputs from 1nA to 100uA using digital trimming.

On chip Sine Wave Oscillator– Currently under design for generating a low amplitude (10 mVpp) sine wave in the wireless neural implant to serve as calibration signal for the V-I and the comparators.

Current Conveyor II – Currently under design, the CCII would be used for recording current from the electrodes in the neural implants, this used with the silicon integrate and fire neurons would get rid of the conventional approach of using voltage amplifiers as front ends in the neural implants. The range is 1pA-200nA.

Teaching

Introduction to Microprocessors, Dhirubhai Institute of Information and Communication Technology, Gandhinagar, Gujarat, India. Jan 2005 – May 2005

Introduction to Information and Communication Technology, Dhirubhai Institute of Information and Communication Technology, Gandhinagar, Gujarat, India. Aug 2004– Dec 2004

Circuits II (EEL 3112), Electrical and Computer Engineering Dept., University of Florida, Aug 2006 to Dec 2006

Circuits II (EEL 3112), Electrical and Computer Engineering Dept., University of Florida, Jan 2007 to May 2007

Analytical Methods (EEL 3105), Electrical and Computer Engineering Department, University of Florida May 2007 to Aug 2007

Programming in ECE (EEL 3834), Electrical and Computer Engineering Department, University of Florida Nov 2011

Publications

Theses/Reports

Manu Rastogi. Design Optimizations of Spiking Hardware Neurons.Ph.D Thesis, University of Florida, 2012[link]

Manu Rastogi An ECG analysis System. B.Tech Thesis, Dhirubhai Ambani Institute of Information and Communication Technology (DA-IICT), Gandhinagar, Gujarat, India, 2005. (Download)

Peer-Reviewed Conferences

Manu Rastogi , Vaibhav Garg, John G. Harris. Low power integrate and fire circuit for data conversion. In: 2009 IEEE International Symposium on Circuits and Systems. IEEE; 2009:2669-2672. [link]

Sheng Feng Yen, Jie Xu, Manu Rastogi , et al. An integrated recording system using an asynchronous pulse representation. In: 2009 4th International IEEE/EMBS Conference on Neural Engineering. IEEE; 2009:399-402. [link] .

Sheng Feng Yen, Jie Xu, Manu Rastogi, et al. A biphasic integrate-and-fire system. In: 2009 IEEE International Symposium on Circuits and Systems. IEEE; 2009:657-660. [link]

John G. Harris , Manu Rastogi , Alexander Singh-Alvarado , et al. Real time signal reconstruction from spikes on a digital signal processor. In: 2008 IEEE International Symposium on Circuits and Systems. IEEE; 2008:1060-1063. [link]

Manu Rastogi , Dipankar Nagchoudhuri , Chetan Parikh. Quadratic Phase Coupling in ECG Signals. In: 2005 Asian Conference on Sensors and the International Conference on New Techniques in Pharmaceutical and Biomedical Research. IEEE; 2005:74-77. [link].

Manu Rastogi, Alexander Singh Alvarado , John G. Harris , JC Principe . Integrate and Fire Circuit as an ADC replacement. In: 2011 IEEE International Symposium on Circuits and Systems. [link] Best Paper Award given by the Neural Systems and Applications Technical Committee (NSATC)

Alexander Singh Alvarado, Manu Rastogi, John G. Harris, JC Principe. The Integrate-and-Fire Sampler: A Special Type of Asynchronous Σ - Δ Modulator. In: 2011 IEEE International Symposium on Circuits and Systems [link]

Presentations and Talks

Quadratic Phase Coupling in ECG signals for detection of Arrhythmia at Sensors and the International Conference on new Techniques in Pharmaceutical and Biomedical Research, Organized by IEEE and Mara University of Technology, Malaysia 2005. [pdf]

Real time signal reconstruction from spikes on a digital signal processor, IEEE International Symposium on Circuits and Systems, Seattle, 2008. [pdf]

I&F Circuit for speech encoding. Presented to the panel on NSF-Partnership for Innovation Program headed by University of Florida, Dean College of Engineering, 2008.

Low Power Integrate and Fire Circuit for Data Conversion, IEEE International Symposium on Circuits and Systems, Taiwan, 2009.

Low Power circuits and models for asynchronous pulse representation of Continuous signals. Circuit Modeling Group, Intel Design and Technology Solutions Organization, Portland, Oregon, 2011.

The Integrate-and-Fire Sampler: a Special Type of Asynchronous Sigma-Delta Modulator, IEEE International Symposium on Circuits and Systems, Brazil, 2011.

The Integrate-and-Fire Circuit as an ADC replacement, IEEE International Symposium on Circuits and Systems, Brazil, 2011. [pdf]

Design Optimizations for Spiking Hardware Neurons, ASIC Design, Corporate R&D, Qualcomm Research, San Diego, 2011 [pdf]

Invited Tutorial on Power Reduction Techniques for Mobile Chips at IEEE SOI-3D-Subthreshold Microelectronics Technology Unified Conference.

Tutorial on Micro-kernel based hardware acceleration for deep learning. TinyML Webinars, 2020. [pdf] [talk]

Patents

V. Rangan, W. H. CONSTABLE, X. Wang, and M. RASTOGI, “Event-based down sampling,” US9883122 B2, 2018. [Link]

X. Wang, M. Rastogi, V. Rangan, and W. H. Constable, “Event-based spatial transformation,” US9846677B2, 2017. [Link]

J. Kan, M. Rastogi, K. Lee, and S. H. Kang, “Comparator including a magnetic tunnel junction (MTJ) device and a transistor,” US9813049B2, 2017. [Link]

Wang, X., Constable, W.H., Rangan, V. and Rastogi, M., Qualcomm Inc, 2018. Interfacing an event based system with a frame based processing system. U.S. Patent 10,147,024. [Link]

Xin Wang, Young Cheul Yoon, Manu Rastogi, Event-driven temporal convolution for asynchronous pulse-modulated sampled signals, US 11,551,076 B2 [Link]

M Anthony Lewis, Iyer Amalendu, Manu Rastogi, System and method for disambiguation of Internet-of-Things devices, US 2020/0106632 A1, [Link]

Cammarota, Rosario; Rastogi, Manu, "Multiply-accumulate (mac) operations for convolutional neural networks" [Link]

Rosario Cammarota, Michael Goldfarb, Manu Rastogi, Sarang Ozarde, "Optimizing performance of recurrent neural networks",[Link]

Professional Experience

Technical Program Committee member Sensys-ML 2020

Guest Associate Editor, Journal of Frontiers in Machine Learning at the Edge, 2019-2020

Associate Editor, Journal of Frontiers in Neuromorphic Systems, 2019-2020

Associate Editor, IEEE Circuits and Systems Open Journal

Industrial Board Chair, IEEE ISCAS 2016-2017.

Review Committee Member, IEEE ISCAS 2017.

IEEE ISCAS reviewer since 2006.

NIPS reviewer.

IEEE TNLS reviewer

IEEE BioCAS reviewer

IEEE TBIOCAS reviewer

Head, Organizing Committee, Cultural Festival in DA-IICT, 2002

Convener, Mess Committee DA-IICT, Aug 2001-Dec 2001.

IEEE member, Student Chapter DA-IICT, Gandhinagar. Involved in organizing various IEEE workshops at DA-IICT.

Represented India at the World Festival of Theatre’96, Denmark, Copenhagen

Member, Student Faculty Joint Committee to formulate Student Constitution for DA-IICT

Member, Student Committee, National Workshop on Challenges in VLSI, to be held at DA-IICT on 13th-14th May 2005

Page updated

Report abuse