Infusing power into 4 wire smart pixel

11/6/2022

Each layer is assigned with the computation pattern that costs the lowest energy. RANA schedules networks in a hybrid computation pattern based on this model. Scheduling Level: A system energy consumption model is built in consideration of computing energy, on-chip buffer access energy, refresh energy and off-chip memory access energy.A higher tolerable failure rate leads to longer tolerable retention time, so more refresh can be removed. Bit-level retention errors are injected during training, so the network' s tolerance to retention failures is improved. Training Level: A retention-aware training method is proposed to improve eDRAM's tolerable retention time with no accuracy loss.RANA: Towards Efficient Neural Acceleration with Refresh-Optimized Embedded DRAM.RANA includes three techniques from the training, scheduling, architecture levels respectively. A retention-aware neural acceleration (RANA) framework has been designed, which strengthens DNN accelerators with refresh-optimized eDRAM to save total system energy. Since the runtime sparsity causes time-varying workload parallelism that harms performance and efficiency, we design a reconfigurable computing engine (RCE) with an online configuration compiler (OCC) for Evolver, in order to dynamically reconfigure dataflow parallelism to match workload parallelism.The feedforward speculation also benefits the execution mode.

We exploit the inherent sparsity of feature/error maps in DNN training’s feedforward and backpropagation passes, and design a bidirectional speculation unit (BSU) to capture runtime sparsity and discard zero-output computation, thus reducing training cost.An outlier-skipping scheme is proposed to save unnecessary training for invalid policies under the profiled latency and energy constraints. Evolver contains a reinforcement learning unit (RLU) that searches QVF polices based on its direct feedbacks.Evolver: A Deep Learning Processor with On-Device Quantization-Voltage-Frequency Tuning.Compared with conventional QVF tuning that determines policies offline, Evolver make optimal customizations for local user scenarios. I have designed a deep learning processor (Evolver) with on-device quantization-voltage-frequency (QVF) tuning. Hope my new papers will come out soon in the near future. I'm working on energy-efficient architecture design for deep learning. 2021: ISSCC, ISCA, MICRO, HPCA, ASPLOS, ICCAD, VLSI, HotChips.2020: ISSCC, ISCA, MICRO, HPCA, ASPLOS, DAC, FPGA, ICCAD, VLSI, HotChips.2019: ISSCC, ISCA, MICRO, HPCA, ASPLOS, DAC, FPGA, ICCAD, ASPDAC, VLSI, HotChips, ASSCC.2018: ISSCC, ISCA, MICRO, HPCA, ASPLOS, DAC, FPGA, ICCAD, DATE, ASPDAC, VLSI, HotChips.2017: ISSCC, ISCA, MICRO, HPCA, ASPLOS, DAC, FPGA, ICCAD, DATE, VLSI, FCCM, HotChips.2016: ISSCC, ISCA, MICRO, HPCA, DAC, FPGA, ICCAD, DATE, ASPDAC, VLSI, FPL.This is an exciting field where fresh ideas come out every day, so I'm collecting works on related topics. One of my research interests is architecture design for deep learning. For more informantion about me and my research, you can go to my homepage. degree from the Institute of Microelectronics, Tsinghua University.

Yuan Xie, as a postdoctoral researcher at the Electrical and Computer Engineering Department, UCSB.

0 Comments

Author

Archives

Categories

Infusing power into 4 wire smart pixel

Leave a Reply.