Convolutional Neural Networks (CNNs) can be compute-intense models that strain the capabilities of embedded devices executing them. Hardware accelerators support edge-computing environments by providing specialized resources to increase computation efficiency. Nevertheless, they usually reduce flexibility, either providing a limited set of operations or by supporting integer operands of specific bitwidth only. Therefore, an HW-SW co-design strategy is key in this context to synergically combine CNN optimizations with the underlying HW modules. Pruning and quantization are algorithmic-level transformations that effectively reduce memory requirements and computing complexity, potentially affecting CNN accuracies. As a consequence they need to be carefully employed, directly controlling accuracy degradation during the optimization phase, and by taking into account the HW characteristics to effectively leverage them to improve efficiency.
Related Publications
Bit-Line Computing for CNN Accelerators Co-Design in Edge AI Inference | |||||
Rios, Marco; Ponzina, Flavio; Levisse, Alexandre Sébastien Julien; Ansaloni, Giovanni; Atienza Alonso, David | |||||
2023 | IEEE Transactions on Emerging Topics in Computing | ||||
Overflow-free compute memories for edge AI acceleration | |||||
Ponzina, Flavio; Rios, Marco Antonio; Levisse, Alexandre Sébastien Julien; Ansaloni, Giovanni; Atienza Alonso, David | |||||
2023 | ACM Transactions on Embedded Computing Systems (TECS) |