Held in conjunction with CVPR 2023 (Vancouver), June 19 (Full Day), 2023
Main Theme: Embedded Neural Architecture
Organized by: Tse-Wei Chen, Branislav Kisacanin, Marius Leordeanu, Ahmed Nabil Belbachir
| Vancouver Time 6/19 (GMT-7) | Session | Speaker | Topic |
| 8:30 | EVW Committee | Welcome notes | |
| 8:50 | Invited Talk #1 | Lars Ebbesson | The role of AI-powered machine vision for sustainable aquaculture and food security: opportunities and challenges |
| 9:30 | Invited Talk #2 | Jose Alvarez | Optimizing large deep models for real-time inference |
| 10:10 | Break | ||
| 10:30 | Paper ID #2 | Yong-Sheng Chen | ES³Net: Accurate and Efficient Edge-based Self-Supervised Stereo Matching Network |
| 10:50 | Paper ID #4 | Jef Plochaet | Hardware-Aware Pruning for FPGA Deep Learning Accelerators |
| 11:10 | Invited Talk #3 | Branislav Kisacanin | What new embedded challenges can we expect from AI? |
| 11:50 | Long Break | ||
| 13:10 | Invited Talk #4 | Andreas Wendel | Effective Fault Monitoring in Autonomous Systems |
| 13:40 | Paper ID #5 | Ethan J Goan | Uncertainty in Real-Time Semantic Segmentation on Embedded Systems |
| 14:00 | Paper ID #6 | Manan Suri | Fully-Binarized Distance Computation based On-device Few-Shot Learning for XR applications |
| 14:20 | Invited Talk #5 | René Vidal | Efficient Vision Transformer for Human Pose Estimation via Patch Selection |
| 15:00 | Break | ||
| 15:20 | Invited Talk #6 | Huanrui Yang | Exploring Bit-Level Patterns for Efficient NN Quantization and Deployment |
| 16:00 | Invited Talk #7 (Online) | Mai Xu | Perception-inspired video coding |
| 16:40 | EVW Committee | Closing Remarks |
Description
Embedded vision is an active field of research, bringing together efficient learning models with fast computer vision and pattern recognition algorithms, to tackle many areas of robotics and intelligent systems that are enjoying an impressive growth today. Such strong impact comes with many challenges that stem from the difficulty of understanding complex visual scenes under the tight computational constraints required by real-time solutions on embedded devices. The Embedded Vision Workshop will provide a venue for discussing these challenges by bringing together researchers and practitioners from the different fields outlined above. Such a topic is directly aligned with the topics of interest of the CVPR community.
Invited Speakers

Lars Ebbesson
Title of Talk: The role of AI-powered machine vision for sustainable aquaculture and food security: opportunities and challenges
Abstract: As the world population grows, more sustainable production of healthy food is essential. Food from the ocean, and aquaculture in particular, is targeted to meet this demand. Unfortunately, growth in aquaculture production in the last decades has led to sustainability challenges that threaten the continued growth in the sector, including environmental impacts, disease, climate risks and production costs. Realtime awareness of the production (fish biology, environment, and operations) is essential to mitigate these constraints and improve sustainability. In Norway, and globally, the industry has responded to these challenges through the development of e.g. new feed sources, increasing circularity and the development of new large closed production system. Due to the size, investment costs and often remote locations of these systems, an increasing demand to monitor the production with a wide range of sensors e.g. cameras, water quality sensors, hydroacoustic, to monitor the fish, rearing environments, and threats in order to maximize production efficiency and minimize risks. While monitoring and AI solutions are emerging, full leveraging of machine vision and AI advancements, as generative AI and transformers, that lift up machine vision for automating aquaculture processes and operations remain. This is essential to overcome the real-world challenges, whether in advanced production systems in Norway to simple systems in Africa. In this talk, I will present examples of some of the application areas for machine vision such as: a) fish behaviour characterization for efficient feeding, aggression, stress and disease detection, b) health threats through external imaging for wounds and parasites, c) management of fish developmental states, size distribution, biomass and maturation, d) feed spill, e) infrastructure monitoring and early warning, f) water characteristics (chemical, physical, biochemical) and microbiome. The detection and analysis of these parameters are complex and often challenging and interdependent, but ultimately will lead to important breakthroughs for reliable semi-automated/automated systems with reliable management alerts.
Biography: Dr. Lars Ebbesson is a chief scientist at NORCE with research over the last 30 years that has covered many basic and applied aspects of fish biology, integrating neuroscience, endocrinology, physiology and behavior to address how environment impacts salmon development, smoltification, stress, welfare, robustness, behavior and appetite. In 2006, he formed the Fish Neuroscience Network in Bergen. In 2009, he established the Integrative Fish Biology Group and in 2016 the Centre for Sustainable Aquaculture Innovations (CSAI) bringing basic biological knowledge into applications in industry. In 2015, together with Nofima, the SFI-CtrlAQUA was established where he led the Dept of Fish Production and Welfare until the end of 2018 where his focus turned towards the digitalization of fish biology to facilitate more sustainable aquaculture operational solutions. In 2018, the H2020 innovation action project iFishIENCi was funded where I am the Science and Technology Manager. The project will develop innovations that will provide reliable real-time environmental and biological data online. We work together with SME´s, technology providers and fish farmers to develop real-time systems such as Fish-Talk-To-Me, intelligent Biology Online Steering System (iBOSS), and SMARTRAS. A strategic focus and strength at NORCE is its ability to support the aquaculture industry in their digital transformation by providing competences from all departments, Environment, Technology, Climate, Health, Society and Energy. Further Digital Fish, the integration of biology and environment with technology to provide real-time understanding of fish physiology and behavior through sensors, cameras, and tracking systems will provide game-changing information on the biology of fish whether related to behavior and migration of wild fish or in aquaculture systems.

Mai Xu
Title of Talk: Perception-inspired video coding
Abstract: Recently, along with the explosion of multimedia content, visual communications have become increasingly prominent in communication networks, affecting the daily life of billions of citizens and millions of businesses in the world. The amount of data over networks is expected to grow almost 40-fold in the next five years. Given the limited spectrum, video applications have encountered the bandwidth-hungry bottleneck. The pioneering research on delivering the perceived content of human is relieving the bandwidth-hungry issue from the perspective of perceptual compression and coding, in which artificial intelligence (AI) techniques, such as computer vision and machine learning, have been actively studied.
In this talk, we mainly focus on perception-inspired video compression, which learns from human intelligence for significantly removing perceptual redundancy of vide data. Specifically, our talk first presents our works in data-driven saliency detection, which can be used to explore perceptual redundancy of video. Based on saliency detection, we then discuss our approaches on perception-inspired video compression for dramatically removing redundancy of video compression, such that both bit-rate and complexity can be significantly reduced without any degradation on quality of experience (QoE). Finally, we briefly introduce our latest works in panoramic video (also called 360-degree video) compression, which improves rate-distortion through predicting viewports of panoramic video.
Biography: Mai Xu is a full professor of School of Electronic Information Engineering, Beihang University. He is Changjiang Distinguished Professor, and Deputy Director of the Youth Scholar Committee of the Chinese Society of Image and Graphics. His research interests include video compression and image processing. In the past five years, more than 100 papershave been published in prestigious journals such as IJCV, IEEE TPAMI, TIP, JSAC, TMM, and famous conferences such as IEEE CVPR, ICCV, ECCV, ACM MM, AAAI, and DCC. Many papers were selected as ESI highly cited papers/highlight papers. He served as the leading guest editor of IEEE journal of selected topics in signal processing. He serves as the Associate Editor of IEEETIP andTMM. He received the outstanding AE award of IEEE TMM twice (Year: 2021 and 2022). As the PI, he is supported bymany projects, e.g., Excellent Young Scholar Fundingofe National Natural Science Foundation of China, and Distinguished Young Scholar Fundingofe National Natural Science Foundation of Beijing.

René Vidal
Title of Talk: Efficient Vision Transformer for Human Pose Estimation via Patch Selection
Abstract: While Convolutional Neural Networks (CNNs) have been widely successful in 2D human pose estimation, Vision Transformers (ViTs) have emerged as a promising alternative to CNNs, boosting state-of-the-art performance. However, the quadratic computational complexity of ViTs has limited their applicability for processing high-resolution images and long videos. In this work we propose three methods for reducing ViT’s computational complexity based on selecting and processing a small number of most informative patches while disregarding others. The first two methods leverage a lightweight pose estimation network to guide the patch selection process, while the third method utilizes a set of joint tokens to ensure that the selected patches contain the most important information about body joints. Experiments across six benchmarks show that our proposed methods achieve a significant reduction in computational complexity, ranging from 30% to 44%, with only a minimal drop in accuracy between 0% and 3.5%
Biography: René Vidal is the Penn Integrates Knowledge and Rachleff University Professor of Electrical and Systems Engineering & Radiology and the Director of the Center for Innovation in Data Engineering and Science (IDEAS) at the University of Pennsylvania. He also directs the NSF-Simons Collaboration on the Mathematical Foundations of Deep Learning and the NSF TRIPODS Institute on the Foundations of Graph and Deep Learning. He is also an Amazon Scholar, Affiliated Chief Scientist at NORCE, and Associate Editor in Chief of TPAMI. His current research focuses on the foundations of deep learning and trustworthy AI and its applications in computer vision and biomedical data science. He is an ACM Fellow, AIMBE Fellow, IEEE Fellow, IAPR Fellow and Sloan Fellow, and has received numerous awards for his work, including the IEEE Edward J. McCluskey Technical Achievement Award, D’Alembert Faculty Award, J.K. Aggarwal Prize, ONR Young Investigator Award, NSF CAREER Award as well as best paper awards in machine learning, computer vision, controls, and medical robotics.

Andreas Wendel
Title of Talk: Effective Fault Monitoring in Autonomous Systems
Abstract: Fault monitoring is an essential component of real-time, safety-critical systems such as autonomous vehicles. In complex AI pipelines that include sensors, perception software, and embedded vision hardware, many parts can fail undetected and unmitigated without the right monitoring. This talk gives an overview of Kodiak’s thorough fault monitoring and redundancy approach, system safety techniques, and areas where additional research is useful.
Biography: Andreas Wendel is the Chief Technology Officer at Kodiak. As one of Kodiak’s founding engineers, Andy has built the company’s engineering team from the ground up. Prior to joining Kodiak, Andy was the Perception Tech Lead at Waymo, where he was part of the small team that launched the first driverless car on public roads. Prior to joining Waymo, Andy earned a PhD from Graz University of Technology in Austria, where he headed the Aerial Vision Group and lectured at the Institute for Computer Graphics and Vision. Andy has received multiple recognitions for his outstanding work, including being named Austria’s Innovator of the Year.

Jose Alvarez
Title of Talk: Optimizing large deep models for real-time inference
Abstract: Hardware resources are very limited when deploying deep neural networks for real-time applications such as autonomous driving. In these cases, the overall goal is to maximize the accuracy of neural network models while achieving the memory and latency constraints required for inference. A common approach to achieve this is to search for optimal architectures, train them and then use light optimization to fit the model to the desired hardware. In this talk, we present a learn large, inference light paradigm, where we start with very large models to maximize accuracy and then compress them very aggressively to meet the hardware limitations. We will walk through the pipeline highlighting the main modules including lossless model compression, pruning and distillation among others. As we will see, this paradigm also applies to modern transformer architectures.
Biography: Jose Alvarez leads a perception for autonomous driving research team. The team focuses on scaling up resource-constraint deep learning for autonomous driving, spanning scene understanding, 3D computer vision, self-supervised learning, data-efficient algorithms, and the efficiency of end-to-end perception models.
Before NVIDIA, Jose Alvarez held research positions at TRI, NICTA / CSIRO (Australia), and a postdoctoral research position at NYU under Prof. Yann LeCun.

Huanrui Yang
Title of Talk: Exploring bit-level patterns for efficient NN quantization and deployment
Abstract: DNN models are stored and computed in “bits” on hardware devices, while there has been little research looking into the bit-level pattern of DNN models on the hardware. In this talk, we start our exploration of bit-level patterns in the weights of fixed-point quantized NN models. We link structural bit-level sparsity to the quantization precision of DNN models, where we propose bit-level training and regularization methods to dynamically train a DNN model into an efficient mixed-precision quantization format. We further explore the potential benefit of designing specific bit-level patterns in quantized DNN models for specialized hardware devices, where we use the emerging ReRAM-based accelerator as a case study.
Biography: Huanrui Yang is a Postdoctoral Researcher in the EECS department of UC Berkeley and Berkeley AI Research, supervised by Prof. Kurt Keutzer. His research interest lies in improving the efficiency and robustness of deep neural network models, with a particular focus on state-of-the-art model architectures (e.g. Transformers, Diffusion, etc.) and emerging hardware devices (e.g. ReRAM). Before joining Berkely, Huanrui obtained Ph.D. in the Department of Electrical and Computer Engineering at Duke University, under the supervision of Prof. Hai Li and Prof. Yiran Chen.

Branislav Kisačanin
Title of Talk: What new embedded challenges can we expect from AI?
Abstract: In this talk we will review some of the latest scientific discoveries using AI and infer where to expect great new applications of AI in the very near future.
Biography: Dr. Branislav Kisačanin (SM IEEE) is a Senior Architect at Nvidia and one of the founders of the Institute for AI R&D of Serbia. He received his BSEE degree at University of Novi Sad and a PhD in EECS from the University of Illinois. Branislav works at Nvidia on low-power computer perception for autonomous vehicles. He has published five scientific books, served as a guest editor of six special issues of top computer vision journals, and received nine US and EU patents. In his spare time he prepares high school students for math and physics olympiads through the AwesomeMath Academy and has published four books for math and physics competitors. On three occasions Branislav served as a judge at ISEF.
Important Dates
Paper submission: March 15, 2023March 9, 2023
Demo abstract submission: March 9, 2023 March 15, 2023
Notification to the authors: April 1, 2023
Camera ready paper:April 8, 2023
Please refer to Submission page for details.
CMT Submission website:
https://cmt3.research.microsoft.com/EVW2023
Topics
- Lightweight and efficient computer vision algorithms for embedded systems
- Hardware dedicated to embedded vision systems (GPUs, FPGAs, DSPs, etc.)
- Software platforms for embedded vision systems
- Neuromorphic computing
- Applications of embedded vision systems in general domains: UAVs (industrial, mobile and consumer), Advanced assistance systems and autonomous navigation frameworks, Augmented and Virtual Reality, Robotics.
- New trends and challenges in embedded visual processing
- Analysis of vision problems specific to embedded systems
- Analysis of embedded systems issues specific to computer vision
- Biologically-inspired vision and embedded systems
- Hardware and software enhancements that impact vision applications
- Performance metrics for evaluating embedded systems
- Hybrid embedded systems combining vision and other sensor modalities
- Embedded vision systems applied to new domains
Committee

Program Chair:
Branislav Kisacanin, NVIDIA (US) and Institute for AI R&D (Serbia)

Publication Chair:
Tse-Wei Chen, Canon Inc. (Japan)

General Chair:
Marius Leordeanu, University Politehnica Bucharest (Romania)

General Chair:
Ahmed Nabil Belbachir, NORCE Norwegian Research Centre (Norway)
Sponsors

BEST PAPER AWARD sponsored by Nvidia
Steering Committee:
Marilyn Claire Wolf, University of Nebraska-Lincoln
Martin Humenberger, NAVER LABS Europe
Roland Brockers, Jet Propulsion Laboratory
Swarup Medasani, MathWorks
Stefano Mattoccia, University of Bologna
Jagadeesh Sankaran, Nvidia
Goksel Dedeoglu, Perceptonic
Margrit Gelautz, Vienna University of Technology
Branislav Kisacanin, Nvidia
Sek Chai, Latent AI
Zoran Nikolic, Nvidia
Ravi Satzoda, Nauto
Stephan Weiss, University of Klagenfurt
Program Committee:
Alina Marcu, University Politehnica of Bucharest
Antonio Haro, eBay
Arun Visweswaraiah, Nvidia
Branislav Kisacanin, Nvidia
Burak Ozer, Verificon Corporation
Dongchao Wen, Inspur Electronic Information Industry Co., Ltd.
Faycal Bensaali, Qatar University
Florin Condrea, Institute of Mathematics of the Romanian Academy
Linda Wills, Georgia Institute of Technology
Martin Kampel, Vienna University of Technology, Computer Vision Lab
Matteo Poggi, University of Bologna
Mihai Cristian Pîrvu, University Politehnica of Bucharest
