The 7th IEEE Workshop on Embedded Computer Vision
June 20, 2011
Crowne Plaza – Colorado Ballroom C
NVIDIA is sponsoring a Tutorial at ECVW 2011, entitled Introduction to Mobile Computer Vision Development with NVIDIA Tegra (Joe Stam).
Registration for Lunch boxes is closed.
Final Program: (pdf)
(Ahmed Nabil Belbachir, general chair)
S1: Keynote Session 1 (08h15 — 09h00)
Keynote: What’s Next in Embedded Vision: Today and Future Technologies,
Mike Piacentino (SRI International Sarnoff, USA)
chair: Ahmed Nabil Belbachir (AIT Austrian Institute of Technology)
This talk explores developing real-time embedded solutions adaptable to a wide range of application markets. This presents many challenges – from providing sufficient flexible processing resources to low power and small sizes. This talk discusses future applications such UAV, man vehicles, robotics, wearable and handheld platforms. I will present Acadia II, a highly integration, power efficient processor for embedded vision. Central to its capabilities is its multi-resolution (pyramid) representation for key functions such as stabilization, fusion, and contrast normalization. I show a novel programming model for the hardware accelerators on Acadia II to create a flexible processing pipeline. I close with our efforts for next generation vision processor, and its impact on future applications.
S2: Embedded Stereo Vision (09h00 — 10h20)
chair: Csaba Beleznai (AIT Austrian Institute of Technology)
09h00 Near real-time Fast Bilateral Stereo on the GPU, Stefano Mattoccia, Marco Viti, Florian Ries (University of Bologna, Italy)
09h20 Stereo and IMU Assisted Visual Odometry on an OMAP3530 for Small Robots, Steven Goldberg, Larry Matthies (Indelible Systems, USA)
09h40 An optimized Silicon Retina Stereo Matching Algorithm using Time-Space Correlation, Christoph Sulzbachner, Christian Zinner, Juergen Kogler (AIT Austrian Institute of Technology)
10h00 Event-Driven Stereo Vision for Fall Detection, Ahmed Nabil Belbachir, Stephan Schraml, Aneta Nowakowska (AIT Austrian Institute of Technology)
10h20 Morning Break
S3: Mobile Computer Vision (10h30 — 11h10)
chair: Sek Chai (SRI International Sarnoff)
10h40 Low-Power and Efficient Ambient Assistive Care System for Elders, Kofi Appiah, Andrew Hunter, Chrisopher Waltham (University of Lincoln, UK)
11h00 Energy-efficient Foreground Object Detection on Embedded Smart Cameras by Hardware-level Operations, Mauricio Casares, Paolo Santinelli, Senem Velipasalar, Andrea Patri, Rita Cucchiara (University of Nebraska, USA)
11h20 Rapid Reconstruction of Small Objects on Mobile Phones, Andreas Hartl, Lukas Gruber, Clemens Arth, Stefan Hauswlesner, Dieter Schmalstieg Graz Technical University, Austria)
11h40 Fast Block Based Local Motion Estimation for Video Stabilization, Giovanni Puglisi, Sebastiano Battiato (Università di Catania, Italy)
12h00 Ego-Motion Compensated Face Detection on a Mobile Device, Bjorn Scheuermann, Arne Ehlers, Hamon Riazy, Florian Baumann, Bodo Rosenhahn (Leibniz University, Germany)
12h20 Lunch Box Distribution
S4: Tutorial Session (12h30 — 14h00)
12h30 Tutorial: Introduction to Mobile Computer Vision Development with NVIDIA Tegra, Joe Stam, (NVIDIA, USA)
Chair: Ahmed Nabil Belbachir (AIT Austrian Institute of Technology)
This tutorial provide an overview of the Tegra processor, the Android OS and development environment and the NVIDIA hardware development kit and software libraries relevant to computer vision along with demos. The tutorial starts fresh out-of-the-box for those with no prior mobile development experience. By the end of the session, attendees should be ready to start investigating their own mobile vision applications.
S5: Poster Session: Applications (14h00 – 15h00)
chair: Sek Chai (SRI International Sarnoff)
Real-Time License Plate Localisation on FPGA, Xiaojun Zhai, Faycal Bensaali, Soodamani Ramalingam (University of Hertfordshire, UK)
A Real-Time Embedded Solution for Skew Correction in Banknote Analysis, Adnan Rashid, Andrea Prati, Rita Cucchiara (University of Modena, Italy)
Energy-Optimized Mapping of Application to Smartphone Platform – A Case Study of Mobile Face Recognition, Yi-Chu Wang, Kwang-Ting Cheng (University of California, USA)
Robust Airlight Estimation for Haze Removal from a Single Image, Matteo Pedone, Janne Heikkilä (University of Oulu, Finland)
Embedded neuromorphic vision for humanoid robots, Chiara Bartolozzi, Francesco Rea, Michael Hofstaetter, Daniel B. Fasnacht, Charles Clercq, Giorgio Metta, Giacomo Indiveri (Italian Institute of Technology)
Photorealistic 3D Face Modeling on a Smartphone, Won Beom Lee, Man Hee Lee, In Kyu Park (Inha University, Republic of Korea)
Embedded Vision Alliance,Jeff Bier, Jeremy Giddings, Shehrzad Qureshi (BDTI, USA)
S6: Poster Session: Technologies (14h00 – 15h00)
Chair: Brian Lovell (University of Queensland, Australia)
An Optimized Vision Library Approach for Embedded Systems, Goksel Dedeoglu, Branislav Kisacanin, Darnell Moore, Vinay Sharma, Andrew Miller (Texas Instruments Inc.)
A Motion based Real-time Foveation Control Loop for Rapid and Relevant 3D Laser Scanning, Gøril M. Breivik, Jens T. Thielemann, Asbjørn Berge, Øystein Skotheim, Trine Kirkhus (SINTEF, Norway)
Fast Boosting Trees for Classification, Pose Detection, and Boundary Detection on a GPU, Neil Birkbeck, Michal Sofka, S. Kevin Zhou (University of Alberta, Canada)
Acceleration of an Improved Retinex Algorithm, Yuan-Kai Wang, Wen-Bin Huang (Taiwan)
FPGA Implementation of Naive Bayes Classifier for Visual Object Recognition, Hongying Meng, Kofi Appiah, Andrew Hunter, Patrick Dickinson (University of Lincoln, UK)
Efficient reconfigurable entropy coder for embedded multi-standards video adaptation, Nicolas Marques, Hassan Rabah, Eric Dabellani, Serge Weber (Nancy University)
Implementation and evaluation of FAST corner detection on the massively parallel embedded processor MX-G, Yushi Moko, Yoshihiro Watanabe, Takashi Komuro, Masatoshi Ishikawa, Masami Nakajima, Kazutami Arimoto (University of Tokyo, Japan)
15h00 Afternoon Break
S7: Invited Talks: Hardware Adaptation (15h20 — 17h05)
Chair: Rita Cucchiara (University of Nebraska, USA)
15h20 NeuFlow: A Runtime Reconfigurable Dataflow Processor for Vision, Yann LeCun (talk), Clément Farabet (demo), (New York University, USA) [Co-authors: Berin Martini, Benoit Corda, Polina, Akselrod, Eugenio Culurciello]
15h55 Adapting algorithms for hardware implementation, Donald Bailey (Massey University, New Zealand)
16h30 Accelerating Neuromorphic Vision on FPGAs, Vijaykrishnan Narayanan (Pennslyvania State University, USA) [Co-authors: Sungho Park, Srinidhi Kestur, Kevin Irick]
S8: Invited Talks: Surveillance (17h05 — 18h15)
Chair: Ahmed Nabil Belbachir (AIT Austrian Institute of Technology)
17h05 Embedded Face and Biometric Technologies for National and Border Security, Brian Lovell (University of Queensland, Australia) [Co-authors: Abbas Bigdeli, Sandra Mau]
17h40 Pedestrian Detection using GPU-accelerated Multiple Cue Computation, Csaba Beleznai, (AIT Austrian Institute of Technology) [Co-authors: David Schreiber, Michael Rauter]
17h45 Paper Award & Closing Remarks
Original Call for Papers
Submissions are now closed.
Ahmed Nabil Belbachir, AIT Austrian Institute of Technology
Abbes Amira, University of Ulster
Andrew Hunter, University of Lincoln, UK
Nikolaos Bellas, University of Thessaly, Greece
Sek Chai, SRI International Sarnoff
Branislav Kisačanin, Texas Instruments
Boaz J. Super, Motorola Solutions
Kristian Ambrosch, AIT Austrian Institute of Technology
Senyo Apewokin, Texas Instrument
Koffi Appiah, University of Lincoln, UK
Sebastiano Battiato, Universita di Catania
Faycal Bensaali, University of Hertsfordshire, UK
Shuvra Bhattacharyya, University of Maryland
Rita Cucchiara, University of Modena e Reggio Emilia, Italy
Goksel Dedeoglu, Texas Instruments
Khanh Duc, Nvidia
Antonio Gentile, University of Palermo, Italy
Antonio Haro, Nokia Research Center
Rongrong Ji, Harbin Institute of Technology
Kevin Koeser, ETH Zurich
Ajay Kumar, IIT Delhi, India
Abelardo Lopez-Lagunas, ITESM-Toluca, Mexico
Jiebo Luo, Kodak
Roberto Manduchi, University of California, Santa Cruz
Steve Mann, U. Toronto, Canada
Hongying Meng, University of Lincoln
Vitorrio Murino, University of Verona
Rajesh Narashimha, Texas Instruments
Burak Ozer, Verificon Corporation
Johnny Park, Purdue University
Hassan Rabah, Nancy University
Bernhard Rinner, Klagenfurt University of Austria
Vinay Sharma, Texas Instruments
Azhar Sufi, Sarnoff Corporation / SRI International
Peter Venetianer, Object Video
Salvatore Vitabile, University of Palermo, Italy
Linda Wills, Georgia Institute of Technology
Marilyn Wolf, Georgia Institute of Technology
Feng Xiao, Fairchild Imaging
Ruiyang Yang, University of Kentucky, USA
Ming Yang, NEC Labs
What’s Next in Embedded Vision: Today and Future Technologies
Michael Piacentino (SRI International Sarnoff)
Summary of the Talk:
Developing real-time embedded solutions adaptable to a wide range of application markets presents many challenges that range from providing sufficient and flexible processing resources to meeting the demands of low power and small sizes. In this talk, I will discuss key future market applications such UAV, man vehicles, robotics, wearable, and handheld platforms. I will present Acadia II, a highly integration, power efficient processor for embedded vision. Central to its capabilities is its multi-resolution (pyramid) representation for key functions such as stabilization, fusion, and contrast normalization. I will also show a novel programming model for the hardware accelerators on Acadia II to create a flexible processing pipeline. I will close with our efforts for next generation vision processor, and its impact on future applications.
Michael Piacentino, Technical Director Vision System organization at SRI International, received his B.S. from Manhattan College in EE and his M.S.E.E. in 1995, from California Polytechnic University; Pomona, CA. His specialized professional competence includes hardware system and chip design for video processing algorithms, with emphasis on state of the art, low power ASIC’s and electronics systems, video processing design and algorithm implementation. For the past 7 years Mike has been the principal investigator on numerous vehicle and soldier based day/night situational awareness programs. Each of these programs required unique vision processing architectures to meet the client objectives. Prior to joining Sarnoff he was a lead designer at General Dynamics, Hughes Missile Systems and Raytheon Corporation from 1986 to 1995. He has authored or co-authored more than a dozen peer-reviewed technical papers and holds five US patents relating to vision system hardware.
Title: Convolutional Networks for Low-Power, Real-Time Vision.
Presenter: Professor Yann LeCun and Clément Farabet
Summary of the Talk:
Micro-robots, UAVs, imaging sensor networks, wireless phones, and other embedded vision systems all require low cost and high-speed implementations of vision systems capable of recognizing and categorizing objects in a scene. In this talk, we present the latest advances in convolutional networks, their applications, the theory behind them, and how they can help solving some of the current vision problems. We also present a scalable hardware architecture to implement general-purpose vision systems based on convolutional networks. The system is a dataflow processor highly-optimized for local image-processing and vision tasks, yet completely runtime reprogrammable. It was designed with the goal of providing a high-throughput vision engine, while consuming little power (10W for an FPGA implementation, 5W and down to 0.5W for different ASIC implementations). We present performance comparisons between software versions of the vision system executing on CPU and GPU machines, and show that our FPGA implementation can outperform these standard computing platforms. A number of applications will be shown through live demos, including a category-level object recognition system that can be trained online, and a complete street scene parser that segments and categorizes videos in several classes (cars, buildings, road, sky, pedestrians, doors, …)
Yann LeCun is Silver Professor of Computer Science and Neural Science at the Courant Institute of Mathematical Sciences and at the Center for Neural Science of New York University. He received an Electrical Engineer Diploma from Ecole Supérieure d’Ingénieurs en Electrotechnique et Electronique (ESIEE), Paris in 1983, and a PhD in Computer Science from Université Pierre et Marie Curie (Paris) in 1987. After a postdoc at the University of Toronto, he joined AT&T Bell Laboratories in Holmdel, NJ in 1988. He became head of the Image Processing Research Department at AT&T Labs-Research in 1996, and joined NYU in 2003, after a brief period as Fellow of the NEC Research Institute in Princeton. His current interests include machine learning, computer vision, pattern recognition, mobile robotics, and computational neuroscience. He has published over 150 technical papers on these topics as well as on neural networks, handwriting recognition, image processing and compression, and VLSI design. His handwriting recognition technology is used by several banks around the world to read checks. His image compression technology, called DjVu, is used by hundreds of web sites and publishers and millions of users to distribute and access scanned documents on the Web, and his image recognition technique, called Convolutional Network, has been deployed by companies such as Google, Microsoft, NEC, France Telecom and several startup companies for document recognition, human-computer interaction, image indexing, and video analytics. He has been on the editorial board of IJCV, IEEE PAMI, IEEE Trans on Neural Networks, was program chair of CVPR’06, and is chair of the annual Learning> Workshop. He is on the science advisory board of Institute for Pure and Applied Mathematics, and is the co-founder of MuseAmi, a music technology company.
Clement Farabet received a Master’s Degree in Electrical Engineering with honors from Institut National des Sciences Appliquées (INSA) de Lyon, France in 2008. His Master’s thesis work was developed at the Courant Institute of Mathematical Sciences of New York University with Professor Yann LeCun. He then joined Professor Yann LeCun’s laboratory in 2008, as a research scientist. In 2009, he started collaborating with Yale University’s e-Lab, led by Professor Eugenio Culurciello. In 2010, he started the PhD program at Universite Paris-Est, with Professors Michel Couprie and Laurent Najman, in parallel with his research work at Yale and NYU. His research interests include intelligent hardware, embedded super-computers, computer vision, machine learning, embedded robotics, sensor fusion, and more broadly artificial intelligence. His current work aims at developing a massively-parallel yet low-power processor for general-purpose vision. Algorithmically, most of this work is based on Prof Yann LeCun’s Convolutional Networks, while the hardware has its roots in dataflow computers and architectures as they first appeared in the 1960s.
Title: Selected Computer Vision Applications using Embedded and Parallel Processing
Presenter: Dr. Csaba Beleznai
Summary of the Talk:
In this talk I will present selected application examples relying on embedded and parallel processing. The presented examples target challenging vision problems such as video analytics, industrial quality inspection in dynamic environments and detection and recognition of deformable objects in presence of clutter. The talk will provide insights into the employed algorithmic concepts, present relevant embedded/parallel implementation details and demonstrate numerous results of the achieved real-time vision systems.
Csaba Beleznai received his M.S. degree from the Technical University of Ilmenau (Germany) in electrical engineering in 1994. He received his Ph.D. degree in physics from the Claude Bernard University, Lyon (France) in 1999. C. Beleznai joined the research center “Advanced Computer Vision” in 2000 and coordinated research activities in the area of “Surveillance and tracking”. Currently he is Senior Scientist at the Austrian Institute of Technology (AIT), where he scientifically coordinates the applied research center Embedded Computer Vision (ECV), which is a task-oriented research project aiming at the joint development of computer vision algorithms and embedded hardware concepts for applications in demanding industrial context. His research interests include visual surveillance, structural models and statistical methods in computer vision.
Title: Introduction to Mobile Computer Vision Development with NVIDIA Tegra
Presenter: Joe Stam, NVIDIA
Summary of the Talk:
The obvious explosion both in the sales volume and computational power of mobile devices ushers in an exciting era for computer vision and imaging applications. Approximately 1 billion camera phones will be sold this year, with some containing multiple imagers. NVIDIA Tegra Application Processors contain powerful multi-core ARM CPUs and GPUs which provide sufficient horsepower to enable many applications previously only conceivable on desktop machines. Developing for mobile devices can seem intimidating at first. A new unfamiliar tool-chain, new operating system, and new hardware architectures present a learning curve to those only familiar with desktop development. This tutorial seeks to ease this transition, and open up the exciting prospects of mobile computing to all computer vision developers. We’ll provide an overview of the Tegra processor, the Android OS, the Android development environment and the NVIDIA hardware development kit and software libraries relevant to computer vision along with some exciting demos of Tegra in action. The tutorial will start fresh out-of-the-box to be accessible to those with no prior mobile development experience. By the end of the session, the attendee should be ready to start investigating mobile vision applications of their own.
Joseph Stam joined NVIDIA in 2007 and is currently spearheading efforts for mobile computer vision applications using NVIDIA’s Tegra processors. He’s also worked on imaging applications in the automotive and professional film and broadcast markets. Prior to joining NVIDIA, Joe worked in the automotive industry for 12 years on research and development of imaging hardware and computer vision algorithms for vehicle based vision products. Joe received a B.S. degree in Engineering Physics & Computer Science from Hope College in Holland, Michigan and an M.S. degree in Electrical Engineering from Michigan State University. He is an inventor on over 80 U.S. patents and several foreign patents, many of which relate to computer vision software and imaging hardware technologies.
Title: Adapting algorithms for hardware implementation
Presenter: Donald Bailey
Summary of the Talk:
Embedded vision often requires balancing the computation and power requirements. Hardware implementation of the vision algorithm using an FPGA enables parallelism to be exploited, allowing clock speeds to be significantly reduced. However, simply porting software algorithms usually gives disappointing performance. Software algorithms are usually optimised for serial implementation. An efficient FPGA implementation requires transforming the algorithm to make better use of parallelism. Several transformations are illustrated using connected components analysis.
Donald Bailey is currently an Associate Professor in the School of Engineering and Advanced Technology at Massey University, in Palmerston North, New Zealand. He received the B.E. (Hons) degree in Electrical Engineering in 1982, and the PhD degree in Electrical and Electronic Engineering from the University of Canterbury, New Zealand in 1985. From 1985 to 1987, he applied image analysis to the wool and paper industries within New Zealand. From 1987 to 1989 he was a Visiting Research Engineer at University of California at Santa Barbara. He joined Massey University in Palmerston North, New Zealand as Director of the Image Analysis Unit at the end of 1989. He is leader of the Image and Signal Processing Research Group. His primary research interests include applications of image analysis, machine vision, and robot vision. One area of particular interest is the application of FPGAs to implementing image processing algorithms. He has published over 200 papers, and is the author of the book “Design for Embedded Vision Using FPGAs”.
Title: Embedded Face and Biometric Technologies for National and Border Security
Presenter: Brian C. Lovell
Summary of the Talk:
The CCTV surveillance industry is undergoing a sea change due to the adoption of IP technologies. This is allowing the integration of a plethora of new cameras and other sensors into huge integrated networks. Adoption of IP technologies is presenting opportunities for scalable visual analytics that add enormous value to entire camera networks. One such technology is scalable robust face search to identify persons of interest in large crowds. Not only are such systems required to work robustly in a wide variety of conditions, they must also be extremely fast and scalable to hundreds, if not thousands, of high definition camera nodes. Developing and testing such technology is challenging and requires a combination of fast algorithms, distributed databases, mobile platform integration, parallel processing using distributed middleware such as ROS, and GPU acceleration using tools such as CUDA and OpenCL. Moreover, an order of magnitude speedups is possible when the core recognition algorithms are implemented with FPGAs. In this talk we cover emerging system trends such as super-megapixel cameras, post incident digital PTZ, integration and fusion of video and non-video sensors, multimodal remote biometrics including face and iris on the move. Current projects being trialled with airports, port, rail, and local councils will be outlined in brief. Finally, the recognition results from a formal face recognition trial in early 2011 within one of Asia’s largest International airports will be presented.
Brian C. Lovell was born in Brisbane, Australia in 1960. He received the BE in electrical engineering (Honours I) in 1982, the BSc in computer science in 1983, and the PhD in signal processing in 1991: all from the University of Queensland (UQ). Professor Lovell is Project Leader of the Advanced Surveillance Group in NICTA and Research Leader of the Security and Surveillance Group in the School of ITEE, UQ. He served as President of the International Association of Pattern Recognition 2008-2010, and is a Senior Member of the IEEE, Fellow of the IEAust, and voting member for Australia on the Governing Board of the International Association for Pattern Recognition since 1998. Professor Lovell was Program Co-Chair of ICPR2008 in Tampa, Florida, and is General Co-Chair of ACPR2011 in Beijing, and ICIP2013 in Melbourne. The Advanced Surveillance Group works with port, rail, and airport organizations as well as several national and international agencies to identify and develop technology-based solutions to address real operational and security concerns.
Title: Accelerating Neuromorphic Vision on FPGAs
Presenter: Vijay Narayanan
Summary of the Talk:
Authors: Sungho Park, Srinidhi Kestur, Kevin Irick, N. Vijaykrishnan
Reconfigurable hardware such as FPGAs are being increasingly employed for application acceleration due to their high degree of parallelism, flexibility and power efficiency – factors which are key in the rapidly evolving field of embedded real-time vision. While recent advances in technology have increased the capacity of FPGAs, lack of standard models for developing custom accelerators creates issues with scalability and compatibility. In this paper, we describe a model for designing Streaming Hardware Accelerators with Run-time Configurability. This model provides a generic interface for each hardware module, a modular and hierarchical structure for parallelism at multiple levels and a run-time reconfiguration framework for increased flexibility.
We present case studies to accelerate sample neuromorphic vision algorithms which are inspired by models of the mammalian visual cortex.These algorithms are extremely compute-intensive and this complexity has hindered adoption into real-time applications. In this work, we describe a modular bottom-up approach which includes building a hardware library for vision and then integrating highly parallel and reconfigurable pipelines using the proposed streaming model. Our implementations can be tuned to trade-off performance for power and resource utilization, and also allow quick modifications to the algorithm to support design-space exploration. Experimental results show speedups of several factors over comparable CPU implementations and higher performance-per-watt over relevant GPU implementations.
Vijaykrishnan Narayanan is a Professor of Computer Science and Engineering at The Pennsylvania State University. His interests are in the areas of power-aware and reliable systems, embedded systems, reconfigurable architectures and computer architecture. He has published more than 300 papers in these areas and supervised more than 60 graduate students. He is the Deputy Editor-in-Chief of the IEEE Transactions on CAD and served as the founding co-editor-in-chief of the ACM Journal of Emerging Technologies in Computing Systems. He is the recipient of several honors including the 2000 ACM SIGDA Outstanding New faculty Award, 2002 IEEE CAS TVLSI Best Paper Award, 2006 PSES Outstanding Research Award, 2010 Distinguished Alumnus Award from SVCE and the 2011 IEEE Fellow. He currently heads a sponsored research project on designing biologically inspired vision systems on reconfigurable hardware.