机器之心走近全球顶尖实验室系列：苏黎世联邦理工CV Lab

全球顶尖CV团队在关注什么？这一次，机器之心知识站走近苏黎世联邦理工学院的计算机视觉实验室（ETHZ CVL），研究组成员将连续4天带来4场技术直播，赶紧点击「阅读原文」关注吧！

苏黎世联邦理工学院计算机视觉实验室由计算机视觉领域著名学者Luc Van Gool, 以及医疗影像教授Ender Konukoglu和计算机视觉及系统教授Fisher Yu的研究组组成，是欧洲乃至世界最顶尖的CV/ML研究机构之一。ETHZ CVL关注信号从采集、分析到处理的全流程，旨在开发通用的概念和方法，研究领域包括视觉场景理解、医学图像分析、机器人、具身智能、高效神经网络与计算等。

CVL对标国际研究前沿，以实际应用驱动研究，并将与工业界的密切合作视为重要优势。研究成果发表于计算机视觉（CVPR、 ECCV、ICCV）、机器学习（NeurIPS、ICML、ICLR）、人工智能（AAAI）、机器人（ICRA）、医学图像（MICCAI）等领域的顶级会议。其中，Luc Van Gool教授提出的SURF算子是计算机视觉领域的经典算法，谷歌学术引用超过20000次。CVL组织的「PASCAL VOC挑战赛」对学术界和工业界产生深远影响。

CVL毕业成员广泛活跃于工业界与学术界。在工业界，实验室成员分布在Google、Facebook、Apple等知名企业，也有部分成员成功创业。在学术界，实验室成员执教于马普所、波恩大学、新加坡国立大学、悉尼大学、南京大学等国内外高校。

CVL由图像通信与理解（Image Communication and Understanding，ICU），生物医学图像计算（Biomedical Image Computing，BMIC），视觉智能和系统（Visual Intelligence and Systems，VIS），计算机辅助医学应用（Computer-assisted Applications in Medicine，CAiM）共4个研究小组构成。其中：

图像通信与理解（ICU）小组由Luc Van Gool教授领导，Dengxin Dai、Radu Timofte、Martin Danelljan等三位讲师，分别在自动驾驶、图像处理、目标跟踪等领域展开研究。此外，ICU小组的成员对高效网络设计与计算、大规模3D场景理解、图像和视频解析以及3D重构等方向有着广泛的兴趣。相关主页：https://icu.ee.ethz.ch/
生物医学图像计算（BMIC）小组由Ender Konukoglu教授领导。面向生物医学领域的前沿挑战，研发理论合理且实际可行的前沿技术解决方案。相关主页：https://bmic.ee.ethz.ch/
视觉智能和系统（VIS）小组由Fisher Yu教授领导。借助图像处理、机器学习、统计学、深度学习和控制理论，研究可以在实际环境中执行复杂任务的感知机器人系统。相关主页：https://cv.ethz.ch/
计算机辅助医学应用（CAiM）小组由Orçun Göksel教授领导。CAiM关注医疗数据分析和信息提取，研究基础涉及多个交叉学科，包括工程学、计算机科学和医学。小组成员具备跨学科背景，力求研发前沿的医学成像和图像分析技术，并应用于临床实践。相关主页：https://caim.ee.ethz.ch/group.html

6月7日至10日，机器之心特别邀请到ETHZ CVL的4位研究员分享团队最新进展，具体安排如下：

直播地址：https://jmq.h5.xeknow.com/s/30jrSR（点击阅读原文直达）

6月7日 20:00-21:00

分享主题：New Opportunities in Monocular 2D and 3D Object Tracking

分享摘要：Object tracking is foundational for video analysis and a key component for perceptionin autonomous systems, such as self-driving cars and drones. Due to its importance, tracking has been studied extensively in the literature. However, the availability of large-scale driving video data brings new research opportunities in the field.

In this talk, I will discuss our recent findings in multiple object tracking (MOT), after briefly reviewing the current works and trends on the topic. Then, I will introduce our new tracking method based on Quasi-Dense Similarity Learning. Our method isconceptually more straight forward yet more effective than the previous works. It boosts almost ten percent of accuracy on the large-scale BDD100K and WaymoMOT datasets.

I will also talk about how to use the 2D tracking method for monocular 3D object tracking. Our quasi-dense 3D tracking pipeline achieves impressive improvements on the nuScenes 3D tracking benchmark with five times tracking accuracy of the popular tracking methods. Our works point to some interesting directions in MOT research and technology in the era of ubiquitous videos.

分享嘉宾：Fisher Yu is an Assistant Professor at ETH Zürich in Switzerland. He obtained his Ph.D. degree from Princeton University and became a postdoctoral researcher at UC Berkeley. He is now leading the Visual Intelligence and Systems (VIS) group at ETH Zürich. Hisgoal is to build perceptual systems capable of performing complex tasks incomplex environments. His research is at the junction of machine learning, computer vision and robotics. He currently works onclosing the loop between vision and action. His works on imagerepresentation learning and large-scale datasets, especially dilated convolutions and the BDD100K dataset, have become essential parts of computer vision research. More info is on his website: https://www.yf.io

6月8日 20:00-21:00

分享主题：Scaling perception algorithms to new domains and tasks

分享摘要：In this talk, I will mainly present our recent methods for semi-supervised and domain-adaptive semantic image segmentation by using self-supervised depth estimation. In particular, we propose three key contributions:

(1) We transfer knowledge from features learned duringself-supervised depth estimation to semantic segmentation;

(2) We propose astrong data augmentation method DepthMix by blending images and labels while respecting the geometry of the scene;

(3) We utilize the depth feature diversity as well as the level of difficulty of learning depth to select the most useful samples to collect human labels in the semi-supervised setting and to generate pseudo-labels in the domain adaptation setting.

Our methods have achieved state-of-the-art results for semi-supervised and domain-adaptive semantic image segmentation. The codes are also made available. During the talk, I will also present our ACDC dataset. ACDC is a new large-scale driving dataset for training and testing semantic image segmentation algorithms on adverse visual conditions, such as fog, nighttime, rain, and snow. The datasetand associated benchmarks are made publicly available.

分享嘉宾：Dengxin Dai is a Senior Scientist and Lecturer working with the Computer Vision Lab at ETH Zurich. He leads the research group TRACE-Zurichworking on Autonomous Driving in cooperation with Toyota. In 2016, he obtained his PhD in Computer Vision at ETH Zurich. He is the organizer of the workshop series (CVPR'19-21) "Vision for All Seasons: Bad Weather and Nighttime",the ICCV'19 workshop "Autonomous Driving", and the ICCV'21 workshop "DeepMTL:Multi-Task Learning in Computer Vision". He was a Guest Editor for the IJCV special issue "Vision for All Seasons", an Area Chair for WACV2020, and an Area Chair for CVPR 2021. His research interests lie in Autonomous Driving, Robust Perception Algorithms, Lifelong Learning, Multi-task Learning, and Multimodal Learning.

6月9日 19:00-20:00

分享主题：Deep Visual Perception in a Structured World

分享摘要：The world we live inis highly structured. Things that are semantically related are typically presented in a similar way: both trucks and buses have wheels and cabins, for example. Things also undergo continuous variations over time; in a video clip, content among frames are highly correlated. Our humans also interact with the environment constantly and communicate with each other frequently. Overall, there exist rich structures between human and environment and over both spaceand time.

Therefore, it is highly needed to understand this visual world from a structured view. In this talk, I will advocate the value of structured information in intelligent visual perception. As examples, I will present a line of my recent work on semantic segmentation, human semantic parsing and fake video detection.

分享嘉宾：Dr. Wenguan Wang is currently a postdoctoral scholar in ETH Zurich, working with Prof. Luc VanGool. From 2018-2019, he was a Senior Scientist in Inception Institute of Artificial Intelligence, UAE. From 2016-2018, he was a Research Assistant (VisitingPh.D.) in The University of California, Los Angeles, under the supervision of Prof. Song-Chun Zhu. His current research interests are in the areas of Image/Video Segmentation, Human-Centric Visual Understanding, Gaze BehaviorAnalysis, Embodied AI, and 3D Object Detection. He has published over 50 journal and conference papers such as TPAMI, CVPR, ICCV, ECCV, AAAI, and Siggraph Asia.

6月10日 19:00-20:00

分享主题：Tiny AI for computer vision

分享摘要：In recent year, deep neural networks has boosted the performances of computer vision algorithms for the task of visual recognition, object detection, semantic segmentation etc. Yet, it often comes with the increase of the model complexity in terms of number of parameters and computational cost. This increases the energy consumption, latency, andtransmission of pre-trained models and becomes a huge challenge for the deployment of deep neural networks on edge devices. Thus, network compression and model acceleration becomes a feasible solution to this problem.

In this talk, we will report the recent development and thought of model acceleration in the Computer Vision Lab. We introduce our work on model acceleration and network architecture optimization from the perspective of learning filter basis, group sparsity, differentiable network pruning, and the advantage of heterogenous network architectures.

分享嘉宾：Yawei Li is currently a Ph.D student at Computer Vision Lab supervised by Prof. Luc Van Gool. His research direction is efficient computation in computer vision. He is interested in efficient neural network design and image restoration. For efficient computation, he has been exploring in the direction of neural network compression and model acceleration, graph neural networks, and vision transformers. He has published papers in top-tier computer vision conferences including CVPR, ECCV, and ICCV.

欢迎加入直播交流群

直播地址：https://jmq.h5.xeknow.com/s/30jrSR（点击阅读原文直达）

扫码入群：针对本次分享主题，欢迎大家进群一起交流。

如群已超出人数限制，请添加其他小助手：syncedai2、syncedai3、syncedai4 或 syncedai5，备注「cv lab」即可加入。

地址：https://jmq.h5.xeknow.com/s/30jrSR

技术分析人工智能应用计算机视觉苏黎世联邦理工学院