8:30 – 9:00

Chairs’ Welcome

9:00 – 9:50

Prof. William Robson Schwartz

Scene Monitoring using Active Cameras


Computer vision and machine learning techniques applied to video surveillance and biometrics have been investigated for several years aiming at finding accurate and efficient solutions to allow the execution of smart surveillance systems in real environments. Aiming at the analysis of the scene to recognize and understand suspicious activities performed by humans in the scene, video surveillance and biometrics still face several challenges. A particular challenge is the person identification due to the poor data acquisition condition and the large distance between the cameras and the persons in the scene. In this talk, I will discuss approaches designed to overcome the difficulties faced by person identification in surveillance scenarios covered by a camera network, especially when PTZ cameras are available to gather higher quality information from the monitored scene.


William Robson Schwartz is an Associate Professor in the Department of Computer Science at the Federal University of Minas Gerais, Brazil. He is recipient of the CNPq Productivity Fellowship since 2013 and the Minas State researcher since 2015. He received his BSc and MSc degrees in Computer Science from the Federal University of Parana, Curitiba, Brazil in 2003 and 2005, respectively. He received his PhD degree in Computer Science from the University of Maryland, College Park, USA in 2010, with a CAPES/Fulbright scholarship. Then, he spent one year in the Institute of Computing at the University of Campinas as a Postdoctoral researcher. His research interests include Computer Vision and Machine Learning applied to Video Surveillance, Computer Forensics and Biometrics. He is also the head of the Smart Sense Laboratory[1], which focuses mainly on large-scale surveillance based on visual and sensor data. In addition, he advises several MSc and PhD students and he has worked as the principal investigator in projects sponsored by public agencies such as CAPES, CNPq and FAPEMIG, and by companies such as Petrobras, Samsung and Hewlett-Packard.

[1] http://www.sense.dcc.ufmg.br

10:00 – 10:50

Special Session 3: Computer Vision for Automatic Human Health Monitoring

Apathy Classification by Exploiting Task Relatedness

S L Happy (INRIA Sophia Antipolis – Méditerranée research center)*; Antitza  Dantcheva (INRIA); ABHIJIT DAS (ISI); Francois Bremond (Inria Sophia Antipolis, France); Radia Zeghari (Cobtek); Philippe  Robert (CobTek)

Using a Skeleton Gait Energy Image for Pathological Gait Classification

João Firmino (Instituto Superior Técnico – Universidade de Lisboa); Paulo L Correia (Instituto de Telecomunicacoes / Instituto Superior Técnico – Universidade de Lisboa)*

Video2Report: A Video Database for Automatic Reporting of Medical Consultancy Sessions

Laura Schiphorst (Utrecht University); Metehan Doyran (Utrecht University); Sabine Molenaar (Utrecht University); Albert Ali Salah (Utrecht University); Sjaak Brinkkemper (Utrecht University)

A General Remote Photoplethysmography Estimator with Spatiotemporal Convolutional Network

Siqi Liu (Department of Computer Science, Hong Kong Baptist University)*; PongChi Yuen (Department of Computer Science, Hong Kong Baptist University)

11:00 – 11:50

Oral Session 6

DriverMHG: A Multi-Modal Dataset for Dynamic Recognition of Driver Micro Hand Gestures and a Real-time Recognition Framework

Okan Köpüklü (Technical University of Munich)*; Thomas Ledwon (Ludwig-Maximilians-Universitaet Muenchen); Yao Rong (University of Tübingen); Neslihan Kose Cihangir (Intel Deutschland GmbH); Gerhard Rigoll (Institute for Human-Machine Communication, TU Munich, Germany)

Generative Model-Based Loss to the Rescue: A Method to Overcome Annotation Errors for Depth-Based Hand Pose Estimation

Jiayi Wang (Max Planck Institut Informatik)*; Franziska Mueller (MPI Informatics); Florian Bernard (MPI); Christian Theobalt (MPI Informatik)

Automatic Detection of Self-Adaptors for Psychological Distress

Weizhe Lin (University of Cambridge)*; Indigo Orton (University of Cambridge); Mingyu Liu (University of Oxford); Marwa Mahmoud (University of Cambridge)

Metric-Scale Truncation-Robust Heatmaps for 3D Human Pose Estimation

István Sárándi (RWTH Aachen University)*; Timm Linder (Robert Bosch GmbH); Kai O Arras (Robert Bosch GmbH); Bastian Leibe (RWTH Aachen University-)

12:00 – 12:50

Poster Session 3 / Late Breaking

Slim-CNN: A Light-Weight CNN for Face Attribute Prediction

Ankit Sharma (University of Central Florida)*; Hassan Foroosh (University of Central Florida)

Deformation Flow Based Two-stream Network for Lip Reading

Jingyun Xiao (University of Chinese Academy of Sciences)*; Shuang Yang (ICT, CAS); Yuan-Hang Zhang (University of Chinese Academy of Sciences); Shiguang Shan (Chinese Academy of Sciences); Xilin Chen (Institute of Computing Technology, Chinese Academy of Sciences)

Temporal Triplet Mining for personality recognition

Dario Dotti (Maastricht University)*; Esam A. H. Ghaleb (Maastricht University); Stylianos Asteriadis (Maastricht University)

Towards automatic monitoring of disease progression in sheep: A hierarchical model for sheep facial expressions analysis from video

Francisca Pessanha (University of Porto)*; Krista McLennan (University of Chester); Marwa Mahmoud (Cambridge University)

Hand tracking from monocular RGB with dense semantic labels

Peter J Thompson (University of Manchester)*; Aphrodite Galata (The University of Manchester)

3D Landmark localization in Point Clouds for the Human Ear

Eimear M O’ Sullivan (Imperial College London)*; Stefanos Zafeiriou (Imperial College London)

A First Investigation Into the Use of Deep Learning for Standardized and Continuous Assessment of Neonatal Post-Operative Pain

Md Sirajus Salekin (University of South Florida)*; Ghada Zamzmi (USF); Dmitry Goldgof (USF); Rangachar Kasturi (USF); Thao Ho (USF Health); Yu Sun (University of South Florida)

Mutual Information Maximization for Effective Lip Reading

xing zhao (Zhejiang University of Technology; Institute of Computing Technology, Chinese Academy of Sciences)*; Shuang Yang (ICT, CAS); Shiguang Shan (Chinese Academy of Sciences); Xilin Chen (Institute of Computing Technology, Chinese Academy of Sciences)

Leveraging Shared and Divergent Facial Expression Behavior Between Genders in Deception Detection

Gazi Naven (University of Rochester)*; taylan k sen (University of Rochester); Luke Gerstner (University of Rochester); Kurtis Glenn Haut (University of Rochester); Melissa Wen (University of Rochester); Ehsan Hoque (University of Rochester)

Simple and Effective Approaches for Uncertainty Prediction in Facial Action Unit Intensity Regression

Torsten Wörtwein (Carnegie Mellon University)*; Louis-Philippe Morency (Carnegie Mellon University)

Deep Weakly-Supervised Domain Adaptation for Pain Localization in Videos

Gnana Praveen Rajasekar (Ecole Technologie Superieure)*; Eric Granger (ETS Montreal ); Patrick Cardinal (Canada)

Emotion or expressivity? An automated analysis of nonverbal perception in a social dilemma

Su Lei (USC ICT)*; Jonathan Gratch (Institute for Creative Technologies); Kalin Stefanov (University of Southern California)

13:00 – 13:50

Prof. Matthew Turk

Is FG Enabling a Surveillance Dystopia?


Face and gesture recognition technologies hold great promise to improve people’s lives, yet they also raise serious issues with respect to privacy, bias, and serious misuse by individuals, companies, and governments. Some civil liberties and advocacy groups have been increasingly raising warnings and supporting legislation banning such technologies. Many in law enforcement push back, arguing that it saves lives and helps to make us safer. Legislative bodies are trying to decide if and how to address these issues, sometimes with limited information. As technologists, what is our role in this public debate? Are we helping to create a surveillance dystopia? What should we do about it? Let’s discuss.


Matthew Turk is president of the Toyota Technological Institute at Chicago (TTIC) and a professor emeritus at UC Santa Barbara. He received a B.S. from Virginia Tech, an M.S. from Carnegie Mellon University, and a Ph.D. from the Massachusetts Institute of Technology. His research focuses on computer vision and multimodal interaction, including early work in autonomous vehicles and in face recognition. He has received several best paper awards and has been general or program chair of several top conferences in computer vision and multimodal interaction. He co-founded an augmented reality startup company in 2014 that was acquired by PTC Vuforia in 2016. He is an IEEE Fellow, an IAPR Fellow, and the recipient of the 2011-2012 Fulbright-Nokia Distinguished Chair in Information and Communications Technologies.

14:00 – 15:20

Special Session 4:  Face and Body Movement Analysis – Applications in Healthcare

Kinematic Tracking of Rehabilitation Patients With Markerless Pose Estimation Fused with Wearable Inertial Sensors

Ronald J Cotton (Shirley Ryan AbilityLab)*

Estimation of Orofacial Kinematics in Parkinson’s Disease: Comparison of 2D and 3D Markerless Systems for Motion Tracking

Diego Guarín (Toronto Rehabilitation Institute)*; Aidan Dempster (Univ. of Toronto); Andrea Bandini (KITE – Toronto Rehab – University Health Network); Yana Yunusova (University of Toronto); Babak Taati (University Health Network)

Multimodal Deep Learning Framework for Mental Disorder Recognition

Ziheng Zhang (University of Cambridge)*; Weizhe Lin (University of Cambridge); Mingyu Liu (University of Oxford); Marwa Mahmoud (University of Cambridge)

Modelling the Statistics of Cyclic Activities by Trajectory Analysis in Riemannian Manifolds

Pietro Pala (University of Florence)*; Stefano Berretti (University of Florence, Italy); Luca Cultrera (University of Florence); Ettore Celozzi (University of Florence); Luca Ciabini (University of Florence); Mohamed Daoudi (IMT Lille Douai); Alberto Del Bimbo (University of Florence)