Skip to main content

Machine learning and computer vision

Machine learning and computer vision

Machine learning is the science of constructing algorithms that learn from data and in the case of computer vision, to image and video data.

The proliferation of data and the availability of high performance computing makes this a fertile and very applicable area of research. It draws on ideas in computer science, statistics and applied mathematics, together with biologically inspired paradigms such as neural computation.

Machine learning research at Exeter spans the range of data, applications and methodologies: from kernel methods to deep neural architectures and reinforcement learning applied to both continuous and discrete, graph-based data.

The group collaborates with industrial partners and with the Impact Lab, and contributes strongly to the University’s membership of the Alan Turing Institute

Group members

Former members

Research Projects

Visual attention and pre-attentive perception

How much of driving is pre-attentive? Quite a lot, it appears. For an experienced driver, the act of driving requires little attention, allowing for extended periods of driving, while at the same time having a conversation, looking for directions or daydreaming. It is precisely because little attention is necessary that drivers' inattention is such a common cause of accident.

Research in collaboration with the University of Surrey has been investigating exactly how much of a driver’s actions could be explained by pre-attentive perception only [1]. This research employed a computational model of pre-attentive perception with machine learning to mimic the driver’s behaviour. Experiments showed that up to 80% of steering and braking actions of a driver can be fully explained by such a pre-attentive model of perception. Moreover, this model was able predict the driver's steering accurately up to a full second before they started turning the wheel.

Ongoing research is investigating the full role of attention in dynamic tasks such as driving, and how a successful attentional strategy can be learnt from experience. In particular, recent experiments have allowed us to explain what is learnt by Deep Saliency models of visual attention [2].

Funding

This research is funded by the EPSRC under project DEVA: Autonomous Driving with Emergent Visual Attention.

References

  1. Pugeault N, Bowden R. (2015) How much of driving is pre-attentive? IEEE Transactions on Vehicular Technologies, volume 64, pages 1-15, article no. 12, DOI:10.1109/TVT.2015.2487826.
  2. He S, Pugeault N. (2017) Deep saliency: What is learnt by a deep network about saliency? 2nd Workshop on Visualization for Deep Learning at the 34 International Conference on Machine Learning, Sidney, Australia, 10th Aug 2017. 
  3. He S, Kangin D, Mi Y, Pugeault N. (2018) Aggregated Sparse Attention for Steering Angle PredictionInternational Conference on Pattern Recognition, Beijing, China, 20th-24th Aug 2018.

Autonomous control

Consider a team of robots charged with exploring an unknown environment completely autonomously. Their task will be to discover the environment’s layout, to localise themselves with respect to visible landmarks and each other, and to plan trajectories ahead.

The state of the art in robotics research provides robust methods to model a robot’s environment from vision and motion, so called SLAM (Simultaneous Localisation and Mapping), as well as for localising itself in this map using visual landmarks. Once the environment’s layout and the robot position are known, planning and navigation are also well studied problems.

What is less understood by previous research is:

  • how should a robot plan its actions and motions initially in order to discover its environment most efficiently and accurately? [1]
  • if we assume that the exploration can done by multiple robots simultaneously, how should these robots decide when and how to cooperate to improve exploration? [2]

In response to those questions, we have developed novel algorithms for planning optimal collaborative exploration of unknown environments by teams of autonomous robots.

This project is in collaboration with the Centre for Vision Speech and Signal Processing (CVSSP) at the University of Surrey.

Video: Humans can localise without LiDAR, can robots?

References

  1. Mendez Maldonado, Oscar, Hadfield, Simon, Pugeault, Nicolas and Bowden, Richard (2016) Next-best stereo: extending next best view optimisation for collaborative sensors. British Machine Vision Conference, 19-22 September 2016, York, UK.
  2. Mendez Maldonado, Oscar, Hadfield, Simon, Pugeault, Nicolas and Bowden, Richard (2017) Taking the Scenic Route to 3D: Optimising Reconstruction from Moving Cameras. IEEE International Conference on Computer Vision 2017, 22-29 October 2017, Venice, Italy.
  3. O Mendez, S Hadfield, N Pugeault, R Bowden (2018) SeDAR-Semantic Detection and Ranging: Humans can localise without LiDAR, can robots?  Proceedings of the International Conference on Robotics and Automation (ICRA'2018), IEEE.

3D face modelling with unprecedented quality

We have recently created the most accurate digital 3D model of human faces [1,2]. We introduced the largest-scale 3D morphable model of facial shapes ever constructed, based on a dataset of more than 10,000 distinct facial identities from a huge range of gender, age and ethnicity combinations.

For this large scale facial model (LSFM), we proposed a fully automated system that establishes dense correspondences among 3D facial scans, yielding state-of-the-art results.

We are currently investigating novel methodologies that will allow us to extend our 3D face modelling so that it can also represent the 3D shapes of emotive faces with unprecedented accuracy. Towards that goal, we have recently collected a new dataset of dynamic 3D facial scans of around 5,000 individuals. This large-scale dataset includes facial scans of the participants in several different facial expressions. This data collection was done during a special exhibition and technology demonstration at the Science Museum, London.

This work is in collaboration with scientists from Imperial College London and craniofacial surgeons from Great Ormond Street Hospital and Royal Free Hospital.

Science magazine article

Science magazine featured this research project and its applications.  

References

  1. J. Booth, A. Roussos, S. Zafeiriou, A. Ponniah, and D. Dunaway. A 3D Morphable Model learnt from 10,000 faces. International Conference on Computer Vision and Pattern Recognition (CVPR 2016), Las Vegas, Nevada, USA, June 2016.
  2. J. Booth, A. Roussos, A. Ponniah, D. Dunaway, and S. Zafeiriou. Large Scale 3D Morphable Models, International Journal of Computer Vision (IJCV), 2017.

 

Dense 3D face reconstruction from images and videos under unconstrained conditions

As part of our ongoing research, we are exploiting our highly-accurate 3D face modelling for problems of Computer Vision and Computer Graphics. Reconstructing the detailed 3D shape and dynamics of the human face has numerous applications, such as facial expression recognition, human-computer interaction, augmented reality, performance capture, computer games and visual effects, to name a few.

Despite the important advances in the related scientific fields, the existing methods have several limitations, since they can only work reliably under restrictive acquisition conditions. In our work, we seek to develop novel formulations and pioneering methodologies for dense 3D reconstruction and modelling of human faces that will be able to deal with challenging real-life image data.

We are paying particular attention to computational efficiency and to achieving unprecedented accuracy and robustness to challenging scenarios, such as low-resolution image data, severe occlusions, strong illumination changes and large intra-subject variability in facial morphology.

This work is in collaboration with scientists from Imperial College London.

References

  1. J. Booth, A. Roussos, E. Ververas, E. Antonakos, S. Ploumpis, Y. Panagakis, and S. Zafeiriou. 3D Reconstruction of "In-the-Wild" Faces in Images and Videos. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI), Accepted for publication.