[Thesis]. Manchester, UK: The University of Manchester; 2019.
After decades of research, automatic facial expression recognition (AFER) has been
shown to work well when restricted to subjects with a limited range of ages, expressions,
and intensities of expression. Recognition of the expressions of subjects across a
large range of ages (including older people), expressions (compound emotions), and
intensities (ranging from a neutral expression to the apex of the target expression)
is harder and, to date, has not been studied in any particular depth.
This thesis focuses on studying the influence of these problems on the accuracy of
AFER. The main concern is to investigate the possibilities that can be used for modelling
facial expression recognition against the impact of the problems under study in order
to ensure the solution is more generalized and effective. Since the face image is
a collection of texture and shape parameters, the study starts by using texture measurement
methods to understand the influence of those problems on face texture features and
hence on texture-based AFER. Our first contribution shows that by using binary robust
independent elementary features (BRIEF) (Calonder et al., 2012), we can develop a
new face descriptor model that is able to describe face images and can generalize
to new data sets. The BRIEF descriptor is able to generate the discriminative features
globally from the image with an explicit shape. However, when BRIEF is used to generate
feature from an image with no explicit shape such as the face image, BRIEF is unable
to generate discriminative feature. We thus propose to use BRIEF locally to ensure
that each pixel in the image is evaluated locally to capture the local shape surrounding
around it. Empirical and comprehensive evaluation using three facial expression datasets
demonstrates that this model gives satisfactory performance compared to other local
face descriptors techniques evaluated on the same datasets. The study also shows that
the patterns of the problems under consideration have a significant effect on the
face texture features and on the accuracy of texture-based AFER.
The study is then extended by using shape measurement methods to investigate the influence
of those problems on the face shape features and hence on shape-based AFER. Our second
contribution shows that by using random forest regression voting in a constrained
local model (RFRV-CLM) framework (Cootes et al., 2012; Lindner et al., 2015), we
can develop a fully automated facial expression localization (FEL) system that is
able to detect the facial key points in a multiple-stage (coarse-to-fine) scheme and
can generalize accurately to new data sets with a wide range of variations of facial
appearances. Empirical and comprehensive evaluation using five different facial expression
datasets demonstrates that this model gives excellent agreement with ground truth
data and outperforms the results of alternative methods evaluated on the same datasets.
The study also shows that the patterns of the problems under study have a significant
effect on the performance of FEL, and that the FEL based on RFRV-CLM achieved good
performance against that effect. It also demonstrates that appearance-based AFER (combining
shape with texture) gives better results than texture-based AFER.
Our final contribution builds on the second and it is the development of an age-based
AFER system that explicitly estimates age group and expression in a single framework.
In this system, we show that by using the age information, in particular apparent
age since some people might look younger or older than their real age, as prior knowledge
to the expression recognition through using a weighted combination rule of a set of
age group classifier and age-specific expression classifiers, we can significantly
eliminate the influence of age features on the expression classification accuracy.
Tested on three age-expression datasets, we show that the results of our novel system
were encouraging in comparison to the state-of-art systems which ignore age and alternative
models recently applied to the problem.
In summary, the results of the BRIEF-based face descriptor, RFRV-CLM-based FEL, and
age-based AFER are encouraging and could be basic building blocks for many face applications
in computer vision such face detection, face recognition ...etc.