We present a convolutional neural network (CNN) approach to age and gender classification, replicat- ing the design of Levi and Hassner [1]. We train a small three-layer CNN on the challenging Adience benchmark of unconstrained face photographs [2]. The network consists of three convolutional layers (with ReLU activations, pool- ing, and normalization) followed by two fully connected layers with dropout, culminating in a softmax output for age (8 classes) or gender (2 classes). Using five-fold cross- validation on the Adience dataset, our model achieves approximately 86.8% accuracy for gender and 50.7% exact (84.7% one-off) accuracy for age. These results match those reported in [1], demonstrating the effectiveness of CNNs for facial attribute estimation in unconstrained images.
Introduction
Age and gender estimation from unconstrained face images is challenging due to variations in pose, lighting, expression, and occlusion. Early approaches relied on hand-crafted features (e.g., LBP, SIFT, Gabor) with classifiers like SVMs, but these methods struggle with real-world variability.
Deep Learning Approach:
Convolutional Neural Networks (CNNs) have improved performance by learning feature hierarchies directly from images. This work replicates the CNN architecture proposed by Levi and Hassner and trains it on the Adience dataset, which contains 26,580 face images labeled by age group (8 classes) and gender (2 classes), with significant real-world variability.
CNN Architecture:
Input: 227×227 RGB face image
Conv layers: 3 convolutional layers with ReLU, max-pooling, and local response normalization
Fully connected layers: 2 layers with 512 neurons each, with dropout
Output: Softmax for age (8 classes) or gender (2 classes)
Training Methodology:
Trained from scratch using SGD with momentum
Batch size 50, learning rate 10?³ reduced to 10??
Data augmentation: random cropping, mirroring
Loss: Cross-entropy
Testing: Center-crop and oversampling (5 crops + flips)
Results:
Gender: 86.8% accuracy (oversampling) vs. 77.8% prior best
Age: 50.7% exact, 84.7% 1-off accuracy vs. 45.1% exact, 79.5% 1-off prior best
Most age errors occur between neighboring groups, reflecting inherent ambiguity
Pipeline:
Face detection and 2D alignment
Resize to 256×256, crop to 227×227
Feed into CNN for age/gender probability
Use oversampling to improve robustness
References
[1] G. Levi and T. Hassner, “Age and Gender Classification using Convolutional Neural Networks,” in Proc. IEEE CVPR Workshops, 2015.
[2] E. Eidinger, R. Enbar, and T. Hassner, “Age and Gender Estimation of Unfiltered Faces,” IEEE Trans. Inf. Forensics Security, vol.9, no.12, pp.2170–2179, 2014.
[3] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet Classifi- cation with Deep Convolutional Neural Networks,” in Adv. Neural Inf. Process. Syst., 2012.
[4] G. E. Hinton et al., “Dropout: A Simple Way to Prevent Neural Networks from Overfitting,” J. Mach. Learning Res., vol.15, pp.1929–1958, 2014.