NoW dataset
To run the challenge we introduce NoW Dataset. The dataset contains 2054 2D images of 100 subjects, captured with an iPhone X, and a separate 3D head scan for each subject. This head scan serves as groundtruth for the evaluation. The subjects are selected to contain variations in age, BMI, and sex (55 female, 45 male).
Validation and test set
We have updated the Challenge. NoW dataset is divided into a validation set and a test set. The validation set consists of 20 subjects and the test set consists of 80 subjects. We provide the ground truth scans for the validation set and their corresponding landmarks. To run the evaluation on the validation set, use the NoW evaluation repository.
Images
We categorize the captured data in four challenges; neutral (620 images), expression (675 images), occlusion (528 images) and selfie (231 images). Neutral, expression and occlusion contain neutral, expressive, and partially occluded face images of all subjects in multiple views, ranging from frontal view to profile view. Expression contains different acted facial expressions such as happiness, sadness, surprise, disgust, and fear. Occlusion contain images with varying occlusions from e.g. glasses, sunglasses, facial hair, hats or hoods. For the selfie category, participants are asked to take selfies with the iPhone, without imposing constraints on the performed facial expression. The images are captured indoor and outdoor to provide variations of natural and artificial light. We provide the crop information for face region in the Downloads page.
Scans
For each subject we capture a raw head scan in neutral expression with an active stereo system (3dMD LLC, Atlanta). The multi-camera system consists of six gray-scale stereo camera pairs, six color cameras, five speckle pattern projectors, and six white LED panels. The reconstructed 3D geometry contains about 120K vertices for each subject. Each subject wears a hair cap during scanning to avoid occlusions and scanner noise in the face or neck region due to hair.
The challenge for all categories is to reconstruct a neutral 3D face given a single monocular image. Note that facial expressions are present in several images, which requires methods to disentangle identity and expression to evaluate the quality of the predicted identity.
Data processing
Most existing 3D face reconstruction methods require a localization of the face. To mitigate the influence of this pre-processing step we provide for each image, a bounding box, that covers the face. To obtain bounding boxes for all images, we first run a face detector on all images, and then predict keypoints for each detected face. We manually select 2D landmarks for failure cases. We then expand the bounding box of the landmarks to each side by 5% (bottom), 10% (left and right), and 30% to the top to obtain a box covering the entire face including forehead.
For the 3D scans, we follow processing protocol similar to this paper. For each scan, the face center is selected, and the scan is cropped by removing everything outside of a specified radius. The selected radius is subject specific computed as 0.7 × (outer eye dist + nose dist) (see Figure 1).