Understand Train Set, Gallery Set and Probe Set in Face Recognition – Deep Learning Tutorial

By | September 24, 2021

When you are building face recognition model using deep learning, you have to build a train set, gallery set and probe set to evaluate the performance of your model. In this tutorial, we will introduce these three sets.

Train set

Training set usually be used to train your model. It usually be splitted to three parts.

For example: Here Data is all train set, it will be splitted a train, valid and test set.

split data into train validation and test set

How to split a total train set into three parts, you can view this tutorial:

Split IMDB Movie Review Dataset (aclImdb) into Train, Test and Validation Set: A Step Guide for NLP Beginners

As to Train, Valid and Test, we should use them as follows:

Train: Use this set to train our model.

Valid: Use this set to select hyper parameters, such as learning rate, batch size et al.

Test: Use this set to compute final metrics. We usually select the result on Test as to model final result based on the best result on Valid.

For example, as to classification problem, we compute the accuracy of Valid and Test per 25 steps when traing our model. The best accuracy of Valid set is 98% and the best accuracy of Test set is 96% at the 4000 step. The accuracy of our model is 96%, even if the accuracy of Test set is 98% at the 5000 step.

Gallery set

Suppose you have a blacklist which contains 500 persons. You can use one, two or more face pictures of these people to build this blacklist.

For example, you may use two face pictures of each person to build this blacklist, your blacklist will contains 1,000 items.

a blacklist example

This blacklist is a gallery set, you will use a model to judge a person is in this blacklist or not. It is easy to know we can not use data in gallery set to train model.

Probe set

Probe set also can not be used to train model. It usually contain two parts:

Part 1: Data in gallery set.

For example, there are 250 persons both in probe and gallery set, however, their face images are different. Our model should judge a person in probe set is also in gallery set by his face picture.

Part 2: Data not in gallery set.

As blacklist mentioned above, our model should judge a person who is not in blacklist is really not in blacklist.

  • If a person is in blacklist, our model can not find him in our blacklist, our model will be wrong. This situation is called False Rejection Rate (FRR).
  • If a persion is not in blacklist, however, our model find a similar person by his face picture and suppose him is a unreliable person. 0ur model is also wrong. This situation is called False Acceptance Rate (FAR).

In paper ” The CAS-PEAL large-scale Chinese face database and baseline evaluations”, Train, Gallery and Test set is also defined, they are:

Training set

A training set is a collection of images which are used to generate a generic representation and to tune parameters for an algorithm. In the protocols, the training set contains 1,200 images (300
subjects randomly selected from the 1,040 subjects in the CAS-PEAL-R1 database and each subject contains four images randomly selected from the frontal subset of the CAS-PEAL-R1 database). (More details about the images can be added hereā€¦)

Gallery set

A gallery set is a collection of images of known individuals against which testing images are matched. In the protocols, the gallery set contains 1,040 images of 1,040 subjects (each subject has one image under normal condition).

Probe sets

A probe set is a collection of probe images of unknown individuals that need to be recognized. In the protocols, nine probe sets are composed from the CAS-PEAL-R1 database. Among them, six probe sets correspond to the six subsets in the frontal subset: expression, lighting, accessory, background, distance and aging, as described in Table 2. The other three probe sets correspond to the images of subjects in the pose subset: looking upwards, looking right into the camera C4 (the middle one), and looking downwards. All the images that appear in the training set are excluded from these probe sets.