Example images from the DiveFace face recognition dataset. Faces are blurred to protect privacy. Visualization by Adam Harvey / Exposing.ai licensed under CC-BY-NC with original images licensed and attributed under Creative Commons CC-BY (attribution required, no commercial use).

DiveFace

DiveFace is a dataset of photos used for face recognition. The dataset was published in 2019 and contains 139,677 total images. Exposing.ai located 115,729 original photos from Flickr used to build DiveFace.

According to the authors, "DiveFace contains annotations equally distributed among six classes related to gender and ethnicity (male, female and three ethnic groups)." Their dataset broadly categorizes everyone into one the following 3 labels: East Asian, Sub-Saharan and South Indian, and Caucasian.

The authors explain that they are "aware about the limitations of grouping all human ethnic origins into only 3 categories. According to different studies, there are more than 5K ethnic groups in the world. We made the division in these three big groups to maximize differences among classes. As we will show in the experimental section, automatic classification algorithms based on these three categories show performances up to 98% accuracy." 1

The gender and ethnicity labels were generated using a combination of automatic facial feature analysis with manual labeling oversight. In total the DiveFace dataset collected and used biometric information for 24,000 individuals for the purpose of developing face recognition technology. The dataset contains an average of 5.5 images per person.

Images in the DiveFace dataset are derived from the MegaFace, which is in turn derived from the Yahoo! Flickr Creative Commons 100 Million (YFCC100M) dataset, which is ultimately derived entirely of Flickr images uploaded between 2004 and 2014. Although YFCC100M and subsequently, MegaFace and DiveFace are all comprised of Creative Commons images, there are important distinctions in the licensing.

The charts below are based on Exposing.ai's analyses of the metadata from the DiveFace dataset. It shows that there is a multitude of licensing in the dataset. The majority of images are licensed under a BY-NC-ND, which stipulates that users of their images must provide attribution (BY), only be used for non-commercial purposes (NC), and that no derivations (ND) are allowed.

The DiveFace metadata can be downloaded from the author's GitHub page at https://github.com/BiDAlab/DiveFace. Their research paper is available at https://arxiv.org/ftp/arxiv/papers/1902/1902.00334.pdf.

DiveFace Attributes
Dataset Name	DiveFace
Dataset Name Full	DiveFace
Total Images	139,677
Identities	24,000
Initial Purpose	Ethnically diverse face recognition
Year Published	2019
Dataset Website	https://github.com/BiDAlab/DiveFace

Photos from Flickr.com in DiveFace
Total Flickr Photos	115,729
Total Flickr Users	6,102
Active on Flickr.com*	106,110
Inactive/removed on Flickr.com*	9,619
API Data Accessed	October 2019
Included in YFC100M	115,729
Photos w/ Geo Data	46,717
Searchable on Exposing.ai	115,729

DiveFace Copyright Distribution

DiveFace Creative Commons license distribution | Download Data (CSV) | Download Chart (SVG)

DiveFace Creative Commons License Distribution

DiveFace Creative Commons license distribution | Download Data (CSV) | Download Chart (SVG)

DiveFace Image Upload Year Distribution

DiveFace Creative Commons license distribution | Download Data (CSV) | Download Chart (SVG)

Top 10 DiveFace Image #Tags

Top 10 image #tags used in DiveFace | Download Data (CSV) | Download Chart (SVG)

Top 10 Geocoded Cities DiveFace

Top 10 cities for geocoded photos in DiveFace | Download Data (CSV) | Download Chart (SVG)

Citing This Work

If you reference or use any data from the Exposing.ai project, cite our original research as follows:

@online{Exposing.ai,
  author = {Harvey, Adam},
  title = {Exposing.ai},
  year = 2021,
  url = {https://exposing.ai},
  urldate = {2021-01-01}
}

If you reference or use any data from DiveFace cite the author's work:

@article{Morales2020SensitiveNetsLA,
    author = "Morales, A. and Fierrez, Julian and Vera-Rodriguez, Rub{\'e}n and Tolosana, R.",
    title = "SensitiveNets: Learning Agnostic Representations with Application to Face Images.",
    journal = "IEEE transactions on pattern analysis and machine intelligence",
    year = "2020",
    volume = "PP"
}

References

1 aA. Morales, et al. "SensitiveNets: Learning Agnostic Representations with Application to Face Images.". IEEE transactions on pattern analysis and machine intelligence PP. (2020):