Exposing.ai
DiveFace
Example images from the DiveFace face recognition dataset. Faces are blurred to protect privacy. Visualization by Adam Harvey / Exposing.ai licensed under CC-BY-NC with original images licensed and attributed under Creative Commons CC-BY (attribution required, no commercial use).
Example images from the DiveFace face recognition dataset. Faces are blurred to protect privacy. Visualization by Adam Harvey / Exposing.ai licensed under CC-BY-NC with original images licensed and attributed under Creative Commons CC-BY (attribution required, no commercial use).

DiveFace

DiveFace is a dataset of photos used for face recognition. The dataset was published in 2019 and contains 139,677 total images. Exposing.ai located 115,729 original photos from Flickr used to build DiveFace.

According to the authors, "DiveFace contains annotations equally distributed among six classes related to gender and ethnicity (male, female and three ethnic groups)." It is unclear why a dataset built to train "unbiased and discrimination-aware face recognition algorithms" uses only 3 ethnicities. Their dataset broadly categorizes everyone into one the following 3 labels: East Asian, Sub-Saharan and South Indian, and Caucasian.

The gender and ethnicity labels were generated using a combination of automatic facial feature analysis with manual labeling oversight. In total the DiveFace dataset collected and used biometric information for 24,000 individuals for the purpose of developing face recognition technology. The dataset contains an average of 5.5 images per person.

Images in the DiveFace dataset are derived from the MegaFace, which is in turn derived from the Yahoo! Flickr Creative Commons 100 Million (YFCC100M) dataset, which is ultimately derived entirely of Flickr images uploaded between 2004 and 2014. Although YFCC100M and subsequently, MegaFace and DiveFace are all comprised of Creative Commons images, there are important distinctions in the licensing.

The charts below are based on Exposing.ai's analyses of the metadata from the DiveFace dataset. It shows that there is a multitude of licensing in the dataset. The majority of images are licensed under a BY-NC-ND, which stipulates that users of their images must provide attribution (BY), only be used for non-commercial purposes (NC), and that no derivations (ND) are allowed.

The DiveFace metadata can be downloaded from the author's GitHub page at https://github.com/BiDAlab/DiveFace. Their research paper is available at https://arxiv.org/ftp/arxiv/papers/1902/1902.00334.pdf.

DiveFace Copyright Distribution

Years
DiveFace Creative Commons license distribution | Download data

DiveFace Creative Commons License Distribution

Years
DiveFace Creative Commons license distribution | Download data

DiveFace Image Upload Year Distribution

Years
DiveFace Creative Commons license distribution | Download data

Top 10 DiveFace Image #Tags

Years
Top 10 image #tags used in DiveFace | Download data

Top 10 Geocoded Cities DiveFace

Years
Top 10 cities for geocoded photos in DiveFace | Download data

Citing This Work

If you reference or use any data from the Exposing.ai project, cite our original research as follows:

@online{Exposing.ai,
  author = {Harvey, Adam. LaPlace, Jules.},
  title = {Exposing.ai},
  year = 2021,
  url = {https://exposing.ai},
  urldate = {2021-01-01}
}

If you reference or use any data from DiveFace cite the author's work:

@article{Morales2020SensitiveNetsLA,
    author = "Morales, A. and Fierrez, Julian and Vera-Rodriguez, Rub{\'e}n and Tolosana, R.",
    title = "SensitiveNets: Learning Agnostic Representations with Application to Face Images.",
    journal = "IEEE transactions on pattern analysis and machine intelligence",
    year = "2020",
    volume = "PP"
}

References

  • 1 A. Morales, et al. "SensitiveNets: Learning Agnostic Representations with Application to Face Images.". IEEE transactions on pattern analysis and machine intelligence PP. (2020):