Exposing.ai
IJB-C
Example face images from the IJB-C face recognition dataset. Faces are blurred to protect privacy. Visualization by Adam Harvey / Exposing.ai licensed under CC-BY-NC with original images licensed and attributed under Creative Commons CC-BY (attribution required, no commercial use).
Example face images from the IJB-C face recognition dataset. Faces are blurred to protect privacy. Visualization by Adam Harvey / Exposing.ai licensed under CC-BY-NC with original images licensed and attributed under Creative Commons CC-BY (attribution required, no commercial use).

IARPA Janus Benchmark C

IARPA Janus Benchmark C (IJB-C) is a dataset of photos used for face recognition benchmarking. The dataset was published in 2017 and contains 21,294 total images. Exposing.ai located 5,757 original photos from Flickr used to build IJB-C.

The IJB-C dataset includes both images and names. The name list includes 3,531 individuals. Many are activists, artists, journalists, foreign politicians, and public speakers. Unlike other datasets such as VGG Face that used the Internet Movie Database as a starting point for gathering names of actors and celebrities, the IJB-C dataset authors instead relied on "YouTube users who upload well-labeled, person-centric videos, such as the World Economic Forum and the International University Sports Federation were also identified". These sources were identified as ideal candidates for the IJB-C dataset. 1

This approach resulted in casting a wide net that gathered many individuals who are neither actors nor pop-stars, but appear often in lectures, speeches, or conferences that were later posted to YouTube. Google recently made clear that YouTube data is not permissible for face recognition purposes and that it never was. According to their updated Terms of Service from November 2020 they added the term "faces" to be explicit that face data is not allowed to be used to identify someone. However, thousands of faces from YouTube videos were included in the IJB-C face recognition benchmarking dataset, along with full names for each person. According to the dataset authors, all the "images were scraped from Google and Wikimedia Commons, and Creative Commons videos were scraped from YouTube." 1

Yet, according to YouTube's updated terms of service:

You are not allowed to:
…
4. collect or harvest any information that might identify a person 
(for example, usernames or faces), unless permitted by that person 
or allowed under section (3) above;

A brief list of names (currently being reviewed and updated) that were included are:

Name Profession
Ai Weiwei Artist, activist
Ta-Nehisi Coates Author, journalist
Molly Crabapple Artist, activist
Raul Krauthausen Disability rights activist
Name Profession
John Maeda Designer, technologist
Evgeny Morozov Writer, technology critic
Jeremy Scahill Journalist, activist
Slavoj Žižek Philosopher

Information Supply Chain

To help understand how IJB-C has been used around the world by commercial, military, and academic organizations; existing publicly available research citing IARPA Janus Benchmark C was collected, verified, and geocoded to show how AI training data has proliferated around the world. Click on the markers to reveal research projects at that location.

Citation data is collected using SemanticScholar.org then dataset usage verified and geolocated. Citations are used to provide an estimated overview of how and where images were used based on institutional affiliations. Thicker lines represent more citations. Please zoom in to see all institutions, as cities may have multiple points very close together.

IJB-C Copyright Distribution

Years
IJB-C Creative Commons license distribution | Download data

IJB-C Creative Commons License Distribution

Years
IJB-C Creative Commons license distribution | Download data

IJB-C Image Upload Year Distribution

Years
IJB-C Creative Commons license distribution | Download data

Top 10 IJB-C Image #Tags

Years
Top 10 image #tags used in IJB-C | Download data

Top 10 Geocoded Cities IJB-C

Years
Top 10 cities for geocoded photos in IJB-C | Download data

Citing This Work

If you reference or use any data from the Exposing.ai project, cite our original research as follows:

@online{Exposing.ai,
  author = {Harvey, Adam. LaPlace, Jules.},
  title = {Exposing.ai},
  year = 2021,
  url = {https://exposing.ai},
  urldate = {2021-01-01}
}

If you reference or use any data from IJB-C cite the author's work:

@article{Whitelam2017IARPAJB,
    author = "Whitelam, Cameron and Taborsky, Emma and Blanton, Austin and Maze, Brianna and Adams, Jocelyn C. and Miller, Tim and Kalka, Nathan D. and Jain, Anil K. and Duncan, James A. and Allen, Kristen E and Cheney, Jordan and Grother, Patrick",
    title = "IARPA Janus Benchmark-B Face Dataset",
    journal = "2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)",
    year = "2017",
    pages = "592-600"
}

References

  • 1 abCameron Whitelam, et al. "IARPA Janus Benchmark-B Face Dataset". 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). (2017): 592-600.