Example face images from the IJB-C face recognition dataset. Faces are blurred to protect privacy. Visualization by Adam Harvey / Exposing.ai licensed under CC-BY-NC with original images licensed and attributed under Creative Commons CC-BY (attribution required, no commercial use).
Example face images from the IJB-C face recognition dataset. Faces are blurred to protect privacy. Visualization by Adam Harvey / Exposing.ai licensed under CC-BY-NC with original images licensed and attributed under Creative Commons CC-BY (attribution required, no commercial use).

IARPA Janus Benchmark C

IARPA Janus Benchmark C (IJB-C) is a dataset of video still-frames and photos used for face recognition benchmarking. The dataset was published in 2017 and contains 21,294 total images. Exposing.ai located 5,757 original photos from Flickr used to build IJB-C and made this information searchable through this site's database search engine.

The IJB-C dataset includes both images and names. The name list includes 3,531 individuals. Many are activists, artists, journalists, foreign politicians, and public speakers. Unlike other datasets such as VGG Face that used the Internet Movie Database as a starting point for gathering names of actors and celebrities, the IJB-C dataset authors instead relied on "YouTube users who upload well-labeled, person-centric videos, such as the World Economic Forum and the International University Sports Federation were also identified". These sources were identified as ideal candidates for the IJB-C dataset. 1

This approach resulted in casting a wide net gathering many individuals who frequently give lectures to online audiences, or participate in conferences that were later posted to YouTube. Using videos from YouTube is a clear violation of their policy, which Google clarified in a November 2020 memo and recently re-clarified in a May 2021 memo further emphasizing in bold text that using data from YouTube for face recognition is a violation of their Terms of Service However, thousands of faces from over 11,000 YouTube videos are included in the IJB-C face recognition benchmarking dataset, along with full names for each person. In total the dataset includes face data from 11,799 YouTube videos and 21,294 photos from Wikimedia or Flickr. According to the dataset authors, all the "images were scraped from Google and Wikimedia Commons, and Creative Commons videos were scraped from YouTube." 1

One video included Jillian York, a digital rights activist who has opposed such surveillance technologies. In 2015, York delivered a lecture at the Chao Computer Congress (32C3) titled "Sin in the Time of Technology". Several years later Nobilis, a US government contractor, pulled the video from YouTube and copied 41 frames of Jillian York's face into the IJB-C dataset. York was never asked permission or even notified that her biometrics were being used in an IARPA research project with the goal of improving face recognition for intelligence analysts. The reasons that York's biometrics, along with over 3,000 others, were chosen for the IJB-C dataset is not clear. The only criteria provided by the dataset authors is that source material must include well-labeled, person-centric data. It's likely that the permissive and often misunderstood Creative Commons license was a factor in using the video. Promotional material posted by IARPA and the Office of Director of National Intelligence show the context for how technology developed in the Janus program may be applied, describing the intended outcome as "Radically Expanding the Scenarios in Which Automated Face Recognition Can Establish Identity", above an example scenario showing Osama Bin-Laden.

 The IJB-C dataset is part of the Janus face recognition program, that was first announced in 2013 with the goal of "Radically Expanding the Scenarios in Which Automated Face Recognition Can Establish Identity"
The IJB-C dataset is part of the Janus face recognition program, that was first announced in 2013 with the goal of "Radically Expanding the Scenarios in Which Automated Face Recognition Can Establish Identity"

The original dataset is over 200GB and includes a CSV file "ijbc_subject_names.csv" with SUBJECT_ID and SUBJECT_NAME columns listing all 3,531 identities used in the dataset. Several of the names are listed below, along with a summary of the YouTube videos used for building the dataset.

Supplementary Data

Sample of Names in the IJB-C Face Recognition Dataset
Name Profession
Ai Weiwei Artist, activist
Evgeny Morozov Writer, technology critic
Jeremy Scahill Journalist, activist
Jill Magid Artist
John Maeda Designer, technologist
Molly Crabapple Artist, activist
Name Profession
Neri Oxman Artist
Paola Antonelli Curator
Raul Krauthausen Activist
Slavoj Žižek Philosopher
Ta-Nehisi Coates Author, journalist
Tracey Emin Artist
Top 300 YouTube Accounts Used in the IJB-C Face Recognition Dataset
YouTube Channel Videos
M5sParlamento 175
thelegendofNeshka 141
World Economic Forum 116
SpicyBollywood 77
TheBollywoodDaily 72
AudiovisualTelam 60
Bollywood Kool 44
Show Jana Krause 36
Manuchehr lenziran 29
Bollywood dna 27
ITU 26
Center for Strategic & International Studies 26
Jullanakatrina Vidal 24
BollywoodUnCut 20
Devang Bhatt 20
CCKirchner 19
IIEA1 18
Hussaini Media Production Azadari Network 17
liberalii1 17
Bollywood Mishits 16
DailyNewsAsiaEurope 15
Repeat Bollywood 15
Valeria Amatue21 Lukyanova 14
Plat GTV Gosip™ 14
EsquerraUnidaPV 14
HotNews Romania 14
Rashtrapati Bhavan New Delhi 13
Hollywood Daily 13
Rosa-Luxemburg-Stiftung 13
Pilipino Music - OPM/MIX W/Lyrics 13
Lampu Islam 13
SpotboyE 12
Kathryn Bernardo Channel 12
Publinews Guatemala 12
Senna Channel 11
BollywoodVintage 11
Maryam Rajavi 11
Netmediatama 11
Lekoi Florentino 11
RadioAMLO 11
TheDailyMaverick 10
telugufullscreen 10
jacy Farnendas 10
GS Music Official - Muzica ta e la noi! 10
TV Gosip Indonesia 10
DirtyGameMafiaTv 10
CapitalAccount 10
Mudasser Ilyas 10
Priyo TV 9
Luigino Bracci Roa 9
Lamula. pe 9
iDream Telugu Movies 9
librairie mollat 9
CultureBuzzIsrael 9
Shri Rajiv Dixit ji 9
Partido Nacionalista Peruano 9
Marcelo Freixo 9
Om Anand 9
红高粱 Red Sorghum 2014 9
omestre999 9
ImaginingtheInternet 9
Friedrich-Naumann-Stiftung für die Freiheit 9
Ministère des Affaires étrangères et du Développement international 8
Jinmu Choy 8
Proud to be Gujarati Vasant Teraiya 8
олег ибриян 8
Hussaini Media Production 8
BollywoodSamachar 8
DaTechGuyBlog 8
Hattani 8
Samantha's Secret Classic Collection 8
Pashto song Awo Ghazal 8
EC Publishing Media 8
Coalició Compromís 8
TV Botafogo 8
NataliaPiar 8
SuperVGHD 8
NZNats 7
Ministério das Relações Exteriores — Brasil 7
jornaldajustica 7
PakTurkey 7
Palazzo Chigi 7
TWiT Netcast Network 7
NASA STI Program 7
irokotv NOLLYWOOD 7
News Videos 7
eSerwis 7
ALF Redhot Junction 7
Indian Entertainment And Top News Channel 7
Radio Algérienne 7
cst1791 7
Sernord Tommy 7
YekEsfahaniDarParis 2 7
Niv Calderon 7
YouTube Channel Videos
PujyaBapuJi 7
hyunwooRE 7
Lumbungbudaya rakyat 7
Haciendayaapp 7
frank diesel 6
RT America 6
BCNChile 6
Presidencia Perú 6
EBC na Rede 6
ScarceMedia 6
tollyfreaks 6
The Israel Project 6
Top Telugu TV 6
Bollywood Pulse 6
Luigino Bracci Roa 6
Lakha420 6
Canale25 6
Jerry Schmidt 6
Nano GoleSorkh 6
David Silva 6
Campus Party 6
Video Events Production 6
Hayatın Sesi - Video 6
vlogbrothers 6
Marc Chabot YT 6
bulbul ahmed joy 6
The Arsenio Hall Show 6
Department National Defense Philippines 6
Larva 2015 6
dreamersradioID 6
Yekesfahani Darparis 6
powtvdotnet 6
Podemos Andalucía 6
Tribuna News 6
Willian Silva Oficial 6
Flora Martirosian 6
Liberated Galaxy 6
Alex Jones Best Off 6
YekEsfahaniDarParis 4 6
The City Club of Cleveland 5
Kosan Rempong 5
HouseResourceOrg 5
newsdiechina 5
SpurredOn 5
Festival of Arts 5
rikedenimes 5
Mars Films 5
ahmet tufan 5
Greens EFA 5
Janam TV 5
Samuel Metias 5
NewsClickin 5
Campur Songo 9 5
GlobalPunjabTV 5
Dil Raju 5
BreakingWorldsNews 5
Bloggueros - Congreso de los Diputados 5
HeavenlyPeach 천도복숭아 5
Shirka Kerala 5
Confidencial 5
United States Institute of Peace 5
Deputada Manuela 5
thedavosquestion 5
The Emerald 5
Gotawa Chenel 5
The Berkman Klein Center for Internet & Society 5
SOAS University of London 5
amsh5555 5
Levántate Bolivia 5
EstateAgencyEvents 5
Idris Brasil 5
Pavel Guzev 5
Vasant HariOm 5
iCanStudioLive 5
Yakın Tarih 5
tiredfornothing 5
The4cylinder 5
Helvy Tiana Rosa 5
Utrikesdepartementet 5
radioamlotv 5
GMC Entertainment 5
rCent - Music 5
Mohammed Assaf 5
ONU Brasil 4
Union pour la Tunisie (Uni*T) 4
VT1035 kiroiva 4
tv Carta 4
wushipindao2 4
The MacGuffin 4
Yoann Durand 4
RoyalblogNL 4
peace & love 4
Michelle Bachelet 4
TheLBJLibrary 4
YouTube Channel Videos
MeleTOP 4
WketDZ 4
NoRosesForMe 4
Francis Chan Sermons 4
FootageWorld 4
BollywoodHitMovies 4
Báo Vẹm 4
UhuruKenyattaTV 4
Boxing VHS 4
jumpgta 4
Ann Mary 4
sevak operator 4
Rayyisse 4
Confederation of British Industry 4
Bárbara Mori 4
Antonio Jose 4
International Rice Research Institute 4
Super Sekali 4
Fasol Prod 4
gcshotime2 4
Atheist Digest 4
tyrlop 4
INKtalks 4
Kapione 4
albatros 4
Yuu Mii Nguyễn 4
Atracaodivulga 4
thanhson11115 4
TheWikiLeaksChannel 4
Com Cay 4
هداية لليوتيوب الإسلامي 4
Star Academy Arabia 4
Music Playlist 4
TD Jakes 4
Igor Kichuk 4
Pearl Talent 4
Creative Makers 4
DemocracyNowEs 4
A Graceful Watchman 4
Peter SJ Dramas 2 4
M5SRoma 4
Retro-Inspiracja 4
Canal 44 4
MektebiSultani 4
Gheorghe Zamfir 4
PalaciodoPlanalto 4
Kitty Geny 4
wojo4hitz 4
barackobamadotcomII 4
UNHCR-ACNUR Américas 4
RationalFaith609 4
MasecaTV 4
Neelie Kroes 4
bentleysw 4
Praise the Lord 4
Shilly Marshall 4
Music Channels Hits 4
Сергей Мавроди 4
YekEsfahaniDarParis 1 4
David Jackmanson in Moe Gippsland Australia 3
Scoop 3
Guy Ntoto 3
imivids 3
U.S. Embassy Tel Aviv 3
MsMilkytheclown1 3
UrbanAge 3
Whaleoil 3
mpbebossa 3
ЧГТРК Грозный 3
Rossijskaya Gazeta 3
nydivide 3
blackhawk 3
Principio Esperanza 3
SiliconANGLE 3
NADiRinforma 3
Turkued 3
WeLoveMusicfr 3
La Chica de Morado 3
Nadir Kamal 3
Don Johnson 3
SurvivingScientology 3
Audi Dublin International Film Festival 3
Periodismo Ciudadano 3
Rachel Madddow Show 3
Company Star 3
Sidrah Zaheer 3
New America 3
Bollywood In Bikini 3
Redsilverj 3
C chen 3
1964sjw 3
The British Library 3
GujLitFest 3
Gawaahi 3
Home Office 3
yasmeen rubi 3

Information Supply Chain

To help understand how IJB-C has been used around the world by commercial, military, and academic organizations; existing publicly available research citing IARPA Janus Benchmark C was collected, verified, and geocoded to show how AI training data has proliferated around the world. Click on the markers to reveal research projects at that location.

Citation data is collected using SemanticScholar.org then dataset usage verified and geolocated. Citations are used to provide an estimated overview of how and where images were used based on institutional affiliations. Thicker lines represent more citations. Please zoom in to see all institutions, as cities may have multiple points very close together.

Citing This Work

If you reference or use any data from the Exposing.ai project, cite our original research as follows:

  author = {Harvey, Adam. LaPlace, Jules.},
  title = {Exposing.ai},
  year = 2021,
  url = {https://exposing.ai},
  urldate = {2021-01-01}

If you reference or use any data from IJB-C cite the author's work:

    author = "Whitelam, Cameron and Taborsky, Emma and Blanton, Austin and Maze, Brianna and Adams, Jocelyn C. and Miller, Tim and Kalka, Nathan D. and Jain, Anil K. and Duncan, James A. and Allen, Kristen E and Cheney, Jordan and Grother, Patrick",
    title = "IARPA Janus Benchmark-B Face Dataset",
    journal = "2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)",
    year = "2017",
    pages = "592-600"


  • 1 abCameron Whitelam, et al. "IARPA Janus Benchmark-B Face Dataset". 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). (2017): 592-600.