IARPA Janus Benchmark C (IJB-C) is a dataset of video still-frames and photos used for face recognition benchmarking. The dataset was published in 2017 and contains 21,294 total images. Exposing.ai located 5,757 original photos from Flickr used to build IJB-C and made this information searchable through this site's database search engine.
The IJB-C dataset includes both images and names. The name list includes 3,531 individuals. Many are activists, artists, journalists, foreign politicians, and public speakers. Unlike other datasets such as VGG Face that used the Internet Movie Database as a starting point for gathering names of actors and celebrities, the IJB-C dataset authors instead relied on "YouTube users who upload well-labeled, person-centric videos, such as the World Economic Forum and the International University Sports Federation were also identified". These sources were identified as ideal candidates for the IJB-C dataset. 1
This approach resulted in casting a wide net gathering many individuals who frequently give lectures to online audiences, or participate in conferences that were later posted to YouTube. Using videos from YouTube is a clear violation of their policy, which Google clarified in a November 2020 memo and recently re-clarified in a May 2021 memo further emphasizing in bold text that using data from YouTube for face recognition is a violation of their Terms of Service However, thousands of faces from over 11,000 YouTube videos are included in the IJB-C face recognition benchmarking dataset, along with full names for each person. In total the dataset includes face data from 11,799 YouTube videos and 21,294 photos from Wikimedia or Flickr. According to the dataset authors, all the "images were scraped from Google and Wikimedia Commons, and Creative Commons videos were scraped from YouTube." 1
One video included Jillian York, a digital rights activist who has opposed such surveillance technologies. In 2015, York delivered a lecture at the Chao Computer Congress (32C3) titled "Sin in the Time of Technology". Several years later Nobilis, a US government contractor, pulled the video from YouTube and copied 41 frames of Jillian York's face into the IJB-C dataset. York was never asked permission or even notified that her biometrics were being used in an IARPA research project with the goal of improving face recognition for intelligence analysts. The reasons that York's biometrics, along with over 3,000 others, were chosen for the IJB-C dataset is not clear. The only criteria provided by the dataset authors is that source material must include well-labeled, person-centric data. It's likely that the permissive and often misunderstood Creative Commons license was a factor in using the video. Promotional material posted by IARPA and the Office of Director of National Intelligence show the context for how technology developed in the Janus program may be applied, describing the intended outcome as "Radically Expanding the Scenarios in Which Automated Face Recognition Can Establish Identity", above an example scenario showing Osama Bin-Laden.
The original dataset is over 200GB and includes a CSV file "ijbc_subject_names.csv" with SUBJECT_ID and SUBJECT_NAME columns listing all 3,531 identities used in the dataset. Several of the names are listed below, along with a summary of the YouTube videos used for building the dataset.
Name | Profession |
---|---|
Ai Weiwei | Artist, activist |
Evgeny Morozov | Writer, technology critic |
Jeremy Scahill | Journalist, activist |
Jill Magid | Artist |
John Maeda | Designer, technologist |
Molly Crabapple | Artist, activist |
Name | Profession |
---|---|
Neri Oxman | Artist |
Paola Antonelli | Curator |
Raul Krauthausen | Activist |
Slavoj Žižek | Philosopher |
Ta-Nehisi Coates | Author, journalist |
Tracey Emin | Artist |
YouTube Channel | Videos | |
---|---|---|
M5sParlamento | 175 | |
thelegendofNeshka | 141 | |
World Economic Forum | 116 | |
SpicyBollywood | 77 | |
UNCENSORED BOLLYWOOD | 74 | |
TheBollywoodDaily | 72 | |
AudiovisualTelam | 60 | |
Bollywood Kool | 44 | |
Show Jana Krause | 36 | |
TVNBR | 33 | |
Manuchehr lenziran | 29 | |
Bollywood dna | 27 | |
ITU | 26 | |
Center for Strategic & International Studies | 26 | |
Jullanakatrina Vidal | 24 | |
BollywoodUnCut | 20 | |
Devang Bhatt | 20 | |
CCKirchner | 19 | |
IIEA1 | 18 | |
Hussaini Media Production Azadari Network | 17 | |
liberalii1 | 17 | |
Bollywood Mishits | 16 | |
DailyNewsAsiaEurope | 15 | |
Repeat Bollywood | 15 | |
Valeria Amatue21 Lukyanova | 14 | |
Plat GTV Gosip™ | 14 | |
EsquerraUnidaPV | 14 | |
HotNews Romania | 14 | |
Rashtrapati Bhavan New Delhi | 13 | |
Hollywood Daily | 13 | |
Rosa-Luxemburg-Stiftung | 13 | |
Pilipino Music - OPM/MIX W/Lyrics | 13 | |
Lampu Islam | 13 | |
SpotboyE | 12 | |
Kathryn Bernardo Channel | 12 | |
Publinews Guatemala | 12 | |
Senna Channel | 11 | |
BollywoodVintage | 11 | |
Maryam Rajavi | 11 | |
Netmediatama | 11 | |
Lekoi Florentino | 11 | |
RadioAMLO | 11 | |
TheDailyMaverick | 10 | |
telugufullscreen | 10 | |
jacy Farnendas | 10 | |
GS Music Official - Muzica ta e la noi! | 10 | |
TV Gosip Indonesia | 10 | |
DirtyGameMafiaTv | 10 | |
CapitalAccount | 10 | |
Mudasser Ilyas | 10 | |
Priyo TV | 9 | |
Luigino Bracci Roa | 9 | |
ILFOGLIETTONE.IT | 9 | |
Lamula. pe | 9 | |
iDream Telugu Movies | 9 | |
librairie mollat | 9 | |
CultureBuzzIsrael | 9 | |
Shri Rajiv Dixit ji | 9 | |
Partido Nacionalista Peruano | 9 | |
Marcelo Freixo | 9 | |
Om Anand | 9 | |
红高粱 Red Sorghum 2014 | 9 | |
omestre999 | 9 | |
ImaginingtheInternet | 9 | |
Friedrich-Naumann-Stiftung für die Freiheit | 9 | |
RTI CHAINE | 8 | |
Ministère des Affaires étrangères et du Développement international | 8 | |
Jinmu Choy | 8 | |
Proud to be Gujarati Vasant Teraiya | 8 | |
INTERPOL | 8 | |
олег ибриян | 8 | |
Hussaini Media Production | 8 | |
BollywoodSamachar | 8 | |
DaTechGuyBlog | 8 | |
Hattani | 8 | |
Samantha's Secret Classic Collection | 8 | |
Pashto song Awo Ghazal | 8 | |
EC Publishing Media | 8 | |
Coalició Compromís | 8 | |
TV Botafogo | 8 | |
NataliaPiar | 8 | |
SuperVGHD | 8 | |
NZNats | 7 | |
Ministério das Relações Exteriores — Brasil | 7 | |
jornaldajustica | 7 | |
PakTurkey | 7 | |
Palazzo Chigi | 7 | |
TWiT Netcast Network | 7 | |
NASA STI Program | 7 | |
irokotv | NOLLYWOOD | 7 |
News Videos | 7 | |
eSerwis | 7 | |
ALF Redhot Junction | 7 | |
Indian Entertainment And Top News Channel | 7 | |
AGERPRES | 7 | |
Radio Algérienne | 7 | |
cst1791 | 7 | |
Sernord Tommy | 7 | |
YekEsfahaniDarParis 2 | 7 | |
Niv Calderon | 7 |
YouTube Channel | Videos |
---|---|
PujyaBapuJi | 7 |
hyunwooRE | 7 |
Lumbungbudaya rakyat | 7 |
Haciendayaapp | 7 |
frank diesel | 6 |
RT America | 6 |
BCNChile | 6 |
Presidencia Perú | 6 |
EBC na Rede | 6 |
ScarceMedia | 6 |
tollyfreaks | 6 |
The Israel Project | 6 |
Top Telugu TV | 6 |
Bollywood Pulse | 6 |
Luigino Bracci Roa | 6 |
Lakha420 | 6 |
Canale25 | 6 |
Jerry Schmidt | 6 |
Nano GoleSorkh | 6 |
David Silva | 6 |
Campus Party | 6 |
Video Events Production | 6 |
Hayatın Sesi - Video | 6 |
vlogbrothers | 6 |
Marc Chabot YT | 6 |
bulbul ahmed joy | 6 |
The Arsenio Hall Show | 6 |
Department National Defense Philippines | 6 |
Larva 2015 | 6 |
dreamersradioID | 6 |
DJJaviFASHION | 6 |
Yekesfahani Darparis | 6 |
powtvdotnet | 6 |
Podemos Andalucía | 6 |
Tribuna News | 6 |
Willian Silva Oficial | 6 |
Flora Martirosian | 6 |
Liberated Galaxy | 6 |
Alex Jones Best Off | 6 |
YekEsfahaniDarParis 4 | 6 |
The City Club of Cleveland | 5 |
Kosan Rempong | 5 |
HouseResourceOrg | 5 |
newsdiechina | 5 |
SpurredOn | 5 |
Festival of Arts | 5 |
rikedenimes | 5 |
Mars Films | 5 |
ahmet tufan | 5 |
IIED | 5 |
Greens EFA | 5 |
Janam TV | 5 |
Samuel Metias | 5 |
NewsClickin | 5 |
Campur Songo 9 | 5 |
GlobalPunjabTV | 5 |
Dil Raju | 5 |
BreakingWorldsNews | 5 |
Bloggueros - Congreso de los Diputados | 5 |
HeavenlyPeach 천도복숭아 | 5 |
Shirka Kerala | 5 |
Confidencial | 5 |
United States Institute of Peace | 5 |
Deputada Manuela | 5 |
thedavosquestion | 5 |
The Emerald | 5 |
Gotawa Chenel | 5 |
The Berkman Klein Center for Internet & Society | 5 |
SOAS University of London | 5 |
STAR STAR | 5 |
amsh5555 | 5 |
Levántate Bolivia | 5 |
EstateAgencyEvents | 5 |
Idris Brasil | 5 |
Pavel Guzev | 5 |
Vasant HariOm | 5 |
iCanStudioLive | 5 |
Yakın Tarih | 5 |
tiredfornothing | 5 |
The4cylinder | 5 |
Helvy Tiana Rosa | 5 |
Utrikesdepartementet | 5 |
radioamlotv | 5 |
GMC Entertainment | 5 |
rCent - Music | 5 |
Mohammed Assaf | 5 |
ONU Brasil | 4 |
Union pour la Tunisie (Uni*T) | 4 |
VT1035 kiroiva | 4 |
tv Carta | 4 |
ALICE CES | 4 |
wushipindao2 | 4 |
The MacGuffin | 4 |
XPRIZE | 4 |
Yoann Durand | 4 |
RoyalblogNL | 4 |
peace & love | 4 |
VICENTE LIMA FC-GRUPO DEBAIXO DOS CACHOS | 4 |
Michelle Bachelet | 4 |
TheLBJLibrary | 4 |
YouTube Channel | Videos |
---|---|
MeleTOP | 4 |
WketDZ | 4 |
NoRosesForMe | 4 |
Francis Chan Sermons | 4 |
FootageWorld | 4 |
FISUTV | 4 |
BollywoodHitMovies | 4 |
Báo Vẹm | 4 |
TVMOV | 4 |
UhuruKenyattaTV | 4 |
Boxing VHS | 4 |
jumpgta | 4 |
Ann Mary | 4 |
sevak operator | 4 |
Rayyisse | 4 |
Confederation of British Industry | 4 |
Bárbara Mori | 4 |
Antonio Jose | 4 |
International Rice Research Institute | 4 |
Super Sekali | 4 |
Fasol Prod | 4 |
gcshotime2 | 4 |
Atheist Digest | 4 |
tyrlop | 4 |
INKtalks | 4 |
THE REMIX - HÒA ÂM ÁNH SÁNG | 4 |
Kapione | 4 |
albatros | 4 |
Yuu Mii Nguyễn | 4 |
Atracaodivulga | 4 |
thanhson11115 | 4 |
TheWikiLeaksChannel | 4 |
Com Cay | 4 |
هداية لليوتيوب الإسلامي | 4 |
Star Academy Arabia | 4 |
Music Playlist | 4 |
TD Jakes | 4 |
Igor Kichuk | 4 |
Pearl Talent | 4 |
Creative Makers | 4 |
DemocracyNowEs | 4 |
A Graceful Watchman | 4 |
Peter SJ Dramas 2 | 4 |
M5SRoma | 4 |
Retro-Inspiracja | 4 |
Canal 44 | 4 |
MektebiSultani | 4 |
Gheorghe Zamfir | 4 |
PalaciodoPlanalto | 4 |
Kitty Geny | 4 |
wojo4hitz | 4 |
barackobamadotcomII | 4 |
UNHCR-ACNUR Américas | 4 |
RationalFaith609 | 4 |
MasecaTV | 4 |
NOWCastSA | 4 |
Neelie Kroes | 4 |
bentleysw | 4 |
Praise the Lord | 4 |
Shilly Marshall | 4 |
Music Channels Hits | 4 |
Сергей Мавроди | 4 |
YekEsfahaniDarParis 1 | 4 |
David Jackmanson in Moe Gippsland Australia | 3 |
Scoop | 3 |
Guy Ntoto | 3 |
imivids | 3 |
U.S. Embassy Tel Aviv | 3 |
MsMilkytheclown1 | 3 |
UrbanAge | 3 |
Whaleoil | 3 |
mpbebossa | 3 |
ЧГТРК Грозный | 3 |
Rossijskaya Gazeta | 3 |
ALDE ADLE | 3 |
nydivide | 3 |
blackhawk | 3 |
Kâmil VARINCA | 3 |
Principio Esperanza | 3 |
SiliconANGLE | 3 |
NADiRinforma | 3 |
Turkued | 3 |
WeLoveMusicfr | 3 |
La Chica de Morado | 3 |
Nadir Kamal | 3 |
Don Johnson | 3 |
SurvivingScientology | 3 |
Audi Dublin International Film Festival | 3 |
Periodismo Ciudadano | 3 |
Rachel Madddow Show | 3 |
Company Star | 3 |
Sidrah Zaheer | 3 |
New America | 3 |
Bollywood In Bikini | 3 |
Redsilverj | 3 |
C chen | 3 |
1964sjw | 3 |
The British Library | 3 |
GujLitFest | 3 |
Gawaahi | 3 |
Home Office | 3 |
yasmeen rubi | 3 |
To help understand how IJB-C has been used around the world by commercial, military, and academic organizations; existing publicly available research citing IARPA Janus Benchmark C was collected, verified, and geocoded to show how AI training data has proliferated around the world. Click on the markers to reveal research projects at that location.
If you reference or use any data from the Exposing.ai project, cite our original research as follows:
@online{Exposing.ai, author = {Harvey, Adam. LaPlace, Jules.}, title = {Exposing.ai}, year = 2021, url = {https://exposing.ai}, urldate = {2021-01-01} }
If you reference or use any data from IJB-C cite the author's work:
@article{Whitelam2017IARPAJB, author = "Whitelam, Cameron and Taborsky, Emma and Blanton, Austin and Maze, Brianna and Adams, Jocelyn C. and Miller, Tim and Kalka, Nathan D. and Jain, Anil K. and Duncan, James A. and Allen, Kristen E and Cheney, Jordan and Grother, Patrick", title = "IARPA Janus Benchmark-B Face Dataset", journal = "2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)", year = "2017", pages = "592-600" }