Helen

Helen is a dataset of photos used for face recognition. The dataset was published in 2012 and contains 2,330 total images. Exposing.ai located 1,854 original photos from Flickr used to build Helen.

Helen is a dataset of annotated face images used for facial component localization, a process used during face recognition. It includes 2,330 images from Flickr found by searching for "portrait" combined with terms such as "family", "wedding", "boy", "outdoor", and "studio".[^orig_paper]

The dataset was published in 2012 with the primary motivation listed as facilitating "high quality editing of portraits". However, the paper's introduction also mentions that facial feature localization "is an essential component for face recognition, tracking and expression analysis."[^orig_paper]

Regardless of the authors' primary motivations, the HELEN dataset has become one of the most widely used datasets for training facial landmark algorithms, which are essential parts of many facial recognition processing systems. Facial landmarking are used to isolate facial features such as the eyes, nose, jawline, and mouth in order to align faces to a template.

An example annotation from the HELEN dataset showing 194 points that were originally annotated by Mechanical Turk workers. Graphic © 2019 MegaPixels.cc based on data from HELEN dataset by Le, Vuong et al.

This analysis shows that since its initial publication in 2012, the HELEN dataset has been used in over 200 research projects related to facial recognition with the vast majority of research taking place in China.

Commercial use includes IBM, NVIDIA, NEC, Microsoft Research Asia, Google, Megvii, Microsoft, Intel, Daimler, Tencent, Baidu, Adobe, Facebook.

Organization	Paper	Link	Year	Used Duke MTMC
SenseTime, Amazon	Look at Boundary: A Boundary-Aware Face Alignment Algorithm
2018	year	✔
SenseTime	ReenactGAN: Learning to Reenact Faces via Boundary Transfer	2018	year	✔

The dataset was used for training the OpenFace software "we used the HELEN and LFPW training subsets for training and the rest for testing" https://github.com/TadasBaltrusaitis/OpenFace/wiki/Datasets

The popular dlib facial landmark detector was trained using HELEN

In addition to the 200+ verified citations, the HELEN dataset was used for

It's been converted into new datasets including

The original site

http://www.ifp.illinois.edu/~vuongle2/helen/

Information Supply Chain

To help understand how Helen Dataset has been used around the world by commercial, military, and academic organizations; existing publicly available research citing Helen Dataset was collected, verified, and geocoded to show how AI training data has proliferated around the world. Click on the markers to reveal research projects at that location.

Academic
Commercial
Military / Government

Citation data is collected using SemanticScholar.org then dataset usage verified and geolocated. Citations are used to provide an estimated overview of how and where images were used based on institutional affiliations. Thicker lines represent more citations. Please zoom in to see all institutions, as cities may have multiple points very close together.

Helen Attributes
Dataset Name	Helen
Dataset Name Full	Helen
Total Images	2,330
Initial Purpose	Face feature localization for recognition
Year Published	2012
Dataset Website	http://www.ifp.illinois.edu/~vuongle2/helen/

Photos from Flickr.com in Helen
Total Flickr Photos	1,854
Total Flickr Users	912
Active on Flickr.com*	1,854
API Data Accessed	October 2019
Included in YFC100M	845
Photos w/ Geo Data	639
Searchable on Exposing.ai	1,854

Helen Copyright Distribution

Helen Creative Commons license distribution | Download Data (CSV) | Download Chart (SVG)

Helen Creative Commons License Distribution

Helen Creative Commons license distribution | Download Data (CSV) | Download Chart (SVG)

Helen Image Upload Year Distribution

Helen Creative Commons license distribution | Download Data (CSV) | Download Chart (SVG)

Top 10 Helen Image #Tags

Top 10 image #tags used in Helen | Download Data (CSV) | Download Chart (SVG)

Top 10 Geocoded Cities Helen

Top 10 cities for geocoded photos in Helen | Download Data (CSV) | Download Chart (SVG)

Citing This Work

If you reference or use any data from the Exposing.ai project, cite our original research as follows:

@online{Exposing.ai,
  author = {Harvey, Adam},
  title = {Exposing.ai},
  year = 2021,
  url = {https://exposing.ai},
  urldate = {2021-01-01}
}

If you reference or use any data from Helen cite the author's work:

@inproceedings{Le2012InteractiveFF,
    author = "Le, Vuong and Brandt, Jonathan and Lin, Zhe L. and Bourdev, Lubomir D. and Huang, T.",
    title = "Interactive Facial Feature Localization",
    booktitle = "ECCV",
    year = "2012"
}

References

1 Vuong Le, et al. "Interactive Facial Feature Localization". (2012):