Exposing.ai
Helen Dataset

Helen

Helen is a dataset of photos used for face recognition. The dataset was published in 2012 and contains 2,330 total images. Exposing.ai located 1,854 original photos from Flickr used to build Helen.

Helen is a dataset of annotated face images used for facial component localization, a process used during face recognition. It includes 2,330 images from Flickr found by searching for "portrait" combined with terms such as "family", "wedding", "boy", "outdoor", and "studio".[^orig_paper]

The dataset was published in 2012 with the primary motivation listed as facilitating "high quality editing of portraits". However, the paper's introduction also mentions that facial feature localization "is an essential component for face recognition, tracking and expression analysis."[^orig_paper]

Regardless of the authors' primary motivations, the HELEN dataset has become one of the most widely used datasets for training facial landmark algorithms, which are essential parts of many facial recognition processing systems. Facial landmarking are used to isolate facial features such as the eyes, nose, jawline, and mouth in order to align faces to a template.

 An example annotation from the HELEN dataset showing 194 points that were originally annotated by Mechanical Turk workers. Graphic © 2019 MegaPixels.cc based on data from HELEN dataset by  Le, Vuong et al.
An example annotation from the HELEN dataset showing 194 points that were originally annotated by Mechanical Turk workers. Graphic © 2019 MegaPixels.cc based on data from HELEN dataset by Le, Vuong et al.

This analysis shows that since its initial publication in 2012, the HELEN dataset has been used in over 200 research projects related to facial recognition with the vast majority of research taking place in China.

Commercial use includes IBM, NVIDIA, NEC, Microsoft Research Asia, Google, Megvii, Microsoft, Intel, Daimler, Tencent, Baidu, Adobe, Facebook.

Organization Paper Link Year Used Duke MTMC
SenseTime, Amazon Look at Boundary: A Boundary-Aware Face Alignment Algorithm
2018 year
SenseTime ReenactGAN: Learning to Reenact Faces via Boundary Transfer 2018 year

The dataset was used for training the OpenFace software "we used the HELEN and LFPW training subsets for training and the rest for testing" https://github.com/TadasBaltrusaitis/OpenFace/wiki/Datasets

The popular dlib facial landmark detector was trained using HELEN

In addition to the 200+ verified citations, the HELEN dataset was used for

It's been converted into new datasets including

The original site

Information Supply Chain

To help understand how Helen Dataset has been used around the world by commercial, military, and academic organizations; existing publicly available research citing Helen Dataset was collected, verified, and geocoded to show how AI training data has proliferated around the world. Click on the markers to reveal research projects at that location.

Citation data is collected using SemanticScholar.org then dataset usage verified and geolocated. Citations are used to provide an estimated overview of how and where images were used based on institutional affiliations. Thicker lines represent more citations. Please zoom in to see all institutions, as cities may have multiple points very close together.

Helen Copyright Distribution

Years
Helen Creative Commons license distribution | Download Data (CSV) | Download Chart (SVG)

Helen Creative Commons License Distribution

Years
Helen Creative Commons license distribution | Download Data (CSV) | Download Chart (SVG)

Helen Image Upload Year Distribution

Years
Helen Creative Commons license distribution | Download Data (CSV) | Download Chart (SVG)

Top 10 Helen Image #Tags

Years
Top 10 image #tags used in Helen | Download Data (CSV) | Download Chart (SVG)

Top 10 Geocoded Cities Helen

Years
Top 10 cities for geocoded photos in Helen | Download Data (CSV) | Download Chart (SVG)

Citing This Work

If you reference or use any data from the Exposing.ai project, cite our original research as follows:

@online{Exposing.ai,
  author = {Harvey, Adam. LaPlace, Jules.},
  title = {Exposing.ai},
  year = 2021,
  url = {https://exposing.ai},
  urldate = {2021-01-01}
}

If you reference or use any data from Helen cite the author's work:

@inproceedings{Le2012InteractiveFF,
    author = "Le, Vuong and Brandt, Jonathan and Lin, Zhe L. and Bourdev, Lubomir D. and Huang, T.",
    title = "Interactive Facial Feature Localization",
    booktitle = "ECCV",
    year = "2012"
}

References

  • 1 Vuong Le, et al. "Interactive Facial Feature Localization". (2012):