Helen is a dataset of photos used for face recognition. The dataset was published in 2012 and contains 2,330 total images. Exposing.ai located 1,854 original photos from Flickr used to build Helen.
Helen is a dataset of annotated face images used for facial component localization, a process used during face recognition. It includes 2,330 images from Flickr found by searching for "portrait" combined with terms such as "family", "wedding", "boy", "outdoor", and "studio".[^orig_paper]
The dataset was published in 2012 with the primary motivation listed as facilitating "high quality editing of portraits". However, the paper's introduction also mentions that facial feature localization "is an essential component for face recognition, tracking and expression analysis."[^orig_paper]
Regardless of the authors' primary motivations, the HELEN dataset has become one of the most widely used datasets for training facial landmark algorithms, which are essential parts of many facial recognition processing systems. Facial landmarking are used to isolate facial features such as the eyes, nose, jawline, and mouth in order to align faces to a template.
This analysis shows that since its initial publication in 2012, the HELEN dataset has been used in over 200 research projects related to facial recognition with the vast majority of research taking place in China.
Commercial use includes IBM, NVIDIA, NEC, Microsoft Research Asia, Google, Megvii, Microsoft, Intel, Daimler, Tencent, Baidu, Adobe, Facebook.
Organization | Paper | Link | Year | Used Duke MTMC |
---|---|---|---|---|
SenseTime, Amazon | Look at Boundary: A Boundary-Aware Face Alignment Algorithm | |||
2018 | year | ✔ | ||
SenseTime | ReenactGAN: Learning to Reenact Faces via Boundary Transfer | 2018 | year | ✔ |
The dataset was used for training the OpenFace software "we used the HELEN and LFPW training subsets for training and the rest for testing" https://github.com/TadasBaltrusaitis/OpenFace/wiki/Datasets
The popular dlib facial landmark detector was trained using HELEN
In addition to the 200+ verified citations, the HELEN dataset was used for
It's been converted into new datasets including
The original site
To help understand how Helen Dataset has been used around the world by commercial, military, and academic organizations; existing publicly available research citing Helen Dataset was collected, verified, and geocoded to show how AI training data has proliferated around the world. Click on the markers to reveal research projects at that location.
If you reference or use any data from the Exposing.ai project, cite our original research as follows:
@online{Exposing.ai, author = {Harvey, Adam. LaPlace, Jules.}, title = {Exposing.ai}, year = 2021, url = {https://exposing.ai}, urldate = {2021-01-01} }
If you reference or use any data from Helen cite the author's work:
@inproceedings{Le2012InteractiveFF, author = "Le, Vuong and Brandt, Jonathan and Lin, Zhe L. and Bourdev, Lubomir D. and Huang, T.", title = "Interactive Facial Feature Localization", booktitle = "ECCV", year = "2012" }