Exposing.ai
PIPA Dataset
Example face images from the PIPA face recognition dataset. Faces are blurred to protect privacy. Visualization by Adam Harvey / Exposing.ai licensed under CC-BY-NC with original images licensed and attributed under Creative Commons CC-BY (attribution required, no commercial use).
Example face images from the PIPA face recognition dataset. Faces are blurred to protect privacy. Visualization by Adam Harvey / Exposing.ai licensed under CC-BY-NC with original images licensed and attributed under Creative Commons CC-BY (attribution required, no commercial use).

People in Photo Albums

People in Photo Albums (PIPA) is a dataset of photos used for face recognition. The dataset was published in 2015 and contains 60,000 face images of about 2,000 individuals, of which 32,518 photos were taken from Flick.com.

According to the dataset authors, PIPA was designed to help recognize peoples' identities in photo albums in an unconstrained setting. But face recognition has applications far beyond personal photo album processing. And sharing a dataset of face images for building face analysis tools contributes to unexpected applications. For example, in 2018 researchers from a military research university in China used the PIPA dataset for their research on "Understanding Humans in Crowded Scenes". The dataset was also used by researchers affiliated with the surveillance company SenseTime and the American surveillance company Facebook.

The personal nature of the dataset, that it includes primarily images of people's semi-public photos shared online, means that it contains many images of children, family dinners, weddings, and other photos are personal in nature. As of January 2020, Berkeley is not longer distributing the dataset though Max Planck Institut in Germany still provides it for free and unrestricted download at https://www.mpi-inf.mpg.de/departments/computer-vision-and-machine-learning/research/people-detection-pose-estimation-and-tracking/person-recognition-in-personal-photo-collections.

The charts below show an analysis of the most frequent image tags that were used for the Flickr images in the PIPA dataset. Thousands of images include tags for #DoD (Department of Defense), #Military, and #ArmedForces.

Information Supply Chain

To help understand how PIPA Dataset has been used around the world by commercial, military, and academic organizations; existing publicly available research citing People in Photo Albums Dataset was collected, verified, and geocoded to show how AI training data has proliferated around the world. Click on the markers to reveal research projects at that location.

Citation data is collected using SemanticScholar.org then dataset usage verified and geolocated. Citations are used to provide an estimated overview of how and where images were used based on institutional affiliations. Thicker lines represent more citations. Please zoom in to see all institutions, as cities may have multiple points very close together.

PIPA Copyright Distribution

Years
PIPA Creative Commons license distribution | Download Data (CSV) | Download Chart (SVG)

PIPA Creative Commons License Distribution

Years
PIPA Creative Commons license distribution | Download Data (CSV) | Download Chart (SVG)

PIPA Image Upload Year Distribution

Years
PIPA Creative Commons license distribution | Download Data (CSV) | Download Chart (SVG)

Top 10 PIPA Image #Tags

Years
Top 10 image #tags used in PIPA | Download Data (CSV) | Download Chart (SVG)

Top 10 Geocoded Cities PIPA

Years
Top 10 cities for geocoded photos in PIPA | Download Data (CSV) | Download Chart (SVG)

Citing This Work

If you reference or use any data from the Exposing.ai project, cite our original research as follows:

@online{Exposing.ai,
  author = {Harvey, Adam. LaPlace, Jules.},
  title = {Exposing.ai},
  year = 2021,
  url = {https://exposing.ai},
  urldate = {2021-01-01}
}

If you reference or use any data from PIPA cite the author's work:

@article{Zhang2015BeyondFF,
    author = "Zhang, Ning and Paluri, Manohar and Taigman, Yaniv and Fergus, Rob and Bourdev, Lubomir D.",
    title = "Beyond frontal faces: Improving Person Recognition using multiple cues",
    journal = "2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)",
    year = "2015",
    pages = "4804-4813"
}

References

  • 1 Ning Zhang, et al. "Beyond frontal faces: Improving Person Recognition using multiple cues". 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). (2015): 4804-4813.