PuppyIR Web Corpus

This dataset consists of 1565 websites that were crawled as part of the PuppyIR project in the course of 2010. Each page is manually judged in terms of:

  • Its suitability for children (ages 3-12)
  • Its suitability for young children (ages 3-6)
  • The topic’s general relevance for children
  • Whether the website was specifically designed for children
  • The page’s general quality

The scores are averages across at least 5 individual human assessors per page. The dataset is available for download at: http://www.carsten-eickhoff.com/files/corpora/puppyir-web.tar.gz

If you would like to refer to the dataset, it was originally described and used in: