Pensoft #

This is a canonical data source description,

The current dataSourceVersion described by this documentation is 1. The dataSource name for this data is Pensoft.

Coverage: see list of journals below; articles date between 2010 and 2020
Size: 6,610 articles
Copyright: varies per article, but all articles are Open Access and most are licensed CC-BY
Credits: Stijn Conix, C.H. Pence

How we got it #

This data was provided to us directly by Pensoft via a bulk XML download. We cannot extend our coverage without obtaining further articles from them.

Journals included #

The following journals are available in this dataset (with number of articles listed for each):

  • Journal of Hymenoptera Research [382]
  • MycoKeys [315]
  • PhytoKeys [820]
  • ZooKeys [4940]
  • Zoosystematics and Evolution [153]

Processing #

  • Metadata and Plain Text: Extracted directly from high quality NLM-XML format source files.
  • PMIDs, PMCIDs, and PubMed Manuscript IDs: PubMed scraping
  • Keywords and Tags: Many papers have author-provided keywords, which are collected in the keywords tag. The tags field contains:
    • paper types
    • paper categories
    • journal-generated tags (often taxonomic groups or species)
    • all taxon or species names present in any taxonomic treatments encoded in the article

To elaborate on this last point, Pensoft XML journal articles include XML-encoded taxonomic treatment information. We do not store the bulk of this data, as it is available from other databases containing taxonomic treatments. We do, however, capture all of the group and species names included in these taxonomic treatments, and save them in the tags field.

Changelog #

  • Data Source Version 1 (2022-05-13): First import of Pensoft data.