Coming soon!
The BioGen task will use as its dataset the latest annual baseline snapshot of PubMed, which goes approximately up through the end of 2023. For more information about this dataset, see the relevant PubMed documentation page.
Additionally, we are providing a pre-processed set of 20,727,695 PMIDs that represent the abstracts contained in the 2023 snapshot; while you are free to use these PMIDs for ad-hoc downloading of small numbers of abstracts using the PubMed API, please do not attempt to download the entire snapshot this way. Instead, use the bulk download linked to above.
The list of PMIDs comes as a newline-delimited textfile (UNIX-style line endings) as well as a JSON file whose root element is a list containing the PMIDs. Both are compressed using gzip.