Participating groups will be provided 40 topics (biomedical questions) and a link to a stable version of PubMed documents. The track task will be to generate runs that contain LLM output that also has up to three attributions (cited references) for each sentence (assertion) that is made, but no more than 50 documents per answer. Each document should be referenced in the runs as PMIDs in square brackets as shown below.
Here is a human-readable display of an example topic and output; note that the actual format for topics and submissions will be in a specified machine-readable format.
Topic: iron and ferritin levels in COVID-19
Question: why is transferrin and iron low in covid patients but ferritin high?
Narrative: The patient is interested in the link between iron and infection, the role iron plays in infection and the implications for COVID-19 course.
Sample Answer [to be returned by the system]: During infections, a battle for iron takes place between the human body and the invading viruses [34389110]. The immune system cells need iron to defend the body against the infection [34389110]. The virus needs iron to reproduce [35240553]. If iron balance is disrupted by the infection, ferritin levels are high [34883281], which signals the disease is severe and may have unfavorable outcomes [34048587, 32681497]. Ferritin is maintaining the body’s iron level [35008695]. Some researchers believe that high levels of ferritin not only show the body struggles with infection, but that it might add to the severity of disease [34924800]. To help covid patients, the doctors may lower the ferritin levels that are too high using drugs that capture iron [32681497].
References [to be returned by the system]: 32681497, 33380357, 34048587, 34389110, 34883281, 34924800, 34960751, 35008695, 35136706, 35240553
The set of PMIDs that will be used for relevance assessment and scoring runs will be those in the fixed PubMed extract we plan to provide. However, participating groups can use the live PubMed interface for their runs if they choose. Documents cited in the LLM output not in the PubMed extract will be ignored for assessment and scoring purposes.
The task of the assessors will be to first identify the assertions and the attributions and then judge whether the attributions support the assertion, i.e., is relevant or not.