The topics will be available from the TREC Active Participants Site as a newline-delimited set of JSON objects with the following entries:
topic_id
: string, corresponding to the topic
identifier.title
: two-three terms that summarize the information
needs of a patient or clinician.question
: an actual question posted by patients to
MedlinePlus logs or derived by clinicians, as described
here.narrative
: a longer description of the information need
that instigated the question.Each submission will take the form of a JSON document, whose root object has the following entries:
team_id
: string, matching the team’s identifier
assigned at registration for the trackrun_name
: string, arbitrarily specified by submitting
teamcontact_email
: string, purpose self-explanatoryresults
: list of output
objects (see
below)Output
objects are to be JSON objects with the following
entries:
topic_id
: string, corresponding to the topic
identifiers in the topics fileanswer
: string, containing the text of the system’s
output. This text will be processed according to the method described
below.references
: list of strings, each string being assumed
to be a PMID corresponding to a PMID cited in the associated
answer
entry.Submissions consisting of malformed or invalid content will be discarded.
Text in the answer
entry will be sentence-tokenized using the SpaCy sentence
tokenizer. Any text within square brackets (“[
”,
“]
”) will be treated as a citation list
; the
structure of a citation list
will be assumed to be a
comma-delimited list of PMID citations. As described in the task description, only the first three entries in
each citation list
will be considered; any remaining
entries will be discarded.
Each citation PMID is to correspond to one of the entries in the
references
list. Further, each citation will be assumed to
be “assigned” to its enclosing sentence. Square-bracket runs that take
place after sentence-final punctuation (i.e., outside of a sentence),
will be discarded. Square-bracket runs that occur at the beginning of
the document (i.e., before any sentences) will be discarded.
Citation numbers in the answer text are subject to the following constraints:
references
list; e.g., if the text contains 12 PMIDs but
there are only 11 entries in the references list, the PMID that does not
appear in the references list would represent an erroneous
citation.PMIDs included in the references
lists are subject to
the following constraints:
references
list must be referred
to at least once by a citation in the answer
textPlease ask on the mailing list if there are any questions or issues with this specification.