Useful Resources

DATA

All versions of the datasets for all BioASQ tasks are available in the BioASQ Datasets page of the BioASQ Participants Area.

BioASQ task a: Large-scale online biomedical semantic indexing

Existing PubMed documents can be used as training data for this task. Pre-processed PubMed documents are provided, but participants are encouraged to do their own pre-processing.

Evaluation will take place on new documents that are added to PubMed and have not been annotated by curators at the time of submission.

BioASQ task b: Biomedical Semantic QA (involves IR, QA, summarization)

The test dataset fro this task will be released in five batches, each containing approximately 100 questions. Separate winners will be announced for each batch. Participation in the task can be partial; for example, it is acceptable to participate in only some of the batches, to return only relevant articles (and no concepts, triples, article snippets), or to return only exact answers (or only `ideal' answers). System responses will be evaluated both automatically and manually.

BioASQ Task MESINESP: Medical Semantic indexing in Spanish

Existing IBECS and LILACS documents can be used as training data for this task. Pre-processed IBECS and LILACS documents are provided, but participants are encouraged to do their own pre-processing.

Evaluation will take place on new documents that are added to IBECS and LILACS and have not been annotated by curators at the time of submission.

Sample Data for all three tasks can be downloaded from the BioASQ Participants Area (no registration required).

 

TOOLS

 

HEMKit software (zip), a collection of hierarchical evaluation measures.
BioASQ Releases Continuous Space Word Vectors Obtained by Applying Word2Vec to PubMed Abstracts.
BioASQ Annotation and assessment tools

Tutorial

BioASQ social network

Tutorial