Related resources
Search for item elsewhere
University researcher(s)
Academic department(s)
Ambiguity and Variability of Database and Software Names in Bioinformatics
Geraint Duck, Robert Stevens, David Robertson and Goran Nenadic
In: Ananiadou, Sophia; Pyysalo, Sampo; Rebholz-Schuhmann, Dietrich; Rinaldi, Fabio; Salakoski, Tapio. Proceedings of the 5th International Symposium on Semantic Mining in Biomedicine (SMBM): 5th International Symposium on Semantic Mining in Biomedicine (SMBM); Zurich. http://www.zora.uzh.ch/64476/; 2012. p. 2-9.
Access to files
- FULL-TEXT.PDF (pdf)
Abstract
There are now numerous options available to achieve various tasks in bioinformatics but, as yet, little progress has been made to capture the common practice by analysing usage and mentions of databases and tools within the literature. In this paper we analyse the variability and ambiguity of database and software name mentions and provide a set of 30 full-text documents manually annotated on the mention level. Our analyses show that identification of mentions of databases and tools is not a task that can be achieved through dictionary matching alone: our baseline dictionary look-up achieved a F-score of just over 50%. This is primarily because of high variability and ambiguity in database and software mentions contained within the literature and due to the extensive number of resources available. We characterise the issues with various mention types and propose potential ways of capturing additional database and software mentions in the literature.