In April 2016 Manchester eScholar was replaced by the University of Manchester’s new Research Information Management System, Pure. In the autumn the University’s research outputs will be available to search and browse via a new Research Portal. Until then the University’s full publication record can be accessed via a temporary portal and the old eScholar content is available to search and browse via this archive.

Ambiguity and Variability of Database and Software Names in Bioinformatics

Geraint Duck, Robert Stevens, David Robertson and Goran Nenadic

In: Ananiadou, Sophia; Pyysalo, Sampo; Rebholz-Schuhmann, Dietrich; Rinaldi, Fabio; Salakoski, Tapio. Proceedings of the 5th International Symposium on Semantic Mining in Biomedicine (SMBM): 5th International Symposium on Semantic Mining in Biomedicine (SMBM); Zurich. http://www.zora.uzh.ch/64476/; 2012. p. 2-9.

Access to files

Abstract

There are now numerous options available to achieve various tasks in bioinformatics but, as yet, little progress has been made to capture the common practice by analysing usage and mentions of databases and tools within the literature. In this paper we analyse the variability and ambiguity of database and software name mentions and provide a set of 30 full-text documents manually annotated on the mention level. Our analyses show that identification of mentions of databases and tools is not a task that can be achieved through dictionary matching alone: our baseline dictionary look-up achieved a F-score of just over 50%. This is primarily because of high variability and ambiguity in database and software mentions contained within the literature and due to the extensive number of resources available. We characterise the issues with various mention types and propose potential ways of capturing additional database and software mentions in the literature.

Bibliographic metadata

Type of resource:
Content type:
Type of conference contribution:
Publication date:
Conference title:
5th International Symposium on Semantic Mining in Biomedicine (SMBM)
Conference venue:
Zurich
Place of publication:
http://www.zora.uzh.ch/64476/
Proceedings start page:
2
Proceedings end page:
9
Proceedings pagination:
2-9
Contribution total pages:
8
Abstract:
There are now numerous options available to achieve various tasks in bioinformatics but, as yet, little progress has been made to capture the common practice by analysing usage and mentions of databases and tools within the literature. In this paper we analyse the variability and ambiguity of database and software name mentions and provide a set of 30 full-text documents manually annotated on the mention level. Our analyses show that identification of mentions of databases and tools is not a task that can be achieved through dictionary matching alone: our baseline dictionary look-up achieved a F-score of just over 50%. This is primarily because of high variability and ambiguity in database and software mentions contained within the literature and due to the extensive number of resources available. We characterise the issues with various mention types and propose potential ways of capturing additional database and software mentions in the literature.

Institutional metadata

University researcher(s):

Record metadata

Manchester eScholar ID:
uk-ac-man-scw:175435
Created by:
Duck, Geraint
Created:
8th October, 2012, 17:16:36
Last modified by:
Duck, Geraint
Last modified:
4th March, 2015, 20:47:52

Can we help?

The library chat service will be available from 11am-3pm Monday to Friday (excluding Bank Holidays). You can also email your enquiry to us.