In April 2016 Manchester eScholar was replaced by the University of Manchester’s new Research Information Management System, Pure. In the autumn the University’s research outputs will be available to search and browse via a new Research Portal. Until then the University’s full publication record can be accessed via a temporary portal and the old eScholar content is available to search and browse via this archive.

An Approach to Answer: "How Tree-Like is a Network"

Geraint Duck

[Dissertation].UK: The University of Manchester;2010.

Access to files

Abstract

This report aims to answer the question of “how networky is my data?”. This question can have many repercussions on all types of phylogenetic data analysis helping to highlight biologically significant events. These can include hybridization, reticulation and gene flow. These events are all biologically interesting, but can all cause the underlying analysis assumption of a tree to no longer hold. The result of this project is a tested and verified methodology on which to test for data “networkyness” through a parametric bootstrap approach providing a statistical measure to answer that question. This entirely novel methodology is packaged up within a Java application for easy, everyday use. This application has been thoroughly tested with each component of the methodology validated in turn.

Bibliographic metadata

Type of resource:
Content type:
Author(s) list:
Degree type:
MSc
Publication date:
Place of publication:
UK
Total pages:
84
Table of contents:
List of Figures 4 List of Tables 5 List of Abbreviations 7 1 Abstract 8 2 Preamble 9 2.1 Declaration ............................ 9 2.2 Copyright Statement ....................... 10 2.3 Acknowledgements ........................ 11 2.4 The Author ............................ 12 3 Introduction 13 3.1 Classical Phylogenetics...................... 14 3.1.1 Phylogenetic Trees .................... 14 3.1.2 Sequence Alignments................... 14 3.1.3 Biological Insight ..................... 15 3.1.4 Substitution Models ................... 16 3.1.5 Statistical Inference in Phylogenetics . . . . . . . . . . 21 3.1.6 Hypothesis Testing .................... 22 3.2 Alternative Phylogenetics .................... 24 3.2.1 Terminology........................ 25 3.2.2 Phylogenetic Trees: The Return............. 25 3.2.3 Phylogenetic Networks .................. 26 3.2.4 Networks verses Trees .................. 29 3.3 Related Work ........................... 30 3.4 Aims and Objectives ....................... 31 3.4.1 Initial Perl Pipeline.................... 31 3.4.2 Primary Java Application ................ 32 3.4.3 Miscellaneous ....................... 32 4 Materials and Methods 34 4.1 Methodology and Implementation................ 35 4.1.1 Sequence Pair-Wise Distances .............. 35 4.1.2 Tree Based Distance Matrix ............... 37 4.1.3 Least Squares Distances ................. 39 4.1.4 The Bootstrap ...................... 40 4.1.5 Statistical Confidence – P-Values . . . . . . . . . . . . 41 4.2 Datasets Used........................... 41 4.2.1 Mitochondrial DNA of Primates . . . . . . . . . . . . . 41 4.2.2 Mitochondrial DNA of Mosquitoes . . . . . . . . . . . 42 4.3 Software Used........................... 42 4.3.1 BEAST .......................... 42 4.3.2 Other ........................... 43 5 Results 46 5.1 Methodology Validation ..................... 47 5.1.1 Initial Set-up ....................... 47 5.1.2 Optimisation ....................... 48 5.1.3 Statistical ......................... 55 5.2 Optimiser Consistency Check .................. 58 5.3 Additional Testing Notes..................... 58 5.4 Example Real Data Analysis................... 61 5.5 Benchmark Tests ......................... 62 6 Interpretation and Discussion 66 6.1 Analysis of the Methodology................... 67 6.2 Application Completion ..................... 67 6.3 Example Dataset Discussion................... 68 6.4 Scope and Limitations ...................... 70 7 Future Work and Extensions 71 8 Conclusion 73 Bibliography 74 Appendix 81 A Parameter Convergence to Sequence Length - Data 81 A.1 GTR Parameters ......................... 82 A.2 Branch Lengths.......................... 83
Abstract:
This report aims to answer the question of “how networky is my data?”. This question can have many repercussions on all types of phylogenetic data analysis helping to highlight biologically significant events. These can include hybridization, reticulation and gene flow. These events are all biologically interesting, but can all cause the underlying analysis assumption of a tree to no longer hold. The result of this project is a tested and verified methodology on which to test for data “networkyness” through a parametric bootstrap approach providing a statistical measure to answer that question. This entirely novel methodology is packaged up within a Java application for easy, everyday use. This application has been thoroughly tested with each component of the methodology validated in turn.
Dissertation supervisor(s):
Language:
eng
Funding notes:

Institutional metadata

University researcher(s):

Record metadata

Manchester eScholar ID:
uk-ac-man-scw:161612
Created by:
Duck, Geraint
Created:
25th May, 2012, 13:38:11
Last modified by:
Duck, Geraint
Last modified:
4th March, 2015, 20:47:43

Can we help?

The library chat service will be available from 11am-3pm Monday to Friday (excluding Bank Holidays). You can also email your enquiry to us.