In April 2016 Manchester eScholar was replaced by the University of Manchester’s new Research Information Management System, Pure. In the autumn the University’s research outputs will be available to search and browse via a new Research Portal. Until then the University’s full publication record can be accessed via a temporary portal and the old eScholar content is available to search and browse via this archive.

Related resources

University researcher(s)

    Academic department(s)

    Bioinformatic Approaches to Detect Transposable Element Insertions in High Throughput Sequence Data from Saccharomyces and Drosophila

    Nelson, Michael

    [Thesis]. Manchester, UK: The University of Manchester; 2016.

    Access to files

    Abstract

    Transposable elements (TEs) are mutagenic mobile DNA sequences whose excision and insertion are powerful drivers of evolution. Some TE families are known to target specific genome features, and studying their insertion preferences can provide information about both TE biology and the state of the genome at these locations. To investigate this, collecting large numbers of insertion sites for TEs in natural populations is required. Genome resequencing data can potentially provide a rich source of such insertion sites. The field of detecting these “non-reference” TE insertions is an active area of research, with many methods being released and no comprehensive review performed. To drive forward knowledge of TE biology and the field of non-reference TE detection, we created McClintock, an integrated pipeline of six TE detection methods. McClintock lowers the barriers against use of these methods by automating the creation of the diverse range of input files required whilst also setting up all methods to run simultaneously and standardising the output. To test McClintock and its component methods, it was run on both simulated and real Saccharomyces cerevisiae data. Tests on simulated data reveal the general properties of component methods’ predictions as well as the limitations of simulated data for testing software systems. Overlap between results from the McClintock component methods show many insertions detected by only one method, highlighting the need to run multiple TE detection methods to fully understand a resequenced sample. Utilising the well characterised properties of S. cerevisiae TE insertion preferences, real yeast population resequencing data can act as a biological validation for the predictions of McClintock. All component methods recreated previously known biological properties of S. cerevisiae TE insertions in natural population data. To demonstrate the versatility of McClintock, we applied the system to Drosophila melanogaster resequencing data. 27 Schneider’s cell lines were sequenced and analysed with McClintock. In addition to demonstrating the scalability of McClintock to larger genomes with more TE families, this exposed ongoing transposition in S2 cell lines. Likewise, the use of non-reference TE insertions as variable sites allowed us to recreate the relationships between S2 sub-lines, confirming that S1, S2, and S3 were most likely established separately. The results also suggest that there are several S2 sub-lines in use and that these sub-lines can differ from each other in TE content by hundreds of non-reference TE copies. Overall this thesis demonstrates that the McClintock pipeline can highlight problems in TE detection from genome data as well as revealing that much can still be learned from this data source.

    Bibliographic metadata

    Type of resource:
    Content type:
    Form of thesis:
    Type of submission:
    Degree programme:
    PhD Wellcome Trust - Molecular and Cell Biology
    Publication date:
    Location:
    Manchester, UK
    Total pages:
    169
    Abstract:
    Transposable elements (TEs) are mutagenic mobile DNA sequences whose excision and insertion are powerful drivers of evolution. Some TE families are known to target specific genome features, and studying their insertion preferences can provide information about both TE biology and the state of the genome at these locations. To investigate this, collecting large numbers of insertion sites for TEs in natural populations is required. Genome resequencing data can potentially provide a rich source of such insertion sites. The field of detecting these “non-reference” TE insertions is an active area of research, with many methods being released and no comprehensive review performed. To drive forward knowledge of TE biology and the field of non-reference TE detection, we created McClintock, an integrated pipeline of six TE detection methods. McClintock lowers the barriers against use of these methods by automating the creation of the diverse range of input files required whilst also setting up all methods to run simultaneously and standardising the output. To test McClintock and its component methods, it was run on both simulated and real Saccharomyces cerevisiae data. Tests on simulated data reveal the general properties of component methods’ predictions as well as the limitations of simulated data for testing software systems. Overlap between results from the McClintock component methods show many insertions detected by only one method, highlighting the need to run multiple TE detection methods to fully understand a resequenced sample. Utilising the well characterised properties of S. cerevisiae TE insertion preferences, real yeast population resequencing data can act as a biological validation for the predictions of McClintock. All component methods recreated previously known biological properties of S. cerevisiae TE insertions in natural population data. To demonstrate the versatility of McClintock, we applied the system to Drosophila melanogaster resequencing data. 27 Schneider’s cell lines were sequenced and analysed with McClintock. In addition to demonstrating the scalability of McClintock to larger genomes with more TE families, this exposed ongoing transposition in S2 cell lines. Likewise, the use of non-reference TE insertions as variable sites allowed us to recreate the relationships between S2 sub-lines, confirming that S1, S2, and S3 were most likely established separately. The results also suggest that there are several S2 sub-lines in use and that these sub-lines can differ from each other in TE content by hundreds of non-reference TE copies. Overall this thesis demonstrates that the McClintock pipeline can highlight problems in TE detection from genome data as well as revealing that much can still be learned from this data source.
    Thesis main supervisor(s):
    Thesis co-supervisor(s):
    Funder(s):
    Language:
    en

    Institutional metadata

    University researcher(s):
    Academic department(s):

    Record metadata

    Manchester eScholar ID:
    uk-ac-man-scw:295594
    Created by:
    Nelson, Michael
    Created:
    20th January, 2016, 23:49:46
    Last modified by:
    Nelson, Michael
    Last modified:
    2nd February, 2018, 13:52:39

    Can we help?

    The library chat service will be available from 11am-3pm Monday to Friday (excluding Bank Holidays). You can also email your enquiry to us.