List of complementary files for "An Evaluation of Machine-learning for Predictive Genome Wide Association Studies" 1. yeast_pheno.RData: contains modified phenotypic data, fold indices and train/test indices for main analysis 2. geno.rds, geno_fuse.rds, geno_genes.rds: unaltered genoset used in the study and the two alternative genosets 3. alternative_genosets_creation.r: script for creating geno_genes.rds and geno_fuse.rds 4. data_preparation.r: code to produce yeast_pheno.RData and geno.rds 5. functions.r: colleciton of various functions used in the analysis 6. xpack.r: collection of CV-procedures for several R packages (glmnet, randomForest, GBM) used in the study 7. added_noise_experiment.r: code for experiment in which phenotypic noise is introduced to the data 8. reduced_dataset_experiment.r: code for experiment investigating importance of the number of sample points 10. reduced_markerset_experiment.r: code for experiment investigating importance of the number of attributes 11. correction_code.r: code for correcing the cross-validation procedure of Bloom et al 12. analysis_main.r: code for main analysis in the paper (results in table 1) 13. analysis_fused_genoset.r, analysis_genes_genoset.r: code for analysis on the two alternative genosets 14. cross.RData, pheno_raw.RData: original phenotypic and genotypic data of Bloom et al NOTE: after downloading a file through octet-stream, rename it to give it its original name and extension (e.g. 'xpack.r'), so that it is usable in R and compatible with code contained in other files. R files with extension .R (or .r) can be read into R via 'source' (i.e. source('xpack.r')) or inspected in any text editor; .RData files usually contain several R objects and can be loaded via e.g. load('yeast_pheno.RData'); .rds files contain a single R object each, which can be loaded via readRDS but have to be assigned a new R object straight away via '<-' (e.g. geno <- readRDS('geno.rds')). xpack.r and function.r are collections of functions and are meant for sourcing (unless you are intersted in inspecting them). The rest of the .r files are scripts for analysis presented in the paper. If you have any problems or questions regarding use of these files, do not hesitate to contact me on: nastasiya.grinberg@manchester.ac.uk