Predicting genome sizes and restriction enzyme recognition-sequence probabilities across the eukaryotic tree of life

Cargando...
Miniatura

Compartir

Buscar en:

Google Scholar logo

URL Fuente

Fecha

Autores

Herrera, Santiago
Reyes Herrera, Paula H.
Shank, Timothy M.

Autor corporativo

Colaborador

Título de la revista

ISSN de la revista

Título del volumen

Documentos PDF

Resumen

High-throughput sequencing of reduced representation libraries obtained through digestion with restriction enzymes – generically known as restriction-site associated DNA sequencing (RAD-seq) – is a common strategy to generate genome-wide genotypic and sequence data from eukaryotes. A critical design element of any RAD-seq study is a knowledge of the approximate number of genetic markers that can be obtained for a taxon using different restriction enzymes, as this number determines the scope of a project, and ultimately defines its success. This number can only be directly determined if a reference genome sequence is available, or it can be estimated if the genome size and restriction recognition sequence probabilities are known. However, both scenarios are uncommon for non-model species. Here, we performed systematic in silico surveys of recognition sequences, for diverse and commonly used type II restriction enzymes across the eukaryotic tree of life. Our observations reveal that recognition-sequence frequencies for a given restriction enzyme are strikingly variable among broad eukaryotic taxonomic groups, being largely determined by phylogenetic relatedness. We demonstrate that genome sizes can be predicted from cleavage frequency data obtained with restriction enzymes targeting ‘neutral’ elements. Models based on genomic compositions are also effective tools to accurately calculate probabilities of recognition sequences across taxa, and can be applied to species for which reduced-representation data is available (including transcriptomes and ‘neutral’ RAD-seq datasets). The analytical pipeline developed in this study, PredRAD (https://github.com/phrh/PredRAD), and the resulting databases constitute valuable resources that will help guide the design of any study using RAD-seq or related methods.

Descripción

Enlace a YouTube

Palabras clave

Citación

Capítulos relacionados

Fuente principal

Parte del recurso

BioRxiv; (2015): BioRxiv (July);p. 1 - 35.

Aprobación

Revisión

Complementado por

Referenciado por

Licencia Creative Commons

Excepto donde se indique lo contrario, la licencia de este ítem se describe como Attribution-NonCommercial-ShareAlike 4.0 International