Skip to main content


Internode Certainty and Related Measures

Internode Certainty (IC) and related measures use information theory to quantify the degree of conflict or incongruence among all nontrivial bipartitions present in a set of trees. IC-based measures can be calculated from different types of data that contain nontrivial bipartitions, including from bootstrap replicate trees to gene trees or individual characters. Given a set of phylogenetic trees, the IC of a given internode reflects its specific degree of incongruence. IC-based measures can be applied to both full and partial taxon data sets. IC-based measures are implemented and freely available in the programs QuartetScores and RAxML.

Publications: Zhou et al. (2019) Syst. Biol.; Kobert et al. (2016) Mol. Biol. Evol.; Salichos et al. (2014) Mol. Biol. Evol.; Salichos & Rokas (2013) Nature


Phylogenies are rapidly becoming larger due to the rising number of available genomes. Beyond broad patterns, large phylogenies often become uninterpretable as tip labels are so small or excluded altogether. As a result, large phylogenies can be difficult to use for other researchers especially for groups interested in the relationships among a subset of taxa. To address this issue, we have developed treehouse, a user friendly shiny app that allows researchers to obtain subtrees from large-scale phylogenies. Treehouse is populated with a handful of large-scale phylogenies available through treehouseDB. Additionally, treehouse features an additional function, userTree, which allows a user to upload and parse their own phylogeny. The app can be downloaded from github.

Publications: Steenwyk & Rokas (2019) BMC Res. Notes

CTDGFinder: Clusters of Tandemly Duplicated Genes Finder

Closely spaced clusters of tandemly duplicated genes (CTDGs, pronounced “CaTDoGs”) contribute to the diversity of many phenotypes, including chemosensation, snake venom, and animal body plans. CTDGs have traditionally been identified subjectively as genomic neighborhoods containing several gene duplicates in close proximity; however, CTDGs are often highly variable with respect to gene number, intergenic distance, and synteny. This lack of formal definition hampers the study of CTDG evolutionary dynamics and the discovery of novel CTDGs in the exponentially growing body of genomic data. CTDGFinder is a homology-based algorithm that formalizes and automates the identification of CTDGs by examining the physical distribution of individual members of families of duplicated genes across chromosomes.

Publication: Ortiz & Rokas (2017) MBE

iWGS: in silico Whole Genome Sequencer and Analyzer

iWGS is an automated pipeline for guiding the choice of appropriate sequencing strategy and assembly protocols. iWGS seamlessly integrates the four key steps of a de novo genome sequencing project: data generation (through simulation), data quality control, de novo assembly, and assembly evaluation and validation. The last three steps can also be applied to the analysis of real data. iWGS is designed to enable the user to have great flexibility in testing the range of experimental designs available for genome sequencing projects, and supports all major sequencing technologies and popular assembly tools.

Publication: Zhou et al. (2016) G3


The GEneSTATION database ( integrates diverse types of omics data across mammals to advance understanding of the genetic basis of gestation and pregnancy-associated phenotypes and accelerate the translation of discoveries from model organisms to humans. GEneSTATION contains curated life history information on pregnancy and reproduction from 23 mammalian genomes. For every human gene, GEneSTATION contains diverse evolutionary (e.g., gene age, population genetic and molecular evolutionary statistics), organismal (e.g., tissue-specific gene and protein expression, differential gene expression, disease phenotype), and molecular data types (e.g., protein interactions), as well as links to many general and pregnancy disease-specific databases.

Publication: Kim et al. (2016) Nucleic Acids Res.