correlation of fitness estimates for different clusters of viral clades

Interactive plot of the correlation between fitness estimates made using different subsets of the sequence data. Each point represents a fitness estimate for a different amino-acid mutation. The Pearson correlation coefficient and the number of mutations being correlated are shown in the upper left of the scatter plot.

You can mouse over points for details.

The minimum predicted count slider below the plot indicates how many expected counts of an an amino acid we require before making a fitness estimate. Larger values yield more accurate estimates but for fewer amino acids. So move the slider to the left to show estimates for more amino acids at lower confidence, and move it to the right to show estimates for fewer amino acids at higher confidence.

This plot only shows the cluster of clades with the largest numbers of sequences.

You can click/shift-click on specific genes in the legend below the plot to only show mutations for that gene.

See Haddox H.K. et al. (2024) for a paper describing the work.

See https://github.com/neherlab/SARS2-mut-fitness-v2 for full computer code, data and a short summary of the theoretical framework.

See https://neherlab.github.io/SARS2-mut-fitness-v2/ for links to all interactive plots.