Issue about Figure S1

> A technical question regarding the Figure S1 simulation: If you simulate under INDELible (which uses a slightly different substitution model, e.g., GY94 with possibly different equilibrium frequencies) and infer with FUBAR, what exactly are you comparing via RMSE? FUBAR was not really designed to estimate true point values of dN and dS at a site super-accurately; it is meant to estimate the posterior probability of a site belonging to a specific rate class on a grid. A much more sensible approach would be to calculate the Wasserstein distance between the simulated distribution of dN/dS and the inferred FUBAR grid density.


I could not find any statement in [paper associated with FUBAR](https://academic.oup.com/mbe/article/27/3/520/998895?) to the effect that it is "not really designed to estimate true point values of dN and dS at a site" with accuracy.  The [HyPhy 2.5 paper](https://academic.oup.com/mbe/article/37/1/295/5555420) cites FUBAR as one of the available methods for measuring site-specific selection.  In addition, [Murrell et al. 2016](https://journals.asm.org/doi/full/10.1128/jvi.00972-16) stated:
> FUBAR was originally designed for identifying sites under positive selection, and the accuracy of θ̂ is relatively unimportant for site-specific inference, which governed some of the design choices behind FUBAR.

FUBAR also remains the recommended option on the Datamonkey webserver for detecting pervasive (non-episodic) site-specific selection.

I also think this statement is inaccurate: "[FUBAR] is meant to estimate the posterior probability of a site belonging to a specific rate class on a grid".  The grid in FUBAR is a means to an end.  It makes the estimation of site-specific selection more efficient by constraining dN and dS values to a fixed grid of combined values.  In contrast, the general bivariate method used in Pond et al. needs to determine the dN and dS values associated with each rate classes, as well as its probability.  

We could take the posterior distribution (grid values) from the FUBAR analyses and compare them to the "true" distributions from the model simulation.  Since INDELible only simulates variation in dN/dS (omega), this distribution will be constant with respect to dS. 

Alternatively, we could redo this analysis using SLAC on the same simulated alignments.  (If we still have these.)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue about Figure S1 #128

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue about Figure S1 #128

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions