A technical question regarding the Figure S1 simulation: If you simulate under INDELible (which uses a slightly different substitution model, e.g., GY94 with possibly different equilibrium frequencies) and infer with FUBAR, what exactly are you comparing via RMSE? FUBAR was not really designed to estimate true point values of dN and dS at a site super-accurately; it is meant to estimate the posterior probability of a site belonging to a specific rate class on a grid. A much more sensible approach would be to calculate the Wasserstein distance between the simulated distribution of dN/dS and the inferred FUBAR grid density.
I could not find any statement in paper associated with FUBAR to the effect that it is "not really designed to estimate true point values of dN and dS at a site" with accuracy. The HyPhy 2.5 paper cites FUBAR as one of the available methods for measuring site-specific selection. In addition, Murrell et al. 2016 stated:
FUBAR was originally designed for identifying sites under positive selection, and the accuracy of θ̂ is relatively unimportant for site-specific inference, which governed some of the design choices behind FUBAR.
FUBAR also remains the recommended option on the Datamonkey webserver for detecting pervasive (non-episodic) site-specific selection.
I also think this statement is inaccurate: "[FUBAR] is meant to estimate the posterior probability of a site belonging to a specific rate class on a grid". The grid in FUBAR is a means to an end. It makes the estimation of site-specific selection more efficient by constraining dN and dS values to a fixed grid of combined values. In contrast, the general bivariate method used in Pond et al. needs to determine the dN and dS values associated with each rate classes, as well as its probability.
We could take the posterior distribution (grid values) from the FUBAR analyses and compare them to the "true" distributions from the model simulation. Since INDELible only simulates variation in dN/dS (omega), this distribution will be constant with respect to dS.
Alternatively, we could redo this analysis using SLAC on the same simulated alignments. (If we still have these.)
I could not find any statement in paper associated with FUBAR to the effect that it is "not really designed to estimate true point values of dN and dS at a site" with accuracy. The HyPhy 2.5 paper cites FUBAR as one of the available methods for measuring site-specific selection. In addition, Murrell et al. 2016 stated:
FUBAR also remains the recommended option on the Datamonkey webserver for detecting pervasive (non-episodic) site-specific selection.
I also think this statement is inaccurate: "[FUBAR] is meant to estimate the posterior probability of a site belonging to a specific rate class on a grid". The grid in FUBAR is a means to an end. It makes the estimation of site-specific selection more efficient by constraining dN and dS values to a fixed grid of combined values. In contrast, the general bivariate method used in Pond et al. needs to determine the dN and dS values associated with each rate classes, as well as its probability.
We could take the posterior distribution (grid values) from the FUBAR analyses and compare them to the "true" distributions from the model simulation. Since INDELible only simulates variation in dN/dS (omega), this distribution will be constant with respect to dS.
Alternatively, we could redo this analysis using SLAC on the same simulated alignments. (If we still have these.)