Support non-normalized probability matrices in hungarian#50
Merged
Conversation
luxaritas
added a commit
to eternagame/EternaJS
that referenced
this pull request
Feb 28, 2025
Conventionally, dot plots are probability matrices such that for any base, the probability that it will pair with any other base will sum to 1, less the probability that it is unpaired - with probability unpaired excluded, so to get the probability unpaired we sum and subtract from 1. However for an algorithm like ribonanzanet, the dot plot is instead a likelihood matrix such that the liklihood of each pair is independent of every other pair, meaning it could sum to more than 1. We now clip P(unp) to so that if the row sums to a value > 1, we don't drop p(unp) to be below 0, which doesn't make sense (it is still ambiguous what p(unp) should be if you have two pairs at .75 liklihood, but this is at least _better_). This reflects a change made in WaymentSteeleLab/arnie#50
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This was implemented by @jandersonlee to support RNet, which outputs a confidence matrix rather than a probability matrix (ie, in RNet all rows/columns do not necessarily need to sum to 1) - in that case hungarian would wind up outputting invalid structures (with unbalanced pairs) as it would attempt to pair bases multiple times.
Reviewed by @rkretsch, and I also ran tests against our PK50 dataset that ensured results were unchanged for vienna/contrafold/eternafold bpps.