Skip to content

Paraphrase#6178

Open
dnil wants to merge 41 commits intomainfrom
paraphrase
Open

Paraphrase#6178
dnil wants to merge 41 commits intomainfrom
paraphrase

Conversation

@dnil
Copy link
Copy Markdown
Member

@dnil dnil commented Apr 2, 2026

This PR adds a functionality or fixes a bug.

Paraphrase JSON files are parsed and added to individuals. Gene symbols are parsed and hgnc_genes added, links to SNV and SV buttons are working. Phase_region entries are parsed and used for creating appropriate IGV gDNA buttons. The SMN CN view changes name to SMN - Dark regions and includes an accordion with a plain display of Paraphrase info. Cases get a new status tag category to accommodate findings outside SMN.

Minimal formatting so far: if it looks like the Paraphase/Paraphrase format is settling a bit we can give it more work. But ideas for simple, nice formatting are most welcome!

Screenshot 2026-04-07 at 08 35 27
Testing on cg-vm1 server (Clinical Genomics Stockholm)

Prepare for testing

  1. Make sure the PR is pushed and available on Docker Hub
  2. First book your testing time using the Pax software available at https://pax.scilifelab.se/. The resource you are going to call dibs on is scout-stage and the server is cg-vm1.
  3. ssh <USER.NAME>@cg-vm1.scilifelab.se
  4. sudo -iu hiseq.clinical
  5. ssh localhost
  6. (optional) Find out which scout branch is currently deployed on cg-vm1: podman ps
  7. Stop the service with current deployed branch: systemctl --user stop scout@<name_of_currently_deployed_branch>
  8. Start the scout service with the branch to test: systemctl --user start scout@<this_branch>
  9. Make sure the branch is deployed: systemctl --user status scout.target
  10. After testing is done, repeat procedure at https://pax.scilifelab.se/, which will release the allocated resource (scout-stage) to be used for testing by other users.
Testing on hasta server (Clinical Genomics Stockholm)

Prepare for testing

  1. ssh <USER.NAME>@hasta.scilifelab.se
  2. Book your testing time using the Pax software. us; paxa -u <user> -s hasta -r scout-stage. You can also use the WSGI Pax app available at https://pax.scilifelab.se/.
  3. (optional) Find out which scout branch is currently deployed on cg-vm1: conda activate S_scout; pip freeze | grep scout-browser
  4. Deploy the branch to test: bash /home/proj/production/servers/resources/hasta.scilifelab.se/update-tool-stage.sh -e S_scout -t scout -b <this_branch>
  5. Make sure the branch is deployed: us; scout --version
  6. After testing is done, repeat the paxa procedure, which will release the allocated resource (scout-stage) to be used for testing by other users.

How to test:

  1. how to test it, possibly with real cases/data

Expected outcome:
The functionality should be working
Take a screenshot and attach or copy/paste the output.

Review:

  • code approved by
  • tests executed by

@codecov
Copy link
Copy Markdown

codecov bot commented Apr 2, 2026

Codecov Report

❌ Patch coverage is 95.95960% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 83.88%. Comparing base (49a8db2) to head (5d8c2ce).

Files with missing lines Patch % Lines
scout/parse/case.py 94.87% 2 Missing ⚠️
scout/server/blueprints/cases/controllers.py 95.65% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #6178      +/-   ##
==========================================
+ Coverage   83.84%   83.88%   +0.04%     
==========================================
  Files         343      343              
  Lines       21304    21351      +47     
==========================================
+ Hits        17862    17910      +48     
+ Misses       3442     3441       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@dnil dnil marked this pull request as ready for review April 10, 2026 07:58
@dnil
Copy link
Copy Markdown
Member Author

dnil commented Apr 10, 2026

This is overall a bit experimental, but I would appreciate some feedback I think. The general idea to deal with paraphase output and the very genuine difficulty of representing the dark areas of the genome that isn't really in the non-graph builds is to have paraphrase parse the paraphase output in the pipeline, and then display it rather generically in scout, complementing inspection of the paraphase alignment bamlets. Still no real solution for marking individual variants caught, but linkout to the regular SNVs and SVs to see what is possibly there already from the lists here, and a case level tag for causative status.

Python >3
Args:
dictionary: dict
def clean_recursive(obj, preserve_keys=None):
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit more general, but still with the option to save some empty keys. It got out of hand to fill potential paraphrase list keys on the previous functions.

@northwestwitch
Copy link
Copy Markdown
Member

Since you asked for ideas.. :)

  • What about creating a table for each gene which resembles what you have on top for SMN. What one notices is that there is a lot of white space on the right side of these expandable DIVs:
image

It could be more compact and showing one individual on one line, same as above?

  • I like the expandable DIVs, but the links to IGV positioned there are not really visible, I think, because of the big space the divs take when expanded, at the moment. With one individual per line it would look much more readable. Same as SMN above!

  • Good riddance with this part and the comments. They're elsewhere anyway!

image
  • Perhaps put a link to the software or state clearly where the respective data (SMN and and DARK regions) come from?

@dnil
Copy link
Copy Markdown
Member Author

dnil commented Apr 13, 2026

Since you asked for ideas.. :)

Super, thank you!

  • What about creating a table for each gene which resembles what you have on top for SMN. What one notices is that there is a lot of white space on the right side of these expandable DIVs:
    It could be more compact and showing one individual on one line, same as above?

They are pretty empty on the right side! Do you have some ideas for how? Notice that the JSON for different loci has somewhat different structure, some with a lot of info, some with just a bit. It found it looked rather messy when I tried just filling the lines with them.

Screenshot 2026-04-13 at 12 21 54 Screenshot 2026-04-13 at 12 21 45

Maybe with a scrollable table? Like for the causatives. I haven't tried that!

  • I like the expandable DIVs, but the links to IGV positioned there are not really visible, I think, because of the big space the divs take when expanded, at the moment. With one individual per line it would look much more readable. Same as SMN above!

Mm, yes, they are a little hidden. It was confusing to have them on the top of the foldout, at least with the current layout. Maybe we could have those on the right side instead. 🤔

  • Good riddance with this part and the comments. They're elsewhere anyway!

The idea is that one can add comments about the variants on the case level, as these cannot be pinned. And have a reference for the family structure/names of individuals handy.

  • Perhaps put a link to the software or state clearly where the respective data (SMN and and DARK regions) come from?

Absolutely will, good call!

@northwestwitch
Copy link
Copy Markdown
Member

northwestwitch commented Apr 13, 2026

They are pretty empty on the right side! Do you have some ideas for how? Notice that the JSON for different loci has somewhat different structure, some with a lot of info, some with just a bit.

These don't have to be one under the other, but perhaps they can be comma separated - it would compress the line
image

Also these different fields look like a repetition of the same thing, but what do I know?

image

EDIT - sorry, managed to edit here instead of in the reply... 🙄 // D

@dnil
Copy link
Copy Markdown
Member Author

dnil commented Apr 13, 2026

They are pretty empty on the right side! Do you have some ideas for how? Notice that the JSON for different loci has somewhat different structure, some with a lot of info, some with just a bit.

These don't have to be one under the other, but perhaps they can be comma separated - it would compress the line

image

Sorry, I think I'm missing something you feel is obvious! Yes, comma separation would compact this line, but it would make the total per-sample line much longer? They would be huge already?

Also these different fields look like a repetition of the same thing, but what do I know?

They look different to me, but I would have to read up on the locus again to see if there is something I'm missing! I only remember that it was important what kind of copy is first in each array, since that is closer to the initial regulatory site.

But in general there is certainly things one could do with each kind of the loci, if we spend time specialising on each of them. E.g. we have a new SMN locus here that could be mapped back on the old. The new one has some stuff the old doesnt and vice versa. The idea was to make something generic for starters. There are hundreds of loci in paraphase, but only a subset with clinical implications that we deal with in paraphrase. The latter typically being the more complex.

I'll consider making a try with tables. They will be broad and scrolly both up and down, but it could work, and if so would be easier to compare if there would be say multiple affected individuals. The typical case is ofcourse the singleton or plain trio.

@northwestwitch
Copy link
Copy Markdown
Member

northwestwitch commented Apr 13, 2026

Sorry, I think I'm missing something you feel is obvious! Yes, comma separation would compact this line, but it would make the total per-sample line much longer? They would be huge already?

Sorry I should have explained it better: If you make it a table you can assign a certain width to that cell and have all these hap strings either comma-separated (with a space) or just space separated and you can make them go to to multiple lines. It would be more compact

Something like this, if it doesn't make it too difficult to read

image

@dnil
Copy link
Copy Markdown
Member Author

dnil commented Apr 13, 2026

Sorry I should have explained it better: If you make it a table you can assign a certain width to that cell and have all these hap strings either comma-separated (with a space) or just space separated and you can make them go to to multiple lines. It would be more compact

Mm, only I dont know beforehand quite which fields will be big or small, have to figure that out dynamically? But let's see I guess. I'll sketch something and see if I understand then.

@dnil
Copy link
Copy Markdown
Member Author

dnil commented Apr 13, 2026

Ok, here goes! I took help from copilot (Claude Haiku 4.5) for ideas and especially for the compacting CSS.

Screenshot 2026-04-13 at 17 34 11

By collecting all top level keys, we get a cleanly aligned table when there isn't much sub-tabling going on.

It still gets a little messy for complex loci, with differences between individuals on the sub-sub key levels.

Screenshot 2026-04-13 at 17 34 18

One could go fixed columns, or have certain keys predefined as massive ones that need extra space - all the way to something like pre-render the table, count space and make fixed columns accordingly. Or just make it scrollable with a large extra margin. But maybe it's fine anyway. We usually mostly focus on the affected - and almost all the time deal with singletons anyway.

@sonarqubecloud
Copy link
Copy Markdown

Quality Gate Failed Quality Gate failed

Failed conditions
B Security Rating on New Code (required ≥ A)

See analysis details on SonarQube Cloud

Catch issues before they fail your Quality Gate with our IDE extension SonarQube for IDE

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Parse and display paraphase json

3 participants