Paraphrase by dnil · Pull Request #6178 · Clinical-Genomics/scout

dnil · 2026-04-02T14:28:50Z

This PR adds a functionality or fixes a bug.

Fix Parse and display paraphase json #4519

Paraphrase JSON files are parsed and added to individuals. Gene symbols are parsed and hgnc_genes added, links to SNV and SV buttons are working. Phase_region entries are parsed and used for creating appropriate IGV gDNA buttons. The SMN CN view changes name to SMN - Dark regions and includes an accordion with a plain display of Paraphrase info. Cases get a new status tag category to accommodate findings outside SMN.

Minimal formatting so far: if it looks like the Paraphase/Paraphrase format is settling a bit we can give it more work. But ideas for simple, nice formatting are most welcome!

Testing on cg-vm1 server (Clinical Genomics Stockholm)

Prepare for testing

Make sure the PR is pushed and available on Docker Hub
First book your testing time using the Pax software available at https://pax.scilifelab.se/. The resource you are going to call dibs on is scout-stage and the server is cg-vm1.
ssh <USER.NAME>@cg-vm1.scilifelab.se
sudo -iu hiseq.clinical
ssh localhost
(optional) Find out which scout branch is currently deployed on cg-vm1: podman ps
Stop the service with current deployed branch: systemctl --user stop scout@<name_of_currently_deployed_branch>
Start the scout service with the branch to test: systemctl --user start scout@<this_branch>
Make sure the branch is deployed: systemctl --user status scout.target
After testing is done, repeat procedure at https://pax.scilifelab.se/, which will release the allocated resource (scout-stage) to be used for testing by other users.

Testing on hasta server (Clinical Genomics Stockholm)

Prepare for testing

ssh <USER.NAME>@hasta.scilifelab.se
Book your testing time using the Pax software. us; paxa -u <user> -s hasta -r scout-stage. You can also use the WSGI Pax app available at https://pax.scilifelab.se/.
(optional) Find out which scout branch is currently deployed on cg-vm1: conda activate S_scout; pip freeze | grep scout-browser
Deploy the branch to test: bash /home/proj/production/servers/resources/hasta.scilifelab.se/update-tool-stage.sh -e S_scout -t scout -b <this_branch>
Make sure the branch is deployed: us; scout --version
After testing is done, repeat the paxa procedure, which will release the allocated resource (scout-stage) to be used for testing by other users.

How to test:

how to test it, possibly with real cases/data

Expected outcome:
The functionality should be working
Take a screenshot and attach or copy/paste the output.

Review:

code approved by
tests executed by

scout/parse/case.py

codecov · 2026-04-02T14:46:18Z

Codecov Report

❌ Patch coverage is 95.95960% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 83.88%. Comparing base (49a8db2) to head (5d8c2ce).

Files with missing lines	Patch %	Lines
scout/parse/case.py	94.87%	2 Missing ⚠️
scout/server/blueprints/cases/controllers.py	95.65%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #6178      +/-   ##
==========================================
+ Coverage   83.84%   83.88%   +0.04%     
==========================================
  Files         343      343              
  Lines       21304    21351      +47     
==========================================
+ Hits        17862    17910      +48     
+ Misses       3442     3441       -1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

… enroute

scout/parse/case.py

dnil · 2026-04-10T08:07:47Z

This is overall a bit experimental, but I would appreciate some feedback I think. The general idea to deal with paraphase output and the very genuine difficulty of representing the dark areas of the genome that isn't really in the non-graph builds is to have paraphrase parse the paraphase output in the pipeline, and then display it rather generically in scout, complementing inspection of the paraphase alignment bamlets. Still no real solution for marking individual variants caught, but linkout to the regular SNVs and SVs to see what is possibly there already from the lists here, and a case level tag for causative status.

dnil · 2026-04-10T08:10:56Z

scout/parse/case.py

-        Python >3
-    Args:
-        dictionary: dict
+def clean_recursive(obj, preserve_keys=None):


This is a bit more general, but still with the option to save some empty keys. It got out of hand to fill potential paraphrase list keys on the previous functions.

northwestwitch · 2026-04-13T09:00:19Z

Since you asked for ideas.. :)

What about creating a table for each gene which resembles what you have on top for SMN. What one notices is that there is a lot of white space on the right side of these expandable DIVs:

It could be more compact and showing one individual on one line, same as above?

I like the expandable DIVs, but the links to IGV positioned there are not really visible, I think, because of the big space the divs take when expanded, at the moment. With one individual per line it would look much more readable. Same as SMN above!
Good riddance with this part and the comments. They're elsewhere anyway!

Perhaps put a link to the software or state clearly where the respective data (SMN and and DARK regions) come from?

dnil · 2026-04-13T10:27:12Z

Since you asked for ideas.. :)

Super, thank you!

What about creating a table for each gene which resembles what you have on top for SMN. What one notices is that there is a lot of white space on the right side of these expandable DIVs:
It could be more compact and showing one individual on one line, same as above?

They are pretty empty on the right side! Do you have some ideas for how? Notice that the JSON for different loci has somewhat different structure, some with a lot of info, some with just a bit. It found it looked rather messy when I tried just filling the lines with them.

Maybe with a scrollable table? Like for the causatives. I haven't tried that!

I like the expandable DIVs, but the links to IGV positioned there are not really visible, I think, because of the big space the divs take when expanded, at the moment. With one individual per line it would look much more readable. Same as SMN above!

Mm, yes, they are a little hidden. It was confusing to have them on the top of the foldout, at least with the current layout. Maybe we could have those on the right side instead. 🤔

Good riddance with this part and the comments. They're elsewhere anyway!

The idea is that one can add comments about the variants on the case level, as these cannot be pinned. And have a reference for the family structure/names of individuals handy.

Perhaps put a link to the software or state clearly where the respective data (SMN and and DARK regions) come from?

Absolutely will, good call!

northwestwitch · 2026-04-13T11:19:21Z

They are pretty empty on the right side! Do you have some ideas for how? Notice that the JSON for different loci has somewhat different structure, some with a lot of info, some with just a bit.

These don't have to be one under the other, but perhaps they can be comma separated - it would compress the line
image

Also these different fields look like a repetition of the same thing, but what do I know?

image

EDIT - sorry, managed to edit here instead of in the reply... 🙄 // D

dnil · 2026-04-13T13:30:10Z

They are pretty empty on the right side! Do you have some ideas for how? Notice that the JSON for different loci has somewhat different structure, some with a lot of info, some with just a bit.

These don't have to be one under the other, but perhaps they can be comma separated - it would compress the line

Sorry, I think I'm missing something you feel is obvious! Yes, comma separation would compact this line, but it would make the total per-sample line much longer? They would be huge already?

Also these different fields look like a repetition of the same thing, but what do I know?

They look different to me, but I would have to read up on the locus again to see if there is something I'm missing! I only remember that it was important what kind of copy is first in each array, since that is closer to the initial regulatory site.

But in general there is certainly things one could do with each kind of the loci, if we spend time specialising on each of them. E.g. we have a new SMN locus here that could be mapped back on the old. The new one has some stuff the old doesnt and vice versa. The idea was to make something generic for starters. There are hundreds of loci in paraphase, but only a subset with clinical implications that we deal with in paraphrase. The latter typically being the more complex.

I'll consider making a try with tables. They will be broad and scrolly both up and down, but it could work, and if so would be easier to compare if there would be say multiple affected individuals. The typical case is ofcourse the singleton or plain trio.

northwestwitch · 2026-04-13T13:48:34Z

Sorry, I think I'm missing something you feel is obvious! Yes, comma separation would compact this line, but it would make the total per-sample line much longer? They would be huge already?

Sorry I should have explained it better: If you make it a table you can assign a certain width to that cell and have all these hap strings either comma-separated (with a space) or just space separated and you can make them go to to multiple lines. It would be more compact

Something like this, if it doesn't make it too difficult to read

dnil · 2026-04-13T14:05:25Z

Sorry I should have explained it better: If you make it a table you can assign a certain width to that cell and have all these hap strings either comma-separated (with a space) or just space separated and you can make them go to to multiple lines. It would be more compact

Mm, only I dont know beforehand quite which fields will be big or small, have to figure that out dynamically? But let's see I guess. I'll sketch something and see if I understand then.

dnil · 2026-04-13T15:40:53Z

Ok, here goes! I took help from copilot (Claude Haiku 4.5) for ideas and especially for the compacting CSS.

By collecting all top level keys, we get a cleanly aligned table when there isn't much sub-tabling going on.

It still gets a little messy for complex loci, with differences between individuals on the sub-sub key levels.

One could go fixed columns, or have certain keys predefined as massive ones that need extra space - all the way to something like pre-render the table, count space and make fixed columns accordingly. Or just make it scrollable with a large extra margin. But maybe it's fine anyway. We usually mostly focus on the affected - and almost all the time deal with singletons anyway.

sonarqubecloud · 2026-04-13T16:06:38Z

Quality Gate failed

Failed conditions
B Security Rating on New Code (required ≥ A)

See analysis details on SonarQube Cloud

Catch issues before they fail your Quality Gate with our IDE extension SonarQube for IDE

dnil added 14 commits April 1, 2026 15:59

start adding paraphrase

6a9c64e

demo data

1ff9f58

add more

88b33c0

panel hits SMN paraphase. maybe we should have the other loci as well?

26341a8

skip double reading smn. parse paraphrase json

5568e33

remove unused removal function

c0b433f

simplify recursion cleaner. add individual build.

43d81b1

Add a dark category for other causative case tags

fc6091b

Display regions and status

a6c2778

status bg color

02ed0b1

working colors

f677131

Collapses, IGV, SNV and SV buttons

967f54a

tooltip with status reason, start printing other keys

13f4302

Values shown

1d13dd8

github-advanced-security bot found potential problems Apr 2, 2026

View reviewed changes

scout/parse/case.py Fixed Show fixed Hide fixed

scout/parse/case.py Fixed Show fixed Hide fixed

repurpose tests

75a6623

dnil added 3 commits April 2, 2026 18:29

use a recursion for display as well

d7e3a30

polish formatting a little

f0710a0

add parsing paraphrase and smn to conftest adapters, cleanup in tests…

416789c

… enroute

github-advanced-security bot found potential problems Apr 7, 2026

View reviewed changes

scout/parse/case.py Fixed Show fixed Hide fixed

dnil and others added 9 commits April 7, 2026 11:32

vulture notes

8a842a1

Merge branch 'main' into paraphrase

96fa581

Merge branch 'main' into paraphrase

f230732

Merge branch 'main' into paraphrase

e8494dc

Merge branch 'main' into paraphrase

58bf879

Merge branch 'main' into paraphrase

761b443

Merge branch 'main' into paraphrase

026c1e5

Merge branch 'main' into paraphrase

28d923e

Merge branch 'main' into paraphrase

b36d837

dnil added 7 commits April 9, 2026 07:56

Merge branch 'main' into paraphrase

8ca0818

Merge branch 'main' into paraphrase

53fa579

Merge branch 'main' into paraphrase

586ec37

operator precedence

4ddae03

refactor to lower apparent mccabe complexity

94feeb9

break out one more function

e9e12c9

Merge branch 'main' into paraphrase

a951feb

dnil marked this pull request as ready for review April 10, 2026 07:58

dnil commented Apr 10, 2026

View reviewed changes

dnil and others added 2 commits April 10, 2026 13:18

changelog

f08248a

Merge branch 'main' into paraphrase

067edc4

Merge branch 'main' into paraphrase

666792c

dnil added 2 commits April 13, 2026 15:56

add links to callers for page and docs

fe04ecc

random blank point

e376588

Convert to table, compact a bit

0fafe82

Add in a sex column

5d8c2ce

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Paraphrase#6178

Paraphrase#6178
dnil wants to merge 41 commits intomainfrom
paraphrase

dnil commented Apr 2, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

codecov bot commented Apr 2, 2026 •

edited

Loading

Uh oh!

Uh oh!

dnil commented Apr 10, 2026

Uh oh!

dnil Apr 10, 2026

Uh oh!

northwestwitch commented Apr 13, 2026

Uh oh!

dnil commented Apr 13, 2026 •

edited

Loading

Uh oh!

northwestwitch commented Apr 13, 2026 •

edited by dnil

Loading

Uh oh!

dnil commented Apr 13, 2026

Uh oh!

northwestwitch commented Apr 13, 2026 •

edited

Loading

Uh oh!

dnil commented Apr 13, 2026

Uh oh!

dnil commented Apr 13, 2026 •

edited

Loading

Uh oh!

sonarqubecloud bot commented Apr 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

dnil commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov bot commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

dnil commented Apr 10, 2026

Uh oh!

dnil Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

northwestwitch commented Apr 13, 2026

Uh oh!

dnil commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

northwestwitch commented Apr 13, 2026 • edited by dnil Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dnil commented Apr 13, 2026

Uh oh!

northwestwitch commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dnil commented Apr 13, 2026

Uh oh!

dnil commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sonarqubecloud bot commented Apr 13, 2026

Quality Gate failed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

dnil commented Apr 2, 2026 •

edited

Loading

codecov bot commented Apr 2, 2026 •

edited

Loading

dnil commented Apr 13, 2026 •

edited

Loading

northwestwitch commented Apr 13, 2026 •

edited by dnil

Loading

northwestwitch commented Apr 13, 2026 •

edited

Loading

dnil commented Apr 13, 2026 •

edited

Loading