Handling of X amino acids/masked sequences

Hi,

As working through https://github.com/PoonLab/sierra-local/issues/108, I hit another interesting case on [2896b833af1259e9cd5907fa6819c3da53beb88b](https://github.com/PoonLab/sierra-local/commit/2896b833af1259e9cd5907fa6819c3da53beb88b):

As example, I took K03455.1 and masked the start of PR with `N`s:

[K03455_masked.fna.gz](https://github.com/user-attachments/files/20593569/K03455_masked.fna.gz)

Running through HIVDB via Stanford correctly identifies that AA 1-33 of PR are missing:

![Image](https://github.com/user-attachments/assets/bb97e600-24f6-490b-8cf5-88f88f9c8fd2)

With sierra-local, I see a full alignment of PR in the JSON and no warnings: 

[K03455_masked_results.json](https://github.com/user-attachments/files/20593611/K03455_masked_results.json)

I get the same or similar results with both `post-align` and `nucamino`. It seems to me that we would want to handle `X` amino acids differently than other amino acids, more similarly to  HIVDB? Many users may mask their sequences for low or no coverage bases and so it would be important from a quality perspective to know that we may miss important variants at those sites.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handling of X amino acids/masked sequences #110

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Handling of X amino acids/masked sequences #110

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions