Skip to content

Apply cutoffs for domains #65

@apcamargo

Description

@apcamargo

This is more of a question than a bug.

My understanding is that when the user set a cutoff, PyHMMER will not report hits that would have hit.included == False. However, the same is not applied at the domain level, so there are cases where the hit is reported because one of the domains passes the cutoffs, but domains that do not satisfy the cutoffs are also reported.

For example:

>P03865 1-315
MATTKLGNTKSASRAINYAEKRAEEKSGLNCDVDYAKSAFKQTRALYGKEDGIQAHTVIQSFKPGEVTPEQCNQLGLELA
EKIAPNHQVAVYTHTDKDHYHNHIVINSVDLETGKKYQSNKKQRDLVKKENDNICREHGLSVTERGIAKMRYTQAEKGIV
FDRDEYSWKDELRDLIENAKTHTSNLETFSEHLEEKGVGVKLRGETISYKPENENKWVRGRTLGSEYEKGAIDHEHERHQ
KQQREPEYADEFKINWDAVEQHTEQLKQRRVERAQETKQAHSKISSRDTRESENQRERAKGNNIRIERGDEGLSR
with open("hmmsearch_plasmid_hallmarks.tsv", "w") as fo:
    fo.write("query\thit\tstart\tend\tevalue\tdomain_score\tdomain_included\n")
    
    for p in Path(".").glob("*.faa"):
        with pyhmmer.easel.SequenceFile(p, digital=True) as seq_file:
            seqs = seq_file.read_block()

        with pyhmmer.plan7.HMMFile("Pfam-A.hmm") as hmm_file:
            for hits in pyhmmer.hmmsearch(hmm_file, seqs, bit_cutoffs="gathering"):
                for hit in hits:
                    for domain in hit.domains:
                        fout.write(
                            f"{hits.query_accession.decode()}\t"
                            f"{hit.name.decode()}\t"
                            f"{domain.env_from}\t"
                            f"{domain.env_to}\t"
                            f"{hit.evalue:.2E}\t"
                            f"{domain.score:.2f}\t"
                            f"{domain.included}\n"
                        )

Generates this output:

query        hit      start   end   evalue     domain_score   domain_included
PF03432.18   P03865   8       239   3.25E-90   288.01         True
PF03432.18   P03865   240     314   3.25E-90   -0.23          False

To be honest, I don't really know what determines if a domain is included or not, given that the cutoffs are applied to the hit. I expected that when a cutoff is applied, domains were also filtered.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingquestionFurther information is requested

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions