Skip to content

Comments

feat(metric): Add CUFD (Codon Usage Frequency Distribution) metric and comprehensive test suite#1

Open
Rakshitha-Ireddi wants to merge 1 commit intoKingsford-Group:mainfrom
Rakshitha-Ireddi:feat/cufd-metric-and-test-suite
Open

feat(metric): Add CUFD (Codon Usage Frequency Distribution) metric and comprehensive test suite#1
Rakshitha-Ireddi wants to merge 1 commit intoKingsford-Group:mainfrom
Rakshitha-Ireddi:feat/cufd-metric-and-test-suite

Conversation

@Rakshitha-Ireddi
Copy link

This PR introduces the Codon Usage Frequency Distribution (CUFD) metric to the ARCADE calculator module, along with a comprehensive test suite covering all calculator functions.

Key Changes:

  • New Metric: Added cufd_kl_divergence and cufd_cosine_similarity to calculator/calculator.py. These metrics measure how closely a sequence matches a target organism's full codon usage distribution.
  • Data: Added calculator/data/codon_usage_human.json containing human codon frequencies from the Kazusa database.
  • Steering: Extended scripts/utils/mutation.py with a cufd mode to generate contrastive pairs (high vs. low CUFD) for steering vector computation.
  • Testing: Added a full test suite in tests/ (previously nonexistent) covering:
    All calculator metrics (MFE, CAI, GC, CpG, UpA, CUFD)
    Codon<->Amino Acid mappings
    Tokenizer utilities

Verification

Added tests/ directory with 61 tests.
Run pytest tests/ to verify. All tests pass locally.

Authors

  • IreddI Rakshitha
  • Yaswanth Devavarapu

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant