Skip to content

Releases: danielzmbp/remag

v0.4.0

05 Mar 21:51

Choose a tag to compare

  • release: bump version to 0.4.0
  • style: apply formatting-only updates across modules and tests
  • remove automated bioconda PR, generate recipe as artifact only; fix recipe (Python >=3.9, add scipy)
  • clean up duplicate imports, stale comments, and misplaced import re
  • remove unused imports and stale comments in hyenadna files
  • remove dead variables and unused import in clustering.py
  • fix: --filter-only flag was silently ignored (missing from args Namespace)
  • remove dead _leiden_clustering_on_graph() function
  • remove dead PathManager class from utils.py
  • docs: update README to match current pipeline and fix cli OPTION_GROUPS ghost reference
  • remove dead greedy_min_score parameter (F1 scores are always >= 0)
  • Revert "fix: tighten greedy clustering contamination cap to 5%"
  • fix: tighten greedy clustering contamination cap to 5%
  • fix: block rescue merges above 10% duplication
  • feat: expose rescue duplication limits in CLI and enforce max total duplication
  • Revert "refactor: update encoder architecture to be dynamic and balanced"
  • refactor: update encoder architecture to be dynamic and balanced
  • fix: rate limit unconditional fusion debug logs
  • fix: rate limit debug logging for gating weights
  • fix: switch debug prints to logger.debug
  • feat: add debug prints for gating weights
  • Revert to state after 'feat: add short-reads/sr mode' (77f1ea5)
  • Update rescue threshold: 0.9 for single sample, 0.7 for coassemblies
  • Add 0.05 to default Leiden resolutions for larger clusters
  • Fix indentation bug in rescue loop causing duplicate merges
  • Update rescue strategy: use global SCG count for contamination and enforce 10% ceiling
  • feat: add short-reads/sr mode to enforce 1000bp min contig length
  • feat: add contamination filter to greedy clustering
  • tune: implement F1-score based bin quality metric inspired by SemiBin2
  • tune: add 0.1 to greedy clustering resolutions
  • tune: score bins by total core genes (unique families) - 5*dups
  • tune: set miniprot query coverage 0.6 identity 0.4, revert global -N/-p
  • fix: use query coverage instead of target coverage for miniprot filtering
  • Revert "Increase contamination penalty from 5 to 7 in bin quality scoring"
  • Revert "Remove Singleton Rescue step"
  • Revert "Increase contamination penalty from 7 to 10"
  • Revert "Add 0.1 to default greedy clustering resolutions"
  • Revert "tune: tighten miniprot thresholds"
  • Revert "chore: remove agents instructions"
  • Revert "tune: lower contamination penalty"
  • Revert "tune: score bins by total core genes"
  • Revert "tune: relax miniprot filters"
  • Revert "tune: restore miniprot outs"
  • tune: restore miniprot outs
  • tune: relax miniprot filters
  • tune: score bins by total core genes
  • tune: lower contamination penalty
  • chore: remove agents instructions
  • tune: tighten miniprot thresholds
  • Add 0.1 to default greedy clustering resolutions
  • Increase contamination penalty from 7 to 10
  • Remove Singleton Rescue step
  • Increase contamination penalty from 5 to 7 in bin quality scoring
  • Add debug logging for singleton rescue
  • Restore exact logging format
  • Revert greedy clustering parallelization
  • Replace joblib with concurrent.futures
  • Parallelize greedy clustering resolution search
  • Reduce rescue duplication tolerance to 3%
  • Implement Singleton Rescue step
  • Remove refinement step and related CLI options
  • Restore missing _leiden_clustering_on_graph function
  • Replace adaptive resolution with greedy Leiden clustering
  • fix: handle duplicate alignment filenames by using parent directory for disambiguation
  • Fix: Remove unexpected 'use_header_cache' argument from check_core_gene_duplications calls
  • feat: Log command line arguments for reproducibility
  • test: add reproduction test for issue verification
  • feat: implement bin rescue strategy
  • chore: update .gitignore
  • test: Add tests for CLI default parameters
  • fix(tests): Resolve existing test suite failures
  • feat: Update coassembly defaults for learning rate and lambda
  • Remove standalone directory
  • Tune coassembly resolution
  • chore: remove AGENTS documentation
    Full Changelog: v0.3.4...v0.4.0

Full Changelog: v0.3.4...v0.4.0

v0.3.4

01 Dec 15:38

Choose a tag to compare

  • chore: place license-files under project section
  • chore: fix license files config
  • fix: make license field build-compatible
  • Prepare v0.3.4 release
  • Merge branch '2modelapproach-wip'
  • Lower coassembly auto barlow lambda to 0.005
  • Add filter-only CLI mode
  • Skip euk filter in single-cell mode
  • Set coassembly min contig length to 4096
  • Raise min contig length for coassemblies
  • Dial back single-cell k-NN default
  • Increase single-cell k-NN default
  • Fix single-cell resolution to 0.01
  • Lower single-cell resolution sweep to <=0.5
  • Cap single-cell resolution sweep at 1.0
  • Add mode presets and single-cell clustering behavior
  • Seed training for reproducibility
  • Adjust LR and auto barlow defaults
  • Raise miniprot alignment thresholds
  • Loosen completeness early-stop threshold to 50%
  • Drop resolution 3.0 from coassembly sweep
  • Prefer cluster count when duplications tie
  • Drop over-split resolutions and prioritize completeness in selection
  • Add completeness-driven early stop in resolution sweep
  • Refine resolution selection tie-breakers
  • Raise coassembly resolution floor to 0.6
  • Quiet cluster sizes to debug level
  • Adjust resolution sweep for coassemblies
  • Test higher Leiden resolution for coassemblies
  • Limit high-resolution testing to coassemblies
  • Filter singleton bins from clustering summary log
  • Add auto default for Barlow lambda based on sample count
  • Add CLI knob for Barlow Twins lambda
  • Refine bin splitting with Leiden and update defaults
  • Switch refinement to k-means
  • Seed training setup deterministically
  • Simplify duplication checks
  • Seed HyenaDNA predictor randomness
  • Refactor feature calculations
  • Write bins after filtering
  • Vectorize clustering utilities
  • Update CLI defaults
  • Fix adaptive resolution grid
  • Prune unused helpers in utils
  • Vectorize cluster contig mapping
  • Clean HyenaDNA model comments
  • Update plot features import
  • Remove AGENTS.md
  • feat: change default refinement threshold to conservative (min_dup=2)
  • wip: save current 2-model approach experiments
  • refactor: show training progress bar only in verbose mode
  • fix: revert test_resolutions to v0.3.0 values (3 resolutions)
  • revert: restore v0.3.0 fusion layer architecture
  • revert: restore v0.3.0 adaptive refinement strategy
  • fix: restore v0.3.0 coverage encoder layer sizes
  • refactor: revert to duplication minimization for resolution selection
  • feat: auto-adjust batch size when dataset is smaller
  • feat: add LRU cache for k-mer mapping and save-bins-before-refinement option
  • feat: improve reproducibility with seeded random number generation
  • fix: add missing losses.py module
  • perf: reuse cached feature tensor in SequenceDataset
  • refactor: remove redundant eukaryotic classification logging from clustering
  • refactor: remove obsolete reclustering check log message
  • refactor: remove unused _leiden_clustering import
  • fix: use hashlib for deterministic fragment generation
  • refactor: remove unused extract_base_contig_name import
  • refactor: remove unused tqdm import from output module
  • refactor: replace transformers tokenizer with standalone implementation
  • refactor: remove transformers dependency from core requirements
  • feat: add version number to params.json output
  • fix: correct BUSCO gene family parsing and lower quality thresholds
  • perf: add deterministic fragments and k-NN graph disk caching
  • fix: inline gene count extraction to avoid missing import
  • fix: restore batch size default to 2048
  • fix: enable refinement to work without -k flag
  • perf: cache k-NN graph in adaptive resolution + log cleanup
  • fix: remove logic that assigns unembedded contigs to largest cluster
  • perf: add graph caching helper for refinement
  • refactor: count only single-copy genes for completeness metric
  • refactor: round resolution value in Leiden log message
  • refactor: remove redundant resolution log message
  • fix: add --save-filtered-contigs to Filtering & Processing section
  • refactor: simplify CLI by removing parameter range restrictions
  • chore: remove unused dependencies (psutil, biopython, joblib)
  • refactor: remove 3 redundant refinement log messages
  • refactor: merge graph info messages into one
  • refactor: round resolution values in Leiden clustering message
  • refactor: remove 'Adaptive resolution determination' message
  • refactor: remove duplicate message and increase batch size
  • perf: optimize miniprot execution with caching
  • refactor: remove redundant filtered FASTA message
  • refactor: clean up output logging
  • refactor: move fragment augmentation message to debug
  • refactor: simplify miniprot logging message
  • refactor: clean up BAM processing progress display
  • refactor: improve BAM processing logging and progress display
  • feat: improve logging and add save-filtered-contigs option
  • feat: update HyenaDNA classifier to use improved model
  • refactor: improve CLI UX with positional args and optional output
  • fix: add run_exports to bioconda recipe template
  • fix: update bioconda recipe generation to fix test failures
  • fix: resolve deprecation warnings in workflows and package config
  • refactor: move calculate_raw_coverage.py to scripts directory
  • refactor: centralize miniprot thresholds and adjust to 0.30/0.50
  • chore: remove unused code from refinement and utils modules
  • feat: auto-adjust batch size when dataset is smaller
  • chore: add large data files to .gitignore
  • docs: prepare release 0.3.0
  • refactor: clean up unused adaptive strategy parameters
  • refactor: reduce console output verbosity
  • docs: add repository guidelines for development workflow
  • feat: add standalone HyenaDNA predictor
  • feat: add Barlow Twins training diagnostics
  • feat: add adaptive resolution and performance optimizations
  • Fix xgboost import and classification column names
  • Replace XGBoost classifier with HyenaDNA LLM-based model
  • Prepare 0.2.5 release
  • chore: drop unused setup_logging import
  • fix: retain best checkpoint state
  • fix: guard zero-read coverage normalization
  • chore: require python 3.9+
  • Always write embeddings.csv regardless of --keep-intermediate flag
  • Update fusion layer architecture
  • Release v0.2.4
  • Merge branch 'main' of https://github.com/danielzmbp/remag
  • refinement: skip bins without duplication data; remove '(conservative approach)' from warnings
  • Merge branch 'main' of https://github.com/danielzmbp/remag
  • feat: implement conservative refinement strategy to preserve completeness
  • Merge branch 'main' of https://github.com/danielzmbp/remag
  • Fix critical edge cases and test infrastructure issues
  • Merge branch 'main' of https://github.com/danielzmbp/remag
  • Eliminate code duplication in DataFrame column initialization
  • Fix security vulnerability and optimize performance
    Full Changelog: v0.3.3...v0.3.4

Full Changelog: v0.3.3...v0.3.4

v0.3.3

04 Nov 13:06

Choose a tag to compare

v0.3.2

30 Oct 22:47

Choose a tag to compare

  • chore: bump version to 0.3.2
  • fix: suppress pandas FutureWarning for fillna downcasting
  • refactor: remove obsolete reclustering check log message
  • refactor: quality-aware resolution selection and improved refinement
  • fix: use hashlib for deterministic fragment generation
  • tune: adjust learning rate, coverage threshold, and refinement rounds
  • refactor: remove unused tqdm import from output module
  • refactor: replace transformers tokenizer with standalone implementation
  • refactor: remove transformers dependency from core requirements
  • feat: add version number to params.json output
  • fix: correct BUSCO gene family parsing and lower quality thresholds
  • perf: add deterministic fragments, k-NN caching, and simplified logging
  • refactor: improve resolution testing and fix refinement without -k
  • perf: cache k-NN graph in adaptive resolution + improve metrics
  • fix: remove logic that assigns unembedded contigs to largest cluster
  • perf: cache k-NN graph during refinement + extend resolution range + find most conservative solution
  • fix: remove minimum resolution threshold in refinement
  • refactor: simplify refinement with fixed resolution steps
  • refactor: fix k-neighbors and threshold during refinement
  • refactor: count only single-copy genes for completeness metric
  • refactor: round resolution value in Leiden log message
  • refactor: fix clustering parameters during resolution testing
  • refactor: widen auto-resolution testing range
  • refactor: increase refinement resolution steps for faster convergence
  • chore: increase default minimum bin size to 300kb
  • chore: change default base learning rate to 0.001
  • refactor: remove redundant resolution log message
  • refactor: improve auto-resolution metrics and refinement
  • fix: add --save-filtered-contigs to Filtering & Processing section
  • refactor: simplify CLI by removing parameter range restrictions
  • refactor: remove experimental SCG loss functionality
  • chore: remove unused dependencies (psutil, biopython, joblib)
  • refactor: remove 3 redundant refinement log messages
  • refactor: merge graph info messages into one
  • refactor: round resolution values in Leiden clustering message
  • refactor: remove 'Adaptive resolution determination' message
  • refactor: remove duplicate message and increase batch size
  • perf: optimize miniprot execution with caching
  • refactor: move SCG feature matrix message to debug
  • refactor: remove redundant filtered FASTA message
  • refactor: clean up output logging
  • refactor: move fragment augmentation message to debug
  • refactor: move SCG gene mappings message to debug
  • refactor: simplify miniprot logging message
  • refactor: clean up BAM processing progress display
  • refactor: improve BAM processing logging and progress display
  • feat: improve logging and add save-filtered-contigs option
  • feat: update HyenaDNA classifier to use improved model
  • refactor: improve CLI UX with positional args and optional output
  • refactor: move calculate_raw_coverage.py to scripts directory
  • feat: add SCG-aware contrastive learning with consolidated miniprot execution
  • fix: add run_exports to bioconda recipe template
    Full Changelog: v0.3.1...v0.3.2

Full Changelog: v0.3.1...v0.3.2

v0.3.1

21 Oct 09:40

Choose a tag to compare

  • chore: bump version to 0.3.1
  • fix: update bioconda recipe generation to fix test failures
  • fix: resolve deprecation warnings in workflows and package config
    Full Changelog: v0.3.0...v0.3.1

Full Changelog: v0.3.0...v0.3.1

v0.3.0

21 Oct 09:04

Choose a tag to compare

v0.2.5

17 Oct 11:42

Choose a tag to compare

  • Prepare 0.2.5 release
  • chore: drop unused setup_logging import
  • fix: retain best checkpoint state
  • fix: guard zero-read coverage normalization
  • chore: require python 3.9+
  • Always write embeddings.csv regardless of --keep-intermediate flag
  • Update fusion layer architecture
    Full Changelog: v0.2.4...v0.2.5

Full Changelog: v0.2.4...v0.2.5

v0.2.4

14 Sep 16:14

Choose a tag to compare

Full Changelog: v0.2.3...v0.2.4

v0.2.3

20 Aug 09:45

Choose a tag to compare

  • Prepare v0.2.3 release - Bug fixes and dependency updates
  • Fix undefined variable error in clustering and clean up imports
  • Update recipe dependencies for bioconda
  • Update README.md for v0.2.2 - Remove obsolete k-means references and update version examples
    Full Changelog: v0.2.2...v0.2.3

Full Changelog: v0.2.2...v0.2.3

v0.2.2

14 Aug 09:33

Choose a tag to compare

  • Update CHANGELOG.md for v0.2.2 release
  • Prepare v0.2.2 release - Final cleanup and version bump
  • Fix undefined variable error in clustering
  • Refactor codebase for improved maintainability and reduced complexity
  • clean up redundant code and comments
  • update dependencies for bioconda
  • remove small euk db
  • add args parameter to _construct_knn_graph function
  • add --outs=0.95 parameter to miniprot command
  • remove unnecessary noise handling in leiden clustering
  • limit bins.csv to contig and cluster columns
  • fix: remove duplicate v prefix in Zenodo title
    Full Changelog: v0.2.1...v0.2.2

Full Changelog: v0.2.1...v0.2.2