Releases: danielzmbp/remag
Releases · danielzmbp/remag
v0.4.0
- release: bump version to 0.4.0
- style: apply formatting-only updates across modules and tests
- remove automated bioconda PR, generate recipe as artifact only; fix recipe (Python >=3.9, add scipy)
- clean up duplicate imports, stale comments, and misplaced import re
- remove unused imports and stale comments in hyenadna files
- remove dead variables and unused import in clustering.py
- fix: --filter-only flag was silently ignored (missing from args Namespace)
- remove dead _leiden_clustering_on_graph() function
- remove dead PathManager class from utils.py
- docs: update README to match current pipeline and fix cli OPTION_GROUPS ghost reference
- remove dead greedy_min_score parameter (F1 scores are always >= 0)
- Revert "fix: tighten greedy clustering contamination cap to 5%"
- fix: tighten greedy clustering contamination cap to 5%
- fix: block rescue merges above 10% duplication
- feat: expose rescue duplication limits in CLI and enforce max total duplication
- Revert "refactor: update encoder architecture to be dynamic and balanced"
- refactor: update encoder architecture to be dynamic and balanced
- fix: rate limit unconditional fusion debug logs
- fix: rate limit debug logging for gating weights
- fix: switch debug prints to logger.debug
- feat: add debug prints for gating weights
- Revert to state after 'feat: add short-reads/sr mode' (77f1ea5)
- Update rescue threshold: 0.9 for single sample, 0.7 for coassemblies
- Add 0.05 to default Leiden resolutions for larger clusters
- Fix indentation bug in rescue loop causing duplicate merges
- Update rescue strategy: use global SCG count for contamination and enforce 10% ceiling
- feat: add short-reads/sr mode to enforce 1000bp min contig length
- feat: add contamination filter to greedy clustering
- tune: implement F1-score based bin quality metric inspired by SemiBin2
- tune: add 0.1 to greedy clustering resolutions
- tune: score bins by total core genes (unique families) - 5*dups
- tune: set miniprot query coverage 0.6 identity 0.4, revert global -N/-p
- fix: use query coverage instead of target coverage for miniprot filtering
- Revert "Increase contamination penalty from 5 to 7 in bin quality scoring"
- Revert "Remove Singleton Rescue step"
- Revert "Increase contamination penalty from 7 to 10"
- Revert "Add 0.1 to default greedy clustering resolutions"
- Revert "tune: tighten miniprot thresholds"
- Revert "chore: remove agents instructions"
- Revert "tune: lower contamination penalty"
- Revert "tune: score bins by total core genes"
- Revert "tune: relax miniprot filters"
- Revert "tune: restore miniprot outs"
- tune: restore miniprot outs
- tune: relax miniprot filters
- tune: score bins by total core genes
- tune: lower contamination penalty
- chore: remove agents instructions
- tune: tighten miniprot thresholds
- Add 0.1 to default greedy clustering resolutions
- Increase contamination penalty from 7 to 10
- Remove Singleton Rescue step
- Increase contamination penalty from 5 to 7 in bin quality scoring
- Add debug logging for singleton rescue
- Restore exact logging format
- Revert greedy clustering parallelization
- Replace joblib with concurrent.futures
- Parallelize greedy clustering resolution search
- Reduce rescue duplication tolerance to 3%
- Implement Singleton Rescue step
- Remove refinement step and related CLI options
- Restore missing _leiden_clustering_on_graph function
- Replace adaptive resolution with greedy Leiden clustering
- fix: handle duplicate alignment filenames by using parent directory for disambiguation
- Fix: Remove unexpected 'use_header_cache' argument from check_core_gene_duplications calls
- feat: Log command line arguments for reproducibility
- test: add reproduction test for issue verification
- feat: implement bin rescue strategy
- chore: update .gitignore
- test: Add tests for CLI default parameters
- fix(tests): Resolve existing test suite failures
- feat: Update coassembly defaults for learning rate and lambda
- Remove standalone directory
- Tune coassembly resolution
- chore: remove AGENTS documentation
Full Changelog: v0.3.4...v0.4.0
Full Changelog: v0.3.4...v0.4.0
v0.3.4
- chore: place license-files under project section
- chore: fix license files config
- fix: make license field build-compatible
- Prepare v0.3.4 release
- Merge branch '2modelapproach-wip'
- Lower coassembly auto barlow lambda to 0.005
- Add filter-only CLI mode
- Skip euk filter in single-cell mode
- Set coassembly min contig length to 4096
- Raise min contig length for coassemblies
- Dial back single-cell k-NN default
- Increase single-cell k-NN default
- Fix single-cell resolution to 0.01
- Lower single-cell resolution sweep to <=0.5
- Cap single-cell resolution sweep at 1.0
- Add mode presets and single-cell clustering behavior
- Seed training for reproducibility
- Adjust LR and auto barlow defaults
- Raise miniprot alignment thresholds
- Loosen completeness early-stop threshold to 50%
- Drop resolution 3.0 from coassembly sweep
- Prefer cluster count when duplications tie
- Drop over-split resolutions and prioritize completeness in selection
- Add completeness-driven early stop in resolution sweep
- Refine resolution selection tie-breakers
- Raise coassembly resolution floor to 0.6
- Quiet cluster sizes to debug level
- Adjust resolution sweep for coassemblies
- Test higher Leiden resolution for coassemblies
- Limit high-resolution testing to coassemblies
- Filter singleton bins from clustering summary log
- Add auto default for Barlow lambda based on sample count
- Add CLI knob for Barlow Twins lambda
- Refine bin splitting with Leiden and update defaults
- Switch refinement to k-means
- Seed training setup deterministically
- Simplify duplication checks
- Seed HyenaDNA predictor randomness
- Refactor feature calculations
- Write bins after filtering
- Vectorize clustering utilities
- Update CLI defaults
- Fix adaptive resolution grid
- Prune unused helpers in utils
- Vectorize cluster contig mapping
- Clean HyenaDNA model comments
- Update plot features import
- Remove AGENTS.md
- feat: change default refinement threshold to conservative (min_dup=2)
- wip: save current 2-model approach experiments
- refactor: show training progress bar only in verbose mode
- fix: revert test_resolutions to v0.3.0 values (3 resolutions)
- revert: restore v0.3.0 fusion layer architecture
- revert: restore v0.3.0 adaptive refinement strategy
- fix: restore v0.3.0 coverage encoder layer sizes
- refactor: revert to duplication minimization for resolution selection
- feat: auto-adjust batch size when dataset is smaller
- feat: add LRU cache for k-mer mapping and save-bins-before-refinement option
- feat: improve reproducibility with seeded random number generation
- fix: add missing losses.py module
- perf: reuse cached feature tensor in SequenceDataset
- refactor: remove redundant eukaryotic classification logging from clustering
- refactor: remove obsolete reclustering check log message
- refactor: remove unused _leiden_clustering import
- fix: use hashlib for deterministic fragment generation
- refactor: remove unused extract_base_contig_name import
- refactor: remove unused tqdm import from output module
- refactor: replace transformers tokenizer with standalone implementation
- refactor: remove transformers dependency from core requirements
- feat: add version number to params.json output
- fix: correct BUSCO gene family parsing and lower quality thresholds
- perf: add deterministic fragments and k-NN graph disk caching
- fix: inline gene count extraction to avoid missing import
- fix: restore batch size default to 2048
- fix: enable refinement to work without -k flag
- perf: cache k-NN graph in adaptive resolution + log cleanup
- fix: remove logic that assigns unembedded contigs to largest cluster
- perf: add graph caching helper for refinement
- refactor: count only single-copy genes for completeness metric
- refactor: round resolution value in Leiden log message
- refactor: remove redundant resolution log message
- fix: add --save-filtered-contigs to Filtering & Processing section
- refactor: simplify CLI by removing parameter range restrictions
- chore: remove unused dependencies (psutil, biopython, joblib)
- refactor: remove 3 redundant refinement log messages
- refactor: merge graph info messages into one
- refactor: round resolution values in Leiden clustering message
- refactor: remove 'Adaptive resolution determination' message
- refactor: remove duplicate message and increase batch size
- perf: optimize miniprot execution with caching
- refactor: remove redundant filtered FASTA message
- refactor: clean up output logging
- refactor: move fragment augmentation message to debug
- refactor: simplify miniprot logging message
- refactor: clean up BAM processing progress display
- refactor: improve BAM processing logging and progress display
- feat: improve logging and add save-filtered-contigs option
- feat: update HyenaDNA classifier to use improved model
- refactor: improve CLI UX with positional args and optional output
- fix: add run_exports to bioconda recipe template
- fix: update bioconda recipe generation to fix test failures
- fix: resolve deprecation warnings in workflows and package config
- refactor: move calculate_raw_coverage.py to scripts directory
- refactor: centralize miniprot thresholds and adjust to 0.30/0.50
- chore: remove unused code from refinement and utils modules
- feat: auto-adjust batch size when dataset is smaller
- chore: add large data files to .gitignore
- docs: prepare release 0.3.0
- refactor: clean up unused adaptive strategy parameters
- refactor: reduce console output verbosity
- docs: add repository guidelines for development workflow
- feat: add standalone HyenaDNA predictor
- feat: add Barlow Twins training diagnostics
- feat: add adaptive resolution and performance optimizations
- Fix xgboost import and classification column names
- Replace XGBoost classifier with HyenaDNA LLM-based model
- Prepare 0.2.5 release
- chore: drop unused setup_logging import
- fix: retain best checkpoint state
- fix: guard zero-read coverage normalization
- chore: require python 3.9+
- Always write embeddings.csv regardless of --keep-intermediate flag
- Update fusion layer architecture
- Release v0.2.4
- Merge branch 'main' of https://github.com/danielzmbp/remag
- refinement: skip bins without duplication data; remove '(conservative approach)' from warnings
- Merge branch 'main' of https://github.com/danielzmbp/remag
- feat: implement conservative refinement strategy to preserve completeness
- Merge branch 'main' of https://github.com/danielzmbp/remag
- Fix critical edge cases and test infrastructure issues
- Merge branch 'main' of https://github.com/danielzmbp/remag
- Eliminate code duplication in DataFrame column initialization
- Fix security vulnerability and optimize performance
Full Changelog: v0.3.3...v0.3.4
Full Changelog: v0.3.3...v0.3.4
v0.3.3
Full Changelog: ...v0.3.3
Full Changelog: https://github.com/danielzmbp/remag/commits/v0.3.3
v0.3.2
- chore: bump version to 0.3.2
- fix: suppress pandas FutureWarning for fillna downcasting
- refactor: remove obsolete reclustering check log message
- refactor: quality-aware resolution selection and improved refinement
- fix: use hashlib for deterministic fragment generation
- tune: adjust learning rate, coverage threshold, and refinement rounds
- refactor: remove unused tqdm import from output module
- refactor: replace transformers tokenizer with standalone implementation
- refactor: remove transformers dependency from core requirements
- feat: add version number to params.json output
- fix: correct BUSCO gene family parsing and lower quality thresholds
- perf: add deterministic fragments, k-NN caching, and simplified logging
- refactor: improve resolution testing and fix refinement without -k
- perf: cache k-NN graph in adaptive resolution + improve metrics
- fix: remove logic that assigns unembedded contigs to largest cluster
- perf: cache k-NN graph during refinement + extend resolution range + find most conservative solution
- fix: remove minimum resolution threshold in refinement
- refactor: simplify refinement with fixed resolution steps
- refactor: fix k-neighbors and threshold during refinement
- refactor: count only single-copy genes for completeness metric
- refactor: round resolution value in Leiden log message
- refactor: fix clustering parameters during resolution testing
- refactor: widen auto-resolution testing range
- refactor: increase refinement resolution steps for faster convergence
- chore: increase default minimum bin size to 300kb
- chore: change default base learning rate to 0.001
- refactor: remove redundant resolution log message
- refactor: improve auto-resolution metrics and refinement
- fix: add --save-filtered-contigs to Filtering & Processing section
- refactor: simplify CLI by removing parameter range restrictions
- refactor: remove experimental SCG loss functionality
- chore: remove unused dependencies (psutil, biopython, joblib)
- refactor: remove 3 redundant refinement log messages
- refactor: merge graph info messages into one
- refactor: round resolution values in Leiden clustering message
- refactor: remove 'Adaptive resolution determination' message
- refactor: remove duplicate message and increase batch size
- perf: optimize miniprot execution with caching
- refactor: move SCG feature matrix message to debug
- refactor: remove redundant filtered FASTA message
- refactor: clean up output logging
- refactor: move fragment augmentation message to debug
- refactor: move SCG gene mappings message to debug
- refactor: simplify miniprot logging message
- refactor: clean up BAM processing progress display
- refactor: improve BAM processing logging and progress display
- feat: improve logging and add save-filtered-contigs option
- feat: update HyenaDNA classifier to use improved model
- refactor: improve CLI UX with positional args and optional output
- refactor: move calculate_raw_coverage.py to scripts directory
- feat: add SCG-aware contrastive learning with consolidated miniprot execution
- fix: add run_exports to bioconda recipe template
Full Changelog: v0.3.1...v0.3.2
Full Changelog: v0.3.1...v0.3.2
v0.3.1
- chore: bump version to 0.3.1
- fix: update bioconda recipe generation to fix test failures
- fix: resolve deprecation warnings in workflows and package config
Full Changelog: v0.3.0...v0.3.1
Full Changelog: v0.3.0...v0.3.1
v0.3.0
Full Changelog: ...v0.3.0
Full Changelog: https://github.com/danielzmbp/remag/commits/v0.3.0
v0.2.5
- Prepare 0.2.5 release
- chore: drop unused setup_logging import
- fix: retain best checkpoint state
- fix: guard zero-read coverage normalization
- chore: require python 3.9+
- Always write embeddings.csv regardless of --keep-intermediate flag
- Update fusion layer architecture
Full Changelog: v0.2.4...v0.2.5
Full Changelog: v0.2.4...v0.2.5
v0.2.4
- Release v0.2.4
- Merge branch 'main' of https://github.com/danielzmbp/remag
- refinement: skip bins without duplication data; remove '(conservative approach)' from warnings
- Merge branch 'main' of https://github.com/danielzmbp/remag
- feat: implement conservative refinement strategy to preserve completeness
- Merge branch 'main' of https://github.com/danielzmbp/remag
- Fix critical edge cases and test infrastructure issues
- Merge branch 'main' of https://github.com/danielzmbp/remag
- Eliminate code duplication in DataFrame column initialization
- Fix security vulnerability and optimize performance
- remove comment
- README: remove explicit conda install of miniprot; clarify it’s an automatic dependency
Full Changelog: v0.2.3...v0.2.4
Full Changelog: v0.2.3...v0.2.4
v0.2.3
- Prepare v0.2.3 release - Bug fixes and dependency updates
- Fix undefined variable error in clustering and clean up imports
- Update recipe dependencies for bioconda
- Update README.md for v0.2.2 - Remove obsolete k-means references and update version examples
Full Changelog: v0.2.2...v0.2.3
Full Changelog: v0.2.2...v0.2.3
v0.2.2
- Update CHANGELOG.md for v0.2.2 release
- Prepare v0.2.2 release - Final cleanup and version bump
- Fix undefined variable error in clustering
- Refactor codebase for improved maintainability and reduced complexity
- clean up redundant code and comments
- update dependencies for bioconda
- remove small euk db
- add args parameter to _construct_knn_graph function
- add --outs=0.95 parameter to miniprot command
- remove unnecessary noise handling in leiden clustering
- limit bins.csv to contig and cluster columns
- fix: remove duplicate v prefix in Zenodo title
Full Changelog: v0.2.1...v0.2.2
Full Changelog: v0.2.1...v0.2.2