Skip to content

QuickLifeSolutions/actor-dedup-datasets

 
 

Repository files navigation

Easy Data Processor: Merge, Clean, and Transform Your Data

This powerful and optimized actor is designed for efficient merging, cleaning, and transformation of large datasets.

Why Use This Actor

  • Speed: Experience blazing-fast data processing with parallelized workloads—up to 20x faster than standard methods.
  • Efficiency: Simultaneously read from multiple datasets, making it ideal for merging data after scraping.
  • Reliability: Actor migration proof with persisted steps, ensuring no repeated work or duplicated data.
  • Memory Management: 'Dedup as loading' mode allows for efficient memory usage, even with huge datasets (10M+ items).
  • Flexibility: Remove duplicates using multiple fields and nested objects/arrays with deep equality checks.
  • Storage Options: Store results in key-value store records.
  • Fast Blank Runs: Quickly identify duplicates without processing data.

Merging

Combine items from multiple datasets into a single dataset or key-value store output. In 'Dedup after load' mode, the order of items retains the order of the provided datasets.

Cleaning (Deduplication)

Specify fields for deduplication to remove duplicate items based on field values. Combine multiple fields for more precise deduplication. Deep comparison is used for objects and arrays.

Transformation

Perform custom data transformations before and after deduplication with preDedupTransformFunction and postDedupTransformFunction. These functions take an array of items and return a modified array.

Access helper variables and the Apify SDK reference within transformation functions. Customize transformations to suit your needs—whether filtering, adding, or modifying items.


Start optimizing your data processing workflow today with the Easy Data Processor and handle large datasets with ease!

Connect With Us

Support


Start enhancing your data processing today with QuickLifeSolutions' Easy Data Processor!

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • JavaScript 96.4%
  • Dockerfile 3.6%