Skip to content

fix: implement pagination in scan command to retrieve all records#279

Open
farzadmf wants to merge 2 commits intoawslabs:mainfrom
farzadmf:main
Open

fix: implement pagination in scan command to retrieve all records#279
farzadmf wants to merge 2 commits intoawslabs:mainfrom
farzadmf:main

Conversation

@farzadmf
Copy link

@farzadmf farzadmf commented Oct 18, 2025

Description of changes:

  • Add pagination loop to scan function to handle DynamoDB's 1MB response limit
  • Collect items across multiple scan operations until requested limit is reached
  • Maintain backward compatibility with existing output formats
  • Bump version to 0.3.1Issue #, if available:

Closes #278

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

- Add pagination loop to scan function to handle DynamoDB's 1MB response limit
- Collect items across multiple scan operations until requested limit is reached
- Maintain backward compatibility with existing output formats
- Bump version to 0.3.1
@farzadmf farzadmf requested a review from a team as a code owner October 18, 2025 20:14
@StoneDot
Copy link
Contributor

Thank you for the PR! I appreciate you taking the time to address this issue.

While I agree that there are valid use cases for paginating through scan results, I have some concerns about the current approach:

  1. The --limit option should remain a limit: The --limit parameter is meant to be a constraint, and I don't think we should derive pagination behavior from it implicitly.
  2. Filter expressions could cause issues: While dy scan doesn't currently support filter expressions, if we add them in the future, this approach could lead to confusing situations where users get very few or no results despite there being more items to scan.

Instead, I'm thinking a more explicit design might work better:

  • Add a dedicated --paginate (or similar) flag that explicitly enables pagination
  • When there are still items to retrieve and --paginate is not specified, the command could display a message suggesting the user to use --paginate to retrieve all remaining items

That said, the dy import command can already retrieve all items from a table if that fits the use case.

What do you think about this approach?

@farzadmf
Copy link
Author

Thank you @StoneDot for the reply; I honestly didn't think much before opening the PR as I did a quick and dirty "hack" to match scan with, probably, my own intuition that it should return everything

The --paginate approach is a good suggestion; however, my personal thought is that if we put it on the user to:

  • Specify --paginate
  • See what dy scan returns
  • Decide whether or not they need/want to call again with/without --paginate

it will add a bit of a "friction" to dy usage (by the simplicity of dy, I think it tries to be the opposite and remove friction)

That being said, in my mind at least, it's like "I want this number of results (in other words, I want to limit my results to that number), so I think it's somehow an "internal" thing that we need to go through pages to return that number of results.

So, maybe there would be command descriptions etc. explaining something like "if you specify a high limit, you may use extra capacity units, need to pay more, so make sure you know what you're doing", but, still, I think limit should act as it suggests: the limit/number of records

@StoneDot
Copy link
Contributor

I've added a comment in #278 explaining my thinking about the separation between dy scan and dy export.

After reading that, has your perspective on the --paginate approach changed at all? I'd like to hear your updated thoughts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

dy scan doesn't return all the rows from the table

2 participants