Skip to content

Conversation

@geruh
Copy link
Contributor

@geruh geruh commented Jan 25, 2026

closes #2945

Rationale for this change

This PR removes the SortedContainers dependency. Looking at the behavior of sorted containers we can simplify the logic for merging manigests and collecting the results while maintaining identical behavior.

What the logic today was doing:

  1. Submit all manifest merge tasks to thread pool (pallelism starts with executor)
  2. Collect futures as they complete using as_completed() which is out of order
  3. Store completed futures in a SortedList to maintain order by submission
  4. Extract all results from the sorted futures
  5. Flatten and return

What we do now:

  1. Submit all manifest merge tasks to thread pool (pallelism starts with executor)
  2. Iterate through futures in submission order, calling .result() on each
  3. Flatten and return

This shows we must collect the results before the next step. So we can iterate futures directly and call .result() in order. This blocks the main thread until each future completes, but doesn't block worker threads and they all continue running in parallel.

Are these changes tested?

All existing tests pass.

Are there any user-facing changes?

No

completed_futures.add(future)

bin_results: list[list[ManifestFile]] = [f.result() for f in completed_futures if f.result()]
bin_results: list[list[ManifestFile]] = [r for f in futures if (r := f.result())]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use a walrus operator to avoid calling .result() twice per future

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice!

Copy link
Contributor

@kevinjqliu kevinjqliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

completed_futures.add(future)

bin_results: list[list[ManifestFile]] = [f.result() for f in completed_futures if f.result()]
bin_results: list[list[ManifestFile]] = [r for f in futures if (r := f.result())]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice!

@kevinjqliu kevinjqliu merged commit a26fca7 into apache:main Jan 25, 2026
11 checks passed
@geruh geruh deleted the sorted branch January 25, 2026 18:19
@Fokko
Copy link
Contributor

Fokko commented Jan 25, 2026

Sweet!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

remove sortedcontainers

3 participants