Skip to content

Conversation

@rlratzel
Copy link
Contributor

Adds image curation benchmark to nightly run. This uses the image curation "getting started" tutorial.

rlratzel and others added 14 commits December 12, 2025 21:46
…images with :latest by default, adds session name to slack report.

Signed-off-by: rlratzel <rratzel@nvidia.com>
Signed-off-by: rlratzel <rratzel@nvidia.com>
Signed-off-by: rlratzel <rratzel@nvidia.com>
…a_updates

Signed-off-by: rlratzel <rratzel@nvidia.com>
…atzel/curator into 2602_benchmark_infra_updates

Signed-off-by: rlratzel <rratzel@nvidia.com>
Signed-off-by: rlratzel <rratzel@nvidia.com>
…g script to allow for more flexibility.

Signed-off-by: rlratzel <rratzel@nvidia.com>
…n-readable output is needed, updates paths to benchmark output dir.

Signed-off-by: rlratzel <rratzel@nvidia.com>
…sults

Signed-off-by: rlratzel <rratzel@nvidia.com>
Signed-off-by: rlratzel <rratzel@nvidia.com>
@copy-pr-bot
Copy link

copy-pr-bot bot commented Dec 21, 2025

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

Signed-off-by: rlratzel <rratzel@nvidia.com>
…laceholders were silently ignored, comment cleanup.

Signed-off-by: rlratzel <rratzel@nvidia.com>
Signed-off-by: rlratzel <rratzel@nvidia.com>
@rlratzel rlratzel marked this pull request as ready for review January 7, 2026 17:44
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Jan 7, 2026

Greptile Summary

Added image curation benchmark to nightly runs using the getting-started tutorial script, along with required MSCOCO datasets configuration.

Key Changes:

  • Added two new dataset entries (mscoco and mscoco_model_weights) to support image curation benchmarking
  • Added image_curation benchmark entry with comprehensive configuration for image processing pipeline
  • Refactored placeholder substitution logic in entry.py to support the new {curator_repo_dir} reserved placeholder, enabling benchmarks to reference scripts outside the benchmarking/scripts directory
  • Improved separation of concerns by splitting placeholder substitution into substitute_reserved_placeholders() (for {session_entry_dir}, {curator_repo_dir}, {dataset:...}) and substitute_container_or_host_paths() (for {datasets_path}, {results_path})
  • Enhanced validation by adding allow_other_placeholders parameter to catch configuration errors early

Implementation Quality:

  • The refactoring properly handles the two-stage placeholder resolution: reserved placeholders are resolved first (which may introduce path placeholders), then path placeholders are resolved
  • The script path resolution correctly handles absolute paths returned from {curator_repo_dir} substitution using Python's Path / operator behavior
  • Error handling is improved with stricter validation when allow_other_placeholders=False

Confidence Score: 5/5

  • This PR is safe to merge with minimal risk
  • The changes are well-structured and logically sound. The refactoring improves code organization by separating placeholder substitution concerns, and the new {curator_repo_dir} placeholder enables a legitimate use case (referencing tutorial scripts). The two-stage substitution correctly handles nested placeholders. No logical errors, security vulnerabilities, or breaking changes were identified.
  • No files require special attention

Important Files Changed

Filename Overview
benchmarking/nightly-benchmark.yaml Added mscoco datasets and image_curation benchmark entry with proper configuration
benchmarking/runner/entry.py Refactored placeholder substitution logic, added curator_repo_dir support, improved code organization
benchmarking/runner/session.py Minor comment clarification, no functional changes

Sequence Diagram

sequenceDiagram
    participant Config as nightly-benchmark.yaml
    participant Session as Session.create_from_dict()
    participant Entry as Entry.get_command_to_run()
    participant Reserved as substitute_reserved_placeholders()
    participant Paths as substitute_container_or_host_paths()
    participant DatasetResolver
    participant PathResolver

    Config->>Session: Load config with image_curation entry
    Session->>Entry: Create Entry with script="{curator_repo_dir}/..."
    
    Note over Entry: Process script field
    Entry->>Reserved: substitute_reserved_placeholders(script)
    Reserved->>Reserved: Replace {curator_repo_dir} with absolute path
    Reserved-->>Entry: /github/nvidia-nemo/curator/tutorials/.../script.py
    
    Entry->>Paths: substitute_container_or_host_paths(script)
    Paths-->>Entry: script unchanged (no path placeholders)
    
    Entry->>Entry: script_path = base_path / script<br/>(returns script since absolute)
    
    Note over Entry: Construct full command
    Entry->>Entry: cmd = f"python {script_path} {args}"
    
    Note over Entry: Process full command
    Entry->>Reserved: substitute_reserved_placeholders(cmd)
    Reserved->>DatasetResolver: resolve("mscoco", "wds")
    DatasetResolver-->>Reserved: "{datasets_path}/mscoco/wds/..."
    Reserved->>Reserved: Replace {session_entry_dir}
    Reserved-->>Entry: cmd with {datasets_path} still present
    
    Entry->>Paths: substitute_container_or_host_paths(cmd)
    Paths->>PathResolver: resolve("datasets_path")
    PathResolver-->>Paths: /path/to/datasets
    Paths-->>Entry: Fully resolved command
    
    Entry-->>Session: Return executable command
Loading

Signed-off-by: rlratzel <rratzel@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants