Skip to content

Conversation

@wwwjn
Copy link
Contributor

@wwwjn wwwjn commented Jan 2, 2026

Stack from ghstack (oldest at bottom):

  1. Add job_config.py to extend current JobConfig. Now an issue is trainer's config and generator's config are not symmetric, eg Parallelism and Generation.parallelism
  2. Use job config system as the centralized / source-of-truth config, loading config from run_configs/qwen3_0.6b.toml file.
  3. Refactor the generator to use EngineArgs() and LLMEngine(), instead of LLM()
  4. Rename simple_rl_multiprocess -> simple_grpo to be more descriptive
  5. Clean up unused code branch

Test: (trainer ddp = 2, n_generator =1)
Screenshot 2025-12-30 at 7 34 00 PM

Following-up refactors:

  • Refactor2: vllm model register - using setup.py and plugin instead of import
  • Refactor3: Weight updater, by directly passing state_dict (DTensor) between trainer and generator
  • Refactor4: Use torchtitan Trainer, modularize each component

[ghstack-poisoned]
@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Jan 2, 2026
@wwwjn wwwjn closed this Jan 2, 2026
[ghstack-poisoned]
@wwwjn wwwjn reopened this Jan 2, 2026
@wwwjn wwwjn changed the title config sys v1 [rl] Using JobConfig as the centralized config system for inference and simple GRPO Jan 2, 2026
Copy link
Contributor

@allenwang28 allenwang28 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this direction, thanks! Mostly nits here

```
Right now we only support VLLM_COMPAT mode, which could achieve trainer and generator bitwise identical. We are working on support UNIFIED mode,
which uses a unified model definition for trainer and generator.
We uses a unified model definition for trainer and generator, which could achieve trainer and generator bitwise identical.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
We uses a unified model definition for trainer and generator, which could achieve trainer and generator bitwise identical.
We use a unified model definition for the trainer and generator, ensuring bitwise-identical models to address a class of subtle correctness bugs in RL for LLMs.

nit

math_reward_function if use_real_dataset else trivial_reward_function
)
# Reward function. TODO: Use a real reward function
self.reward_fn = trivial_reward_function
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the RL job definition would need to define a callable here, is this the idea for later?

sampling_params = SamplingParams(
temperature=temperature,
max_tokens=max_new_tokens,
n=n_samples_per_prompt,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

n_samples_per_prompt fulfills the same purpose as group_size I assume. I see below that we're preferring to submit a prompt multiple times instead of relying on vLLM. Is this due to batch invariance or something else? I'd assume that letting vLLM do it is better performance wise

type=int,
default=1,
help="Number of GPUs for tensor parallelism (default: 1 for single GPU)",
def infer():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def infer():
def generate():

nit, but infer() makes me think of like getting the logits

# Create process meshes
trainer_mesh = this_host().spawn_procs(per_host={"gpus": 2})
trainer_mesh = this_host().spawn_procs(
per_host={"gpus": trainer_ddp_size * trainer_tp_size}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the future we should deduce the total number of GPUs needed for a given trainer parallelism

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/8gpu CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants