-
Notifications
You must be signed in to change notification settings - Fork 477
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Thanks for your great work.
I'm running the math_tool training example. I first encounter this:
File "envs/rllm2/lib/python3.11/asyncio/tasks.py", line 489, in wait_for
return fut.result()
^^^^^^^^^^^^
File "RL/rllm/rllm/engine/agent_execution_engine.py", line 394, in run_agent_trajectory_async
prompt_tokens, response_tokens, response_masks, is_valid_trajectory = self.assemble_steps(episode_steps) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "RL/rllm/rllm/engine/agent_execution_engine.py", line 453, in assemble_steps
current_completion_ids = step["completion_ids"]
~~~~^^^^^^^^^^^^^^^^^^
KeyError: 'completion_ids'
I changed line253 into "completion_ids": model_output.completion_ids, in rllm.engine.agent_execution_engine. I guess this is a little mistake that should be correct in the repo.
Then I see this warning during training:
(TaskRunner pid=32375) WARNING:2025-12-17 21:07:06,390:When assemble steps, detect the trajectory not accumulative at position 4688. Expected: [1699, 25310, 367, 284, 33122], Got: [59, 811, 446, 367, 284]. Setting response_masks to all 0s. This is likely due to retokenization.
(TaskRunner pid=32375) Trajectory 50 completed due to: ENV_DONE. Reward is 0.0.
(TaskRunner pid=32375)
(TaskRunner pid=32375) Number of Trajectories 199/256 completed
(TaskRunner pid=32375) Trajectory 53 completed due to: ENV_DONE. Reward is 0.0.
(TaskRunner pid=32375)
(TaskRunner pid=32375) Number of Trajectories 200/256 completed
(TaskRunner pid=32375) WARNING:2025-12-17 21:07:10,227:When assemble steps, detect the trajectory not accumulative at position 7240. Expected: [1699, 1830, 284, 220, 15], Got: [59, 1016, 629, 284, 220]. Setting response_masks to all 0s. This is likely due to retokenization.
(TaskRunner pid=32375) Trajectory 55 completed due to: ENV_DONE. Reward is 0.0.
(TaskRunner pid=32375)
(TaskRunner pid=32375) Number of Trajectories 201/256 completed
(TaskRunner pid=32375) Trajectory 49 completed due to: MAX_STEPS. Reward is 0.
(TaskRunner pid=32375)
(TaskRunner pid=32375) Number of Trajectories 202/256 completed
(TaskRunner pid=32375) WARNING:2025-12-17 21:07:16,377:When assemble steps, detect the trajectory not accumulative at position 6668. Expected: [1699, 5035, 284, 220, 15], Got: [59, 406, 2370, 284, 220]. Setting response_masks to all 0s. This is likely due to retokenization.
I wonder why this happens. Is this an implementation error or a tool calling caused issue?
When assemble steps, detect the trajectory not accumulative at position 4688. Expected: [1699, 25310, 367, 284, 33122], Got: [59, 811, 446, 367, 284]. Setting response_masks to all 0s. This is likely due to retokenization.
I would appreciate it if you could reply. Thanks again for your great contribution!
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working