Skip to content

[Warning] When assemble steps, detect the trajectory not accumulative #352

@kxfan2002

Description

@kxfan2002

Thanks for your great work.

I'm running the math_tool training example. I first encounter this:

File "envs/rllm2/lib/python3.11/asyncio/tasks.py", line 489, in wait_for
  return fut.result()
     ^^^^^^^^^^^^
 File "RL/rllm/rllm/engine/agent_execution_engine.py", line 394, in run_agent_trajectory_async
   prompt_tokens, response_tokens, response_masks, is_valid_trajectory = self.assemble_steps(episode_steps)                                                                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "RL/rllm/rllm/engine/agent_execution_engine.py", line 453, in assemble_steps
   current_completion_ids = step["completion_ids"]
                        ~~~~^^^^^^^^^^^^^^^^^^
 KeyError: 'completion_ids'

I changed line253 into "completion_ids": model_output.completion_ids, in rllm.engine.agent_execution_engine. I guess this is a little mistake that should be correct in the repo.

Then I see this warning during training:

(TaskRunner pid=32375) WARNING:2025-12-17 21:07:06,390:When assemble steps, detect the trajectory not accumulative at position 4688. Expected: [1699, 25310, 367, 284, 33122], Got: [59, 811, 446, 367, 284]. Setting response_masks to all 0s. This is likely due to retokenization.
(TaskRunner pid=32375) Trajectory 50 completed due to: ENV_DONE. Reward is 0.0. 
(TaskRunner pid=32375) 
(TaskRunner pid=32375) Number of Trajectories 199/256 completed
(TaskRunner pid=32375) Trajectory 53 completed due to: ENV_DONE. Reward is 0.0. 
(TaskRunner pid=32375) 
(TaskRunner pid=32375) Number of Trajectories 200/256 completed
(TaskRunner pid=32375) WARNING:2025-12-17 21:07:10,227:When assemble steps, detect the trajectory not accumulative at position 7240. Expected: [1699, 1830, 284, 220, 15], Got: [59, 1016, 629, 284, 220]. Setting response_masks to all 0s. This is likely due to retokenization.
(TaskRunner pid=32375) Trajectory 55 completed due to: ENV_DONE. Reward is 0.0. 
(TaskRunner pid=32375) 
(TaskRunner pid=32375) Number of Trajectories 201/256 completed
(TaskRunner pid=32375) Trajectory 49 completed due to: MAX_STEPS. Reward is 0. 
(TaskRunner pid=32375) 
(TaskRunner pid=32375) Number of Trajectories 202/256 completed
(TaskRunner pid=32375) WARNING:2025-12-17 21:07:16,377:When assemble steps, detect the trajectory not accumulative at position 6668. Expected: [1699, 5035, 284, 220, 15], Got: [59, 406, 2370, 284, 220]. Setting response_masks to all 0s. This is likely due to retokenization.

I wonder why this happens. Is this an implementation error or a tool calling caused issue?

When assemble steps, detect the trajectory not accumulative at position 4688. Expected: [1699, 25310, 367, 284, 33122], Got: [59, 811, 446, 367, 284]. Setting response_masks to all 0s. This is likely due to retokenization.

I would appreciate it if you could reply. Thanks again for your great contribution!

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions