[Warning] When assemble steps, detect the trajectory not accumulative

Thanks for your great work.

I'm running the math_tool training example. I first encounter this:
```
File "envs/rllm2/lib/python3.11/asyncio/tasks.py", line 489, in wait_for
  return fut.result()
     ^^^^^^^^^^^^
 File "RL/rllm/rllm/engine/agent_execution_engine.py", line 394, in run_agent_trajectory_async
   prompt_tokens, response_tokens, response_masks, is_valid_trajectory = self.assemble_steps(episode_steps)                                                                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "RL/rllm/rllm/engine/agent_execution_engine.py", line 453, in assemble_steps
   current_completion_ids = step["completion_ids"]
                        ~~~~^^^^^^^^^^^^^^^^^^
 KeyError: 'completion_ids'
```

I changed line253 into `"completion_ids": model_output.completion_ids,` in `rllm.engine.agent_execution_engine`. I guess this is a little mistake that should be correct in the repo.

Then I see this warning during training:

```
(TaskRunner pid=32375) WARNING:2025-12-17 21:07:06,390:When assemble steps, detect the trajectory not accumulative at position 4688. Expected: [1699, 25310, 367, 284, 33122], Got: [59, 811, 446, 367, 284]. Setting response_masks to all 0s. This is likely due to retokenization.
(TaskRunner pid=32375) Trajectory 50 completed due to: ENV_DONE. Reward is 0.0. 
(TaskRunner pid=32375) 
(TaskRunner pid=32375) Number of Trajectories 199/256 completed
(TaskRunner pid=32375) Trajectory 53 completed due to: ENV_DONE. Reward is 0.0. 
(TaskRunner pid=32375) 
(TaskRunner pid=32375) Number of Trajectories 200/256 completed
(TaskRunner pid=32375) WARNING:2025-12-17 21:07:10,227:When assemble steps, detect the trajectory not accumulative at position 7240. Expected: [1699, 1830, 284, 220, 15], Got: [59, 1016, 629, 284, 220]. Setting response_masks to all 0s. This is likely due to retokenization.
(TaskRunner pid=32375) Trajectory 55 completed due to: ENV_DONE. Reward is 0.0. 
(TaskRunner pid=32375) 
(TaskRunner pid=32375) Number of Trajectories 201/256 completed
(TaskRunner pid=32375) Trajectory 49 completed due to: MAX_STEPS. Reward is 0. 
(TaskRunner pid=32375) 
(TaskRunner pid=32375) Number of Trajectories 202/256 completed
(TaskRunner pid=32375) WARNING:2025-12-17 21:07:16,377:When assemble steps, detect the trajectory not accumulative at position 6668. Expected: [1699, 5035, 284, 220, 15], Got: [59, 406, 2370, 284, 220]. Setting response_masks to all 0s. This is likely due to retokenization.
```

I wonder why this happens. Is this an implementation error or a tool calling caused issue?

```
When assemble steps, detect the trajectory not accumulative at position 4688. Expected: [1699, 25310, 367, 284, 33122], Got: [59, 811, 446, 367, 284]. Setting response_masks to all 0s. This is likely due to retokenization.
```

I would appreciate it if you could reply. Thanks again for your great contribution!




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Warning] When assemble steps, detect the trajectory not accumulative #352

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Warning] When assemble steps, detect the trajectory not accumulative #352

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions