fix: align generated dataset columns to model's feature_names #2223

majiayu000 · 2025-12-29T11:12:00Z

Summary

Add column name alignment in LLM generators to ensure generated datasets match the model's expected feature_names
This fixes the KeyError that occurs when different detectors generate datasets with inconsistent column names

Root Cause

Different LLM-based detectors were generating datasets with different column names:

sycophancy: query
harmfulness: user_input
stereotypes: user_input
etc.

When a model specifies feature_names=["message"], the prepare_dataframe() method fails with KeyError because the generated dataset doesn't contain the expected column names.

Solution

Add _align_columns_to_feature_names() method that:

Reorders columns if they match feature_names but in different order
Renames columns if count matches (positional mapping)
Handles cases with more generated columns than expected features
Logs warnings for edge cases

Test plan

Added unit tests for column alignment functionality
Existing tests should continue to pass

Fixes #2213

Add column name alignment in LLM generators to ensure generated datasets match the model's expected feature_names. This fixes the KeyError that occurs when different detectors generate datasets with inconsistent column names (e.g., 'user_input', 'query', 'user_question') while the model expects specific feature names. The new `_align_columns_to_feature_names` method: - Reorders columns if they match feature_names but in different order - Renames columns if count matches (positional mapping) - Handles cases with more generated columns than expected features - Logs warnings for edge cases Fixes Giskard-AI#2213 Signed-off-by: majiayu000 <1835304752@qq.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

fix: align generated dataset columns to model's feature_names #2223

fix: align generated dataset columns to model's feature_names #2223

Uh oh!

majiayu000 commented Dec 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

1 participant

Uh oh!

fix: align generated dataset columns to model's feature_names #2223

Are you sure you want to change the base?

fix: align generated dataset columns to model's feature_names #2223

Uh oh!

Conversation

majiayu000 commented Dec 29, 2025

Summary

Root Cause

Solution

Test plan

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

1 participant