-
Notifications
You must be signed in to change notification settings - Fork 625
PIDML: use columns' labels from JSON file instead of hardcoding them #7371
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PIDML: use columns' labels from JSON file instead of hardcoding them #7371
Conversation
|
Error while checking build/O2Physics/o2 for 6ce8bb0 at 2024-08-20 19:26: Full log here. |
|
passing table reference would not be needed - changes in progress |
|
IOBinding not used because of additional memory copying and no gain in computational time. |
Tools/PIDML/pidOnnxModel.h
Outdated
| } | ||
|
|
||
| template <typename T, typename... C> | ||
| std::optional<float> getColumnValueByLabel(o2::framework::pack<C...>, const T& rowIterator, const std::string& columnLabel) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then it would be typename C::type as the return type; for ML usage, hardcoded float cast was reasonable - but if it would be in O2, then not anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tools/PIDML/pidOnnxModel.h
Outdated
| } | ||
|
|
||
| template <typename C, typename T> | ||
| typename C::type getPersistentValue(const T& rowIterator) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This also could replace the O2 function in ASOA.h, which now gets the value at the arrow level
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After this PR is merged, it would be possible to use a common getter also for dynamic columns - and the code would look like that: static_cast<C>(rowIterator).get()
Tools/PIDML/pidOnnxModel.h
Outdated
| } | ||
|
|
||
| template <typename... P1, typename... P2> | ||
| static constexpr bool is_equal_size(o2::framework::pack<P1...>, o2::framework::pack<P2...>) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you really need this function? Isn't it just pack_size({some pack}) == pack_size({some othe pack})
|
Please, open a PR there with what you want to add, and we discuss it. |
|
@ktf this was just discussed in the AIP meeting and we are worried that the performance impact of getting columns by name is significant. See also slide 31 from https://indico.cern.ch/event/1452758/contributions/6115208/attachments/2923455/5131519/Final%20PID%20ML%20presentation.pdf |
|
Indeed, I would have an API which allows you to get a "token" associated to a column (which can be retrieved by name) and then use that token to get the binding in a more effective way. As I said, let's open a PR to O2 and see what can be done. |
|
Sounds good. And important is that the json (which is the desired use case here) does not do the mapping from |
| input_shape[0] = batch_size; | ||
|
|
||
| std::vector<float> inputTensorValues = createInputsSingle(track); | ||
| std::vector<float> inputTensorValues = getValues(track); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just saw that you are still converting our columns to rows. I though you said you managed to do it with bindings. Did I understand it wrong?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I.e., I understood you found a way to setup an OrtValue directly from a signle IoBinding which is attached to the values of the column, without an intermediate representation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As Maja said previously in the discussion in this PR, we do not use IOBinding. We wanted to use it, but we still needed to make a copy of a value to standardize it. Anton suggested using gandiva to copy and standardize the whole column and then use IOBinding, so I implemented the first part of it: standardization using gandiva, but there was an issue. It copies the whole column, but even our example task used filtered iterator and just needed to iterate over only 1/10 of the column, because of the filtering.
- So copying it and IOBinding would mean copying x (10 for my example) times more data than we need and then we can't use IOBinding, because it needs contiguous memory.
- Another solution can be copying only filtered rows of column to the new column and then making a second copy to gandiva standardized column (we need continuous memory to use IOBinding), but for us, it still seems not ideal. (of course, we can skip gandiva stage and copy it once with standard arithmetic operations, but I am not sure which one would be faster)
- We could also say that we ban Filtered<> from our PIDML API and the user would need to prepare the table contiguous, but this would be hard to use our API then
- On top of that it wouldn't support dynamic columns unless we evaluate lambda and save it to a newly created column
So we decided not to use IOBinding.
|
About the performance of this solution: I think that @jgrosseo is correct, and implementing that would be a performance issue, but I will make a draft PR in O2 because I have already written it. I will add a benchmark there and tell you the numbers. If it comes to our OnnxModel, we should still be able to get some vector of getters in our constructor (string comparison would be only there), and then we would call this handler during iteration. Performance should be similar to hard-coded implementation, but it wouldn't be possible for other projects to reuse our code. |
|
This PR has not been updated in the last 30 days. Is it still needed? Unless further action is taken, it will be closed in 5 days. |
No description provided.