Modify RobertaEmbedding forward as custom op method#996
Modify RobertaEmbedding forward as custom op method#996michalkuligowski merged 9 commits intohabana_mainfrom
Conversation
|
@kzawora-intel As @michalkuligowski is off now, can you please review this PR? Thanks. |
|
@michalkuligowski @kzawora-intel Can you please advise me how you want to change this PR? Our customer waits for roberta embedding enablement. |
964be54 to
0ddcf15
Compare
|
/run-gaudi-tests |
|
From those two failed test logs, I see that they were actually passed but somehow couldn't exit the process normally with this message. "Received notify event: Due to an error on node g3-srv179-c03w-idc a jira ticket https://jira.habana-labs.com/browse/SW-225420 was opened, your resource vllm-fork-996-79cqyb8h7e-tfjob might be effected" I think they are not real issues. And the same PR for v1.21.0-next branch which is #1049, all CI passed on it. |
Same PR as #996. Just for v1.21.0_next branch.
|
/run-gaudi-tests |
…ition_id creation method
2b3ca17 to
81cd1ba
Compare
|
/run-gaudi-tests |
|
/skip-gaudi-tests due to test passing: 2025-04-24T00:15:06Z tensorflow === PASSED MODEL: Meta-Llama-3.2-11B-Vision-Instruct-mss.yaml === INFO Received notify event: your resource vllm-fork-996-q2mlrdm3tk-tfjob will reach its max duration in 30 minutes and will be deleted WARNING received notify kill event: your resource vllm-fork-996-q2mlrdm3tk-tfjob has reached it's max duration 1h0m0s, it's going to be destroyed WARNING workload removed from cluster SUCCESS successfully removed failed workload vllm-fork-996-q2mlrdm3tk-tfjob |
This is custom op change as PR #786 follow-up.
Removed RobertaEmbedding class from model file and implemented it as CustomOp class in new file.
forward_cuda() is the original forward function and forward_hpu() is our specific change.