[IR2Vec] Add embeddings mode to llvm-ir2vec tool#147844
Merged
Conversation
This was referenced Jul 9, 2025
Contributor
Author
Contributor
boomanaiden154
left a comment
There was a problem hiding this comment.
Premerge failures here also look relevant.
7c4d86d to
5f1f3fe
Compare
bf757c0 to
684d298
Compare
|
✅ With the latest revision this PR passed the Python code formatter. |
2d88b38 to
6fd2dca
Compare
boomanaiden154
approved these changes
Jul 14, 2025
Contributor
boomanaiden154
left a comment
There was a problem hiding this comment.
Minor style nits, otherwise LGTM.
|
|
||
| using namespace llvm; | ||
| using namespace ir2vec; | ||
| using namespace llvm::ir2vec; |
Contributor
There was a problem hiding this comment.
Instead of using statements, it might be better to wrap everything outside of main in an anonymous namespace inside the llvm::ir2vec namespace. I'm not sure what the coding standards are, but that's the pattern I see in other tools like llvm-exegesis.
|
|
||
| // Generate embeddings based on the specified level | ||
| switch (Level) { | ||
| case FunctionLevel: { |
Contributor
There was a problem hiding this comment.
Does clang-format not let you indent here?
6fd2dca to
4e92c2b
Compare
5f1f3fe to
744b38b
Compare
4e92c2b to
f975249
Compare
744b38b to
51b0120
Compare
ab12375 to
a3b518b
Compare
51b0120 to
e931cf1
Compare
e931cf1 to
0f1720f
Compare
f2498dc to
7b801df
Compare
0f1720f to
52ec5db
Compare
This was referenced Jul 16, 2025
52ec5db to
36fe251
Compare
7b801df to
df6bdef
Compare
36fe251 to
47d402c
Compare
df6bdef to
0ee74a8
Compare
f4181fd to
7f45a74
Compare
0ee74a8 to
c0360c7
Compare
Contributor
Author
Merge activity
|
7f45a74 to
74e3b78
Compare
svkeerthy
added a commit
that referenced
this pull request
Jul 17, 2025
) Add a new LLVM tool `llvm-ir2vec`. This tool is primarily intended to generate triplets for training the vocabulary (#141834) and to potentially generate the embeddings in a stand alone manner. This PR introduces the tool with triplet generation functionality. In the upcoming PRs I'll add scripts under `utils/mlgo` to complete the vocabulary tooling. #147844 adds embedding generation logic to the tool. (Tracking issue - #141817)
c0360c7 to
537495c
Compare
This was referenced Jul 23, 2025
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Add embedding generation functionality to the llvm-ir2vec tool, complementing the existing triplet generation mode.
This change completes the IR2Vec tool by adding the embedding generation functionality, which was previously mentioned as a TODO item. The tool now supports both triplet generation for vocabulary training and embedding generation using a trained vocabulary.