User-Defined P-Code Graph Rewriting (building on RULECOMPILE) #8758

osogi · 2025-12-11T11:02:23Z

osogi
Dec 11, 2025

Duplicate of #8742 (didn't know about gh "discussions" feature)

I am working on a project to implement P-Code graph rewriting in Ghidra, specifically to support "macro folding" (collapsing P-Code patterns into simplified forms).

My current design concept is similar to the existing RULECOMPILE feature (https://msm.lt/re/ghidra/rulecompile/#rulecompile), but with two major extensions:

Extensibility: The ability to add new transformation rules from the user environment without recompiling the core.
CFG support: Expanding the rule grammar to include Basic Blocks, allowing the rewriter to address not just local data-flow, but also control-flow patterns.

I would like to gather some community feedback and have a few specific questions for the maintainers:

Status of RULECOMPILE: Why is the current RULECOMPILE functionality disabled/unmaintained? Were there fundamental performance or architectural issues discovered during its development?
Utility: Does the described feature feel as a valuable (or at least usable) addition?
Feasibility: Are there obvious show-stopper problems with this whole idea that I’m missing?

I would appreciate any opinions, architectural advice, or warnings about potential "blind spots" in this approach.

Any thoughts, war stories, or pointers are very welcome!

osogi · 2025-12-11T11:05:24Z

osogi
Dec 11, 2025
Author

Connected #2991

0 replies

msm-code · 2025-12-11T23:19:42Z

msm-code
Dec 11, 2025

Hi! Nice idea. I'm the author of the linked blog post :) I'm happy to see someone is working on this.

Status of RULECOMPILE: Why is the current RULECOMPILE functionality disabled/unmaintained? Were there fundamental performance or architectural issues discovered during its development?

I'm just guessing here, but the current implementation of RULECOMPILE is not really production quality - it's very easy to crash the decompiler process with relatively small mistakes (like by referencing an undefined node). Perhaps it was developed as an experiment but never reached a production maturity?

The second reason is that (looking from outside) Ghidra decompiler is undermaintained/underfunded/undermanned. Take a look at any PR aimed at improving the decompiler, for example those two recent PRs. Or the whole contribution history of @LukeSerne. Notice two patterns:

They are all assigned to a single Ghidra maintainer
They are all unmerged :)

My point being, decompiler PRs take forever to review.

Having said that,

Utility: Does the described feature feel as a valuable (or at least usable) addition?

Yes, the ability to extend and improve the decompiler output dynamically is one of my most wanted Ghidra features :). I work with obfuscated code a lot and being able to leverage Ghidra core instead of one-shot scripts would be great. Right now Ghidra is a bit limited here, because we can control the decompiler output only via annotations, attributes, variable types etc, but never directly.

Feasibility: Are there obvious show-stopper problems with this whole idea that I’m missing?

Having it merged is one. Do I understand correctly that you plan to implement this feature as a large PR to Ghidra? CONTRIBUTING.md manages the expectations here, and mentions that large changes may take a while to review (paraphrasing).

I think the chances of upstreaming your feature will raise significantly if you manage to minimize your changes to the core, and implement most of your project as a Ghidra plugin. But you need at least some hook API in the decompiler, so I'm not sure how realistic is that.

Good luck!

2 replies

thixotropist Dec 12, 2025

It's not that hard to enable processor-specific plugins within the decompiler. As a very rough proof of concept demo, you can take a look at https://github.com/thixotropist/ghidra_decompiler_plugins. That patches a decompiler plugin manager into a released Ghidra tarball, then builds a sample RISC-V plugin capable of recognizing simple vectorizations of memcpy and such. Recognizing pcode patterns isn't very hard - but deciding how to edit pcode block graph structures and safely trim the dependencies of scratch registers that might be interpreted as subsequent parameter registers can be hard to generalize.

osogi Dec 13, 2025
Author

Thanks for the advices — I’ll definitely think them over and try to apply during implementation

Yes, the ability to extend and improve the decompiler output dynamically is one of my most wanted Ghidra features :). I work with obfuscated code a lot and being able to leverage Ghidra core instead of one-shot scripts would be great

Just to clarify: my current plan is to focus on a relatively simple rewrite grammar / rule-based rewriting approach (similar to what rulecompile provides), rather than building a full-fledged API for working with p-code directly.

If my original post gave the wrong impression, sorry about that. What would you suggest I add/change to make this limitation explicit so it doesn’t confuse others? And would this kind of “rule-based rewriting” feature still be interesting/useful to you in your workflow?

LukeSerne · 2025-12-12T10:17:59Z

LukeSerne
Dec 12, 2025

I think quite a few decompiler issues can be resolved / mitigated if the interface of the decompiler binary is changed to operate purely on PCODE. In that model, unoptimised ("low") pcode, goes into the decompiler, and optimised ("high") pcode comes out. Then, the translation of that high pcode into C-like pseudocode can be done in Java (or leave it in the decompiler and have the Java make two calls to the decompiler - one to optimise the pcode, and one to transform pcode into C).

This has several major benefits:

The formatting of the decompiled output will be more configurable (even user scripts can make changes if wanted), which simplifies the settings that need to be passed to the decompiler - a lot of those are used to configure the output formatting. It will also allow users to more easily display the output of the decompiler in a format they like - perhaps moving away from trying to look like C (like Binary Ninja's HLIL), or perhaps trying to more closely match C by rewriting possible remaining pcode operations into their C equivalents (like CONCAT and LZCOUNT operations).
User scripts can perform some post-processing on the high pcode before rendering it into pseudocode. Since a lot of the structure of the high pcode is removed, it seems like it would be a lot easier to work with the high pcode instead of the C code. While this might be possible now, I'm not aware of a way to then transform that (modified) high pcode into C again.
User operations that have no influence on the actual pcode (such as renaming a variable) will be able to bypass the whole interaction with the decompiler, and just require re-rendering the high pcode. This would help with better user experience working with decompilation of huge functions (not just a "faster decompiler" request) #5730.

The only major drawback is that it seems like this requires quite a lot of changes, which will stay in PR review forever unless the Ghidra team indicates that they also want to change the interface of the decompiler. It might be hard to maintain such a PR for a long time, since it will probably easily lead to merge conflicts if the upstream files are changed.

What do other people think about this idea? Do you think it's a good idea? Do you have a suggestion to improve it? Am I missing something that makes this harder? Please let me know!

On a positive note, we got 3 decompiler commits this week, which is more than we've had in a long time 🙂

2 replies

fmagin Dec 12, 2025

Handling problems like control flow structuring in the JVM part would be nice. In theory this should already be possible with the kind of fairly non-invasive changes I made for changing the syntax tree via a service (that is then supplied by a separate extension).

The problem with that is still the limitation of the actual syntax tree as we discussed in #4086 but generating an entirely new syntax tree that is backwards compatible should also be possible.

osogi Dec 13, 2025
Author

Phew…! The idea sounds awesome, but I’m worried it’s too ambitious and probably outside the scope of the problem I’m trying to solve.

To be fair, I still haven’t fully thought through how the interaction between the decompiler and Ghidra should be structured. So if it turns out that the proposed approach isn’t significantly more complex than other alternatives, I might give it a try. But for now, I find that a bit hard to believe :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

User-Defined P-Code Graph Rewriting (building on RULECOMPILE) #8758

Uh oh!

{{title}}

Uh oh!

Replies: 3 comments 4 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

User-Defined P-Code Graph Rewriting (building on RULECOMPILE) #8758

Uh oh!

osogi Dec 11, 2025

Replies: 3 comments · 4 replies

Uh oh!

osogi Dec 11, 2025 Author

Uh oh!

msm-code Dec 11, 2025

Uh oh!

thixotropist Dec 12, 2025

Uh oh!

osogi Dec 13, 2025 Author

Uh oh!

Uh oh!

LukeSerne Dec 12, 2025

Uh oh!

fmagin Dec 12, 2025

Uh oh!

osogi Dec 13, 2025 Author

osogi
Dec 11, 2025

Replies: 3 comments 4 replies

osogi
Dec 11, 2025
Author

msm-code
Dec 11, 2025

osogi Dec 13, 2025
Author

LukeSerne
Dec 12, 2025

osogi Dec 13, 2025
Author