Using trace-rmi.proto to create backend #8182

astrelsky · 2025-05-21T12:09:39Z

astrelsky
May 21, 2025

I've been looking through this and am a bit confused because there doesn't seem to be anything for the server to know when to create breakpoints, step and stuff like that.

Using protobuf provides a unique opportunity for systems where python is not supported, gdb isn't supported and/or a proprietary debug library/protocol is present. I can run golang on the system without a problem because I ported the tool chain so this seems like the obvious thing to do. I'm just really confused about how this system is supposed to work.

I'm still mostly just thinking out loud at this point, so no rush on answering this. I have to fully reverse engineer the debug protocol for my target and find out if I can even use the breakpoints without causing a system panic.

d-millar · 2025-05-21T15:58:32Z

d-millar
May 21, 2025
Collaborator

@astrelsky @nsadeveloper789 is doubtless the best person to answer your questions, but he's out-of-office for the next two weeks so I'll take a stab. As you probably already have deduced, the trace-rmi.proto spec is used to generate TraceRmi.java in Ghidra/Debug/Debugger-rmi-trace/build/generated/source/proto/main/java. (I believe this happens as part of "gradle assemblePyPackage", but don't hold me to that.) If you peek at this class, you'll see a slew of subclasses. Most of these are either builders for various objects that might get passed over the wire (addresses, primitives, arrays, object) or request/reply pairs.

Most of the request/reply pairs relate to what I would call state methods - read/write registers, read/write memory, various bookkeeping functions, e.g. create trace. These also handle "objects" which are very generic. We're following the dbgmodel approach here, i.e. you specify a schema describing parents and their children and you populate objects with elements or attributes. Elements are generally keyed objects of a single type; attributes are a mixed bag, key/value pairs of different types. Processes, threads, modules, etc. are all objects in this sense. Process, thread, module containers have processes, threads, and modules as elements. Things like processes typically have only attributes.

It should be noted that almost all of the request/reply pairs follow a push model. The target pushes information to Ghidra, based on either some event, such as a break, or a request from the Ghidra, such as refresh. Data is generally not returned directly in the reply. In the agent implementations, the event logic is generally in a file like hooks.py, whereas the refresh logic is kept in commands.py. Hooks will reference methods in commands.

Methods.py comprises all of the logic we used to refer to as control/extended methods. Methods (to finally get to your initial question) includes step, break, continue, set a breakpoint, etc. The methods are not predetermined. Methods are registered on start-up by the agent and made available to various objects, e.g. in the tree, or to the command line. The annotation logic directs the registration process, i.e. specifies parameters, display characteristics, object associations, etc. In the python logic, look for @REGISTRY.method refs. Some of the "action" annotations are "special', i.e. have builtin behaviors, such as "refresh". Also of note, the XRequest/ReplayInvokeMethod pair is handled slightly differently than the other pairs (hence the X).

As a bit of history, we used to split methods into state (read/write), control, and extended. "Extended" were target-specific commands not generally supported by all debuggers. It turns out most of the control commands really fall into the same bucket, i.e. they don't generalize very well. Breakpoints are the worst case in this respect. Even for something like gdb, the low-level RSP interface is not well standardized. The new design is considerably more flexible and extensible, but also a bit harder to understand because of its genericness. Similarly, we used to always have specific objects for process, thread, module, breakpoint, but the more generic model adapts to new debuggers, debugger variants, and targets more seemlessly.

For more info, if you haven't already, take a look at the B5 sections of the class. I tried to document the process of writing a new agent as I implemented it (vs after), the example being "drgn". Also, we have one non-python agent, which still uses protobuf despite technically not needing it: Debugger-jpda which handles Java/Dalvik. Of course, am happy to answer more questions - you might be the first person to attempt writing a custom debugger agent, so we're super-interested in your feedback.

1 reply

astrelsky May 21, 2025
Author

Thank you. I just took a peek at the methods.py in the ghidradrgn package and now I feel a bit more confident that this is the right thing I'm looking to do. A lot of what was/is confusing me is all the Ghidra type stuff in here in ghidra trace because I don't see why the agent would care.

I would be completely omitting ghidratrace and going right from ghidra on my PC to the remote target where I would be running go where I can interface with a debug protocol for an aarch64 cpu. It is very primitive and there is no os and no processes or threads.

This still seems unnecessarily complicated. I don't know if it would be best to omit ghidratrace or use it as a middleman and let python be the bottleneck instead of java 😂.

Edit: I forgot to look at the jdpa agent, I can't see anything else when replying on my phone. I'll look at it later. Sounds like it is more of what I'm looking to do.

d-millar · 2025-05-21T17:04:29Z

d-millar
May 21, 2025
Collaborator

Let me see if I can simplify things... In our current working model, there are effectively always two communication streams, one between the GUI and the agent and one between the agent and the target. If the agent is guaranteed to be on the same machine as Ghidra proper (i.e. the GUI) and you plan on writing the agent in Java, there is technically no need for the first stream. That said (having done this for Java/Dalvik), it's still MUCH easier to implement that stream. Why - because the GUI already assumes that model.

In the case of the second stream, we by-and-large do not implement the protocol. These protocols are very brittle in most cases and change frequently. (I will avoid horror stories, here....) We used to implement the agents in Java, but this implementation was also inherently brittle, i.e. we used JNA to wrap dbgeng DLLs, JNI/Swig to talk to the lldb libraries, and the MI/2 interface to talk with gdb. All of these target have well-defined python3 interfaces, however, so we made the shift.

From your description, sounds like you may be in the position of implementing both streams. I think you'll still want to avoid making the first stream direct. The first-stream logic you'll want to implement will parallel that in Debug/Debugger-jpda/src/main/java/ghidra/dbg/jdi/rmi/jpda, i.e. you'll probably want JdiCommands/Methods/Hooks equivalents. The code in ghidra/dbg/jdi/manager is essentially the second stream, i.e. all the actual JDI/JPDA logic. For Java, obviously, this second stream is virtual, no protocol involved, just direct access.

1 reply

astrelsky May 21, 2025
Author

I do have to implement both streams, it's a proprietary protocol with no public documentation that I have to reverse, but it's not that complicated and I already have read/write (which let me dump the elf with all the commands).

The Java/Dalvik one is a much better example thank you. I forgot you mentioned it when I wrote my first reply and couldn't see what I was replying to on my phone.

We'll see how much motivation I can work up this weekend.

d-millar · 2025-05-21T17:07:10Z

d-millar
May 21, 2025
Collaborator

Oh, and, perhaps obvious (?), ghidratrace is the middleman for the first stream. (And the type stuff is mostly there to keep us from shooting ourselves in the foot with python.)

3 replies

nsadeveloper789 May 23, 2025
Maintainer

@astrelsky Sorry I wasn't available to reply earlier, but it looks to me @d-millar has got you on the right path. His first reply is a better overview of the system than I probably would have produced. That said, if you do end up binding to our protocol in golang, I think a client API (i.e., ghidratrace for go) could be of general use to the community. So, if it's not already your plan, I'd ask you to please factor your code as a general client and publish it. Then, implement your specific debugger/agent as a separate module that you'll presumably keep private.

I'm not a golang expert by any measure, but if you'd like to talk design, I'm open to collaborate.

astrelsky May 23, 2025
Author

@astrelsky Sorry I wasn't available to reply earlier, but it looks to me @d-millar has got you on the right path. His first reply is a better overview of the system than I probably would have produced. That said, if you do end up binding to our protocol in golang, I think a client API (i.e., ghidratrace for go) could be of general use to the community. So, if it's not already your plan, I'd ask you to please factor your code as a general client and publish it. Then, implement your specific debugger/agent as a separate module that you'll presumably keep private.

I'm not a golang expert by any measure, but if you'd like to talk design, I'm open to collaborate.

I'll put in my best effort if I do. Whether or not I build something for general use will be determined by how much of the protocol I end up understanding. You can explain something to me a million times and I can read a million different books but I'll never fully understand until I put my hands on it.

astrelsky May 24, 2025
Author

Unfortunately I think this is way too complicated and not worth the effort for my use case. I'm not able to motivate myself to go through all this when there is a possibility that I won't even use it in the end.

Using trace-rmi.proto to create backend #8182

Uh oh!

Uh oh!

astrelsky May 21, 2025

Replies: 3 comments · 5 replies

Uh oh!

d-millar May 21, 2025 Collaborator

Uh oh!

Uh oh!

astrelsky May 21, 2025 Author

Uh oh!

d-millar May 21, 2025 Collaborator

Uh oh!

astrelsky May 21, 2025 Author

Uh oh!

d-millar May 21, 2025 Collaborator

Uh oh!

nsadeveloper789 May 23, 2025 Maintainer

Uh oh!

astrelsky May 23, 2025 Author

Uh oh!

astrelsky May 24, 2025 Author

astrelsky
May 21, 2025

Replies: 3 comments 5 replies

d-millar
May 21, 2025
Collaborator

astrelsky May 21, 2025
Author

d-millar
May 21, 2025
Collaborator

astrelsky May 21, 2025
Author

d-millar
May 21, 2025
Collaborator

nsadeveloper789 May 23, 2025
Maintainer

astrelsky May 23, 2025
Author

astrelsky May 24, 2025
Author