Skip to content

debug.superh.aud: Add SuperH AUD-II read support #985

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

pd0wm
Copy link

@pd0wm pd0wm commented Aug 1, 2025

AUD-II is an interesting protocol as it seems to be present on all SH-2A SuperH chips. Unlike JTAG (H-UDI) it looks like it cannot be turned off or password protected. The protocol is meant for real time tracing of data and branches, however it also has a second mode: "RAM Monitoring Mode". This is a simple mode where you can peek/poke at memory in 1, 2 or 4 byte chunks.

This applet only implements the "Read" command of the "RAM Monitoring Mode", as I was using it dump the flash of one of these SuperH chips from an automotive part.

Dumping 512kB takes about 8 minutes. I think most of the time is wasted due to USB round-trips waiting for the data to be ready and reading out each nibble individually. This seems to take about 250us per nibble. I could move this logic to the FPGA side, and return 4 bytes at once. However, not sure if that complexity is worth it.

This is my first time writing a glasgow applet, so feel free to let me know if there are things I could have done better or more efficient. Also happy to remove some comments if you think they are superfluous. Let me know if it's in the right place in the debugmodule, it's a debug/tracing protocol but I'm currently only using it to read memory.

PR is in Draft until I confirm that the docs still build properly.

Logic Analyzer trace of reading 4 bytes:
2025-08-01-105310_2361x1066_scrot

Reference from datasheet:
2025-08-01-105324_554x291_scrot

@pd0wm pd0wm marked this pull request as ready for review August 4, 2025 10:02
@pd0wm
Copy link
Author

pd0wm commented Aug 4, 2025

Confirmed the docs build locally and the new applet shows up. Let me know what you think!

@whitequark
Copy link
Member

I'll try to do a review today; was busy with other things, sorry!

Copy link
Member

@whitequark whitequark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although I wrote over 2100 words of review comments, please don't take this to mean that I dislike your PR. On the contrary, I'm incredibly impressed: this is one of the nicest PRs from a first-time contributor I've seen for Glasgow. You've clearly taken the time to examine at existing code, understand its intent, and replicate it in your applet, even though there's basically no documentation, and without being explicitly directed by a reviewer. I would be very happy to have you as a repeat contributor, which is why I spent a few hours making sure we're on the same page going forward.

Dumping 512kB takes about 8 minutes. I think most of the time is wasted due to USB round-trips waiting for the data to be ready and reading out each nibble individually. This seems to take about 250us per nibble. I could move this logic to the FPGA side, and return 4 bytes at once. However, not sure if that complexity is worth it.

I'd say that the complexity is worth not having to dump half a megabyte over a slower-than-9600-baud interface alone (or maybe I just lack patience). Impatience aside, being able to do things like wait state polling on the FPGA is how Glasgow ends up being so much nicer than other tools, and is what the FPGA is there for. (Our probe-rs applet is one of the fastest, possibly the fastest, supported SWD debug probe, usually with a good margin too. This kind of thing is what I want the project to be known for in general.)

It is true that there is some increase in complexity due to having to replicate the nibble read state machine and its timeout handling, but you could manage that by refactoring the code a little, which I explain further in the comments below.

@@ -9,3 +9,4 @@ MCU debugging
:maxdepth: 3

arm7
superh
Copy link
Member

@whitequark whitequark Aug 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the appropriate taxon is program even though it is technically using an interface with "Debug" in its name:

The program taxon groups applets implementing interfaces to memory technology devices (volatile and non-volatile) that are directly connected to programmable logic, such as a microcontroller or a gate array.

I tried to figure out how you would make a debugger using AUD-II and it's not clear to me that you can. The built-in modes are tracing and arbitrary read/write. You have a breakpoint module (UBC) that seems programmable via monitoring mode which could support single-stepping and breakpoints. There is however no way to get an arbitrary CPU register via AUD-II (or execute an arbitrary instruction, or even just get the value of R15), so you'd need help from the firmware: an ISR that dumps the CPU context into some predefined RAM area that is used for communicating with the debugger. Even then, UBC has no way to tell which breakpoint has fired on any particular cycle, so even though it supports data breakpoints you can't use it for watchpoints: you wouldn't know whether a breakpoint or a watchpoint has fired, which breaks single-stepping. You could work around that by adding an SH-2A emulator and an architecture model and evaluating every configured breakpoint/watchpoint against the post-breakpoint architectural state, if you wanted to create an abomination and had a vast excess of free time on your hands.

I think it's safe to say nobody will ever contribute a debugger for this chip.

from amaranth.lib.wiring import In, Out

from glasgow.abstract import AbstractAssembly, GlasgowPin
from glasgow.applet import GlasgowAppletV2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We leave two blank lines between top-level entities (like class or class and import block).

I really should set up automation for this; I'll actually do this later via https://pre-commit.ci.

data_pins = []
for i, pin in enumerate(self._ports.audata):
m.submodules[f"audata{i}_buffer"] = pin = io.Buffer("io", pin)
data_pins.append(pin)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless you need individual OE control, you don't need this: all Amaranth buffers are multi-pin capable, with shared OE.

Comment on lines +52 to +68
audata_o = Signal(4)
audata_oe = Signal(4)
audata_i = Signal(4)
audsync = Signal()
audrst = Signal()
audck = Signal()

# Connect the IO buffers to the signals
m.d.comb += m.submodules.audmd_buffer.o.eq(1) # Always in RAM monitoring mode
m.d.comb += m.submodules.audrst_buffer.o.eq(audrst)
m.d.comb += m.submodules.audck_buffer.o.eq(audck)
m.d.comb += m.submodules.audsync_buffer.o.eq(audsync)
m.d.comb += [
Cat(pin.oe for pin in data_pins).eq(audata_oe),
Cat(pin.o for pin in data_pins).eq(audata_o),
]
m.submodules += cdc.FFSynchronizer(Cat(pin.i for pin in data_pins), audata_i)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although this isn't a wrong way to implement this part of the applet, I would suggest the following as more compact and more idiomatic:

m.submodules.audsync_buffer = audsync_buffer = \
    io.Buffer("o", self._ports.audsync)

# ...

with m.FSM():
    # ...
        m.d.sync += audsync_buffer.o.eq(1)

There are two general styles in use in the codebase:

  1. With a *Bus class: in this style, an additional component called (e.g. in this case) AudBus would have audata_o: In(4); audata_i: Out(4); audsync: In(1); ... as its signature members. The *Bus class contains little to no logic, and is used immediately in some other component, which generates strobes, sequences, and so on.
  2. Without a *Bus class: in this style:
    a. Typically you would either use IOStreamer with a Enframer+Deframer+Controller architecture, or
    b. just an FSM for very simple protocols without high performance demands.

This applet looks like a mixture of (1) and (2b). I think we may actually have some applets like this in the codebase because it's sometimes impractical to fully refactor a complete but obscure applet when migrating to a new API model, but ideally we shouldn't. I think (2b) is probably the most appropriate here, but the code you have is also not wrong, just slightly unidiomatic.

(But also, see the other suggestions, which may supersede the ones in this comment.)

with m.If(self.i_stream.valid):
m.d.comb += self.i_stream.ready.eq(1)
m.d.sync += data.eq(self.i_stream.payload >> 4)
with m.If((self.i_stream.payload & 0xF) == AUDCommand.Reset):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could reduce repetition here by using

with m.Switch(self.i_stream.payload & 0xF):
    with m.Case(AUDCommand.Reset):


self.logger.info("Reading data")
bs = 4
with open(args.output, 'wb') as f:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should use type=argparse.FileType('wb') for add_argument(), which causes argparse itself to ensure that the filename can be opened. It also saves you having to manage this resource yourself, which can get annoying with more complex applets.

def add_build_arguments(cls, parser, access):
access.add_voltage_argument(parser)

access.add_pins_argument(parser, "audata", width=4, required=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In applets like this one, I suggest using required=True, default=True to avoid having to specify the pinout each time. This way, if you connect one of the harnesses to your DUT, you don't need to guess which pinout it is: it is the default one.

The order of the arguments that applets use is:

  • reset(s)
  • strap(s)
  • clock(s)
  • controls
  • data

This ordering is done per functional block, e.g. for RGMII you would have PHY reset, then RX control/data, then TX control/data (inputs before outputs).

For this applet, I suggest:

  • AUDRST#
  • AUDMD
  • AUDCK
  • AUDSYNC
  • AUDATA


@synthesis_test
def test_build(self):
self.assertBuilds(self.hardware_args)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a replay-hardware-in-loop test using @applet_v2_hardware_test(mocks=["aud_iface"]. I don't have the (fairly esoteric) hardware you're using, so if I'm doing a codebase-wide refactoring, I would have little ability to check if I broke it.

The way this decorator works is that the first time you run the test, it talks to your hardware and records the function calls done on the mocked class. The second time you run the test, it ensures the same arguments are passed, and provides the same return values. Provided no actual algorithms change, this gives a quick and easy way to validate that the applet is still functional, and permit a (limited) degree of refactorings, avoiding fossilization of the codebase.

with m.If(self.o_stream.ready):
m.next = "RECV-COMMAND"

return m
Copy link
Member

@whitequark whitequark Aug 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some general comments on the AUDComponent:

  1. You're using all of the pins synchronously; also, the bus itself is fully synchronous except for AUDRST# (and AUDMD). I think you can use io.FFBuffer and drop FFSynchronizer (since AUDATA isn't asynchronous to your sampling clock).
  2. This protocol includes wait states ("Not Ready" output). Polling for the end of a wait state in software is one of the worst inefficiencies a tool like Glasgow could include. I think you should, at the very least, have a command that toggles AUDCK and polls AUDATA until it becomes non-zero and then returns that value to the host. The timeout would become an input to the component, which is tied to a register in the AUDInterface class. If you implement polling in gateware, you will be able to submit a sequence of reads as a single bulk operation, without spending USB roundtrips on bus synchronization.
  3. I took a careful look at the entire §21 and it appears that both the tracing mode and the monitor mode were designed to allow a byte-oriented implementation: for tracing, last nibble of CMD1 and first nibble of CMD2 form a header byte, and for monitoring, command/direction nibble and zeroes form a header byte. This means that an implementation using octet-based communication with the host is feasible, provided that wait state polling is implemented in hardware. Being able to use fast primitives like bytearray concatenation, int.{to,from}_bytes, etc without having to pack nibbles in Python is very good for performance; this applet deals with enough data that it starts to really matter. If you implement byte-packing in hardware, you will be able to retrieve the result of a sequence of reads as a single await self._pipe.read(size * count) operation, which is dramatically faster for large quantities of data.
  4. I suggest at least making a small stream-based component that takes two commands: "read nibble with given sync" and "write nibble with given sync", and encapsulates toggling the clock. The AUDComponent will then only take care of command handling, byte-packing, as well as wait state polling.
  5. Since it is so similar to reads and quite easy to implement and test, I think it would be nice if you added write support for completeness. Also, writes allow you to set up a predictable state in RAM for hardware-in-the-loop testing (so you don't have to modify your assertions if you have a different firmware image).

Of these suggestions, (1) and (2) are I'd say required for a high-quality implementation, while (3), (4), and (5) would be nice but I would not reject the applet if you choose not to implement them.

await self.aud_iface.init()

self.logger.info("Reading data")
bs = 4
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you do not ensure that the address provided by the user is longword-aligned, this would have to be bs = 1, or you'll end up with bus errors. It's also fine to enforce that the address is aligned in def address(x):.

I assume there are some registers that can only be accessed in certain modes (byte/word/longword). This doesn't need to be surfaced in the CLI since the vast majority of people won't use it and those who do can use the AUDInterface class API.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants