Towards a defined log/trace output from the Sail model #545

rsnikhil · 2024-09-09T13:28:06Z

This first comment just describes the issue(s). Follow-up comments
discuss possible solutions.

Currently the Sail model writes out a log file in ASCII format. Issues:

Logs can become very large (a binary format would be much smaller)
Downstream tools (verification, trace-driven simulation, ...) have
to parse ASCII text to recreate the log data
We haven't been precise about defining a log format, and we update
it as needed in an ad hoc fashion. As a result downstream tools
break when we make such changes.

rsnikhil · 2024-09-09T13:31:55Z

I have looked at 4 public RISC-V specifications for "trace/log", and
none of them seem suitable for a full trace output from the Sail
model.

I recommend the Sail model use a 5th alternative, "Bluespec trace",
which I will discuss in a follow-up comment.

RVI E-Trace ("Efficient Trace")
https://github.com/riscv-non-isa/riscv-trace-spec/releases/download/v2.0.3/riscv-trace-spec-asciidoc.pdf
From RVI; ratified
RVI N-Trace ("Nexus Trace") https://github.com/riscv/tg-nexus-trace
From RVI; ratified (nor nearing ratification)
RVFI ("RISC-V Formal Verification Interface")
https://github.com/SymbioticEDA/riscv-formal From SymbioticEDA
RVVI ("RISC-V Verification Interface")
https://github.com/riscv-verification/RVVI From Imperas originally
(now part of Synopsys), but now public. Generated by the
Imperas/Synopsys simulator. Used by OpenHWGroup.

E-Trace and N-Trace:

focus on compressed control-flow trace (recording only
"unpredictable" branch/jump/trap/xRET)
rely on access to the original ELF file in order to reconstruct the
full control trace
are not suitable for self-modifying code/JIT where there is no ELF file
do not capture arch-state updates (GPRs, FPRs, CSRs). E-Trace has a
section for capturing memory updates.

RVFI and RVVI are similar (RVVI's README calls itself a kind of
superset of RVFI). They specify (nearly) all architectural state
updates that we'd need in a log, but they are not very suitable for
recording in a file, because it is VERY wide (e.g., all 32 GPR values,
all 32 FPR values, all defined CSR values, ...).

rsnikhil · 2024-09-09T13:33:15Z

Bluespec, Inc. has defined a trace model (PDF attached) which could be used by the Sail model.

Bluespec, Inc. is ready to contribute this spec to RVI as an open spec.

It is a binary format
Fields are optional, so can be dialled from minimal trace to verbose trace
Can capture all standard arch state ("GC") updates (not yet spec'd for Vector)
Can capture traps/interrupts
Can capture additional intermediate state (e.g., CSR attempted and actual update,
AMO pre- and post- values, ...)

This has been in use for some years inside Bluespec, Inc. for a number
of HW designs and simulators.

2020-03-10_trace-protocol.pdf

rsnikhil · 2024-09-09T13:42:17Z

Bluespec, Inc. is also happy to contribute to RVI a C program (which already exists) which reads and parses the binary trace and prints it in human-readable format.

This could also be restructured into a read-and-parse library + a human-readable-printer, so that the former can be used in other tools that need to read the trace.

PeterSewell · 2024-09-09T13:50:52Z

What format does the TestRig stuff use? Peter

…

On Mon, 9 Sept 2024 at 14:32, Rishiyur S. Nikhil ***@***.***> wrote: I have looked at 4 public RISC-V specifications for "trace/log", and none of them seem suitable for a full trace output from the Sail model. I recommend the Sail model use a 5th alternative, "Bluespec trace", which I will discuss in a follow-up comment. 1. RVI E-Trace ("Efficient Trace") https://github.com/riscv-non-isa/riscv-trace-spec/releases/download/v2.0.3/riscv-trace-spec-asciidoc.pdf From RVI; ratified 2. RVI N-Trace ("Nexus Trace") https://github.com/riscv/tg-nexus-trace From RVI; ratified (nor nearing ratification) 3. RVFI ("RISC-V Formal Verification Interface") https://github.com/SymbioticEDA/riscv-formal From SymbioticEDA 4. RVVI ("RISC-V Verification Interface") https://github.com/riscv-verification/RVVI From Imperas originally (now part of Synopsys), but now public. Generated by the Imperas/Synopsys simulator. Used by OpenHWGroup. E-Trace and N-Trace: - focus on compressed control-flow trace (recording only "unpredictable" branch/jump/trap/xRET) - rely on access to the original ELF file in order to reconstruct the full control trace - are not suitable for self-modifying code/JIT where there is no ELF file - do not capture arch-state updates (GPRs, FPRs, CSRs). E-Trace has a section for capturing memory updates. RVFI and RVVI are similar (RVVI's README calls itself a kind of superset of RVFI). They specify (nearly) all architectural state updates that we'd need in a log, but they are not very suitable for recording in a file, because it is *VERY* wide (e.g., all 32 GPR values, all 32 FPR values, all defined CSR values, ...). — Reply to this email directly, view it on GitHub <#545 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABFMZZXLFH4UQ6FR6Y3HZGLZVWPOHAVCNFSM6AAAAABN4RY7TWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMZYGEZTQMRTGQ> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

PeterRugg · 2024-09-09T14:38:09Z

@PeterSewell In TestRIG we use RVFI, as documented here https://github.com/CTSRD-CHERI/TestRIG/blob/master/RVFI-DII.md#rvfi-dii-execution-packet-88-bytes. I guess a pro is that it's supported by the model already, though may need a little work to ensure there's nothing TestRIG specific.

but they are not very suitable for
recording in a file, because it is VERY wide (e.g., all 32 GPR values,
all 32 FPR values, all defined CSR values, ...)

That's certainly not true for RVFI as we implement it. Maybe that's new with RVVI? I can't see anything like that in https://github.com/SymbioticEDA/riscv-formal/blob/master/docs/rvfi.md either, except it does have optional ports per supported CSR, since lots of them can be updated as side-effects. Each packet is 88 bytes, though some updates, most notably CSR writes, are not directly reported in that trace.

Fields are optional, so can be dialled from minimal trace to verbose trace

That's a neat feature. We currently don't have that for TestRIG, and it would break the binary format to support.

Bluespec, Inc. is also happy to contribute to RVI a C program (which already exists) which reads and parses the binary trace and prints it in human-readable format.

We parse RVFI packets and print them in https://github.com/CTSRD-CHERI/QuickCheckVEngine/blob/master/src/QuickCheckVEngine/RVFI_DII/RVFI.hs, though that's in Haskell, and is mostly focussed on checking equivalence between two traces.

rsnikhil · 2024-09-09T15:23:13Z

Thanks for the link for TestRIG's RVFI spec (I had not seen that before).

That's certainly not true for RVFI as we implement it

You're right. In RVFI, a GPR update is defined as:

output [NRET *    5 - 1 : 0] rvfi_rd_addr
output [NRET * XLEN - 1 : 0] rvfi_rd_wdata

with 'rvfi_rd_addr' == 0 if Rd is not updated, (and NRET>1 only for superscalar retirement).

But it's still an XLEN+5-wide output even if Rd is not updated (signalled by 'rvfi_rd_addr' == 0)

For CSRs, there seems to be four XLEN-wide buses for each defined CSR.

In RVVI (file rvviTrace.sv), they have:

// X Registers
    wire [31:0][(XLEN-1):0]   x_wdata    [(NHART-1):0][(RETIRE-1):0];   // X data value
    wire [31:0]               x_wb       [(NHART-1):0][(RETIRE-1):0];   // X data writeback (change) flag

// Control and Status Registers
    wire [4095:0][(XLEN-1):0] csr        [(NHART-1):0][(RETIRE-1):0];   // Full CSR Address range
    wire [4095:0]             csr_wb     [(NHART-1):0][(RETIRE-1):0];   // CSR writeback (change) flag

i.e., all 32 registers, each with a valid bit (and similarly for FPRs), and all 4096 possible CSRs.

rsnikhil · 2024-09-09T15:25:31Z

We parse RVFI packets and print them in though that's in Haskell

(I like that! but) I think the Sail model audience's preference for such a reader is likely to be C and Python.

allenjbaum · 2024-09-09T18:45:08Z

What we want from SAIL is, for each instruction executed, - the instruction fetch address and data (along with an implicit memory accesses ( PTEs) with their level and whether it is a read or write (A/D bit accesses) There is currently a corner case in Sail now when there is an instruction access fault, where CSR updates aren't exposed the disassembled instruction is rather invaluable for debugging - but might make coverage easier to write - Each register/CSR number(s) of any explicit (e.g. Rd) or implicit (generally CSR side effects) writes, along with the value update, Note that this has to deal with multiple registers begin updated by an op (unaligned ld/st, register pairs, vector load/store), and all register types (Xreg/ Vregs, CSRs) - all data memory reads or write addresses Note that this has to deal with multiple access (as above) We certainly don't want to see all state at each instruction boundary, just the changes. The Sail log also outputs mode changes (interrupts/exceptions/returns). We can probably figure that out by looking at the CSR updates (e.g. xCause being written), but there are corner cases that make it difficult. NRET is probably not applicable for SAIL.

…

On Mon, Sep 9, 2024 at 8:25 AM Rishiyur S. Nikhil ***@***.***> wrote: We parse RVFI packets and print them in though that's in Haskell (I like that! but) I think the Sail model audience's preference for such a reader is likely to be C and Python. — Reply to this email directly, view it on GitHub <#545 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AHPXVJVJWFI67CFV2JMRSGDZVW4YJAVCNFSM6AAAAABN4RY7TWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMZYGQZDINJSGY> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

jrtc27 · 2024-09-09T18:51:03Z

For RVFI it's worth noting that what's in the spec is heavily aimed towards the physical hardware interface, where it's simpler to have a whole bunch of wires that may or may not be used. Having a smarter encoding of that same data for software would make a lot of sense (e.g. not encoding wdata if rd is 0).

PeterSewell · 2024-09-09T19:00:21Z

@Alasdair Armstrong ***@***.***> should we just use (a handy representation of) new-interface events?

…

On Mon, 9 Sept 2024, 19:53 Jessica Clarke, ***@***.***> wrote: For RVFI it's worth noting that what's in the spec is heavily aimed towards the physical hardware interface, where it's simpler to have a whole bunch of wires that may or may not be used. Having a smarter encoding of that same data for software would make a lot of sense (e.g. not encoding wdata if rd is 0). — Reply to this email directly, view it on GitHub <#545 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABFMZZQB27O6B5RYBMWDY4TZVXU23AVCNFSM6AAAAABN4RY7TWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMZYHA2DQNRXGI> . You are receiving this because you were mentioned.Message ID: ***@***.***>

allenjbaum · 2024-09-09T19:02:35Z

I completely agree. But, you still need a tag to identify the regieter number and which kind of register it is (currently Xreg, Vreg, and CSR; soon maybe Matrix regs) That's just now, and we'll probably need to look ahead to 48/64b instructions, where the number of Vregs might expand, there may be separate Mask regs, etc_. So, right now it's a minimal 5bitreg# + 2bit tag, though you could compress out the tag to a single bit if you're clever. Not sure it is worth the bother. It might be easiest to just define an 8b tag to simply implementation.

…

On Mon, Sep 9, 2024 at 11:51 AM Jessica Clarke ***@***.***> wrote: For RVFI it's worth noting that what's in the spec is heavily aimed towards the physical hardware interface, where it's simpler to have a whole bunch of wires that may or may not be used. Having a smarter encoding of that same data for software would make a lot of sense (e.g. not encoding wdata if rd is 0). — Reply to this email directly, view it on GitHub <#545 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AHPXVJTP4WDSR7X73D6O7DLZVXU23AVCNFSM6AAAAABN4RY7TWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMZYHA2DQNRXGI> . You are receiving this because you commented.Message ID: ***@***.***>

Timmmm · 2024-09-15T21:07:53Z

I had a brief skim through Bluespec's format. It looks pretty well designed. I don't think we can use it as-is though, and I would favour an encoding that was more standard - I think Protobuf is probably the best choice.

I think there are three things that have to happen:

We need to complete Implementing callbacks in sail-riscv for state-changing events #449 (state callbacks).
We need to define the semantics of what is captured in the binary log.
We need to define the encoding.

With regard to the semantics, I can comment on some random things I've learned from my experience with Codasip's system:

Registers aren't just 32/64-bit. Vector registers can be very long. There's also CHERI (65 or 129 bits). Basically you need to be able to store an arbitrarily long bit vector.
You need to be able to store any number of register writes and memory accesses in each step.
Store register identities by index and type (i.e. X, 5, CSR, 53, V, 14) not by name (mstatus, x3).
Our system doesn't do this, but it would be extremely helpful if register writes also have a flag to say whether or not the value written to registers actually changed them. Otherwise you find yourself having to align models on things like whether fflags gets recorded as a register write in different situations.
It's very helpful to record physical and virtual addresses for each memory operation, and also PMAs but that's a little trickier since they aren't standard.
We calculate architectural coverage from the log file, however one thing that makes it difficult is that it doesn't store register input values for instructions. Technically that could be redundant information - you can maybe go back in the trace and figure it out - but that's a bit painful. It would be much nicer if you could just record that information directly in each step.

allenjbaum · 2024-09-15T23:06:41Z

I believe ISAC keeps a running "cache" of register values (of all types based on the immediately previous write) and uses that if necessary. I'd object if we had to expand the size of the log with that information, because it would expand it quite a lot. I hadn't thought about exposing the VA; that can be extracted from the running log by using the previous value of the basereg and offset, at the expense of parsing the opcode. (which SAIL already does for you in the disassembly, right?). I Im not sure where the 53 comes from in the above - the are 4K potential indices, plus the indirect ones (which, can be kept track of by the previous write of the select CSR). I'm not sure where the 14 comes from either: there are 32 vector registers.

…

On Sun, Sep 15, 2024 at 2:08 PM Tim Hutt ***@***.***> wrote: I had a brief skim through Bluespec's format. It looks pretty well designed. I don't think we can use it as-is though, and I would favour an encoding that was more standard - I think Protobuf is probably the best choice. I think there are three things that have to happen: 1. We need to complete #449 <#449> (state callbacks). 2. We need to define the semantics of what is captured in the binary log. 3. We need to define the encoding. With regard to the semantics, I can comment on some random things I've learned from my experience with Codasip's system: 1. Registers aren't just 32/64-bit. Vector registers can be very long. There's also CHERI (65 or 129 bits). Basically you need to be able to store an arbitrarily long bit vector. 2. You need to be able to store any number of register writes and memory accesses in each step. 3. Store register identities by index and type (i.e. X, 5, CSR, 53, V, 14) not by name (mstatus, x3). 4. Our system doesn't do this, but it would be extremely helpful if register writes also have a flag to say whether or not the value written to registers actually changed them. Otherwise you find yourself having to align models on things like whether fflags gets recorded as a register write in different situations. 5. It's very helpful to record physical and virtual addresses for each memory operation, and also PMAs but that's a little trickier since they aren't standard. 6. We calculate architectural coverage from the log file, however one thing that makes it difficult is that it doesn't store register input values for instructions. Technically that could be redundant information - you can maybe go back in the trace and figure it out - but that's a bit painful. It would be much nicer if you could just record that information directly in each step. — Reply to this email directly, view it on GitHub <#545 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AHPXVJWQDTTNSAMTUJTVV23ZWXZMBAVCNFSM6AAAAABN4RY7TWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNJRG44TEMBXHA> . You are receiving this because you commented.Message ID: ***@***.***>

Timmmm · 2024-09-16T08:05:19Z

Yes all this stuff can often be figured out from the previous events, but it makes it a fair bit more complex and fragile. But I agree with your approach that most things should be optional so we can just make this information optional too if people want to make that trade-off.

By the way I forgot one extra thing:

You need a "reset" event, and that event should include the writes due to reset. The trace should begin with one of those events, and the first one should include the values of all registers.

Im not sure where the 53 comes from in the above

Those were just examples of register indices.

allenjbaum · 2024-09-16T16:49:18Z

RISC-V explicit limits reset values to be undefined unless otherwise stated - and there are very few registers that have defined reset values. Implementations, on the other hand, almost always define the reset value - and (in the ACT case at least), the config file can lists those reset values. The intent is that SAIL read the config file and initialize those reset values (or leave them undefined), so those don't need to be output in the log. - they're constants. Boot code may want/need to explicitly initialize otherwise uninitialized CSRs, and in that case the results of that code would naturally be output in the trace.

…

On Mon, Sep 16, 2024 at 1:05 AM Tim Hutt ***@***.***> wrote: Yes all this stuff can often be figured out from the previous events, but it makes it a fair bit more complex and fragile. But I agree with your approach that most things should be optional so we can just make this information optional too if people want to make that trade-off. By the way I forgot one extra thing: 7. You need a "reset" event, and that event should include the writes due to reset. The trace should begin with one of those events, and the first one should include the values of *all* registers. Im not sure where the 53 comes from in the above Those were just examples of register indices. — Reply to this email directly, view it on GitHub <#545 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AHPXVJX5FPASRNL7C6NYLILZW2GNLAVCNFSM6AAAAABN4RY7TWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNJSGI2TINBSGU> . You are receiving this because you commented.Message ID: ***@***.***>

Timmmm · 2024-09-16T20:12:27Z

and there are very few registers that have defined reset values.

Indeed. But the ones that are should be in the log. (Incidentally the way the model handles resets at the moment is a bit of a mess, but I have some code to clean that up that I'll send when I get around to it).

The intent is that SAIL read the config file and initialize those reset
values (or leave them undefined), so those don't need to be output in the
log. - they're constants.

Sail can't leave them undefined - at least the simulator output has no concept of undefined like SystemVerilog's x. And again while you may be able to reconstruct their values from the config file, it's much more convenient if they are in the log file directly. It also means interoperability with other systems that want to use this file format is much better.

allenjbaum · 2024-09-16T21:29:51Z

Huh - Sail really should have a concept like "X" in logic simulators (for this, anyway).. But I don't think we need anything in the log that we can't easily deduce without it. ISAC could even check to see if a register was used as an operand before being written if that were important. Anyone who has access to the config file (and everyone must in order to run the ACTs) will easily be able to query for their value. I don't I understand the interoperability issues though.

…

On Mon, Sep 16, 2024 at 1:12 PM Tim Hutt ***@***.***> wrote: and there are very few registers that have defined reset values. Indeed. But the ones that are should be in the log. (Incidentally the way the model handles resets at the moment is a bit of a mess, but I have some code to clean that up that I'll send when I get around to it). The intent is that SAIL read the config file and initialize those reset values (or leave them undefined), so those don't need to be output in the log. - they're constants. Sail can't leave them undefined - at least the simulator output has no concept of undefined like SystemVerilog's x. And again while you may be able to reconstruct their values from the config file, it's much more convenient if they are in the log file directly. It also means interoperability with other systems that want to use this file format is much better. — Reply to this email directly, view it on GitHub <#545 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AHPXVJV4SBDJUPOPVQVHILTZW43UBAVCNFSM6AAAAABN4RY7TWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNJTHA2DMNBUHA> . You are receiving this because you commented.Message ID: ***@***.***>

Alasdair · 2024-09-16T22:25:16Z

undefined in Sail is similar X in Verilog, but it propagates more precisely. For constructing an executable simulator, each unknown bit will generally just become 0 however, so undefined values are mostly useful for theorem provers and symbolic execution. Most of the languages we want to target don't natively have 4-value logic like Verilog, so we have to be more conservative.

allenjbaum · 2024-09-16T22:27:13Z

That should work for our purposes.

…

On Mon, Sep 16, 2024 at 3:25 PM Alasdair Armstrong ***@***.***> wrote: undefined in Sail is similar X in Verilog, but it propagates more precisely. For constructing an executable simulator, each unknown bit will generally just become 0 however, so undefined values are mostly useful for theorem provers and symbolic execution. Most of the languages we want to target don't natively have 4-value logic like Verilog, so we have to be more conservative. — Reply to this email directly, view it on GitHub <#545 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AHPXVJWK2GOVZRTSGI33EW3ZW5LGFAVCNFSM6AAAAABN4RY7TWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNJUGEZTKNZWGY> . You are receiving this because you commented.Message ID: ***@***.***>

Timmmm · 2024-09-17T12:09:19Z

Anyone who has access to the config file (and everyone must in order to run the ACTs) will easily be able to query for their value. I don't I understand the interoperability issues though.

Right but I think this format should be usable beyond the Sail model, and in other systems that may generate a trace they won't necessarily be using the same config format. In any case it simplifies things with no downside that I can see.

Here is my rough suggestion. I think it has most of the things from our format and the Bluespec format. Pretty much everything can be optional.

It won't encode quite as efficiently as the Bluespec format but it should still be quite efficient, and has the advantage of being a standard format. Nobody has to write any encoder/decoder code.

It's missing PMAs (probably not necessary initially), a reset event, NMIs, and debug entry (needed if debug is entered due to an external event).

syntax = "proto3";

enum Privilege {
    User = 0;
    Supervisor = 1;
    Machine = 2;
    Debug = 3;
}

message Event {
    // Optional sequential event ID starting at 0.
    optional uint64 id = 1;

    // Optional clock count (`mcycle`) for this event.
    optional uint64 cycle = 2;

    // Privilege mode for this event.
    optional Privilege privilege = 3;

    // Set to true for CHERI cores in Capability Pointer mode.
    bool capability_pointer_mode = 4;

    // Optional PC (virtual address). May not be set for some events, e.g. debug.
    optional uint64 pc = 5;

    // Optional opcode. May not be set for some events, e.g. interrupts.
    optional uint32 opcode = 6;

    // Optional details about a trap (if any) for this event.
    optional Trap trap = 7;

    // Memory reads and writes. AMOs are logged as a read and write.
    repeated MemoryOperation memReads = 8;
    repeated MemoryOperation memWrites = 9;

    // Register reads and writes.
    repeated RegisterAccess regReads = 10;
    repeated RegisterAccess regWrites = 11;
}

message Trap {
    uint32 cause = 1;
    bool is_interrupt = 2;
}

enum RegisterType {
    X = 0;
    F = 1;
    V = 2;
    CSR = 3;
}

message RegisterAccess {
    RegisterType type = 1;
    uint16 index = 2;
    bytes value = 3;

    // Optional CHERI tag.
    bool tag = 4;

    // Can be optionally set to `true` to indicate the value wasn't actually
    // changed by this write. This means when doing tandem verification that
    // it won't be an error for the other trace to not include a write to
    // this register.
    bool unchanged = 5;
}

message MemoryOperation {
    optional uint64 virtual_address = 1;
    optional uint64 physical_address = 2;
    optional bytes data = 3;
}

allenjbaum · 2024-09-17T15:07:21Z

This seems to be missing information that the existing log has (or we need added), e.g. (off the top of my head): on a trap, from which mode to which mode? (I would make "Mode" optional, and only list it when it changes, even if the change is to the same mode (e.g. trap M->M) On a memory reference: is it implicit or explicit, and what kind (e.g. ifetch, pte fetch level L) <--the "L" is important You do list the PC, but unclear which PC, and none of the PTE fetches that lead up to it) On a memory reference, is it a VA, GVA, or PA? This will cause a bit of churn in ISAC primarily, maybe in CTG - don't know enough about that - but should make it easier to maintain. You'll eventually need to add a matrix type....

…

On Tue, Sep 17, 2024 at 5:09 AM Tim Hutt ***@***.***> wrote: Anyone who has access to the config file (and everyone must in order to run the ACTs) will easily be able to query for their value. I don't I understand the interoperability issues though. Right but I think this format should be usable beyond the Sail model, and in other systems that may generate a trace they won't necessarily be using the same config format. In any case it simplifies things with no downside that I can see. ------------------------------ Here is my rough suggestion. I think it has most of the things from our format and the Bluespec format. Pretty much everything can be optional. It won't encode quite as efficiently as the Bluespec format but it should still be quite efficient, and has the advantage of being a standard format. Nobody has to write any encoder/decoder code. It's missing PMAs (probably not necessary initially), a reset event, NMIs, and debug entry (needed if debug is entered due to an external event). syntax = "proto3"; enum Privilege { User = 0; Supervisor = 1; Machine = 2; Debug = 3; } message Event { // Optional sequential event ID starting at 0. optional uint64 id = 1; // Optional clock count (`mcycle`) for this event. optional uint64 cycle = 2; // Privilege mode for this event. optional Privilege privilege = 3; // Set to true for CHERI cores in Capability Pointer mode. bool capability_pointer_mode = 4; // Optional PC (virtual address). May not be set for some events, e.g. debug. optional uint64 pc = 5; // Optional opcode. May not be set for some events, e.g. interrupts. optional uint32 opcode = 6; // Optional details about a trap (if any) for this event. optional Trap trap = 7; // Memory reads and writes. AMOs are logged as a read and write. repeated MemoryOperation memReads = 8; repeated MemoryOperation memWrites = 9; // Register reads and writes. repeated RegisterAccess regReads = 10; repeated RegisterAccess regWrites = 11; } message Trap { uint32 cause = 1; bool is_interrupt = 2; } enum RegisterType { X = 0; F = 1; V = 2; CSR = 3; } message RegisterAccess { RegisterType type = 1; uint16 index = 2; bytes value = 3; // Optional CHERI tag. bool tag = 4; // Can be optionally set to `true` to indicate the value wasn't actually // changed by this write. This means when doing tandem verification that // it won't be an error for the other trace to not include a write to // this register. bool unchanged = 5; } message MemoryOperation { optional uint64 virtual_address = 1; optional uint64 physical_address = 2; optional bytes data = 3; } — Reply to this email directly, view it on GitHub <#545 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AHPXVJV3NZDNH55XNP444ZLZXALYNAVCNFSM6AAAAABN4RY7TWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNJVGU2DAMRXGU> . You are receiving this because you commented.Message ID: ***@***.***>

arichardson · 2024-09-17T15:11:56Z

I like the idea of using protobuf here since it is a nicely extensible format. It's also quite dense when encoded and has bindings for lots of languages. And if you don't have a binding you can always convert it to text format and parse that...

Basically that's what I would have liked for the RVFI trace format v2 that I added, but I didn't know of any format that was nicely supported in Haskell, C, C++ (and OCaml, although with the simulator going away that is less of an issue), so I went for a minimal extension of the existing RVFI trace.

Alasdair · 2024-09-17T19:50:25Z

Internally we have the following type for memory read operations (and a similar one for writes)

enum Access_variety = {
  AV_plain,
  AV_exclusive,
  AV_atomic_rmw
}

enum Access_strength = {
  AS_normal,
  AS_rel_or_acq, // Release or acquire
  AS_acq_rcpc // Release-consistency with processor consistency
}

struct Explicit_access_kind = {
  variety : Access_variety,
  strength : Access_strength
}

union Access_kind('arch_ak : Type) = {
  AK_explicit : Explicit_access_kind,
  AK_ifetch : unit, // Instruction fetch
  AK_ttw : unit, // Translation table walk
  AK_arch : 'arch_ak // Architecture specific type of access
}

struct Mem_read_request('n : Int, 'vasize : Int, 'pa : Type, 'ts : Type, 'arch_ak: Type), 'n > 0 = {
  access_kind : Access_kind('arch_ak),
  // There may not always be a virtual address, e.g. when translation is off.
  // Additionally, translate reads don't have a (VA, PA) pair in the
  // translation relation anyway.
  va : option(bits('vasize)),
  pa : 'pa,
  translation : 'ts,
  size : int('n),
  tag : bool
}

and their are other types for things like cache operations, etc. It seems like that could be translated into some kind of protobuf message format?

Timmmm · 2024-10-23T00:06:17Z

Learned about this today:

https://github.com/sparcians/stf_lib

I haven't looked into it at all yet but it may be worth checking out. We could of course support multiple trace formats if we want.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Towards a defined log/trace output from the Sail model #545

Towards a defined log/trace output from the Sail model #545

rsnikhil commented Sep 9, 2024

rsnikhil commented Sep 9, 2024

rsnikhil commented Sep 9, 2024

rsnikhil commented Sep 9, 2024

PeterSewell commented Sep 9, 2024 via email

PeterRugg commented Sep 9, 2024

rsnikhil commented Sep 9, 2024

rsnikhil commented Sep 9, 2024

allenjbaum commented Sep 9, 2024 via email

jrtc27 commented Sep 9, 2024 •

edited

Loading

PeterSewell commented Sep 9, 2024 via email

allenjbaum commented Sep 9, 2024 via email

Timmmm commented Sep 15, 2024

allenjbaum commented Sep 15, 2024 via email

Timmmm commented Sep 16, 2024

allenjbaum commented Sep 16, 2024 via email

Timmmm commented Sep 16, 2024

allenjbaum commented Sep 16, 2024 via email

Alasdair commented Sep 16, 2024

allenjbaum commented Sep 16, 2024 via email

Timmmm commented Sep 17, 2024

allenjbaum commented Sep 17, 2024 via email

arichardson commented Sep 17, 2024

Alasdair commented Sep 17, 2024

Timmmm commented Oct 23, 2024

Towards a defined log/trace output from the Sail model #545

Towards a defined log/trace output from the Sail model #545

Comments

rsnikhil commented Sep 9, 2024

rsnikhil commented Sep 9, 2024

rsnikhil commented Sep 9, 2024

rsnikhil commented Sep 9, 2024

PeterSewell commented Sep 9, 2024 via email

PeterRugg commented Sep 9, 2024

rsnikhil commented Sep 9, 2024

rsnikhil commented Sep 9, 2024

allenjbaum commented Sep 9, 2024 via email

jrtc27 commented Sep 9, 2024 • edited Loading

PeterSewell commented Sep 9, 2024 via email

allenjbaum commented Sep 9, 2024 via email

Timmmm commented Sep 15, 2024

allenjbaum commented Sep 15, 2024 via email

Timmmm commented Sep 16, 2024

allenjbaum commented Sep 16, 2024 via email

Timmmm commented Sep 16, 2024

allenjbaum commented Sep 16, 2024 via email

Alasdair commented Sep 16, 2024

allenjbaum commented Sep 16, 2024 via email

Timmmm commented Sep 17, 2024

allenjbaum commented Sep 17, 2024 via email

arichardson commented Sep 17, 2024

Alasdair commented Sep 17, 2024

Timmmm commented Oct 23, 2024

jrtc27 commented Sep 9, 2024 •

edited

Loading