Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Towards a defined log/trace output from the Sail model #545

Open
rsnikhil opened this issue Sep 9, 2024 · 24 comments
Open

Towards a defined log/trace output from the Sail model #545

rsnikhil opened this issue Sep 9, 2024 · 24 comments

Comments

@rsnikhil
Copy link
Collaborator

rsnikhil commented Sep 9, 2024

This first comment just describes the issue(s). Follow-up comments
discuss possible solutions.

Currently the Sail model writes out a log file in ASCII format. Issues:

  • Logs can become very large (a binary format would be much smaller)

  • Downstream tools (verification, trace-driven simulation, ...) have
    to parse ASCII text to recreate the log data

  • We haven't been precise about defining a log format, and we update
    it as needed in an ad hoc fashion. As a result downstream tools
    break when we make such changes.

@rsnikhil
Copy link
Collaborator Author

rsnikhil commented Sep 9, 2024

I have looked at 4 public RISC-V specifications for "trace/log", and
none of them seem suitable for a full trace output from the Sail
model.

I recommend the Sail model use a 5th alternative, "Bluespec trace",
which I will discuss in a follow-up comment.

  1. RVI E-Trace ("Efficient Trace")
    https://github.com/riscv-non-isa/riscv-trace-spec/releases/download/v2.0.3/riscv-trace-spec-asciidoc.pdf
    From RVI; ratified

  2. RVI N-Trace ("Nexus Trace") https://github.com/riscv/tg-nexus-trace
    From RVI; ratified (nor nearing ratification)

  3. RVFI ("RISC-V Formal Verification Interface")
    https://github.com/SymbioticEDA/riscv-formal From SymbioticEDA

  4. RVVI ("RISC-V Verification Interface")
    https://github.com/riscv-verification/RVVI From Imperas originally
    (now part of Synopsys), but now public. Generated by the
    Imperas/Synopsys simulator. Used by OpenHWGroup.

E-Trace and N-Trace:

  • focus on compressed control-flow trace (recording only
    "unpredictable" branch/jump/trap/xRET)

  • rely on access to the original ELF file in order to reconstruct the
    full control trace

  • are not suitable for self-modifying code/JIT where there is no ELF file

  • do not capture arch-state updates (GPRs, FPRs, CSRs). E-Trace has a
    section for capturing memory updates.

RVFI and RVVI are similar (RVVI's README calls itself a kind of
superset of RVFI). They specify (nearly) all architectural state
updates that we'd need in a log, but they are not very suitable for
recording in a file, because it is VERY wide (e.g., all 32 GPR values,
all 32 FPR values, all defined CSR values, ...).

@rsnikhil
Copy link
Collaborator Author

rsnikhil commented Sep 9, 2024

Bluespec, Inc. has defined a trace model (PDF attached) which could be used by the Sail model.

Bluespec, Inc. is ready to contribute this spec to RVI as an open spec.

  • It is a binary format
  • Fields are optional, so can be dialled from minimal trace to verbose trace
  • Can capture all standard arch state ("GC") updates (not yet spec'd for Vector)
  • Can capture traps/interrupts
  • Can capture additional intermediate state (e.g., CSR attempted and actual update,
    AMO pre- and post- values, ...)

This has been in use for some years inside Bluespec, Inc. for a number
of HW designs and simulators.

2020-03-10_trace-protocol.pdf

@rsnikhil
Copy link
Collaborator Author

rsnikhil commented Sep 9, 2024

Bluespec, Inc. is also happy to contribute to RVI a C program (which already exists) which reads and parses the binary trace and prints it in human-readable format.

This could also be restructured into a read-and-parse library + a human-readable-printer, so that the former can be used in other tools that need to read the trace.

@PeterSewell
Copy link
Collaborator

PeterSewell commented Sep 9, 2024 via email

@PeterRugg
Copy link
Contributor

@PeterSewell In TestRIG we use RVFI, as documented here https://github.com/CTSRD-CHERI/TestRIG/blob/master/RVFI-DII.md#rvfi-dii-execution-packet-88-bytes. I guess a pro is that it's supported by the model already, though may need a little work to ensure there's nothing TestRIG specific.

but they are not very suitable for
recording in a file, because it is VERY wide (e.g., all 32 GPR values,
all 32 FPR values, all defined CSR values, ...)

That's certainly not true for RVFI as we implement it. Maybe that's new with RVVI? I can't see anything like that in https://github.com/SymbioticEDA/riscv-formal/blob/master/docs/rvfi.md either, except it does have optional ports per supported CSR, since lots of them can be updated as side-effects. Each packet is 88 bytes, though some updates, most notably CSR writes, are not directly reported in that trace.

Fields are optional, so can be dialled from minimal trace to verbose trace

That's a neat feature. We currently don't have that for TestRIG, and it would break the binary format to support.

Bluespec, Inc. is also happy to contribute to RVI a C program (which already exists) which reads and parses the binary trace and prints it in human-readable format.

We parse RVFI packets and print them in https://github.com/CTSRD-CHERI/QuickCheckVEngine/blob/master/src/QuickCheckVEngine/RVFI_DII/RVFI.hs, though that's in Haskell, and is mostly focussed on checking equivalence between two traces.

@rsnikhil
Copy link
Collaborator Author

rsnikhil commented Sep 9, 2024

Thanks for the link for TestRIG's RVFI spec (I had not seen that before).

That's certainly not true for RVFI as we implement it

You're right. In RVFI, a GPR update is defined as:

output [NRET *    5 - 1 : 0] rvfi_rd_addr
output [NRET * XLEN - 1 : 0] rvfi_rd_wdata

with 'rvfi_rd_addr' == 0 if Rd is not updated, (and NRET>1 only for superscalar retirement).

But it's still an XLEN+5-wide output even if Rd is not updated (signalled by 'rvfi_rd_addr' == 0)

For CSRs, there seems to be four XLEN-wide buses for each defined CSR.


In RVVI (file rvviTrace.sv), they have:

// X Registers
    wire [31:0][(XLEN-1):0]   x_wdata    [(NHART-1):0][(RETIRE-1):0];   // X data value
    wire [31:0]               x_wb       [(NHART-1):0][(RETIRE-1):0];   // X data writeback (change) flag

// Control and Status Registers
    wire [4095:0][(XLEN-1):0] csr        [(NHART-1):0][(RETIRE-1):0];   // Full CSR Address range
    wire [4095:0]             csr_wb     [(NHART-1):0][(RETIRE-1):0];   // CSR writeback (change) flag

i.e., all 32 registers, each with a valid bit (and similarly for FPRs), and all 4096 possible CSRs.

@rsnikhil
Copy link
Collaborator Author

rsnikhil commented Sep 9, 2024

We parse RVFI packets and print them in though that's in Haskell

(I like that! but) I think the Sail model audience's preference for such a reader is likely to be C and Python.

@allenjbaum
Copy link
Collaborator

allenjbaum commented Sep 9, 2024 via email

@jrtc27
Copy link
Collaborator

jrtc27 commented Sep 9, 2024

For RVFI it's worth noting that what's in the spec is heavily aimed towards the physical hardware interface, where it's simpler to have a whole bunch of wires that may or may not be used. Having a smarter encoding of that same data for software would make a lot of sense (e.g. not encoding wdata if rd is 0).

@PeterSewell
Copy link
Collaborator

PeterSewell commented Sep 9, 2024 via email

@allenjbaum
Copy link
Collaborator

allenjbaum commented Sep 9, 2024 via email

@Timmmm
Copy link
Collaborator

Timmmm commented Sep 15, 2024

I had a brief skim through Bluespec's format. It looks pretty well designed. I don't think we can use it as-is though, and I would favour an encoding that was more standard - I think Protobuf is probably the best choice.

I think there are three things that have to happen:

  1. We need to complete Implementing callbacks in sail-riscv for state-changing events #449 (state callbacks).
  2. We need to define the semantics of what is captured in the binary log.
  3. We need to define the encoding.

With regard to the semantics, I can comment on some random things I've learned from my experience with Codasip's system:

  1. Registers aren't just 32/64-bit. Vector registers can be very long. There's also CHERI (65 or 129 bits). Basically you need to be able to store an arbitrarily long bit vector.
  2. You need to be able to store any number of register writes and memory accesses in each step.
  3. Store register identities by index and type (i.e. X, 5, CSR, 53, V, 14) not by name (mstatus, x3).
  4. Our system doesn't do this, but it would be extremely helpful if register writes also have a flag to say whether or not the value written to registers actually changed them. Otherwise you find yourself having to align models on things like whether fflags gets recorded as a register write in different situations.
  5. It's very helpful to record physical and virtual addresses for each memory operation, and also PMAs but that's a little trickier since they aren't standard.
  6. We calculate architectural coverage from the log file, however one thing that makes it difficult is that it doesn't store register input values for instructions. Technically that could be redundant information - you can maybe go back in the trace and figure it out - but that's a bit painful. It would be much nicer if you could just record that information directly in each step.

@allenjbaum
Copy link
Collaborator

allenjbaum commented Sep 15, 2024 via email

@Timmmm
Copy link
Collaborator

Timmmm commented Sep 16, 2024

Yes all this stuff can often be figured out from the previous events, but it makes it a fair bit more complex and fragile. But I agree with your approach that most things should be optional so we can just make this information optional too if people want to make that trade-off.

By the way I forgot one extra thing:

  1. You need a "reset" event, and that event should include the writes due to reset. The trace should begin with one of those events, and the first one should include the values of all registers.

Im not sure where the 53 comes from in the above

Those were just examples of register indices.

@allenjbaum
Copy link
Collaborator

allenjbaum commented Sep 16, 2024 via email

@Timmmm
Copy link
Collaborator

Timmmm commented Sep 16, 2024

and there are very few registers that have defined reset values.

Indeed. But the ones that are should be in the log. (Incidentally the way the model handles resets at the moment is a bit of a mess, but I have some code to clean that up that I'll send when I get around to it).

The intent is that SAIL read the config file and initialize those reset
values (or leave them undefined), so those don't need to be output in the
log. - they're constants.

Sail can't leave them undefined - at least the simulator output has no concept of undefined like SystemVerilog's x. And again while you may be able to reconstruct their values from the config file, it's much more convenient if they are in the log file directly. It also means interoperability with other systems that want to use this file format is much better.

@allenjbaum
Copy link
Collaborator

allenjbaum commented Sep 16, 2024 via email

@Alasdair
Copy link
Collaborator

undefined in Sail is similar X in Verilog, but it propagates more precisely. For constructing an executable simulator, each unknown bit will generally just become 0 however, so undefined values are mostly useful for theorem provers and symbolic execution. Most of the languages we want to target don't natively have 4-value logic like Verilog, so we have to be more conservative.

@allenjbaum
Copy link
Collaborator

allenjbaum commented Sep 16, 2024 via email

@Timmmm
Copy link
Collaborator

Timmmm commented Sep 17, 2024

Anyone who has access to the config file (and everyone must in order to run the ACTs) will easily be able to query for their value. I don't I understand the interoperability issues though.

Right but I think this format should be usable beyond the Sail model, and in other systems that may generate a trace they won't necessarily be using the same config format. In any case it simplifies things with no downside that I can see.


Here is my rough suggestion. I think it has most of the things from our format and the Bluespec format. Pretty much everything can be optional.

It won't encode quite as efficiently as the Bluespec format but it should still be quite efficient, and has the advantage of being a standard format. Nobody has to write any encoder/decoder code.

It's missing PMAs (probably not necessary initially), a reset event, NMIs, and debug entry (needed if debug is entered due to an external event).

syntax = "proto3";

enum Privilege {
    User = 0;
    Supervisor = 1;
    Machine = 2;
    Debug = 3;
}

message Event {
    // Optional sequential event ID starting at 0.
    optional uint64 id = 1;

    // Optional clock count (`mcycle`) for this event.
    optional uint64 cycle = 2;

    // Privilege mode for this event.
    optional Privilege privilege = 3;

    // Set to true for CHERI cores in Capability Pointer mode.
    bool capability_pointer_mode = 4;

    // Optional PC (virtual address). May not be set for some events, e.g. debug.
    optional uint64 pc = 5;

    // Optional opcode. May not be set for some events, e.g. interrupts.
    optional uint32 opcode = 6;

    // Optional details about a trap (if any) for this event.
    optional Trap trap = 7;

    // Memory reads and writes. AMOs are logged as a read and write.
    repeated MemoryOperation memReads = 8;
    repeated MemoryOperation memWrites = 9;

    // Register reads and writes.
    repeated RegisterAccess regReads = 10;
    repeated RegisterAccess regWrites = 11;
}

message Trap {
    uint32 cause = 1;
    bool is_interrupt = 2;
}

enum RegisterType {
    X = 0;
    F = 1;
    V = 2;
    CSR = 3;
}

message RegisterAccess {
    RegisterType type = 1;
    uint16 index = 2;
    bytes value = 3;

    // Optional CHERI tag.
    bool tag = 4;

    // Can be optionally set to `true` to indicate the value wasn't actually
    // changed by this write. This means when doing tandem verification that
    // it won't be an error for the other trace to not include a write to
    // this register.
    bool unchanged = 5;
}

message MemoryOperation {
    optional uint64 virtual_address = 1;
    optional uint64 physical_address = 2;
    optional bytes data = 3;
}

@allenjbaum
Copy link
Collaborator

allenjbaum commented Sep 17, 2024 via email

@arichardson
Copy link
Collaborator

I like the idea of using protobuf here since it is a nicely extensible format. It's also quite dense when encoded and has bindings for lots of languages. And if you don't have a binding you can always convert it to text format and parse that...

Basically that's what I would have liked for the RVFI trace format v2 that I added, but I didn't know of any format that was nicely supported in Haskell, C, C++ (and OCaml, although with the simulator going away that is less of an issue), so I went for a minimal extension of the existing RVFI trace.

@Alasdair
Copy link
Collaborator

Internally we have the following type for memory read operations (and a similar one for writes)

enum Access_variety = {
  AV_plain,
  AV_exclusive,
  AV_atomic_rmw
}

enum Access_strength = {
  AS_normal,
  AS_rel_or_acq, // Release or acquire
  AS_acq_rcpc // Release-consistency with processor consistency
}

struct Explicit_access_kind = {
  variety : Access_variety,
  strength : Access_strength
}

union Access_kind('arch_ak : Type) = {
  AK_explicit : Explicit_access_kind,
  AK_ifetch : unit, // Instruction fetch
  AK_ttw : unit, // Translation table walk
  AK_arch : 'arch_ak // Architecture specific type of access
}

struct Mem_read_request('n : Int, 'vasize : Int, 'pa : Type, 'ts : Type, 'arch_ak: Type), 'n > 0 = {
  access_kind : Access_kind('arch_ak),
  // There may not always be a virtual address, e.g. when translation is off.
  // Additionally, translate reads don't have a (VA, PA) pair in the
  // translation relation anyway.
  va : option(bits('vasize)),
  pa : 'pa,
  translation : 'ts,
  size : int('n),
  tag : bool
}

and their are other types for things like cache operations, etc. It seems like that could be translated into some kind of protobuf message format?

@Timmmm
Copy link
Collaborator

Timmmm commented Oct 23, 2024

Learned about this today:

https://github.com/sparcians/stf_lib

I haven't looked into it at all yet but it may be worth checking out. We could of course support multiple trace formats if we want.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants