Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TZX updates - mostly for easier JSON parsing #990

Merged
merged 4 commits into from
Aug 19, 2024
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
56 changes: 44 additions & 12 deletions format/tap/tap.go
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ import (
"bufio"
"bytes"
"embed"
"fmt"

"golang.org/x/text/encoding/charmap"

Expand Down Expand Up @@ -54,9 +55,13 @@ func decodeTapBlock(d *decode.D) {
// read header, fragment, or data block
switch length {
case 0:
// fragment with no data
d.Fatalf("TAP fragments with 0 bytes are not supported")
case 1:
d.FieldRawLen("data", 8)
d.FieldStruct("data", func(d *decode.D) {
d.FieldArray("bytes", func(d *decode.D) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if an alternative to this is some improvements of how raw fields can be turned into JSON? maybe easiest to show a demo how it works at the moment. Here i use [1,2,3] | tobytes to get a raw field aka a "binary"

$ fq -n -o bits_format=md5 '[1,2,3] | tobytes'
   │00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d│0123456789abcd│
0x0│01 02 03│                                │...│          │.: raw bits 0x0-0x3 (3)
$ fq -n -o bits_format=base64 -V '[1,2,3] | tobytes'
"AQID"
$ fq -n -o bits_format=string -V '[1,2,3] | tobytes' # this is the default
"\u0001\u0002\u0003"

The alternative could be to add a -o bits_format=array etc that turns binary into a byte array? so it would be something like this:

$ fq -n -o bits_format=array '[1,2,3] | tobytes'
[1,2,3]

but maybe array is not good name, maybe "bytes_array"? 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did see that tobytes, but it seems to only be useful when you want to pull out just one field. For my tests I would just run fq -d tzx -V d file.tzx but that messes up the JSON.

Is there a way to use that tobytes when generating the whole thing as a JSON?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeap the bits_format option controls how binary values (note they store bits not bytes, but are usually zero bit padded to be byte aligned) should be represented as JSON. I made quick hack to add a bytes_array options here https://github.com/wader/fq/tree/bits-format-bytes-array

$ go run . -o bits_format=bytes_array -d tzx -V d format/tzx/testdata/basic_prog1.tzx
{
  "blocks": [
    <cut>
    {
      "pause": 1000,
      "tap": {
        "blocks": [
          {
            "data": {
              "checksum": 182,
              "data": [
                0,
                10,
                20,
                0,
                <cut>
              ],
              "flag": "standard_speed_data"
            },
            "length": 40
          }
        ]
      },
      "type": "standard_speed_data"
    }
  ],
  "major_version": 1,
  "minor_version": 20,
  "signature": [
    90,
    88,
    84,
    97,
    112,
    101,
    33,
    26
  ]
}

The tobytes function can used to convert things into a binary, strings, array of numbers etc. But one can use the explode function on a binary to gets bytes... there is even a tobits

$ fq -cn '"åäö" | ., tobytes, tobits | explode'
[229,228,246] # codepoints in string, normal jq behaviour
[195,165,195,164,195,182] # bytes
[1,1,0,0,0,0,1,1,1,0,1,0,0,1,0,1,1,1,0,0,0,0,1,1,1,0,1,0,0,1,0,0,1,1,0,0,0,0,1,1,1,0,1,1,0,1,1,0] # bits

Sorry that all this is a bit undocumented, is still a bit work in progress but i haven't worked on it much lately

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem Mattias, and thanks for the info. I'll check it out later.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that would be best, it's a bit sad that JSON has no binary safe type :( so it's a bit of a tradeoff how to represent it, always a byte array would probably be inconvenient with other formats.

But it's probably a good idea to document your uses cases as examples

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Mattias, is this something I should leave to you, or do you want me to add it to my PR?

Regarding the naming: bytes_array seems reasonable.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great 👍 will review tomorrow and you can leave it to me do a proper bytes_array PR tomorrow with docs and tests and also

d.FieldU8("byte")
})
})
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be d.FieldRawLen again?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, mis-read. Is the struct around it needed?

Copy link
Contributor Author

@mrcook mrcook Aug 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to wrap each type of block to be able to distinguish between header and data blocks.

As a 1-byte data block there is no flag or checksum byte so this can not be passed to the decodeDataBlock() without that handling the length. So the struct is still useful to know the block type, and aid in JSON parsing.

I suppose another block type could be introduced, but that seems overkill for what is actually a rare occurrence.

Edited to add: I know of no useful purpose for such a 1-byte block

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aha i see, makes sense

case 19:
d.FieldStruct("header", func(d *decode.D) {
decodeHeader(d)
Expand All @@ -72,15 +77,34 @@ func decodeTapBlock(d *decode.D) {
func decodeHeader(d *decode.D) {
blockStartPosition := d.Pos()

// Always 0: byte indicating a standard ROM loading header
d.FieldU8("flag", scalar.UintMapSymStr{0: "standard_speed_data"})
// flag indicating the type of header block, usually 0 (standard speed data)
d.FieldU8("flag", scalar.UintFn(func(s scalar.Uint) (scalar.Uint, error) {
if s.Actual == 0x00 {
s.Sym = "standard_speed_data"
} else {
s.Sym = "custom_data_block"
}
return s, nil
}))

// Header type
dataType := d.FieldU8("data_type", scalar.UintMapSymStr{
0x00: "program",
0x01: "numeric",
0x02: "alphanumeric",
0x03: "data",
})
dataType := d.FieldU8("data_type", scalar.UintFn(func(s scalar.Uint) (scalar.Uint, error) {
switch s.Actual {
case 0x00:
s.Sym = "program"
case 0x01:
s.Sym = "numeric"
case 0x02:
s.Sym = "alphanumeric"
case 0x03:
s.Sym = "data"
default:
// unofficial header types
s.Sym = fmt.Sprintf("unknown%02X", s.Actual)
}
return s, nil
}))

// Loading name of the program. Filled with spaces (0x20) to 10 characters.
d.FieldStr("program_name", 10, charmap.ISO8859_1)

Expand Down Expand Up @@ -120,7 +144,10 @@ func decodeHeader(d *decode.D) {
// UnusedWord: 32768.
d.FieldU16("unused")
default:
d.Fatalf("invalid TAP header type, got: %d", dataType)
// Unofficial header types
d.FieldU16("data_length")
d.FieldU16("unknown1", scalar.UintHex)
d.FieldU16("unknown2", scalar.UintHex)
}

// Simply all bytes XORed (including flag byte).
Expand All @@ -140,7 +167,12 @@ func decodeDataBlock(d *decode.D, length uint64) {
return s, nil
}))
// The essential data: length minus the flag/checksum bytes (may be empty)
d.FieldRawLen("data", int64(length-2)*8)
d.FieldArray("bytes", func(d *decode.D) {
for i := uint64(0); i < length-2; i++ {
d.FieldU8("byte")
}
})

// Simply all bytes (including flag byte) XORed
d.FieldU8("checksum", d.UintValidate(calculateChecksum(d, blockStartPosition, d.Pos()-blockStartPosition)), scalar.UintHex)
}
Expand Down
42 changes: 39 additions & 3 deletions format/tap/testdata/basic_prog1.fqtest
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,43 @@ $ fq -d tap dv basic_prog1.tap
0x10| 28 00 | (. | length: 40 0x15-0x17 (2)
| | | data{}: 0x17-0x3f (40)
0x10| ff | . | flag: "standard_speed_data" (255) 0x17-0x18 (1)
0x10| 00 0a 14 00 20 f5 22 66| .... ."f| data: raw bits 0x18-0x3e (38)
0x20|71 20 69 73 20 74 68 65 20 62 65 73 74 21 22 0d|q is the best!".|
0x30|00 14 0a 00 ec 31 30 0e 00 00 0a 00 00 0d |.....10....... |
| | | bytes[0:38]: 0x18-0x3e (38)
0x10| 00 | . | [0]: 0 byte 0x18-0x19 (1)
0x10| 0a | . | [1]: 10 byte 0x19-0x1a (1)
0x10| 14 | . | [2]: 20 byte 0x1a-0x1b (1)
0x10| 00 | . | [3]: 0 byte 0x1b-0x1c (1)
0x10| 20 | | [4]: 32 byte 0x1c-0x1d (1)
0x10| f5 | . | [5]: 245 byte 0x1d-0x1e (1)
0x10| 22 | " | [6]: 34 byte 0x1e-0x1f (1)
0x10| 66| f| [7]: 102 byte 0x1f-0x20 (1)
0x20|71 |q | [8]: 113 byte 0x20-0x21 (1)
0x20| 20 | | [9]: 32 byte 0x21-0x22 (1)
0x20| 69 | i | [10]: 105 byte 0x22-0x23 (1)
0x20| 73 | s | [11]: 115 byte 0x23-0x24 (1)
0x20| 20 | | [12]: 32 byte 0x24-0x25 (1)
0x20| 74 | t | [13]: 116 byte 0x25-0x26 (1)
0x20| 68 | h | [14]: 104 byte 0x26-0x27 (1)
0x20| 65 | e | [15]: 101 byte 0x27-0x28 (1)
0x20| 20 | | [16]: 32 byte 0x28-0x29 (1)
0x20| 62 | b | [17]: 98 byte 0x29-0x2a (1)
0x20| 65 | e | [18]: 101 byte 0x2a-0x2b (1)
0x20| 73 | s | [19]: 115 byte 0x2b-0x2c (1)
0x20| 74 | t | [20]: 116 byte 0x2c-0x2d (1)
0x20| 21 | ! | [21]: 33 byte 0x2d-0x2e (1)
0x20| 22 | " | [22]: 34 byte 0x2e-0x2f (1)
0x20| 0d| .| [23]: 13 byte 0x2f-0x30 (1)
0x30|00 |. | [24]: 0 byte 0x30-0x31 (1)
0x30| 14 | . | [25]: 20 byte 0x31-0x32 (1)
0x30| 0a | . | [26]: 10 byte 0x32-0x33 (1)
0x30| 00 | . | [27]: 0 byte 0x33-0x34 (1)
0x30| ec | . | [28]: 236 byte 0x34-0x35 (1)
0x30| 31 | 1 | [29]: 49 byte 0x35-0x36 (1)
0x30| 30 | 0 | [30]: 48 byte 0x36-0x37 (1)
0x30| 0e | . | [31]: 14 byte 0x37-0x38 (1)
0x30| 00 | . | [32]: 0 byte 0x38-0x39 (1)
0x30| 00 | . | [33]: 0 byte 0x39-0x3a (1)
0x30| 0a | . | [34]: 10 byte 0x3a-0x3b (1)
0x30| 00 | . | [35]: 0 byte 0x3b-0x3c (1)
0x30| 00 | . | [36]: 0 byte 0x3c-0x3d (1)
0x30| 0d | . | [37]: 13 byte 0x3d-0x3e (1)
0x30| b6| | .|| checksum: 0xb6 (valid) 0x3e-0x3f (1)
Loading
Loading