Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TZX updates - mostly for easier JSON parsing #990

Merged
merged 4 commits into from
Aug 19, 2024
Merged

Conversation

mrcook
Copy link
Contributor

@mrcook mrcook commented Aug 15, 2024

While trying to test a bunch of tape images (over 21K files) I encountered some issues. Parsing the JSON output was also being a bit annoying, so I made changes to make that easier.

  • quite a few tapes utilise unofficial TAP block types, which don't trouble the emulators, so I've added support for these.
  • all of the data fields have been changed to be an array of U8 values.
  • I've wrapped the TAP blocks in a FieldStruct so the JSON can be more easily parsed.
  • the Hardware block field names were too cryptic, so I've renamed them.

To make it easier to parse the JSON output the header/data blocks are
now wrapped in a FieldStruct, and data fields changed to be an array
of uint8 values
case 1:
d.FieldRawLen("data", 8)
d.FieldStruct("data", func(d *decode.D) {
d.FieldArray("bytes", func(d *decode.D) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if an alternative to this is some improvements of how raw fields can be turned into JSON? maybe easiest to show a demo how it works at the moment. Here i use [1,2,3] | tobytes to get a raw field aka a "binary"

$ fq -n -o bits_format=md5 '[1,2,3] | tobytes'
   │00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d│0123456789abcd│
0x0│01 02 03│                                │...│          │.: raw bits 0x0-0x3 (3)
$ fq -n -o bits_format=base64 -V '[1,2,3] | tobytes'
"AQID"
$ fq -n -o bits_format=string -V '[1,2,3] | tobytes' # this is the default
"\u0001\u0002\u0003"

The alternative could be to add a -o bits_format=array etc that turns binary into a byte array? so it would be something like this:

$ fq -n -o bits_format=array '[1,2,3] | tobytes'
[1,2,3]

but maybe array is not good name, maybe "bytes_array"? 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did see that tobytes, but it seems to only be useful when you want to pull out just one field. For my tests I would just run fq -d tzx -V d file.tzx but that messes up the JSON.

Is there a way to use that tobytes when generating the whole thing as a JSON?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeap the bits_format option controls how binary values (note they store bits not bytes, but are usually zero bit padded to be byte aligned) should be represented as JSON. I made quick hack to add a bytes_array options here https://github.com/wader/fq/tree/bits-format-bytes-array

$ go run . -o bits_format=bytes_array -d tzx -V d format/tzx/testdata/basic_prog1.tzx
{
  "blocks": [
    <cut>
    {
      "pause": 1000,
      "tap": {
        "blocks": [
          {
            "data": {
              "checksum": 182,
              "data": [
                0,
                10,
                20,
                0,
                <cut>
              ],
              "flag": "standard_speed_data"
            },
            "length": 40
          }
        ]
      },
      "type": "standard_speed_data"
    }
  ],
  "major_version": 1,
  "minor_version": 20,
  "signature": [
    90,
    88,
    84,
    97,
    112,
    101,
    33,
    26
  ]
}

The tobytes function can used to convert things into a binary, strings, array of numbers etc. But one can use the explode function on a binary to gets bytes... there is even a tobits

$ fq -cn '"åäö" | ., tobytes, tobits | explode'
[229,228,246] # codepoints in string, normal jq behaviour
[195,165,195,164,195,182] # bytes
[1,1,0,0,0,0,1,1,1,0,1,0,0,1,0,1,1,1,0,0,0,0,1,1,1,0,1,0,0,1,0,0,1,1,0,0,0,0,1,1,1,0,1,1,0,1,1,0] # bits

Sorry that all this is a bit undocumented, is still a bit work in progress but i haven't worked on it much lately

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem Mattias, and thanks for the info. I'll check it out later.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that would be best, it's a bit sad that JSON has no binary safe type :( so it's a bit of a tradeoff how to represent it, always a byte array would probably be inconvenient with other formats.

But it's probably a good idea to document your uses cases as examples

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Mattias, is this something I should leave to you, or do you want me to add it to my PR?

Regarding the naming: bytes_array seems reasonable.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great 👍 will review tomorrow and you can leave it to me do a proper bytes_array PR tomorrow with docs and tests and also

d.FieldRawLen("data", 8)
d.FieldStruct("data", func(d *decode.D) {
d.FieldRawLen("bytes", 8)
})
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be d.FieldRawLen again?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, mis-read. Is the struct around it needed?

Copy link
Contributor Author

@mrcook mrcook Aug 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to wrap each type of block to be able to distinguish between header and data blocks.

As a 1-byte data block there is no flag or checksum byte so this can not be passed to the decodeDataBlock() without that handling the length. So the struct is still useful to know the block type, and aid in JSON parsing.

I suppose another block type could be introduced, but that seems overkill for what is actually a rare occurrence.

Edited to add: I know of no useful purpose for such a 1-byte block

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aha i see, makes sense

Copy link
Owner

@wader wader left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me 👍 just had one comment about a struct

@wader
Copy link
Owner

wader commented Aug 19, 2024

Ok to merge?

@mrcook
Copy link
Contributor Author

mrcook commented Aug 19, 2024

Ok to merge?

Yes, please do! And again, thanks Mattias!

@wader wader merged commit 1fac951 into wader:master Aug 19, 2024
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants