Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Images in gt tables are not processed correctly by quarto into Typst #11829

Open
Jocarnail opened this issue Jan 9, 2025 · 16 comments
Open

Images in gt tables are not processed correctly by quarto into Typst #11829

Jocarnail opened this issue Jan 9, 2025 · 16 comments
Assignees
Labels
bug Something isn't working tables Issues with Tables including the gt integration typst upstream Bug is in upstream library
Milestone

Comments

@Jocarnail
Copy link

Jocarnail commented Jan 9, 2025

Bug description

I have found an issue while trying to use gt for tables with plots in Typst.
The plots that are created in the text_transform function are copied into the filename_files/mediabag folder incorrectly. Some of all of them are broken and they appears incomplete. png and svg files seems truncated and jpg files have a broken encoding.

I was able to reproduce the behaviour with the following gt functions: ggplot_image, local_image, fmt_image.

The issue may be related to Pandoc and the HTML processing that is performed on tables. However, when I tried using the html-table-processing: none option the table simply did not appear.

In the .typ output file it is clear that the issue is related to pointing the image to a broken copy of the picture, which happens even if a static image is provided.

Steps to reproduce

---
format: 
    typst:
        keep-typ: true

---

```{r, echo=FALSE, warning=FALSE, message=FALSE}
plot_timeline <- function(T){
    tibble(x = seq(1,5), y = x^2) |> 
        ggplot(aes(
            x = x,
            y = y
        )) + 
            geom_line()
}
```

```{r, echo=FALSE, warning=FALSE, message=FALSE}
#| label: tbl-example
#| tbl-cap: This is an example table

library(gt)
library(tidyverse)

tibble(Things= seq(1,5)) |>  
    mutate(trans = Things) |> 
    gt() |> 
    text_transform(
        locations = cells_body(columns="trans"),
        fn = function(col){
            local_image(test_image("png"))
        }
    )
```

Expected behavior

Pictures that are copied into filename_files/mediabag should be identical to the original

Actual behavior

Pictures that are copied into filename_files/mediabag are broken, possibly truncated.

Error message

processing file: report.qmd
1/4                  
2/4 [unnamed-chunk-1]
3/4                  
4/4 [tbl-example]    
output file: report.knit.md

pandoc 
  to: typst
  output-file: report.typ
  standalone: true
  shift-heading-level-by: -1
  default-image-extension: svg
  wrap: none
  citeproc: false
  
[typst]: Compiling report.typ to report.pdf...error: failed to decode image (Format error decoding Png: Corrupt deflate stream. BadCodeLengthHuffmanTree)
    ┌─ report.typ:306:199
    │
306 │   table.cell(align: horizon + right, stroke: (top: (paint: rgb("#d3d3d3"), thickness: 0.75pt)))[1], table.cell(align: horizon + right, stroke: (top: (paint: rgb("#d3d3d3"), thickness: 0.75pt)))[#box(image("report_files/mediabag/H8NW74WLNzJwAAAAABJR.png"))],
    │                                                                                                                                                                                                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Your environment

  • IDE: Positron Version: 2025.01.0 (Universal) build 152
  • OS: macOS 15.0.1 (24A348)

Quarto check output

Quarto 1.7.7
[✓] Checking environment information...
      Quarto cache location: /Users/filippofronza/Library/Caches/quarto
[✓] Checking versions of quarto binary dependencies...
      Pandoc version 3.4.0: OK
      Dart Sass version 1.70.0: OK
      Deno version 1.46.3: OK
      Typst version 0.11.0: OK
[✓] Checking versions of quarto dependencies......OK
[✓] Checking Quarto installation......OK
      Version: 1.7.7
      Path: /Applications/quarto/bin

[✓] Checking tools....................OK
      TinyTeX: v2024.10
      Chromium: (not installed)

[✓] Checking LaTeX....................OK
      Using: TinyTex
      Path: /Users/filippofronza/Library/TinyTeX/bin/universal-darwin
      Version: 2024

[✓] Checking basic markdown render....OK

[✓] Checking Python 3 installation....OK
      Version: 3.13.1
      Path: /opt/homebrew/opt/[email protected]/bin/python3.13
      Jupyter: 5.7.2
      Kernels: python3

(|) Checking Jupyter engine render....Traceback (most recent call last):
  File "/Applications/quarto/share/jupyter/jupyter.py", line 21, in <module>
    from notebook import notebook_execute, RestartKernel
  File "/Applications/quarto/share/jupyter/notebook.py", line 15, in <module>
    from yaml import safe_load as parse_string
ModuleNotFoundError: No module named 'yaml'
[✓] Checking Jupyter engine render....OK

Note: I am on the pre-release version right now because updated in the hope of resolving the issue. The issue was the same in the 1.6 version I was using before.

@Jocarnail Jocarnail added the bug Something isn't working label Jan 9, 2025
@cscheid
Copy link
Collaborator

cscheid commented Jan 9, 2025

I can repro this, thanks for the report. That's very strange.

@cscheid cscheid added tables Issues with Tables including the gt integration typst labels Jan 9, 2025
@cscheid
Copy link
Collaborator

cscheid commented Jan 9, 2025

The issue may be related to Pandoc and the HTML processing that is performed on tables. However, when I tried using the html-table-processing: none option the table simply did not appear.

That's not surprising. GT emits HTML and Quarto needs to convert the table to Pandoc's AST format in order for it to have any chance to show up in Typst!

@cscheid
Copy link
Collaborator

cscheid commented Jan 9, 2025

GT (understandably) emits these images as data URIs; I wonder if this is fundamentally the issue.

@cscheid
Copy link
Collaborator

cscheid commented Jan 9, 2025

A markdown image encoded in a data URI does not cause Typst to crash:

Input

---
format: typst
---

![Test](...)

Output

Image

@gordonwoodhull
Copy link
Contributor

I did not attempt to test images inside gt when implementing Typst CSS / html table processing for Typst. It would be nice to have sparklines and such. Would think data URIs would work.

@cscheid
Copy link
Collaborator

cscheid commented Jan 9, 2025

They do work in a simplified setting outside of GT:

---
format: typst
---

```{=html}
<table><tr><td>1</td><td>
<img role="img" src="" />
</td></tr></table>
```

That is a wholly-white PNG image, and this compiles to a table without a problem.

@cscheid
Copy link
Collaborator

cscheid commented Jan 9, 2025

Copy-pasting the image from the GT output into my simple table reintroduces the bug. So this is not GT: it's either Pandoc or Quarto.

@gordonwoodhull
Copy link
Contributor

gordonwoodhull commented Jan 9, 2025

I can repro with your simplified example. On a hunch, I disabled juice, and both examples succeed.

So it's either juice or my glue code around it.

@cscheid
Copy link
Collaborator

cscheid commented Jan 9, 2025

@gordonwoodhull It's a juice bug 😬

Instrumenting parsehtml.lua to write juice's input and output shows it:

  local function handle_raw_html_as_table(el)
    local eltext
    -- write el.text to disk
    local f = io.open("/tmp/juice-input.html", "w")
    f:write(el.text)
    f:close()

    if(_quarto.format.isTypstOutput()) then
      eltext = juice(el.text)
      local f = io.open("/tmp/juice-output.html", "w")
      f:write(eltext)
      f:close()
    else
      eltext = el.text
    end
...

Then we can already see the problem with the file sizes:

$ ls -lrt
total 128
drwx------  3 cscheid  wheel     96 Dec 19 17:01 com.apple.launchd.U2JGfjR1u6
drwx------  3 cscheid  wheel     96 Dec 19 17:01 com.apple.launchd.C5QqYNWiie
drwxr-xr-x  2 root     wheel     64 Jan  8 19:47 powerlog
-rw-r--r--@ 1 cscheid  wheel  46711 Jan  9 14:07 juice-input.html
-rw-r--r--@ 1 cscheid  wheel  15313 Jan  9 14:07 juice-output.html

@cscheid cscheid added the upstream Bug is in upstream library label Jan 9, 2025
@cscheid
Copy link
Collaborator

cscheid commented Jan 9, 2025

It should be possible for us to protect data-URI-encoded images to work around juice, especially since we know we don't want it to touch these images at all.

@cscheid cscheid added this to the v1.7 milestone Jan 9, 2025
@gordonwoodhull
Copy link
Contributor

gordonwoodhull commented Jan 9, 2025

Yeah, seeing the same thing. I think it could be a line length issue ( > 43300 characters)

Agree, any kind of placeholder should work. Will try.

@cscheid
Copy link
Collaborator

cscheid commented Jan 9, 2025

Interestingly (?), calling juice from Deno directly doesn't trigger the bug:

$ deno
> import juice from "npm:juice"
undefined
> juice.version
"11.0.0"
> let inp = Deno.readTextFileSync("/tmp/juice-input.html")
undefined
> Deno.writeTextFileSync("/tmp/juice-output-deno.html", juice(inp))
undefined
$ ls -lrt
total 224
drwx------  3 cscheid  wheel     96 Dec 19 17:01 com.apple.launchd.U2JGfjR1u6
drwx------  3 cscheid  wheel     96 Dec 19 17:01 com.apple.launchd.C5QqYNWiie
drwxr-xr-x  2 root     wheel     64 Jan  8 19:47 powerlog
-rw-r--r--@ 1 cscheid  wheel  46711 Jan  9 14:07 juice-input.html
-rw-r--r--@ 1 cscheid  wheel  15313 Jan  9 14:07 juice-output.html
-rw-r--r--  1 cscheid  wheel  46711 Jan  9 14:19 juice-output-deno.html

@cscheid
Copy link
Collaborator

cscheid commented Jan 9, 2025

Oh, that must be a bug they fixed in 11.0.0; we're running juice 10.0.0.

EDIT: No, that's not it. Running juice directly from deno on those inputs doesn't trigger the bug (on version 10, or 11, either from npm:.. or skypack). So it's our invocation of juice through quarto run, somehow.

@gordonwoodhull
Copy link
Contributor

Nod, I still see truncation if I change juice.ts to

import juice from "npm:juice"

and version is 11

@gordonwoodhull
Copy link
Contributor

gordonwoodhull commented Jan 9, 2025

Okay I misread this at first:

https://stackoverflow.com/questions/695151/data-protocol-url-size-limitations

There isn't any fixed limit of 32K but it does seem that juice is deciding to truncate data URIs under 15K characters for its own reasons.

@cscheid
Copy link
Collaborator

cscheid commented Jan 9, 2025

Ok cool. Our workaround then is pretty clear - we intercept data URIs, replace them with UUIDs and then wrap it back.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working tables Issues with Tables including the gt integration typst upstream Bug is in upstream library
Projects
None yet
Development

No branches or pull requests

3 participants