[Breaking Changes] attribute-based node protection #107

zkamvar · 2024-04-29T19:27:03Z

As discussed in #105 (comment), since I have some time on my hands, I wanted to give this a go.

This shifts that paradigm from splitting nodes that need protection to one where we would use attributes to tell us where the protection needs to be applied.

Not Ready to Merge

I'm not quite ready to merge this yet because this has ripple effects for both {babeldown} as it explicitly relies on asis and curly nodes to be separated out to avoid them entering the translation fields

  ## protect content inside curly braces and math ----
  woolish$body <- tinkr::protect_math(woolish$body)
  woolish$body <- tinkr::protect_curly(woolish$body)
  curlies <- xml2::xml_find_all(woolish$body, "//*[@curly]")
  purrr::walk(curlies, protect_curly)
  maths <- xml2::xml_find_all(woolish$body, "//*[@asis='true']")
  purrr::walk(maths, protect_math)

and {pegboard} as the link transformation routines (from Jekyll -> pandoc) explicitly assume that the asis nodes exist

as shown in the documentation fix_links.R#L38-L48:

#' However, if a link uses liquid templating for a variable such as: 
#' `[Home]({{ page.root }}/index.html) and other text`, it will appear in XML as
#'
#' ```xml
#' ...
#' <text asis="true">[</text>
#' <text>Home</text>
#' <text asis="true">]</text>
#' <text>({{ page.root }}/index.html) and other text</text>
#' ...
#' ```

I've modified the escape-text function escape text based on wether or not it exists in an escapable range. This commit implements a proof of concept that protects the first escapable character and will not pass check.

In this version, we no longer need to split nodes in order to protect them if we also want them to be continuous. I've taken the XSL template "escape-text" and modified it so that it takes in three new parameters: 1. `pos`..........the position of the current character 2. `protect.pos`..a space-separated list of starting positions for protection 3. `protect.end`..a space-separated list of ending positions for protection I've also added three new helper templates to handle list contents: `peek` returns the top of the list, `trim` trims off the first element of the list (or returns the value if it's not a list), and `adjust-range` trims a list depending on if the current value is within range. There's a lot of printing here because I wasn't too confident with debugging, but based on my test in inst/extdata/xml_protect.xml, it produces results correctly.

I had initially found a tokenize template and had contacted the author about license information (she gave permission): <https://exslt.github.io/str/functions/tokenize/str.tokenize.template.xsl.html> When I was working with it, I found that the function exists as part of libxml because it bundles EXSLT functions, which allows me to do this easier and more efficient by tracking and modifying a single index instead of a pair of strings.

This will address #105

…into fix-105-unprotect

The square bracket _should_ be escaped since it's outside of the protected range.

This begins to address limitations of the attribute-based protection by providing a way to separate and rejoin nodes that were previously split.

The previous iteration was not quite correct because it had assumed that the sourcepos would match up exactly with the protection ranges, but these were two separate numbers. This does the following: 1. when a protected range spans the entire node, then it is labeled "asis" 2. `split_sourcepos()` now reflects the actual end of the sourcepos instead of the computed end 3. an awkward catch for single nodes in `join_split_nodes()` is now eliminated 4. `join_split_nodes()` no longer re-comuputes the protected ranges from the sourcepos

This allows us to search for internal nodes using their identities

zkamvar added 30 commits April 18, 2024 12:38

testing attribute-based protection [skip ci]

b45845f

I've modified the escape-text function escape text based on wether or not it exists in an escapable range. This commit implements a proof of concept that protects the first escapable character and will not pass check.

move helper templates; document; remove comments [skip ci]

f1c57ce

update comments

31e5a0f

add range updator

90561ef

update math protection to use labels.

a7b33c8

make sure node protection exists with an empty set

aeb3f58

use attributes to protect square bracket nodes

8aa181b

use attributes to protect curly nodes

e51eceb

This will address #105

update tests

1b9ae6a

add NEWS; bump description

32d2ffd

export protection functions

cd95f58

Merge branch 'main' into fix-105-unprotect

a4b84fc

add test for #105

7c8d35b

ensure output of get_protected_ranges is integer

fd45728

fix off-by-one errors

f79c123

rerun snaps

29b9b3d

Merge branch 'fix-105-unprotect' of https://github.com/ropensci/tinkr …

cc8ab53

…into fix-105-unprotect

add extra checks for add_protected_ranges()

bbc1e61

fix failing CI test

b7c94ee

The square bracket _should_ be escaped since it's outside of the protected range.

add text node boolean functions

b478e97

add comments to test file

741fbd7

add protection tests

2066ba7

update documentation a bit

739020a

document node protection

9357b95

rename protect.pos -> protect.start

2614c20

create utils for playing with sourcepos

afef059

add capability to split and rejoin protected nodes

54c1d1f

This begins to address limitations of the attribute-based protection by providing a way to separate and rejoin nodes that were previously split.

fix doc booboo

febfa2f

zkamvar added 9 commits May 2, 2024 09:31

add find_between_nodes from pegboard

9e7de72

This allows us to search for internal nodes using their identities

use helpers for asis

acd560f

remove "curly" attr when setting a math node

f22d2aa

update curly processing to better handle alt text

7875dea

update processing of alt text to remove breaks

37332f7

fix typo

bb0032c

simplify node joinery

4db2189

update tests for curly

37fc979

This was referenced May 8, 2024

Improve show method; add nodelist show functions #108

Merged

tinkr 0.3.0 roadmap #109

Open

Merge branch 'main' into fix-105-unprotect

6a6b3bc

zkamvar mentioned this pull request May 9, 2024

add get_protected() #111

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Breaking Changes] attribute-based node protection #107

[Breaking Changes] attribute-based node protection #107

zkamvar commented Apr 29, 2024 •

edited

Loading

[Breaking Changes] attribute-based node protection #107

Are you sure you want to change the base?

[Breaking Changes] attribute-based node protection #107

Conversation

zkamvar commented Apr 29, 2024 • edited Loading

Not Ready to Merge

zkamvar commented Apr 29, 2024 •

edited

Loading