-
-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Slate/HTML serializers and deserializers #101
base: main
Are you sure you want to change the base?
Conversation
@tiberiuichim thanks for creating this Pull Request and helping to improve Plone! TL;DR: Finish pushing changes, pass all other checks, then paste a comment:
To ensure that these changes do not break other parts of Plone, the Plone test suite matrix needs to pass, but it takes 30-60 min. Other CI checks are usually much faster and the Plone Jenkins resources are limited, so when done pushing changes and all other checks pass either start all Jenkins PR jobs yourself, or simply add the comment above in this PR to start all the jobs automatically. Happy hacking! |
@jenkins-plone-org please run jobs |
Jenkins is complaining that I've added a new dependency and it's not pinned. What's the procedure for this? Resiliparser is the only Python HTML parser that represents its data as DOM Nodes (Document, Element, Inlines and Text nodes) and so it's the only one that I could use in a convenient manner to preserve the perception-based rendering of HTML in browser. When I've initially created eea.volto.slate back in 2021, this parser didn't exist (and I was an ignorant back then to all the complexities of this problem space). So, I'm ready to be pointed to another option, but only if it can do browser-style dom parsing. html5lib won't cut it. It doesn't expose TextNodes, all text between tags is exposed as node.tail. So, a wrapper could be done on top of it, but it's more work then it's worth. |
@tiberiuichim I'm not familiar with the motivation for this work, so can you say a bit more about why you want to store blocks as HTML? My initial gut reaction is that it sounds like it adds unnecessary work at the time of serialization and deserialization, and also adds complexity. Is there a benefit that makes that worthwhile? Regarding Resiliparser, I would want to avoid adding a new dependency if possible. Did you consider BeautifulSoup (bs4), which is already a dependency of Plone? It looks like it exposes each bit of text as a https://www.crummy.com/software/BeautifulSoup/bs4/doc/#navigablestring -- but maybe there is some other thing it doesn't handle in the way you need? |
The html2slate.py and slate2html.py are not dependent on any Plone. If anyone needs them in their own projects, they can just copy them. |
@jenkins-plone-org please run jobs |
I think it's ready. |
@jenkins-plone-org please run jobs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tiberiuichim Here's my first pass on reviewing. So far I looked at the tests to make sure that what it's doing makes sense, but I didn't yet look at how it's doing it.
@@ -0,0 +1 @@ | |||
<p style="text-align:center" class="styled"><b><span data-slate-data="{"type":"dataentity","data":{"column":"number_total_sites","provider_url":"/data/countries-protected-areas-statistics"}}"><span class="primary-big-text">1565</span></span></b> Protected areas</p> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no json file for tests 3 and 4?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tiberiuichim are these used in the tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor grammar fixes to README.rst
Co-authored-by: David Glick <[email protected]> Co-authored-by: Steve Piercy <[email protected]>
@jenkins-plone-org please run jobs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tiberiuichim Sorry for taking so long to get back to this. I've now done a pass reading through the implementation. For something like this it's hard to say when enough testing has been done, since there are always edge cases. It would be possible to go borrow some more test cases from blocks-conversion-tool. But, maybe it makes as much sense to merge it as is and let someone try it with real data.
- Run `make build` to build the Plone backend | ||
- Run `make start` to start the Plone backend | ||
- Run `make test` to run the tests. | ||
- Run `bin/zope-testrunner --auto-color --auto-progress --test-path src -t name_of_test` to run a particular test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tiberiuichim I usually use the coredev buildout for this (as well as working on other packages in Plone core). I think either way is fine but for the sake of comparison:
- clone
[email protected]:plone/buildout.coredev.git
- edit checkouts.cfg to specify which packages to check out from github
- run
make
to build - edit in src/plone.volto
- run
bin/instance fg
to start the backend - run
bin/test -s plone.volto
to run all tests from plone.volto - run
bin/test -t [name]
to run a specific test
These two classes can be inherited and extended for your custom elements and | ||
plugins. To handle any custom element, you need to provide a method called | ||
``handle_tag_<elementname>``. For example, if you have a custom element of | ||
``@type`` "a", you can do:: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this helps me understand why they are classes instead of functions.
@@ -0,0 +1 @@ | |||
<p style="text-align:center" class="styled"><b><span data-slate-data="{"type":"dataentity","data":{"column":"number_total_sites","provider_url":"/data/countries-protected-areas-statistics"}}"><span class="primary-big-text">1565</span></span></b> Protected areas</p> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tiberiuichim are these used in the tests?
FIRST_ANY_SPACE = re.compile(r"^\s", re.M) | ||
FIRST_ALL_SPACE = re.compile(r"^\s+", re.M) | ||
ANY_SPACE_AT_END = re.compile(r"\s$", re.M) | ||
ANY_WHITESPACE = re.compile(r"\s|\t|\n", re.M) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Be aware, \s
includes non-breaking whitespace, which might not be desired. Tab and newline are also included in \s
so I don't think including them separately does anything here.
def normalize(self, value): | ||
"""Normalize value to match Slate constraints""" | ||
|
||
assert isinstance(value, list) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this isn't the case, the program will end with AssertionError: False which won't be super helpful. If this is something that should never happen, using type annotations and static analysis in your editor might be a better way to go. If we expect that we might actually get a list here from certain user input, then we should give the user a more helpful error about what they did wrong.
The html2slate and slate2html are the value here.
For the curious, there's an
htmlblock.zcml
that you can include, which will override the way slate blocks are stored, making them always store as HTML in ZODB, with python-side serializer/deserializer.Checklist: