Skip to content
This repository has been archived by the owner on Jun 21, 2022. It is now read-only.

Problems with getting RDF from XHTML5 served as application/xhtml+xml? #6

Open
christianhujer opened this issue Oct 8, 2016 · 3 comments

Comments

@christianhujer
Copy link

The tool seems to have problems extracting the schema.org RDFa data from the following page: http://nelkinda.com/blog/user-stories-are-not-always-user-stories/
The page is written in XHTML5, delivered as application/xhtml+xml, encoded with gzip, and it seems that pymicrodata is unable to extract any information from it.
I have successfully used the following tools with said page:

By the way, the tools from Microsoft also have problems with this page.

@christianhujer
Copy link
Author

The following attachment is a zip archive with the XHTML page that isn't processed successfully.

sample.zip

@iherman
Copy link
Contributor

iherman commented Oct 10, 2016

@christianhujer: there were two problems. One was yours and the other was mine...

  • Your code is not based on microdata; it is in RDFa. It seems that the tools that you refer to interpret both microdata and RDFa, and hence produce proper output. However, pyMicrodata is strictly for microdata and not for RDFa.
  • One the other hand: there is an RDFa distiller, too. There is a a service at W3C, and there is also an RDFLib library to handle that, namely pyrdfa3. There was a bug in that code, and it was indeed related to the fact that you served XHTML. I have found that bug, and have updated that repository. The aforementioned service has also been updated, and it does interpret your file, see at http://bit.ly/2dWuXBr

Thanks for the bug report!

@christianhujer
Copy link
Author

@iherman Ahaha, thanks for clearing it up! I actually used the service at W3C, but when reporting the bug I must have confused the two libraries (RDF vs microdata). And I can confirm that the bug is now fixed. I have, however, found another small glitch, which I will report at pyrdfa3.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants