-
-
Notifications
You must be signed in to change notification settings - Fork 918
Description
Please describe the bug
Per Nokogiri's documentation, the Node#inner_text method (aliased as text
and content
as well) is meant to capture "the plaintext content for this Node." Given the usage of the method name inner_text
, it implies that it works similarly to the JavaScript method of the same name.
However, the JavaScript method explicitly excludes the inner content of any <style>
tags that are children of the given node, while Node#inner_text
includes it.
Help us reproduce what you're seeing
Example URL (note that you need to curl
this link to reproduce the below, not simply examine it in browser dev tools, as runtime JS changes the underlying DOM structure): https://www.binance.com/en/terms
require 'httparty'
require 'nokogiri'
x = HTTParty.get('https://www.binance.com/en/terms').body
y = Nokogiri::HTML.parse(x).at_css("body div main").inner_text
Expected behavior
Expectation: the result should start with Binance Terms of Use
...
Actual: it starts with .css-13trade{box-sizing:border-box
...
Per the JS docs for innerText, <style> elements are explicitly excluded. See the example here: https://developer.mozilla.org/en-US/docs/Web/API/HTMLElement/innerText
Environment
# Nokogiri (1.11.3)
---
warnings: []
nokogiri:
version: 1.11.3
cppflags:
- "-I/Users/jpinho/.rvm/gems/ruby-2.7.1/gems/nokogiri-1.11.3-x86_64-darwin/ext/nokogiri"
- "-I/Users/jpinho/.rvm/gems/ruby-2.7.1/gems/nokogiri-1.11.3-x86_64-darwin/ext/nokogiri/include"
- "-I/Users/jpinho/.rvm/gems/ruby-2.7.1/gems/nokogiri-1.11.3-x86_64-darwin/ext/nokogiri/include/libxml2"
ldflags: []
ruby:
version: 2.7.1
platform: x86_64-darwin19
gem_platform: x86_64-darwin-19
description: ruby 2.7.1p83 (2020-03-31 revision a0c7c23c9c) [x86_64-darwin19]
engine: ruby
libxml:
source: packaged
precompiled: true
patches:
- 0001-Revert-Do-not-URI-escape-in-server-side-includes.patch
- 0002-Remove-script-macro-support.patch
- 0003-Update-entities-to-remove-handling-of-ssi.patch
- 0004-libxml2.la-is-in-top_builddir.patch
- 0005-Fix-infinite-loop-in-xmlStringLenDecodeEntities.patch
- 0006-htmlParseComment-treat-as-if-it-closed-the-comment.patch
- 0007-use-new-htmlParseLookupCommentEnd-to-find-comment-en.patch
- '0008-use-glibc-strlen.patch'
- '0009-avoid-isnan-isinf.patch'
- 0010-parser.c-shrink-the-input-buffer-when-appropriate.patch
- 0011-update-automake-files-for-arm64.patch
libxml2_path: "/Users/jpinho/.rvm/gems/ruby-2.7.1/gems/nokogiri-1.11.3-x86_64-darwin/ext/nokogiri"
iconv_enabled: true
compiled: 2.9.10
loaded: 2.9.10
libxslt:
source: packaged
precompiled: true
patches:
- 0001-update-automake-files-for-arm64.patch
compiled: 1.1.34
loaded: 1.1.34
other_libraries:
zlib: 1.2.11
libiconv: '1.15'
Additional context
N/A