Skip to content

[feature] Node#inner_text should not capture <style> tag contents #2292

@jaypinho

Description

@jaypinho

Please describe the bug

Per Nokogiri's documentation, the Node#inner_text method (aliased as text and content as well) is meant to capture "the plaintext content for this Node." Given the usage of the method name inner_text, it implies that it works similarly to the JavaScript method of the same name.

However, the JavaScript method explicitly excludes the inner content of any <style> tags that are children of the given node, while Node#inner_text includes it.

Help us reproduce what you're seeing

Example URL (note that you need to curl this link to reproduce the below, not simply examine it in browser dev tools, as runtime JS changes the underlying DOM structure): https://www.binance.com/en/terms

require 'httparty'
require 'nokogiri'
x = HTTParty.get('https://www.binance.com/en/terms').body
y = Nokogiri::HTML.parse(x).at_css("body div main").inner_text

Expected behavior

Expectation: the result should start with Binance Terms of Use...

Actual: it starts with .css-13trade{box-sizing:border-box...

Per the JS docs for innerText, <style> elements are explicitly excluded. See the example here: https://developer.mozilla.org/en-US/docs/Web/API/HTMLElement/innerText

Environment

# Nokogiri (1.11.3)
    ---
    warnings: []
    nokogiri:
      version: 1.11.3
      cppflags:
      - "-I/Users/jpinho/.rvm/gems/ruby-2.7.1/gems/nokogiri-1.11.3-x86_64-darwin/ext/nokogiri"
      - "-I/Users/jpinho/.rvm/gems/ruby-2.7.1/gems/nokogiri-1.11.3-x86_64-darwin/ext/nokogiri/include"
      - "-I/Users/jpinho/.rvm/gems/ruby-2.7.1/gems/nokogiri-1.11.3-x86_64-darwin/ext/nokogiri/include/libxml2"
      ldflags: []
    ruby:
      version: 2.7.1
      platform: x86_64-darwin19
      gem_platform: x86_64-darwin-19
      description: ruby 2.7.1p83 (2020-03-31 revision a0c7c23c9c) [x86_64-darwin19]
      engine: ruby
    libxml:
      source: packaged
      precompiled: true
      patches:
      - 0001-Revert-Do-not-URI-escape-in-server-side-includes.patch
      - 0002-Remove-script-macro-support.patch
      - 0003-Update-entities-to-remove-handling-of-ssi.patch
      - 0004-libxml2.la-is-in-top_builddir.patch
      - 0005-Fix-infinite-loop-in-xmlStringLenDecodeEntities.patch
      - 0006-htmlParseComment-treat-as-if-it-closed-the-comment.patch
      - 0007-use-new-htmlParseLookupCommentEnd-to-find-comment-en.patch
      - '0008-use-glibc-strlen.patch'
      - '0009-avoid-isnan-isinf.patch'
      - 0010-parser.c-shrink-the-input-buffer-when-appropriate.patch
      - 0011-update-automake-files-for-arm64.patch
      libxml2_path: "/Users/jpinho/.rvm/gems/ruby-2.7.1/gems/nokogiri-1.11.3-x86_64-darwin/ext/nokogiri"
      iconv_enabled: true
      compiled: 2.9.10
      loaded: 2.9.10
    libxslt:
      source: packaged
      precompiled: true
      patches:
      - 0001-update-automake-files-for-arm64.patch
      compiled: 1.1.34
      loaded: 1.1.34
    other_libraries:
      zlib: 1.2.11
      libiconv: '1.15'

Additional context

N/A

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions