Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/net/html: html entity escape error #48237

Closed
WebFeng opened this issue Sep 8, 2021 · 6 comments
Closed

x/net/html: html entity escape error #48237

WebFeng opened this issue Sep 8, 2021 · 6 comments
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done.
Milestone

Comments

@WebFeng
Copy link

WebFeng commented Sep 8, 2021

What version of Go are you using (go version)?

$ go version
go version go1.14.1 darwin/amd64

Does this issue reproduce with the latest release?

yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE="on"
GOARCH="amd64"
GOBIN=""
GOCACHE="/Users/xxx/Library/Caches/go-build"
GOENV="/Users/xxx/Library/Application Support/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOINSECURE=""
GONOPROXY=""
GONOSUMDB=""
GOOS="darwin"
GOPATH="/Users/xxx/go"
GOPRIVATE=""
GOPROXY="https://goproxy.cn,direct"
GOROOT="/usr/local/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/darwin_amd64"
GCCGO="gccgo"
AR="ar"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD="/Users/fnwcoder/Code/go-filter-ad2/go.mod"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/hh/kr4769hj0tb4kjckqhlbhgww0000gn/T/go-build221547048=/tmp/go-build -gno-record-gcc-switches -fno-common"

What did you do?

package main

import (
	"fmt"
	"golang.org/x/net/html"
	"strings"
	"log"
	"os"
)

func main() {
	root, err := html.Parse(strings.NewReader("<body><!--<p></p>&lt;!--[video]--&gt;--></body>"))
	if err != nil {
		log.Fatal(err)
	}

	c := findCommentNode(root)
	fmt.Printf("Comment Node Type: %d; Data: %q\n\n", c.Type, c.Data)

	fmt.Println("html.Render:")
	if err := html.Render(os.Stdout, c); err != nil {
		log.Fatal(err)
	}
	fmt.Println()
}

func findCommentNode(n *html.Node) *html.Node {
	if n.Type == html.CommentNode {
		return n
	}
	for n := n.FirstChild; n != nil; n = n.NextSibling {
		if nn := findCommentNode(n); nn != nil {
			return nn
		}
	}
	return nil
}

What did you expect to see?

Comment Node Type: 4; Data: "<p></p><!--[video]-->"

html.Render:
<!--<p></p>&lt;!--[video]--&gt;-->

What did you see instead?

Comment Node Type: 4; Data: "<p></p><!--[video]-->"

html.Render:
<!--<p></p><!--[video]-->-->

refer:PuerkitoBio/goquery#391

@gopherbot gopherbot added this to the Unreleased milestone Sep 8, 2021
@cagedmantis cagedmantis changed the title x/net/html: html entity escap error x/net/html: html entity escape error Sep 16, 2021
@cagedmantis cagedmantis added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Sep 16, 2021
@cagedmantis
Copy link
Contributor

/cc @bradfitz @ianlancetaylor

@gopherbot
Copy link
Contributor

Change https://golang.org/cl/354929 mentions this issue: html: regard the comment token's data as not escaped

@namusyaka
Copy link
Member

namusyaka commented Oct 9, 2021

Thanks for the report, @WebFeng.

At least your expectation looks like same with the behavior on chrome.
This issue has been raised because the comment token's data is regarded as it can contain escaped entity.

As far as I read this page, I haven't found any mention of escaped entities in comment, but I don't think it's necessary to treat the data as escaped when it comes to comment token.

cc @nigeltao

@gonejack
Copy link

gonejack commented Apr 5, 2022

Hi, I got similar issue with golang.org/x/net v0.0.0-20220403103023-749bd193bc2b resulting broken result.

package main

import (
	"fmt"
	"log"
	"strings"

	"golang.org/x/net/html"
)

func main() {
	var htm = `
<html>
<body>
<!--!-->
<div>text</div>
</body>
</html>
`
	node, err := html.Parse(strings.NewReader(htm))
	if err != nil {
		log.Fatalln(err)
	}
	fmt.Println(node.LastChild.LastChild.LastChild.Data)
}

output:

!-->
<div>text</div>
</body>
</html>

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/419334 mentions this issue: html: escape comment and doctype tokens' data

@dmitshur dmitshur added NeedsFix The path to resolution is known, but the work has not been done. and removed NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels Jul 25, 2022
@gopherbot
Copy link
Contributor

Change https://go.dev/cl/442496 mentions this issue: html: properly handle explanation marks in comments

WeiminShang added a commit to WeiminShang/net that referenced this issue Nov 16, 2022
Fixes golang/go#48237

Change-Id: I309e3ad30684fb71b9b3e67dfac156da08dbc69b
Reviewed-on: https://go-review.googlesource.com/c/net/+/419334
Run-TryBot: Nigel Tao <nigeltao@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Kunpei Sakai <namusyaka@gmail.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
@golang golang locked and limited conversation to collaborators Oct 12, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done.
Projects
None yet
Development

No branches or pull requests

6 participants