-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Crash parsing HTML document #126
Comments
Hey @retorquere you can run |
I'm an idiot. Of course I wanted level 0, sorry. The offending page is at https://gist.github.com/6c955708ecfa70ff55d363c485f9eb1e |
No wait -- the log ends at
so which of these two is likely the culprit? |
I have another file on which it consistently crashes, but if I test only that file, it passes. |
It'll be |
It crashes on more files now. I've removed test/index.html since, but I still have others. My current run ends with
but it may not be something about that file in particular; if I set DirectoryPath to |
My site is available as a tarball on https://0x0.st/z2DV.gz |
(but that tarball was produced on MacOS, which means that |
Thanks! I'm on holiday next week but will try to have a look at this at some point in July. |
Thanks! Is there anything I can do in the interim to help debugging this? |
I've tried this on a linux system and it runs without issue there. |
Ah, that's very interesting. I'm no expert on osx (only access I have is the Travis test runners). Right now, without looking at code, unfortunately I don't have any ideas. |
No issue. When you're back I'd be happy to run an instrumented version that may give more insight. |
I'm seeing the same on Linux on Ubuntu Focal: node@791983aec7ee:~/antora-base$ ./bin/htmltest -l0 public
htmltest started at 09:18:46 on public
========================================================================
0: DirectoryPath string = public
1: DirectoryIndex string = index.html
2: FilePath string =
3: FileExtension string = .html
4: CheckDoctype bool = true
5: CheckAnchors bool = true
6: CheckLinks bool = true
7: CheckImages bool = true
8: CheckScripts bool = true
9: CheckMeta bool = true
10: CheckGeneric bool = true
11: CheckExternal bool = true
12: CheckInternal bool = true
13: CheckInternalHash bool = true
14: CheckMailto bool = true
15: CheckTel bool = true
16: CheckFavicon bool = false
17: CheckMetaRefresh bool = true
18: EnforceHTML5 bool = false
19: EnforceHTTPS bool = false
20: IgnoreURLs []interface {} = []
21: IgnoreDirs []interface {} = []
22: IgnoreInternalEmptyHash bool = false
23: IgnoreEmptyHref bool = false
24: IgnoreCanonicalBrokenLinks bool = true
25: IgnoreExternalBrokenLinks bool = false
26: IgnoreAltMissing bool = false
27: IgnoreDirectoryMissingTrailingSlash bool = false
28: IgnoreSSLVerify bool = false
29: IgnoreTagAttribute string = data-proofer-ignore
30: HTTPHeaders map[interface {}]interface {} = map[Accept:*/* Range:bytes=0-0]
31: TestFilesConcurrently bool = false
32: DocumentConcurrencyLimit int = 128
33: HTTPConcurrencyLimit int = 16
34: LogLevel int = 0
35: LogSort string = document
36: ExternalTimeout int = 15
37: StripQueryString bool = true
38: StripQueryExcludes []string = [fonts.googleapis.com]
39: EnableCache bool = true
40: EnableLog bool = true
41: OutputDir string = tmp/.htmltest
42: OutputCacheFile string = refcache.json
43: OutputLogFile string = htmltest.log
44: CacheExpires string = 336h
45: NoRun bool = false
46: VCREnable bool = false
47: Version string = 0.12.1
testDocument on Home/faq.html
Home/faq.html
DOCTYPE html []
--- Home/faq.html --> <nil>
from cache --- Home/faq.html --> https://docs.tpwiki.com/Home/faq.html
OK --- Home/faq.html --> https://docs.tpwiki.com/Home/faq.html
from cache --- Home/faq.html --> https://docs.tpwiki.com
OK --- Home/faq.html --> https://docs.tpwiki.com
target does not exist --- Home/faq.html --> /oauth2/sign_out
testDocument on Home/index.html
Home/index.html
DOCTYPE html []
--- Home/index.html --> <nil>
from cache --- Home/index.html --> https://docs.tpwiki.com/Home/index.html
OK --- Home/index.html --> https://docs.tpwiki.com/Home/index.html
from cache --- Home/index.html --> https://docs.tpwiki.com
OK --- Home/index.html --> https://docs.tpwiki.com
target does not exist --- Home/index.html --> /oauth2/sign_out
testDocument on SEL751_Arc_Flash_Protection_Settings/unstable/downloads/Downloads.html
SEL751_Arc_Flash_Protection_Settings/unstable/downloads/Downloads.html
DOCTYPE html []
--- SEL751_Arc_Flash_Protection_Settings/unstable/downloads/Downloads.html --> <nil>
from cache --- SEL751_Arc_Flash_Protection_Settings/unstable/downloads/Downloads.html --> https://docs.tpwiki.com
OK --- SEL751_Arc_Flash_Protection_Settings/unstable/downloads/Downloads.html --> https://docs.tpwiki.com
target does not exist --- SEL751_Arc_Flash_Protection_Settings/unstable/downloads/Downloads.html --> /oauth2/sign_out
from cache --- SEL751_Arc_Flash_Protection_Settings/unstable/downloads/Downloads.html --> https://gitlab.tpwiki.com/standard-designs/arc-flash-protection/SEL751_Arc_Flash_Protection_Settings/tree/master
OK --- SEL751_Arc_Flash_Protection_Settings/unstable/downloads/Downloads.html --> https://gitlab.tpwiki.com/standard-designs/arc-flash-protection/SEL751_Arc_Flash_Protection_Settings/tree/master
from cache --- SEL751_Arc_Flash_Protection_Settings/unstable/downloads/Downloads.html --> https://gitlab.tpwiki.com/standard-designs/arc-flash-protection/SEL751_Arc_Flash_Protection_Settings/issues
OK --- SEL751_Arc_Flash_Protection_Settings/unstable/downloads/Downloads.html --> https://gitlab.tpwiki.com/standard-designs/arc-flash-protection/SEL751_Arc_Flash_Protection_Settings/issues
from cache --- SEL751_Arc_Flash_Protection_Settings/unstable/downloads/Downloads.html --> https://gitlab.tpwiki.com/standard-designs/arc-flash-protection/SEL751_Arc_Flash_Protection_Settings/compare/master...master
OK --- SEL751_Arc_Flash_Protection_Settings/unstable/downloads/Downloads.html --> https://gitlab.tpwiki.com/standard-designs/arc-flash-protection/SEL751_Arc_Flash_Protection_Settings/compare/master...master
testDocument on SEL751_Arc_Flash_Protection_Settings/unstable/setting_guide/Setting_Guide.html
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x30 pc=0x51c0d7]
goroutine 1 [running]:
github.com/wjdp/htmltest/htmldoc.(*Document).Parse(0x0)
/home/travis/gopath/src/github.com/wjdp/htmltest/htmldoc/document.go:47 +0x37
github.com/wjdp/htmltest/htmldoc.(*Document).IsHashValid(...)
/home/travis/gopath/src/github.com/wjdp/htmltest/htmldoc/document.go:112
github.com/wjdp/htmltest/htmltest.(*HTMLTest).checkInternalHash(0xc0000ce240, 0xc0003210b0)
/home/travis/gopath/src/github.com/wjdp/htmltest/htmltest/check-link.go:325 +0xb0
github.com/wjdp/htmltest/htmltest.(*HTMLTest).checkInternal(0xc0000ce240, 0xc0003210b0)
/home/travis/gopath/src/github.com/wjdp/htmltest/htmltest/check-link.go:299 +0x15d
github.com/wjdp/htmltest/htmltest.(*HTMLTest).checkLink(0xc0000ce240, 0xc0000fe480, 0xc0001ed0a0)
/home/travis/gopath/src/github.com/wjdp/htmltest/htmltest/check-link.go:97 +0x5ec
github.com/wjdp/htmltest/htmltest.(*HTMLTest).testDocument(0xc0000ce240, 0xc0000fe480)
/home/travis/gopath/src/github.com/wjdp/htmltest/htmltest/htmltest.go:204 +0x18c
github.com/wjdp/htmltest/htmltest.(*HTMLTest).testDocuments(0xc0000ce240)
/home/travis/gopath/src/github.com/wjdp/htmltest/htmltest/htmltest.go:183 +0x65
github.com/wjdp/htmltest/htmltest.Test(0xc000013950, 0xc000010018, 0xc0000f9d48, 0x1)
/home/travis/gopath/src/github.com/wjdp/htmltest/htmltest/htmltest.go:143 +0x89b
main.run(0xc000013950, 0xc000013950)
/home/travis/gopath/src/github.com/wjdp/htmltest/main.go:159 +0x207
main.main()
/home/travis/gopath/src/github.com/wjdp/htmltest/main.go:66 +0x268 My system is: Linux 791983aec7ee 4.19.0-8-amd64 #1 SMP Debian 4.19.98-1 (2020-01-26) x86_64 x86_64 x86_64 GNU/Linux running within a Docker container. Happy to provide further information. This error is highly consistent and always occurs. |
My directory is also |
This seems to be an issue with parsing HTML. I know this issue is very old but @danyill do you have a copy of the files that caused the crash? |
@wjdp Sorry for the slow response, time is getting away on me. I have a copy of a very similar one which also crashes on the latest version of htmltest. I can't share this publicly but am happy to provide it with you. What is the easiest way to provide this to? Can I email it to your commit address? (1.5 Mb file with embedded images). |
Hi, @wjdp. I was able to replicate this error. In my case, I have 2 pages, first page has an anchor link to another page
page 2
Links are valid since when I run it on localhost or server, links work OK. UPDATED (27.10.21): once I remove
|
Been digging into this while watching tv 😄 I'm quite sure I've narrowed down the culprit: htmltest/htmltest/check-link.go Line 336 in d3ffce7
Debugging shows me that The way I'm currently fairly sure I can hit this issue is one of two ways:
From there, any call to member functions will panic if they reference internal members. I'll keep digging, but I wanted to report on my progress in case it spurred someone else to see the correct path through to resolving this issue. |
So easy enough fix for the panic, check the The next issue I run into, is that this reference I have should resolve, but it doesn't because (I assume) the reference it points to isn't available in |
Okay! I think I got it working! I had to keep the list of all PR coming shortly! |
This fix includes two things: 1. Check for `ok` value from `ResolveRef` in `checkInternalHash`. If a value was ignored but was a valid link, it would panic as it was not found. 2. Change the behaviour of `discoverRecurse` such that it keeps all found `Document`s, but adds a new `IgnoreTest` attribute such that we can track if it should be skipped on test, but still referenced in a test. Closes wjdp#126
htmltest is erroring out when I run it:
To Reproduce
Steps to reproduce the behaviour:
.htmltest.yml
Please copy in your config file
Source files
I haven't been able to narrow it down yet -- my request is for htmltest to print the page it's processing to help narrow it down.
Expected behaviour
print each page as it's being processed
Versions
The text was updated successfully, but these errors were encountered: