Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

silent segfault #287

Closed
estin opened this issue Jul 7, 2017 · 7 comments
Closed

silent segfault #287

estin opened this issue Jul 7, 2017 · 7 comments

Comments

@estin
Copy link

estin commented Jul 7, 2017

Source:

extern crate env_logger;
extern crate html5ever;

use std::default::Default;
use html5ever::parse_document;
use html5ever::rcdom::RcDom;
use html5ever::tendril::TendrilSink;

fn main() {
    env_logger::init().unwrap();

    let html = r#"
        <DOCTYPE html>
        <html>
        <head></head>
        <body>
            <div class="catalog-goods__image-view_item" data-item-id="365392"></div>
        </body>
        </html>
    "#;

    println!("Start");
    parse_document(RcDom::default(), Default::default()).one(html);
    println!("Stop");
} 

Output:

$ RUST_BACKTRACE=1 cargo run
    Finished dev [unoptimized + debuginfo] target(s) in 0.0 secs
     Running `target/debug/html5ever-bug`
Start

$ dmesg -T | tail -n 1
[Fri Jul  7 17:04:57 2017] html5ever-bug[8461]: segfault at 7ffef5df0cf0 ip 0000562e1ee6ba5c sp 00007ffef5e0dce8 error 4 in html5ever-bug[562e1eb94000+3c2000]

$ LD_PRELOAD=libSegFault.so target/debug/html5ever-bug
Start
Segmentation fault (core dumped)

$ RUST_LOG=debug RUST_BACKTRACE=1 cargo run
..skiped..
DEBUG:html5ever::tokenizer: got character                             
DEBUG:html5ever::tokenizer: processing in state BeforeAttributeName   
DEBUG:html5ever::tokenizer: got character d                           
DEBUG:html5ever::tokenizer: processing in state AttributeName         
DEBUG:html5ever::tokenizer: got character a                           
DEBUG:html5ever::tokenizer: got character t                           
DEBUG:html5ever::tokenizer: got character a                           
DEBUG:html5ever::tokenizer: got character -                           
DEBUG:html5ever::tokenizer: got character i                           
DEBUG:html5ever::tokenizer: got character t                           
DEBUG:html5ever::tokenizer: got character e                           
DEBUG:html5ever::tokenizer: got character m                           
DEBUG:html5ever::tokenizer: got character -                           
DEBUG:html5ever::tokenizer: got character i                           
DEBUG:html5ever::tokenizer: got character d                           
DEBUG:html5ever::tokenizer: got character =                           
DEBUG:html5ever::tokenizer: processing in state BeforeAttributeValue  
DEBUG:html5ever::tokenizer: got character "                           
DEBUG:html5ever::tokenizer: processing in state AttributeValue(DoubleQuoted)                                                                
DEBUG:html5ever::tokenizer: got characters Some(NotFromSet(Tendril<UTF8>(inline: "365392")))                                                
DEBUG:html5ever::tokenizer: got characters Some(FromSet('\"'))        
DEBUG:html5ever::tokenizer: got character "                           
DEBUG:html5ever::tokenizer: processing in state AfterAttributeValueQuoted                                                                   DEBUG:html5ever::tokenizer: got character > 

Environment:

$ uname -a
Linux localhost.lan 4.11.9-1-ARCH #1 SMP PREEMPT Wed Jul 5 18:23:08 CEST 2017 x86_64 GNU/Linux

$ rustc --version
rustc 1.20.0-nightly (696412de7 2017-07-06)

$ cat Cargo.toml | grep html5ever
name = "html5ever-bug"
html5ever = "0.18.0"

$ cargo --version
cargo 0.21.0-nightly (eb6cf012a 2017-07-02)
@jdm
Copy link
Member

jdm commented Jul 7, 2017

Could you run this under a debugger and get a backtrace of the segfault?

@estin
Copy link
Author

estin commented Jul 7, 2017

Seems like rust-lang/rust#43110

$ gdb -q target/debug/html5ever-bug
Reading symbols from target/debug/html5ever-bug...done.
warning: Missing auto-load script at offset 0 in section .debug_gdb_scripts
of file /home/user/prj/tmp/html5ever-bug/target/debug/html5ever-bug.
Use `info auto-load python-scripts [REGEXP]' to list them.
(gdb) run
Starting program: /home/user/prj/tmp/html5ever-bug/target/debug/html5ever-bug 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/libthread_db.so.1".
Start

Program received signal SIGSEGV, Segmentation fault.
0x000055555582ba5c in compiler_builtins::probestack::__rust_probestack ()
    at /checkout/src/rustc/compiler_builtins_shim/../../libcompiler_builtins/src/probestack.rs:55
55	/checkout/src/rustc/compiler_builtins_shim/../../libcompiler_builtins/src/probestack.rs: No such file or directory.
(gdb) bt
#0  0x000055555582ba5c in compiler_builtins::probestack::__rust_probestack ()
    at /checkout/src/rustc/compiler_builtins_shim/../../libcompiler_builtins/src/probestack.rs:55
#1  0x00005555557e7f9e in lazy_static::lazy::{{impl}}::get::{{closure}}<std::sync::mutex::Mutex<string_cache::atom::StringCache>,fn() -> std::sync::mutex::Mutex<string_cache::atom::StringCache>> ()
    at /home/user/.cargo/registry/src/github.com-1ecc6299db9ec823/lazy_static-0.2.8/src/lazy.rs:22
#2  0x00005555557e7c3a in std::sync::once::{{impl}}::call_once::{{closure}}<closure> () at /checkout/src/libstd/sync/once.rs:227
#3  0x00005555557f471d in std::sync::once::Once::call_inner () at /checkout/src/libstd/sync/once.rs:307
#4  0x00005555557e7bac in std::sync::once::Once::call_once<closure> (
    self=0x555555b634f8 <<string_cache::atom::STRING_CACHE as core::ops::deref::Deref>::deref::__stability::LAZY+8>, f=...)
    at /checkout/src/libstd/sync/once.rs:227
#5  0x00005555557eb460 in lazy_static::lazy::Lazy<std::sync::mutex::Mutex<string_cache::atom::StringCache>>::get<std::sync::mutex::Mutex<string_cache::atom::StringCache>,fn() -> std::sync::mutex::Mutex<string_cache::atom::StringCache>> (self=<optimized out>)
    at /home/user/.cargo/registry/src/github.com-1ecc6299db9ec823/lazy_static-0.2.8/src/lazy.rs:22
#6  string_cache::atom::{{impl}}::deref::__stability () at <__lazy_static_internal macros>:20
#7  string_cache::atom::{{impl}}::deref (self=0x555555868193 <str.d>) at <__lazy_static_internal macros>:21
#8  0x00005555555ad988 in string_cache::atom::{{impl}}::from<markup5ever::LocalNameStaticSet> (string_to_add=...)
    at /home/user/.cargo/registry/src/github.com-1ecc6299db9ec823/string_cache-0.6.0/src/atom.rs:320
#9  0x00005555555a9324 in string_cache::atom::{{impl}}::from<markup5ever::LocalNameStaticSet> (string_to_add=...)
    at /home/user/.cargo/registry/src/github.com-1ecc6299db9ec823/string_cache-0.6.0/src/atom.rs:333
#10 0x0000555555611ba0 in html5ever::tokenizer::Tokenizer<html5ever::tree_builder::TreeBuilder<alloc::rc::Rc<markup5ever::rcdom::Node>, markup5ever::rcdom::RcDom>>::finish_attribute<html5ever::tree_builder::TreeBuilder<alloc::rc::Rc<markup5ever::rcdom::Node>, markup5ever::rcdom::RcDom>> (self=0x7fffffffde78) at /home/user/.cargo/registry/src/github.com-1ecc6299db9ec823/html5ever-0.18.0/src/tokenizer/mod.rs:504
#11 0x000055555561137e in html5ever::tokenizer::Tokenizer<html5ever::tree_builder::TreeBuilder<alloc::rc::Rc<markup5ever::rcdom::Node>, markup5ever::rcdom::RcDom>>::emit_current_tag<html5ever::tree_builder::TreeBuilder<alloc::rc::Rc<markup5ever::rcdom::Node>, markup5ever::rcdom::RcDom>> (self=0x7fffffffde78) at /home/user/.cargo/registry/src/github.com-1ecc6299db9ec823/html5ever-0.18.0/src/tokenizer/mod.rs:401
#12 0x00005555556182e3 in html5ever::tokenizer::Tokenizer<html5ever::tree_builder::TreeBuilder<alloc::rc::Rc<markup5ever::rcdom::Node>, markup5ever::rcdom::RcDom>>::step<html5ever::tree_builder::TreeBuilder<alloc::rc::Rc<markup5ever::rcdom::Node>, markup5ever::rcdom::RcDom>> (
    self=0x7fffffffde78, input=0x7fffffffe040)
    at /home/user/.cargo/registry/src/github.com-1ecc6299db9ec823/html5ever-0.18.0/src/tokenizer/mod.rs:1009
#13 0x0000555555613eaa in html5ever::tokenizer::Tokenizer<html5ever::tree_builder::TreeBuilder<alloc::rc::Rc<markup5ever::rcdom::Node>, markup5ever::rcdom::RcDom>>::run<html5ever::tree_builder::TreeBuilder<alloc::rc::Rc<markup5ever::rcdom::Node>, markup5ever::rcdom::RcDom>> (
    self=0x7fffffffde78, input=0x7fffffffe040)
    at /home/user/.cargo/registry/src/github.com-1ecc6299db9ec823/html5ever-0.18.0/src/tokenizer/mod.rs:362
#14 0x00005555556140d3 in html5ever::tokenizer::Tokenizer<html5ever::tree_builder::TreeBuilder<alloc::rc::Rc<markup5ever::rcdom::Node>, markup5ever::rcdom::RcDom>>::feed<html5ever::tree_builder::TreeBuilder<alloc::rc::Rc<markup5ever::rcdom::Node>, markup5ever::rcdom::RcDom>> (
    self=0x7fffffffde78, input=0x7fffffffe040)
    at /home/user/.cargo/registry/src/github.com-1ecc6299db9ec823/html5ever-0.18.0/src/tokenizer/mod.rs:220
#15 0x00005555555aa36b in html5ever::driver::{{impl}}::process<markup5ever::rcdom::RcDom> (self=0x7fffffffde78, t=...)
    at /home/user/.cargo/registry/src/github.com-1ecc6299db9ec823/html5ever-0.18.0/src/driver.rs:88
#16 0x00005555556372e6 in tendril::stream::TendrilSink::one<html5ever::driver::Parser<markup5ever::rcdom::RcDom>,tendril::fmt::UTF8,tendril::tendril::NonAtomic,&str> (self=..., t=...) at /home/user/.cargo/registry/src/github.com-1ecc6299db9ec823/tendril-0.3.1/src/stream.rs:47
#17 0x0000555555644a52 in html5ever_bug::main () at src/main.rs:24
(gdb) Quit

@Ygg01
Copy link
Contributor

Ygg01 commented Jul 10, 2017

It seems issue rust-lang/rust#43110 was closed in favor of rust-lang/rust#43102

@estin
Copy link
Author

estin commented Jul 12, 2017

Yep. It was nightly issue.
Now works on

$ rustc --version
rustc 1.20.0-nightly (9475ae477 2017-07-11)

@estin estin closed this as completed Jul 12, 2017
@dessalines
Copy link

This is occurring again on rust 1.56.0

LemmyNet/lemmy#1964

@jdm
Copy link
Member

jdm commented Dec 1, 2021

Do you have a testscase that can reproduce the problem that you're seeing? Both the code that calls html5ever and the data that it parses.

@dessalines
Copy link

Sorry, I tried this with a few non-utf8 chars and it seemed to not segfault. Ignore the above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants