Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unable to collapse generated profile data: Custom { kind: InvalidData, error: StringError("stream did not contain valid UTF-8") } #32

Open
jjyr opened this issue Apr 25, 2019 · 6 comments

Comments

@jjyr
Copy link

jjyr commented Apr 25, 2019

Randomly panic on my mac

thread 'main' panicked at 'unable to collapse generated profile data: Custom { kind: InvalidData, error: StringError("stream did not contain valid UTF-8") }', src/libcore/result.rs:997:5
stack backtrace:
   0: std::sys::unix::backtrace::tracing::imp::unwind_backtrace
   1: std::sys_common::backtrace::_print
   2: std::panicking::default_hook::{{closure}}
   3: std::panicking::default_hook
   4: std::panicking::rust_panic_with_hook
   5: std::panicking::continue_panic_fmt
   6: rust_begin_unwind
   7: core::panicking::panic_fmt
   8: core::result::unwrap_failed
   9: flamegraph::generate_flamegraph_by_running_command
  10: cargo_flamegraph::main
  11: std::rt::lang_start::{{closure}}
  12: std::panicking::try::do_call
  13: __rust_maybe_catch_panic
  14: std::rt::lang_start_internal
  15: main
@ghost
Copy link

ghost commented Jun 20, 2020

Happening on mine too for all runs

thread 'main' panicked at 'unable to collapse generated profile data: Custom { kind: InvalidData, error: "stream did not contain valid UTF-8" }', /Users/logan/.cargo/registry/src/github.com-1ecc6299db9ec823/flamegraph-0.3.0/src/lib.rs:236:5
stack backtrace:
   0: <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt
   1: core::fmt::write
   2: std::io::Write::write_fmt
   3: std::panicking::default_hook::{{closure}}
   4: std::panicking::default_hook
   5: std::panicking::rust_panic_with_hook
   6: rust_begin_unwind
   7: core::panicking::panic_fmt
   8: core::option::expect_none_failed
   9: flamegraph::generate_flamegraph_for_workload
  10: cargo_flamegraph::main
  11: std::rt::lang_start::{{closure}}
  12: std::rt::lang_start_internal
  13: main

@michaelkirk
Copy link
Contributor

I hit this too. I'm not sure yet what's causing it, but here are some notes:

First, I commented out the line that deletes the raw dtrace dump:

@@ -163,11 +166,11 @@ mod arch {
              output file cargo-flamegraph.stacks",
         );
 
-        std::fs::remove_file("cargo-flamegraph.stacks")
-            .expect(
-                "unable to remove cargo-flamegraph.stacks \
-                 temporary file",
-            );
+        //std::fs::remove_file("cargo-flamegraph.stacks")
+        //    .expect(
+        //        "unable to remove cargo-flamegraph.stacks \
+        //         temporary file",
+        //    );
 
         buf
     }

I tried specifying the encoding to be utf-8 in case that wasn't the default, but still get the same error sometimes.

Looking at the file, it mostly looks ok. How to track down the offending non-utf8 chars? Maybe roundtrip through iconv?

$ iconv
[offending_line.stacks.zip](https://github.com/flamegraph-rs/flamegraph/files/5320626/offending_line.stacks.zip)
 -f utf-8 -t utf-8 < cargo-flamegraph.stacks > cargo-flamegraph.stacks.reencoded
iconv: (stdin):154433:1054: cannot convert

Ok - so maybe something is wrong on line 154433, let's take a look:

$ head -n 154433 < cargo-flamegraph.stacks | tail -n 1 > cargo-flamegraph-offending-line.stacks

The offending line looks something like this (though obviously I can't paste invalid unicode into GH's text area (here it is zipped if you're feeling brave: offending_line.stacks.zip):

libobjc.A.dylib`bool objc::DenseMapBase<objc::DenseMap<DisguisedPtr<objc_object>, objc::DenseMap<void const*, objc::ObjcAssociation, objc::DenseMapValueInfo<objc::ObjcAssociation>, objc::DenseMapInfo<void const*>, objc::detail::DenseMapPair<void const*, objc::ObjcAssociation> >, objc::DenseMapValueInfo<objc::DenseMap<void const*, objc::ObjcAssociation, objc::DenseMapValueInfo<objc::ObjcAssociation>, objc::DenseMapInfo<void const*>, objc::detail::DenseMapPair<void const*, objc::ObjcAssociation> > >, objc::DenseMapInfo<DisguisedPtr<objc_object> >, objc::detail::DenseMapPair<DisguisedPtr<objc_object>, objc::DenseMap<void const*, objc::ObjcAssociation, objc::DenseMapValueInfo<objc::ObjcAssociation>, objc::DenseMapInfo<void const*>, objc::detail::DenseMapPair<void const*, objc::ObjcAssociation> > > >, DisguisedPtr<objc_object>, objc::DenseMap<void const*, objc::ObjcAssociation, objc::DenseMapValueInfo<objc::ObjcAssociation>, objc::DenseMapInfo<void const*>, objc::detail::DenseMapPair<void const*, objc::ObjcAssociation> >, objc::D+0x64

Hmm.. so maybe it's some exotic encoding - what does chardetect think it might be?

$ chardetect cargo-flamegraph-offending-line.stacks 
cargo-flamegraph-offending-line.stacks: ISO-8859-1 with confidence 0.73

Ok, maybe it's ISO-8859-1? Let's try to convert:

$ iconv -f ISO-8859-1 -t utf-8 < cargo-flamegraph-offending-line.stacks 
              libobjc.A.dylib`bool objc::DenseMapBase<objc::DenseMap<DisguisedPtr<objc_object>, objc::DenseMap<void const*, objc::ObjcAssociation, objc::DenseMapValueInfo<objc::ObjcAssociation>, objc::DenseMapInfo<void const*>, objc::detail::DenseMapPair<void const*, objc::ObjcAssociation> >, objc::DenseMapValueInfo<objc::DenseMap<void const*, objc::ObjcAssociation, objc::DenseMapValueInfo<objc::ObjcAssociation>, objc::DenseMapInfo<void const*>, objc::detail::DenseMapPair<void const*, objc::ObjcAssociation> > >, objc::DenseMapInfo<DisguisedPtr<objc_object> >, objc::detail::DenseMapPair<DisguisedPtr<objc_object>, objc::DenseMap<void const*, objc::ObjcAssociation, objc::DenseMapValueInfo<objc::ObjcAssociation>, objc::DenseMapInfo<void const*>, objc::detail::DenseMapPair<void const*, objc::ObjcAssociation> > > >, DisguisedPtr<objc_object>, objc::DenseMap<void const*, objc::ObjcAssociation, objc::DenseMapValueInfo<objc::ObjcAssociation>, objc::DenseMapInfo<void const*>, objc::detail::DenseMapPair<void const*, objc::ObjcAssociation> >, objc::D»

Note in particular, the last entry: "objc::D»"

I've gone through this exercise a few times, and do not always get the same guessed encoding, which makes me think this might be some kind of corruption rather than dtrace wittingly using an obscure encoding, but who knows 🤷

@michaelkirk
Copy link
Contributor

I'm also on a mac btw (10.15)

$ dtrace -V
dtrace: Sun D 1.15

Is anyone hitting this not on a mac?

michaelkirk added a commit to michaelkirk/flamegraph that referenced this issue Oct 2, 2020
Intermittently, invalid utf-8 is found in cargo-flamegraph.stacks, which
causes parsing to blow up with the error:

> unable to collapse generated profile data: Custom { kind: InvalidData, error: StringError("stream did not contain valid UTF-8") }

This commit doesn't fix the underlying problem, but simply works around
it by lossily re-encoding to valid utf8.

Anecdotally it seems to be macos symbol names at the end of the line that
contain the invalid utf-8 - I don't know if this is due to some error in dtrace
or if somehow the symbols actually contain non utf-8 encodings.

Note I did try explicitly specifying utf8 output, by adding to the
dtrace command invocation:

    command.arg("-x");
    command.arg("encoding=utf8");

But I ran into the same error seemingly just as often.
@michaelkirk
Copy link
Contributor

I have a workaround at #101, it would be interesting if anyone who frequently experiences this error could give it a whirl.

@michaelkirk
Copy link
Contributor

Assuming it's a bug that dtrace ever outputs invalid utf-8, I filed a radar (rdar://8800290) and duped to open radar here: https://openradar.appspot.com/radar?id=5013532726788096

@austinabell
Copy link

btw I am just seeing this now but I added support for non-utf8 in inferno here: jonhoo/inferno#196

It's not released nor version bumped in this repo, but if you need something to work with without having to fork yourself, can use this for the time being:

cargo install --git "https://github.com/austinabell/flamegraph"

I made the change in inferno so it can be used other than just in this repo

bors bot added a commit that referenced this issue Dec 7, 2020
101: Workaround #32 - fails parsing invalid utf8 dtrace output (macos only?) r=spacejam a=michaelkirk

Intermittently, invalid utf-8 is found in cargo-flamegraph.stacks, which causes parsing to blow up with the error:

> unable to collapse generated profile data: Custom { kind: InvalidData, error: StringError("stream did not contain valid UTF-8") }

This commit doesn't fix the underlying problem, but simply works around
it by lossily re-encoding to valid utf8.

Anecdotally it seems to be macos symbol names at the end of the line that
contain the invalid utf-8 - I don't know if this is due to some error in dtrace
or if somehow the symbols actually contain non utf-8 encodings.

Note I did try explicitly specifying utf8 output, by adding to the
dtrace command invocation:

    command.arg("-x");
    command.arg("encoding=utf8");

But I ran into the same error seemingly just as often.

---

This is admittedly a hack, so I understand if you don't want to merge it, but it might be helpful for folks like me experiencing #32. 

Anecdotally this commit seems to completely fix things for me. Without it I get the above error about 50% of the time — making it quite frustrating to use this otherwise very nice tool. 🙂 

The caveat is that presumably the invalid symbol names will not be correctly labeled/classified, but in practice this hasn't bitten me yet, since it seems to be a relatively small number of affected lines.

Co-authored-by: Michael Kirk <michael.code@endoftheworl.de>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants