-
Notifications
You must be signed in to change notification settings - Fork 20
Meta data for llnode - Node.js version 8 #50
Comments
I am interested in helping this one out. I am working on nodejs/llnode#143 to bring Errors back, and worked on thin strings & stub frames in nodejs/llnode#121 before. Also working on nodejs/llnode#144 to get the CI output the actual errors from lldb instead of silently timing out. BTW also working on the JS API nodejs/llnode#147 , although need to think about how to compile both/one of the addon and the plugin when installed by npm. |
Do you have specifics on what isn't working? For example, the errors implementation seems to have changed in V8 like Joyee mentioned, but as of nodejs/llnode@3ab289b, all of the postmortem metadata appeared to be loading properly, and the test suite was passing with TurboFan (on my machine anyway - we do need a better CI story there). |
@yunong can you list what you was not working for you ? |
ping @yunong |
@cjihrig do you know who has been updating the metadata (since it sounds like it has been done). Would like to understand the process and see if we have it documented somewhere. |
I did the bulk of the metadata update. As far as I know, it's not documented anywhere. It mostly involves seeing what metadata no longer exists (which llnode will tell you) and then jumping through V8 commits to find out what happened to it. A big problem is that the metadata can disappear from V8 and Node, and doesn't present any issues until someone is trying to do postmortem debugging with llnode. I'll work on putting together a test in core to try to catch this scenario. Another option is to make compilation fail when certain constants don't exist. This currently happens on SmartOS - constants disappear, and |
@cjihrig thanks, it would be a good step to have tests that will tell when we have problem with the metadata. How often is has an update needed ? If it is for most V8 versions then we probably want to try to have the update process integrated into the steps used to update V8 in core. |
Practically every major V8 upgrade and infrequently after minor or patch updates. Paradoxically, before the V8 team started maintaining the postmortem script we knew about most breaking changes because the build would break after an upgrade. :-)
llnode is complex enough that I wouldn't want to inflict that on someone upgrading V8. |
One might argue that by updating V8 without doing the work to ensure the post mortem metadata is correct, you're actually inflicting broken software changes on users. Ensuring the post mortem bits still function is part of the real cost of updating V8, and the process for updating V8 should reflect that. |
Easy to say when you're not doing the work. If Joyent has an interest in postmortem work, they should at least try a little harder to keep the upstream smartos build in working shape. |
This isn't really a Joyent-specific issue at all. We tend to use |
I was specifically referring to nodejs/node-v8#21, which is smartos-specific and was ultimately fixed by people with zero interest in that platform. I'll make you a deal: your company makes upgrades frictionless on the smartos front and we'll look into keeping the postmortem metadata in sync across upgrades. That's in your best interest too, right? |
I'm not really sure what motivates your apparent hostility, but nonetheless would like to note that @cjihrig works for Joyent. You'll see him earlier in this thread, where he talks about working on the metadata update. He also seems to have engaged on, and fixed, the other issue you're referencing. |
/cc @nodejs/v8 who can maybe glean some insight from the V8 side @cjihrig is updating the metadata something that could be automated? Currently for the majority of V8 updates we are using https://github.com/targos/update-v8 right now.
While I can see your point here I'm not 100% that I agree. I would like to challenge the post-mortem working group to come up with a scalable solution to this problem. We have put quite a bit of effort into keeping V8 up to date. As one of the people doing a bunch of the work I wasn't even aware that this was a problem. While I recognize that most-mortem analysis is an important feature, I don't think that we can possibly block upgrading the VM when the team working on this doesn't even know how to fix the problem. For example nodejs/node#16271 has been open for over a month and potentially broken meta data has not come up once. Without individuals from postmortem actively involved in the V8 upgrade process I don't think it is entirely fair to put to onus on those who are not actively invested in it. All that being said, I'm totally willing to chip in on the work if we can find a reasonable way to get it done |
I didn't know Colin worked at Joyent. That makes my irritation at smartos being broken with every upgrade a little less severe although I'd still like it a lot better if you proactively kept it in working shape, like IBM does for its platforms. I was going to circle back to what @jclulow wrote earlier:
But at this point I'd just be repeating what @MylesBorins said: it should be up to the stakeholders to make it work. That includes me, by the way, and could be you, @jclulow. |
No, I don't think so. It would be a lot better if V8 had some sort of test that alerted when one or more of the symbols disappeared. We don't find out that it has happened until @targos updates V8 in node-v8 and the compile on SmartOS fails. Even then, that will only catch things that are used by mdb_v8. llnode uses a mostly (but not completely) overlapping set of metadata, but gives no type of error notification until someone is trying to use it to debug a core file. |
The best solution would be for V8 to grow first-class postmortem analysis tools (i.e., maintained with V8). Obviously, that's not a small ask. But until that happens, debugging tools like mdb_v8 and llnode have to re-implement a bunch of the VM internals. They encode VM implementation details that change from version to version. (If you want to know more about this, I wrote about what's involved in this.) The metadata in V8 was created to parametrize the parts of that that are likely to change frequently and easy to parametrize. But there's no way to know that the metadata is complete and correct except to actually exercise postmortem tooling against a Node version built with that metadata. When the metadata is incorrect or incomplete, there's no way to automatically fix it -- you have to understand the semantics of what changed in V8, add metadata to describe that, and potentially update the debugging tool to use it. You don't really know you've done it correctly until you've done all of that. Trying to automate this process is like trying to automate changes to a program that depends on a library whose interface is constantly changing in massive, complex ways -- because that's essentially what's going on. One fundamental question is: is postmortem debugging (and the metadata that's used to support it) a first-class, supported part of the Node platform? If it is, then it should be tested regularly like any other supported feature. mdb_v8 has an automated test suite that could be run with each new Node version. (It doesn't have to be mdb_v8's test suite -- I just know that one well. It can be run for an arbitrary Node version, and if that passes, you have decent coverage on mdb_v8's postmortem facilities for the new Node version. Maybe the better option would be to incorporate all the major postmortem tools' test suites, since they have different sets of features that require different metadata.) In practice, postmortem support has historically been more of a second-class feature: it's best-effort, and tools like mdb_v8 have to deal with metadata that's missing. This is a pretty major pain, but on the other hand, most of us who work on mdb_v8 can't spend a ton of time dealing with every V8 update. As a result, mdb_v8 tends to be somewhat behind in support for latest Node versions, but that's okay for most of its users. |
Perhaps @nodejs/build would be interested in making a CI job with the mdb_node teat suite (or llnode) At the very least we can use that to find problems. The amount of work to keep things up to date sounds non trivial, and in unsure how to reconcile that with our current work to improve the speed of V8 updates That being said I would like to help improve the process... It seems like at the very least we can remove the element of suprise |
@MylesBorins We have a llnode CI on Jenkins https://ci.nodejs.org/view/post-mortem/job/llnode-pipeline/ , although it only works with Node.js releases at the moment. We can make it work with github refs as well so we could know in advance if an V8 update is going to break llnode. (It doesn't have to block the V8 update, but we would get the notice in advance and fix sooner) The llnode Travis CI is broken at the moment but should be good after nodejs/llnode#144 lands. Once Travis is green it should not be hard to get Jenkins working again. |
My interest is that the more a runtime is used in production the more important post-mortem tooling is and I think it is important that we move it from "second-class" to "first-class". We have at least one end user (@yunong) with a large deployment indicating that this not working will delay adoption of 8.x in their environment. As with everything the challenge is how to get any additional work done on a regular basis, but as suggested a good first step is being alerted so that we know when its broken, spreading the knowledge of what work is required and what it takes to do it and then looking to see if we have an creative ways to get it done. We had the same challenge with updates with IDDE. There were 2 parts. The first was updating the meta data. We managed to automate the generation of that. At the time it used some proprietary tools so it could not be open sourced but I believe the part that generates the meta data now has removed that part of the dependency (and possible open source in OMR) so it might be worth another look (although I've not had enough time to look at the Node.js metadata generation to know if that makes sense). The second part was the code that uses the metadata to implement the commands and those still needed to be maintained by hand. |
On nodejs/node#14901 I proposed documenting what the postmortem metadata provided by V8 and Node are, and what it can be used for. This document could be improved with information regarding what needs to be done to keep the metadata updated. Regarding how to test for missing or altered metadata after V8 is updated without relying on llnode or mdb_v8 is creating a script to compare the metadata available before and after v8 is updated. This approach would help us identify what constants were added, changed and (especially) deleted. I can write a prototype for that to see if it works. |
While I don't think V8 will come up with a postmortem tool on-par with llnode anytime soon, we are willing to at least help find issues early. However, it's not entirely clear to me what needs to happen. V8 does have a build target called I would welcome upstream contributions where a new build target is added. That would force the V8 team to make changes to But it would be a step into the right direction? |
Yang, the node.js integration buildbot already takes care of that. I haven't seen a release that outright failed to build since it went online (apart from smartos - but that's joyent's job.) The difficulty is in updating downstream consumers (e.g., llnode) but that's ultimately manual labor; @davepacheco summarized it well. That's also why...
...doesn't really help. The best step forward is arguably @joyeecheung's suggestion of extending the llnode CI. |
BTW I opened nodejs/llnode#151 which can print missing metadata required by the llnode test suites. See that OP about what is currently missing in Node v9.2.0 on MacOS (and a bit of Node v8.9.0 Linux because I use that coredump to speed up local test runs). The output there is not exhaustive because there is too much noise from invalid casts and pointer loading and I turned those off in the debug output for now. |
It should be compiled and linked. Otherwise, the metadata wouldn't wind up in the final binaries at all. Or am I misremembering how this works? |
There's a flag to compile and link the metadata. V8 has this flag set to if (v8_postmortem_support) {
sources += [ "$target_gen_dir/debug-support.cc" ]
deps += [ ":postmortem-metadata" ]
} |
V8 would not need to come up with a complete debugger (although that would be ideal). Another option would be to provide a library or tool with a stable interface that was capable interpreting basic structures and spitting out a description, maybe using the proposed format: A library could even be phrased in terms of operations like "lookup the address of this native symbol" and "read this chunk of memory". Consumers would then implement these operations. That would allow us to plug it into various existing debuggers for core files, live processes, and even existing heap dumps; and the V8 team wouldn't need to maintain that code. That's still a bunch of work, but it's much less than what's required to build a full debugger. I'm really glad that the postmortem metadata has taken off and we've been able to build multiple useful debugging tools with it -- and all of these suggestions are great! But it's always been something of a hack. It's somewhat amazing that it's worked as well as it has. Without real support from the VM, this is always going to be brittle. |
It looks like it is used if
to
there is no indication that anything is wrong. That means that if the name changes in the V8 codebase, but not in that script, we end up with broken metadata. |
@davepacheco FWIW we actually have a tool that we use to debug crash dumps that come as part of Chrome's crash report. See The crash dumps that we receive are usually not much more than 1MB in size and usually contains the raw stack and fractions of the heap. The tool shows that, annotates parts of the stack, and there is a way to tell it to assume that a certain address represents a V8 heap object. We can then deduce other known objects from that. The way the deduction works is that we keep an updated Every heap object's first field points to its map object. Every map object's first field points to the map map, or meta map. Since we know the fixed offset of the meta map, we can deduce the addresses of all other immortal immovable objects, including other maps. That way we can easily tell whether something on the heap represents a string, a shared function info, or a context, etc. However, this tool is not really polished, and deals with crash dumps that are very incomplete. |
@hashseed That's fantastic information! We could potentially use some of those same fixed addresses to bootstrap other debugging tools. Thanks for that information. |
@yunong still hoping we can get your input on a reproduce for problems. |
This is specific to mdb_v8. After I updated llnode, I started work on updating mdb_v8. There are a few implementation details (not related to the metadata) in mdb_v8 that are different from llnode, and haven't had the time to revisit. I have a WIP branch here. Most of the tests were passing with TurboFan the last time I checked. |
PR-URL: nodejs#151 Refs: nodejs/post-mortem#50 Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
PR-URL: #151 Refs: nodejs/post-mortem#50 Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
@cjihrig do you think this needs to stay open or should we just close. |
I think we can close this. I think your original list of issues has been addressed. Of course, conversation can continue. |
In discussions at NodeConf EU I learned that we don't have the meta data needed for llnode in Node.js version 8.
It sounds like the change to the turbofan pipeline in V8 meant that there were so many changes that those who usually maintained the meta data were not able to keep up.
I think we need to
Identify who (sadly I don't know off the top of my head who was doing this valuable work recently) had been doing the work in order to get some context on how to do it efficiently.
Put together a plan and volunteers to help get the data up to date for Version 8
Identify volunteers to keep the data up to date
Put testing in place/notifications so that we have CI failures when the data is not up to date. We may want some sort of step that validates this as part of the release process.
@yunong who mentioned this to me
@MylesBorins who asked if we had a in issue covering this.
The text was updated successfully, but these errors were encountered: