-
Notifications
You must be signed in to change notification settings - Fork 437
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Stream based token parser #225
Conversation
The unit tests are failing right now, I'll get that fixed up ASAP. But the integration tests are all passing fine. |
The latest performance numbers:
This requires some patches to the |
@arthurschreiber are you interested in becoming a maintainer on Tedious? Quite frankly, we could use a couple of new maintainers who are smart and actively using Tedious. @rossipedia and I were making significant contributions for a while, but we ended up porting the project we were using it for away from node.js, so our time and motivation to maintain Tedious has dropped off quite a bit. @patriksimek is still around here and there, but no one has the time to give it the attention it deserves. You've started tackling streams, performance, and testing concerns, which have all been on our wish list for a very long time. If your intentions are to continue contributing improvements like this, then I don't think anyone would have a problem adding you as a collaborator. |
@bretcope Sure, why not. 😄 I also got added as a maintainer for the With some more changes that I made to the
So, for fetching a 5MB varbinary column out of the database, this is around 6 times faster, while a 50MB varbinary, the streamig token parser is more than 67 times faster. Next steps will be to land all the performance improvements I made in the |
Any further work being done on this? This seems like great work |
Yeah, I've not forgotten about this. I'm also thinking about rewriting the part of tedious that serializes data back to the stream to make use of What currently bugs me a bit is that both the old and new parsers are (subjectively) super fugly. Does anyone have any idea on how we could better organize the code? |
Does this mean that the horrible, nasty, ugly try/catch in token-stream-parser will go? That would be awesome. It's always bugged me. (It would also mean that the Tokens that span Packets page could go too.) |
@pekim Yup, that'll be completely gone. That (and the ensuing reparsing of tokens) is one of the reasons memory usage and performance really go down the drain. |
@arthurschreiber imho you're trading one fugly solution for another fugly solution that improves performance/memory consumption, so I'd just merge the code in and then worry about making it look pretty 😄. If you can get it passing all tests, I'd say its prime for merging (and the |
I'll be trying to bang this up into a more acceptable state over the course of next week and to make this available for others to take on a test ride. Sorry for the delay, I know many people are really looking forward to this. |
Hey all. Just wanted to let you know that I'm still working on this. I made some performance and memory improvements to the I'll try to get at it this weekend. |
e3631de
to
684ac97
Compare
This method is only used by the tests and probably should be removed in the long term.
Okay, I just rebased all the changes here against the latest Here's the latest benchmark result:
These results are from running on Node.js 0.10.38. Overall, these are incredible performance and memory improvements over what tedious offered before, when reading large results back from the database. And there's no noticable impact when reading small results. The next step would be to also make the sending side be stream based, but I want this to happen in a separate PR. @bretcope @pekim @patriksimek What do you guys think? |
Just out of curiosity... are these changes likely to make it easier to On 23 July 2015 at 16:55, Arthur Schreiber notifications@github.com wrote:
|
@lee-houghton In theory, this might be possible. I just really don't know how the API would have to look like to support this. I'm also not familiar with ODBC. |
I can't see a reason not to merge this if you're passing all the tests and you are seeing that kind of increase, but I might be missing something. So, LGTM 👍 from me! |
Sounds good to me. |
@arthurschreiber this is great, thanks! Please take a look at the mssql test suite, there are few test that doesn't pass. There's a quick guide how to set up test config. When I was trying to debug the protocol to make sure it's not a mssql issue I have found that Please let me know if I can be helpful. |
@patriksimek Thanks for the heads up. Looks like the |
@patriksimek I pushed some additional changes that should fix the |
@arthurschreiber thanks, fixed that. I had that enabled in my config by mistake. There are still some issues with local datetime. Not sure if you can see them because it propably depends on your current time zone. I'll try to investigate it. |
I have no idea how to patch this PR so I made a gist with changed value-parser.coffee. With that all test are passing. |
@patriksimek Thanks. I'll try to incorporate that change ASAP. |
Closed in favor of #285. |
This replaces the token parser inside tedious with a stream based version based on the dissolve module.
This is pretty much a big ugly hack at the moment, but it passes all tedious unit and integration tests. (Also, the application I'm working on passes all it's tests with the new parser). Beware, here be dragons.
The idea behind these changes is to make the token parsing code more performant. Right now, if a token is split over multiple tds messages, parsing is retried each time a new message is received from scratch. Obviously, this results in incredibly bad performance.
The
dissolve
module gives us a way to do chunk based parsing of tokens, without having to throw away the parsing results even in the case that tokens span multiple packages.Here's some performance numbers (see
benchmarks/benchmarks.coffee
):Old Parser
New Parser
As you can see, the switch to
dissolve
hugely improves the performance when receiving data that spans multiple tds messages, while not impacting the performance of data that is contained inside a single message.There is a slight increase in the memory usage reported in the benchmarks, and I'm not really sure what is causing this, but I'll take a closer look.
Overall, this is definitely not ready yet, but is meant to show off some of the work I've been doing and to gather some feedback on this.