Skip to content

2009 09 22 more than a couple of visitors

Fabian Schmied edited this page Sep 22, 2009 · 1 revision

Published on September 22nd, 2009 at 8:39

More than a couple of visitors

In a comment to a previous posting, reader Ray wrote the following:

[…] Because starting with nothing but visitors puts you more or less where you would be if you weren't using re-linq to begin with, defeating much of the whole purpose. Just trying to grasp how you would get beyond the Frans Bouma 'toy' scenario to full on sql-generating linq provider, that's all. […]

Now, let me state this: with re-linq’s QueryModel, you don't start with nothing but visitors.

When you write a LINQ provider, you have to deal with two kinds of expression trees: the outer expression tree, which holds a chain of MethodCallExpressions that correspond to the query methods you've called (or C# query clauses you've written), and lots of inner expression trees which are held by the MethodCallExpressions and correspond to the expressions you’ve passed to those query methods (or clauses).

When you write a custom LINQ provider, you therefore have to interpret the outer expression tree structure, identify each query method being called, and build the structure of your target query. In addition, you have to analyze the inner expressions and translate those to your target query language. Sounds simple, but isn’t:

  • In the outer expression tree, calls to the same query methods can mean different things. For example, Select is used both for select and for let constructs.
  • LINQ expressions can get very complex, (nearly) any query method can follow any other query method; in your target query language, that’s usually not the case. You will need to transform the queries so that they match your target query constraints. You’ll also want to transform queries in order to optimize them for your target query system.
  • Query methods can be user-defined, so the LINQ provider needs to be extensible enough to add new query methods when they are needed.
  • LINQ uses an interesting method of flowing data through the expression tree using so-called transparent identifiers. Because of that data flow mechanism, identifying what data the inner expression trees actually access is not trivial at all.
  • LINQ queries can be nested, so an inner expression tree might contain a new outer expression tree corresponding to a subquery. That outer expression tree looks similar, but not quite the same as the “real” outer expression tree. The inner expression trees of the subquery can access data stemming from the outer expression trees.
  • The inner expression trees might reference closure members from the context surrounding the code building the query. You need to partially evaluate those in order to interpret their values. Partial evaluation is tricky, though – one should evaluate as much as possible, but be careful not to evaluate too much.

re-linq helps you with all of those things:

  • It identifies what a query method means and constructs Clause instances held by a QueryModel which express the semantics of the query (rather than the syntax of its construction, which is what the expression tree holds).
  • The QueryModel is easily transformable; you can move clauses around and change them as you want. That’s not nearly as simple (or efficient) when tried on raw expression trees.
  • re-linq’s architecture makes it easy to parse any new query method (or property) one might conceive.
  • re-linq analyzes the data flows throughout the query model; every reference to query source data links back to the clause that produces the data. That’s really a big help, dealing with a QuerySourceReference that links back to a from clause is so much easier than trying to manually resolve a nested MemberExpression representing transparent identifiers. I’ve explained this part of re-linq in a previous post.
  • re-linq automatically detects sub-queries and parses them into dedicated QueryModels. References to outer data are correctly translated into QuerySourceReferences.
  • Partial evaluation is also handled by re-linq.

So, that’s why you don’t start with nothing but visitors with re-linq. You get a semantically sound QueryModel, easy transformability, an extensible parser architecture, simplified inner expressions with back-references to the data sources, subquery detection, partial evaluation, and so on.

Yes, you still have to visit expression trees, but those have already been tremendously simplified at that point. Even if we parsed those for you, we’d still have to generate some kind of generalized tree structure that you’d have to traverse in order to generate your target query. So, what we do is we give you an expression tree anyway, but with all the stumbling blocks removed.

And in addition, re-linq is a library that is maintained by us, the contributors. This means: if you find a bug, we’ll fix it. If somebody adds a new feature, all users can profit from it. That’s what makes the difference to rolling your own. Make use of all the time we’ve already invested!

- Fabian

Comments

Stefan Wenig - September 22nd, 2009 at 09:26

I’d like to add the following:

Ray’s question seems to come from a more general misunderstanding. re-linq, at its core, is not so much about generating SQL. It can do that, but that’s not the thing we’re most excited about. re-linq is about removing much of the hard work that is necessary when you generate queries in any other query language, so it is target language agnostic.

Just to give you an impression: We provide a (somewhat outdated) SQL backend, there’s the HQL sample at re-linq|ishing the Pain: Using re-linq to Implement a Powerful LINQ Provider on the Example of NHibernate (which generates HQL query string for NHibernate), there’s Steve Strong building a production HQL backend in the NH trunk using the new HQL ASTs, and it’s quite easy to imagine XQuery or Entity SQL backends too. Anything, really.

With such a generic scope, there’s really not much more we can do. Fabian has some ideas for additional features, but basically that’s it. You have to create a mental mapping between LINQ and your target language, and each language is different. re-linq just makes dealing with the LINQ side of the equation much easier. We believe that’s a huge part of any LINQ provider though, so in many cases re-linq would actually reduce the development time to a fraction.

(That said, there are transformations that might be useful for various backends, so re-linq provides a good platform to share that code!)

When you just want to generate SQL, take a look at the SQL backend. It basically gives you entry points to define SQL dialects and the mapping of classes/properties to tables/columns. (In an early prototype, we were even able to generate HQL strings via the SQL backend, treating HQL as a SQL dialect.)

If you want to help improve the old SQL backend to use the new capabilities of re-linq, just contact us.

Clone this wiki locally