Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document @generated functions #10673

Merged
merged 7 commits into from
Apr 22, 2015

Conversation

tomasaschan
Copy link
Member

stagedfunctions have been around for a while now, and we've successfully used them in Interpolations.jl, but they are sadly without documentation. I know @timholy wants to help with it, but he has more important stuff on his plate, so I figured I'd help out. I don't claim to be an expert in the inner workings of this black magic in any way, but as an "outsider" I might even be more suitable to write these docs than someone who knows all the gory details, since can't go into them ;) However, this also means that these docs need to be extra carefully fact-checked; I've tested all the examples locally, but I can't guarantee that I've understood why they work the way they do.

This is still just WIP - as indicated toward the end, I want to add another section with a more advanced example that demonstrate stagedfunctions doing something in a way that's more convenient than doing it with macros and/or regular functions, but I couldn't come up with a good case that didn't require a lot of background. Any suggestions are welcome.

Also, I couldn't figure out how to run the doc tests locally. I know I knew how to do this once, but I've forgotten and couldn't find the info anywhere I looked (README.md in both project root and under doc, CONTRIBUTING.md and at docs.julialang.org). If it's not just me being illiterate, maybe that too needs to be documented...

@tomasaschan
Copy link
Member Author

Also, I'm really hesitant to add this at the end of an already huge section in the manual, but I didn't know where else to put it. Any suggestions here are of course also welcome!

@timholy
Copy link
Member

timholy commented Mar 29, 2015

Very nice! And thanks very much.

more advanced magic than just regular functions; enter *staged functions*.
Staged functions have the capability to generate specialized code depending
on the *types* of the arguments you give them, so that you can optimize or
generalize your code in ways that aren't possible with ordinary functions.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of points:

  • This description makes it sound as if stagedfunctions are a little less powerful than macros, but mostly they are just different (and there are a good number of things you can do with stagedfunctions that you can't do with macros). The key distinctions between macros and stagedfunctions are:
    • macros only work with expressions, and so don't know the types of the inputs
    • macros work at parsing-time, whereas stagedfunctions get expanded on-demand, possibly differently for each set of types
  • One way I like to conceptualize stagedfunctions is that they provide a flexible framework to move work from runtime to compile-time. I'll see if I can come up with a short example below.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think everything is there now:

Stagedfunctions play a similar role as macros, but at a later stage between parsing and run-time. Staged functions give the capability to generate specialized code depending on the types of their arguments. Macros work with expressions at parsing-time and cannot access the types of their inputs. In contrast, a stagedfunction gets expanded at a time when the type of the arguments is known, but the function is not yet compiled.
Depending on the types of the input, a staged function returns a quoted expression which then forms the function body of the specialized function. Staged functions provide a flexible framework to move work from runtime to compile-time.

@mbauman
Copy link
Member

mbauman commented Mar 29, 2015

This is a great start! Some random thoughts:

  • I definitely agree it belongs in the Meta-programming section. If it starts to feel unwieldy, we could maybe split it apart later (maybe between reflection and generation?)
  • Since it immediately follows Macros, I think it'd be good to highlight their differences more. Ah, I see Tim just made the same comment as I was typing this. I'd also add a note about why hygiene isn't a concern.
  • One of the trickiest things for me as I was learning how to use stagedfunctions was understanding how the arguments represent either the passed value or its type depending upon the context. Perhaps it's be good to have an example that simply returns both the value and type, e.g., :(x, $x), as a first step.
  • It'd be good to assert that the code generation must be deterministic and type-stable to get well-defined behavior.

@timholy
Copy link
Member

timholy commented Mar 29, 2015

Perhaps sub2ind is a good all-round example? One could provide 3 implementations: the current loop-based one, one based on stagedfunctions (which would illustrate building up a single expression), and one based on recursion (#10337). This might illustrate the ideas, as well as point out that there are sometimes good alternatives to using stagedfunctions (which might make @JeffBezanson happy).

An earlier version of Jutho's PR was based on stagedfunctions; does anyone know how to find a commit prior to a force-push?

@tomasaschan
Copy link
Member Author

Thanks for the comments, @timholy and @mbauman!

@timholy: Yes, I agree that first paragraph isn't ideal. It was the first thing I wrote, mainly just to get started, so I'll be happy to try to re-word it. sub2ind seems like a good example function, especially since there are several other implementations to compare with. I couldn't find a stagedfunction implementation among the ones you linked to - do you know if there is one lying around somewhere, or should I invent my own?

@mbauman: Yes, the distinction between when the argument x is a type and when it's a value was the hardest bit to grasp for me as well. I tried to illustrate how this works by means of a println(x) statement in the body, before returning an expression, to avoid adding interpolation to the mix. Did you think this wasn't clear enough?
Also, deterministic code generation is a good thing to mention. An example like

stagedfunction foo(x)
    if rand() < .5
        return :(x)
    else
        return :("boo!")
    end
end

could be used both to illustrate that the code in the body is only actually run once (we get the same result every time we execute this function, but we don't know until we've done it once if it's going to be x or "boo!"...) and the returned expression is re-used for the same type after that.

Edit: sorry, managed to tag @mauro instead of @mbauman. Leaving this note here so you don't get confused over why you got a notification :)

compiled. After that, the expression returned from the ``stagedfunction`` on the
first invocation is re-used as the method body.

We can utilize this to do slightly weirder things:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe instead something like "The example staged function foo above did not do anything a normal function foo(x)=x*x could not do, except printing the the type on the first invocation. However, the power of a staged function lies in its ability to compute different quoted expression depending on the types passed to it:"

@timholy
Copy link
Member

timholy commented Mar 30, 2015

do you know if there is one lying around somewhere, or should I invent my own?

That PR originally used stagedfunctions, so somewhere there should be an old commit that is no longer on a named branch. But I don't know how you find it, and I suspect that (at least for me) it would take longer to figure that out than to rewrite it.

@ViralBShah ViralBShah added the docs This change adds or pertains to documentation label Mar 31, 2015
@ViralBShah ViralBShah added this to the 0.4.0 milestone Mar 31, 2015
@tomasaschan
Copy link
Member Author

There, I clearly killed the build with whitespace errors.

Is it documented somewhere what checks are run on documentation edits, and how to run them locally?

@mauro3
Copy link
Contributor

mauro3 commented Mar 31, 2015

cd doc
make html

should make a doc/_build/html/index.html which you can open in the browser. You need sphinx installed. Although, afaik whitespace is not checked for, I think that is a Travis thing.

@mauro3
Copy link
Contributor

mauro3 commented Mar 31, 2015

@tomasaschan
Copy link
Member Author

@mauro3, I did that and it worked without error. I was hoping there was somewhere I could read more about all the checks run by Travis, to avoid pushing stuff that won't build...

@mauro3
Copy link
Contributor

mauro3 commented Apr 1, 2015

Yes, sorry, I should read the question! It looks like Travis chocked on

make check-whitespace

which you can run in the top-directory of your Julia install. Does that give you errors? I think, that is the only extra test run, apart from the Julia unit-tests. At least that is how I interpret: https://github.com/JuliaLang/julia/blob/master/.travis.yml

@tomasaschan tomasaschan force-pushed the doc-stagedfunctions branch from 7618581 to cd19d47 Compare April 6, 2015 11:10
@tomasaschan
Copy link
Member Author

@timholy: I find it difficult to wrap my head around how to implement sub2ind as a stagedfunction. Any hints (or even a full implementation) would be most welcome - I can wrap it in explaining text, but I don't understand the algorithmic idea behind implementing it.


(Slightly OT below...)

I did, however, find a way to inspect all the "dangling" commits on my local git tree. I didn't have anything on there from the previous version of the PR (no surprise there) but I figured someone else might. This is what I did (in bash):

# find all the commit hashes of dangling commits
git fsck --lost-found | awk '{ print $3 }' > hashes.txt
# loop through them and open each in a browser window
for ((i=1; i<=$(wc -l hashes.txt | cut -f 1 -d ' '); i++)); do echo $i; chromium-browser "https://github.com/julialang/julia/commits/`head -$i hashes.txt | tail -1`"; done

On my machine this opened 52 new tabs, which took a while but by no means was a problem for my laptop. If you have many dangling commits, though, this might be too much; check how many you have with wc -l hashes.tmp and hand-craft the loop limits if necessary to split it up in portions. The implementation from #10337 might be salvage-able if we really want to find it :)

@timholy
Copy link
Member

timholy commented Apr 6, 2015

Briefly it would look something like this (not tested):

stagedfunction sub2ind{N}(dims::NTuple{N}, indexes...)
    ex = :(indexes[$N]-1)
    for i = N-1:-1:1
        ex = :(indexes[$i]-1 + dims[$i]*$ex)
    end
    :($ex + 1)
end

As you can see, it's almost identical to a function version that uses loops:

function sub2ind{N}(dims::NTuple{N}, indexes...)
    ind = indexes[N]-1
    for i = N-1:-1:1
        ind = indexes[i]-1 + dims[i]*ind
    end
    ind + 1
end

but you don't actually create a runtime loop with the stagedfunction (check the final expression built by the stagedfunction).

That said, I wouldn't be shocked if LLVM might be able to do the same thing in this case.

In short: don't do this.

While these examples are perhaps not so interesting, they have hopefully
elped illustrating how staged functions work, both in the definition end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"...they have hopefully helped to illustrate how..."

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I'll make sure to correct that.

@tomasaschan
Copy link
Member Author

@timholy Fast and helpful as always :) I think it was the possibility to interpolate ex into the new ex in the loop body that I didn't think of. A few quick tests show me that it gives the same results as the other two versions, so this is definitely all I need to finalize this PR. Thanks a lot!

@tomasaschan
Copy link
Member Author

There - I fixed some typos and grammar issues, and completed the example. If there are no outstanding issues with the text I feel "done" with this (for now - documentation is never finished...).

I'm happy to squash and/or rebase this if desirable.

@tomasaschan
Copy link
Member Author

FWIW, I did a quick benchmark of the three approaches in the advanced example:

using Benchmark

function sub2ind_loop{N}(dims::NTuple{N}, I::Integer...)
    ind = I[N] - 1
    for i = N-1:-1:1
        ind = I[i]-1 + dims[i]*ind
    end
    ind + 1
end

sub2ind_rec(dims::()) = 1   
sub2ind_rec(dims::(),i1::Integer, I::Integer...) =
    i1==1 ? sub2ind(dims,I...) : throw(BoundsError())
sub2ind_rec(dims::(Integer,Integer...), i1::Integer) = i1
sub2ind_rec(dims::(Integer,Integer...), i1::Integer, I::Integer...) =
    i1 + dims[1]*(sub2ind(Base.tail(dims),I...)-1)

stagedfunction sub2ind_staged{N}(dims::NTuple{N}, I::Integer...)
    ex = :(I[$N] - 1)
    for i = N-1:-1:1
        ex = :(I[$i] - 1 + dims[$i]*$ex)
    end
    :($ex + 1)
end

loop() = sub2ind_loop((101,235,1249,325,1992,123,59), 67, 129, 875, 125, 11, 89, 46)
rec() = sub2ind_rec((101,235,1249,325,1992,123,59), 67, 129, 875, 125, 11, 89, 46)
staged() = sub2ind_staged((101,235,1249,325,1992,123,59), 67, 129, 875, 125, 11, 89, 46)

#correctness and warmup
@assert loop() == rec() == staged()

#bench
compare(10_000, loop, rec, staged)

After a gzillion ambiguity warnings from Benchmark.jl, the results seem to indicate that the loop and the recursive approach are compatible in performance (running workspace(); include("bench.jl") a couple of times yields a different winner between the two between runs) and the staged function approach is slightly faster. I have no idea how much of this is because of inlining and other compiler magic, though, so the benchmark might be altogether rubbish...

@timholy
Copy link
Member

timholy commented Apr 7, 2015

If you define benchmarks like this:

function run_loop(n)
           s = 0
           for j = 1:n
               for I in CartesianRange(CartesianIndex((5,5,5,5)))
                   s += sub2ind_loop((5,5,5,5), I[1], I[2], I[3], I[4])
               end
           end
           s
       end

then you'll see that sub2ind_loop is much slower than the other two (which are equivalent):

julia> @time run_loop(10^4)
elapsed time: 0.442542374 seconds (368 MB allocated, 4.93% gc time in 17 pauses with 0 full sweep)
1956250000

julia> @time run_rec(10^4)
elapsed time: 0.024556659 seconds (224 bytes allocated)
1956250000

julia> @time run_staged(10^4)
elapsed time: 0.02548805 seconds (224 bytes allocated)
1956250000

That turns out to be because of splatting (which you can tell because of the memory allocation).

@tomasaschan
Copy link
Member Author

That makes sense - thanks for pointing it out!

@tomasaschan tomasaschan force-pushed the doc-stagedfunctions branch from c42321b to 704aa64 Compare April 7, 2015 15:30
@mbauman
Copy link
Member

mbauman commented Apr 7, 2015

It's so difficult to benchmark just the compiler magic you want to occur in typical situations without getting too much or too little magic.

@pao
Copy link
Member

pao commented Apr 7, 2015

@tlycken For the future, the syntax to skip AppVeyor is [av skip] with a space, not a dash. This commit did run on AV (and passed).

@tomasaschan
Copy link
Member Author

@pao, right, thanks. I noticed that it ran, but figured I'd do more harm than good trying to figure it out by trial and error. Next time I'll get it right! :)

@tomasaschan
Copy link
Member Author

Is this waiting on me to do anything more here?

:($ex + 1)
end

**What code will this staged function generate?**
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it feels like there needs to be an @code_expand or something similar that will do this without the extra work described below. it may be worth having an issue to revisit this later.

@tomasaschan
Copy link
Member Author

@vtjnash Thanks for a very thorough proof-reading! :)

@tomasaschan tomasaschan force-pushed the doc-stagedfunctions branch 3 times, most recently from 1b9500b to 6c544ca Compare April 17, 2015 09:01
@mbauman
Copy link
Member

mbauman commented Apr 20, 2015

New name! Do you mind going back through and updating the vocabulary, @tlycken?

the types of the arguments are known, but the function is not yet compiled.

Depending on the types of the arguments, a staged function returns a quoted
expression which then forms the function body of the specialized function.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a computer science expert, but I think you want to be careful about the distinction between functions and methods. I think this would be more correctly stated as "a staged function returns … which forms the method body of the specialized method". But I may be wrong or splitting hairs not worth splitting.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're absolutely correct - we should be careful, especially since this is quite a difficult topic to wrap one's head around in the first place. I'll fix it when I go over and change the terminology to generated functions.

Tomas Lycken and others added 4 commits April 20, 2015 22:23
This is a rebase/squash of the following commits:

8ffd923 Doc: stagedfunctions. Basic functionality and simple example
471befa Update stagedfunction doc according to comments
cd19d47 Start on advanced example
e02d21f Correct syntax typos
293ec59 Correct typo and improve grammar
c42321b Complete advanced example
* Make `sub2ind_staged_impl` define `N` correctly
* Add a note that staging *might* occur more than once.
@tomasaschan tomasaschan force-pushed the doc-stagedfunctions branch from 6c544ca to c4719d2 Compare April 20, 2015 20:24
@tomasaschan
Copy link
Member Author

@mbauman Done!

I'd be grateful for a new round of proofreading - I might very well have missed some things in the rewrite.

*types* of the arguments, not their values.

3. Instead of calculating something or performing some action, you return
from a *quoted expression* which, when evaluated, does what you want.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"you return from a quoted expression"

@tomasaschan
Copy link
Member Author

Thanks @mbauman for the speedy review!

@tomasaschan tomasaschan force-pushed the doc-stagedfunctions branch from 331a156 to 59a8905 Compare April 20, 2015 20:54
@tomasaschan tomasaschan changed the title WIP: Document stagedfunctions WIP: Document @generated functions Apr 20, 2015
@mschauer
Copy link
Contributor

"Specialized method" is not really a clear concept. A method is a function specialized on type and for generating functions the generation part can be itself a method or a function, which returns then "further specialized methods", right?

According to comment by @mshauer - I have attempted to adjust this passage
to the Julia terminology, hoping that I didn't make it difficult to read in the process :)
@tomasaschan
Copy link
Member Author

@mschauer I failed to spell your handle right in the commit message, but that last commit is supposed to fix the wording re: "specialized methods". I hope it's better now :)

@mbauman
Copy link
Member

mbauman commented Apr 21, 2015

👍 That is less jargon-y and more readable, I think. Nicely done.

One last bug: the PR title. Do you still consider this a WIP?

@tomasaschan tomasaschan changed the title WIP: Document @generated functions Document @generated functions Apr 21, 2015
@tomasaschan
Copy link
Member Author

Nope - not more than any documentation ever ;)

@prcastro
Copy link
Contributor

👍

jakebolewski added a commit that referenced this pull request Apr 22, 2015
@jakebolewski jakebolewski merged commit aafdc50 into JuliaLang:master Apr 22, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs This change adds or pertains to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.