Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

eliminate stack overflow #1639

Open
andrewrk opened this issue Oct 6, 2018 · 12 comments
Open

eliminate stack overflow #1639

andrewrk opened this issue Oct 6, 2018 · 12 comments
Labels
proposal This issue suggests modifications. If it also has the "accepted" label then it is planned.
Milestone

Comments

@andrewrk
Copy link
Member

andrewrk commented Oct 6, 2018

This is a proposal to solve #157.

First we must introduce a new kind of value, one that is neither comptime nor runtime.
It is a value that is known at compile-time, but only after all semantic analysis is complete,
and so it cannot participate in comptime code. For the purposes of this proposal,
this kind of value is called a "post-comptime" value. An actual comptime value can be used
where a post-comptime value is expected. Simple arithmetic such as +, -, and * can be
used on post-comptime values, to produce another post-comptime value. There also will be
some way to do a max() operation on post-comptime values.

External Functions

extern function declarations have a default stack size requirement,
which is dependent on the target. For example, on x86_64 Linux,
the default is 8388608 bytes (8 MiB). This matches the default stack
size on Linux for executables as well as default pthread stack sizes.
Zig treats every extern function as if it needs at least this much
stack space.

When an external function is known to require a larger or smaller amount
of stack space, it can be annotated:

extern<912> fn foo()void;

Here, foo is known to require only 912 bytes of stack space. In this example,
we provided a comptime value, but the expression can be a post-comptime value.

Function Pointers

One does not simply call a function pointer.

One must use @ptrCall(worst_case_stack_size, function, ...), where
worst_case_stack_size is a post-comptime value.

This inserts a runtime safety check - once compilation completes,
all function stack size requirements are known, and inserted into
the binary. A function pointer call checks if the provided
worst_case_stack_size value is correct.

This ensures that during semantic analysis, the compiler can rely on
this value when calculating worst case stack size.

Most APIs will want to accept comptime functions rather than function
pointers, as this avoids @ptrCall. For example, std.mem.Allocator,
std.io.InStream, std.io.OutStream, and std.rand.Random.

Recursion

Recursion, whether direct or indirect will be solved with the coroutines rewrite.
See #1260.

Call graph analysis detects loops:

fn fib(x: i32) i32 {
    if (x <= 1) return x;
    return fib(x - 1) + fib(x - 2); // error: unbounded stack growth
}

This will be fixed with coroutines:

async fn fib(allocator: *Allocator, x: i32) !i32 {
    if (x <= 1) return x;
    const frame = try allocator.create(@Frame(fib));
    defer allocator.destroy(frame);
    frame.* = async fib(x - 1);
    const value1 = await frame.*;
    frame.* = async fib(x - 2);
    const value2 = await frame.*;
    return value1 + value2;
}

Calls to this function grow heap memory, not stack memory. The function can
return error.OutOfMemory. This is a leaf node in the call graph analysis;
it does not store any information on the stack.

Annotating Additional Stack Usage (Inline Assembly)

In some cases, it may be necessary to annotate that more stack is used, even
when it does not come from a function call. One such example is inline
assembly that directly accesses the stack.

For these cases there will be a function @declareStackUsage(byte_count), where
byte_count is a post-comptime value.

Because Zig is not able to trace how this builtin call moves with LLVM's function
inlining optimization passes, the value is globally added to the root node of
the call graph.

Threads

Currently in std.os.spawnThread, Zig allocates a hard coded 8 MiB per thread.

With this proposal, it will use a new builtin function @stackSize(function).
This builtin returns a post-comptime number of bytes of the worst case stack
size required to call function.

Main Function

Zig knows the start symbol for the target. This is how it avoids including
std/special/bootstrap.zig when a program manually exports the start symbol.

During semantic analysis, Zig uses @stackSize(start_function) internally
to determine stack size requirements, and passes this to the linker.

Exported Functions

Zig internally runs @stackSize on all exported functions, in order to
run the call graph analysis and emit compile errors for recursion, etc.
The computed stack size requirements are included in generated .h files,
in comments, since C has no way to annotate stack size requirements for
functions.

@andrewrk andrewrk added the proposal This issue suggests modifications. If it also has the "accepted" label then it is planned. label Oct 6, 2018
@andrewrk andrewrk added this to the 0.4.0 milestone Oct 6, 2018
@Hejsil
Copy link
Contributor

Hejsil commented Oct 22, 2018

With the new coroutine semantics, what reason is there to prefix calls with async? There is no longer a need to provide an allocator at callsite so the prefix async is redundant. Related #1676

@andrewrk
Copy link
Member Author

It would not be strictly necessary. However it makes it obvious that the function call does not return the return type of the function but instead returns the coroutine frame.

It would also be possible for async functions to call other async functions without any special syntax at all; the question becomes, is it better to force the suspend point to be explicitly pointed out, or not?

@shawnl
Copy link
Contributor

shawnl commented May 23, 2019

The computed stack size requirements are included in generated .h files,
in comments, since C has no way to annotate stack size requirements for
functions.

It should be a new attribute: __attribute__((__stack_size__(256))) This would enhance the existing no_split_stack https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html

@andrewrk andrewrk modified the milestones: 0.5.0, 0.6.0 Aug 27, 2019
@andrewrk andrewrk modified the milestones: 0.6.0, 0.7.0 Dec 9, 2019
@daurnimator
Copy link
Contributor

One must use @ptrCall(worst_case_stack_size, function, ...), where
worst_case_stack_size is a post-comptime value.

This would now (since #3856) be a new option to @call.

@daurnimator
Copy link
Contributor

What would we use as the stack size for vdso functions? If the kernel doesn't already, we could propose that they include a valid PT_GNU_STACK p_memsz field?

@anosovic
Copy link

would it be required to heap allocate stack frames of functions that are tail call optimized/stack size is bounded at comptime?

@ghost
Copy link

ghost commented Jan 26, 2021

Re. post-comptime: I think this is an idea worth exploring on its own. In trying to work out how to better optimise struct layout, I've stumbled upon another use case; see #7888.

@andrewrk andrewrk modified the milestones: 0.8.0, 0.9.0 May 19, 2021
@churchofthought
Copy link

I thought you guys might want to know there is a huge discussion about automatic transformation of recursive algos to heap allocated rather than stack allocated, on Hacker News.

I think Zig, being so progressive, and being more of a "safe" language than C could benefit by fixing the stack overflow flaw in C.

https://news.ycombinator.com/item?id=28684098

Cheers @andrewrk . Thanks for pioneering Zig

@andrewrk andrewrk modified the milestones: 0.9.0, 0.10.0 Nov 23, 2021
@andrewrk andrewrk modified the milestones: 0.10.0, 0.11.0 Apr 16, 2022
@andrewrk andrewrk modified the milestones: 0.11.0, 0.12.0 Apr 9, 2023
@andrewrk andrewrk modified the milestones: 0.13.0, 0.12.0 Jul 9, 2023
@yaksher
Copy link

yaksher commented Sep 12, 2023

The syntax for recursive functions via coroutines seems both confusing and extremely annoying to write, making recursion incredibly tedious to use. Using async/await to write recursive functions seems very confusing (and would also hamper performance). The minimal idea I have to make it better would be to have a builtin alternative to @call like @alloc_call which takes an allocator along with the usual arguments and can return an error (this could be a standard library function if the generics system Zig has allows it, I haven't written enough to know off the top of my head if you can forward variadic arguments). So the Fibonacci function would look like

async fn fib(allocator: *Allocator, x: i32) !i32 {
    if (x <= 1) return x;
    return try @alloc_call(allocator, fib, x - 1) + try @alloc_call(allocator, fib, x - 2);
}

It would still be messy, have the overhead of heap allocation, and mark the function as async for some reason even though it practically isn't, but at least it would provide a nice way to write it.

My alternative suggestion, however, is to provide some form of explicit stack recursion support. If it's okay for the function to say it's out of memory, then it's okay for the function to fail because it hit the stack limit and return error.WouldOverflowStack or whatever, as long as this can be efficiently detected at runtime.

I have two main ideas for the syntax for this, both with their tradeoffs. One is a rec keyword applied to functions, like async. Functions which are declared rec need to also be prefixed rec at the callsite and a rec call can return an error.WouldOverflowStack and otherwise evaluates to the value of the function called.

rec fn fib(x: i32) !i32 {
    if (x <= 1) return x;
    return try rec fib(x - 1) + try rec fib(x - 2)
}

From an implementation perspective, I think rec functions would have a @max_frame_size instead of an @stack_size. This describes the max stack depth they can reach without making another checked, recursive call, which has size 0 in the normal stack analysis because it's runtime checked instead. This would factor in any non-recursive functions they call. When you make a call with rec, it checks that current_stack_pointer - @max_frame_size(func) is still above min_stack_pointer and returns error.WouldOverflowStack if this check fails, otherwise it calls the function.

An alternative option is to move the burden of dynamically allocating a stack frame inside the recursive function. This might remove the need for a rec keyword, I'm not sure exactly how you'd implement it. I'm imagining it would look something like this:

fn fib(x: i32) !i32 {
    try @build_next_frame();
    if (x <= 1) return x;
    return try fib(x - 1) + try fib(x - 2);
}

The @build_next_frame() builtin could be placed anywhere in a function, and even multiple times if you wanted to, and it would segment the stack frame into several sections, and it would extend to the next one at each call, returning error.WouldOverflowStack on failure. The compile time stack size of a function with any number of @build_next_frame() calls in it is the size up to the first one. Notably, this means that the above function would have a compile time stack size of 1 or 2 words (depending on the CPU architecture) for pushing the return pointer to the stack. This means that the size of the rest of the frame can include any space used (at compile time) by the recursive calls, which is again just 1 or 2 words. This has several advantages over the previous suggestion, such as making recursive functions into just normal functions that can return an error. It makes recursive functions arguably cleaner to write because they only have to acknowledge that they're recursive in one place (and otherwise just handle that they can return an error) instead of in the definition and in every place where they make a recursive call. It would also allow you to make conditional stack allocations that don't factor into the compile time stack size of the function, which could be useful for large buffers that aren't always used.

fn example(x: i32) i32 {
    // do stuff
    if (x = 0) {
        @build_next_frame() catch @panic("");
        const y: [100000]i32 = undefined;
    }
}

The obvious disadvantage over the previous suggestion is that a rec keyword makes relative sense and is fairly intuitive while this try @build_next_frame() BS is something nobody has ever seen. Additionally, something would need to be done to ensure that the return value of @build_next_frame is actually handled, because obviously just continuing into the rest of the function would be invalid. A potential alternative formulation would be some kind of block, like

fn fib(x: i32) !i32 {
    try @with_dynamic_frame {
        if (x <= 1) return x;
        return try fib(x - 1) + try fib(x - 2);
    };
}

This variant would probably need to be a keyword (like dynamic_scope) or whatever rather than a builtin, since builtins are things with function-like syntax, but nonetheless.

The rec keyword would also be a better hint to the compiler when performing static analysis; ideally, the compiler would be able to determine that fib(x) has a call depth of x and then it can elide all the checks except at the final callsite, where it just checks that the space used by the entire thing is fine. This would be inconsistent at best (e.g., handling recursive list traversal this way would probably be impossible), but it's worth considering and rec might be better suited to it.

Either way, even without static analysis like that, I think either of these is more readable, easier to write, and more performant than coroutine-based recursion on the heap, while being exactly as safe.

It would also I think be reasonable to supplement this with an @tailcall built in, which tells the compiler that if it can't compile this recursion to a loop, it should throw a compiler error. This allows you to write a recursive function while guaranteeing that it actually has a known stack size. The most basic version of this would be extremely simple on the implementation side in that it would not try to do anything clever. If it's literally a tail call already and you can just replace call .label; ret with jmp .label, you do that and succeed, and otherwise you fail. So

fn fact(x: i32) i32 {
    fn helper(x: i32, acc: i32) i32 {
        if (x <= 1) return acc;
        return @tailcall(fact, x - 1, acc * x);
    }
    return helper(x, 1);
}

Would compile just fine, because it's a literal tail call and trivially optimizable to a loop while

fn fact(x: i32) i32 {
    if (x <= 1) return 1;
    return @tailcall(fact, x - 1) * x;
}

would not be supported in a most basic version because an accumulator variable would need to be added by the compiler. That said, it would be nice to try to support some stuff like the latter, it seems like a rather low priority. Either way, the crux would be that it lets you write a "recursive-like" function which uses a compile time known stack space.

The final issue I see is that the current handling of extern functions would make it extremely difficult to work with C interop, while also not actually being safe anyway. One of the big draws of Zig is being able to automatically generate C bindings, but this would make it impossible to call a C function on any system where the stack size is limited to the architecture default or less, I believe? I'm not sure what happens if you run an executable that asks for a specific stack size on a system with ulimits set below what it asks for. Providing accurate stack size annotations to C functions would be extremely difficult. On the flip side, the default stack size is not actually guaranteed to be safe for C functions anyway, because those can have unchecked recursion whose depth depends on the arguments. It's unclear to me how best to handle this because you can't actually check a non-cooperative function's stack usage. At best, you can protect the page at the end of the stack and set a signal handler to make a stack overflow runtime checked undefined behavior instead of just "undefined behavior, good luck." Unlike with the recursive functions, I wasn't able to come up with anything that feels like a good solution, but it felt worth noting that extern functions, since they can be recursive, are still inherently unsafe even if you allocate the amount of stack size they're supposed to run within.

@AnataBakka
Copy link

AnataBakka commented Feb 24, 2024

From an implementation perspective, I think rec functions would have a @max_frame_size instead of an @stack_size. This describes the max stack depth they can reach without making another checked, recursive call, which has size 0 in the normal stack analysis because it's runtime checked instead. This would factor in any non-recursive functions they call. When you make a call with rec, it checks that current_stack_pointer - @max_frame_size(func) is still above min_stack_pointer and returns error.WouldOverflowStack if this check fails, otherwise it calls the function.

Something similar to this^. Everything should be gucci if you are able to tell a function how many times it can loop around itself, and if it goes beyond that then throw a runtime error that we can check against. The compiler should guarantee that there is enough stack allocated for any defered operations so that the developer defined recursing index can be reached without a stack overflow.

This is to mimic what we could be doing manually, i.e allocate arrays of maxNestSize for each required variable in the function, add an extra array of maxNestSize to store identifiers of different branches of entry with different deferred operations, confirm that the arrays don't overflow, and create a reversed loop to process the defferred operations, but in some cases it feels clunky, so would be nice if the compiler were finally able to automize the fix for the out of date design of deferred operations.

@ni-vzavalishin
Copy link

ni-vzavalishin commented Apr 1, 2024

My alternative suggestion, however, is to provide some form of explicit stack recursion support. If it's okay for the function to say it's out of memory, then it's okay for the function to fail because it hit the stack limit and return error.WouldOverflowStack or whatever, as long as this can be efficiently detected at runtime.

I was having exactly the same idea. Furthermore, maybe this would allow @alloca to be reintroduced, since it will be able to fail in the same fashion.

TBH I'm rather sceptical about the feasibility of static stack analysis. There are definitely cases where it could work, but I doubt it can cover a sufficiently wide spectrum of situations in a reasonable/usable way, so it'd be nice to be able to not count on it.

@mlugg mlugg added this to Safety Aug 22, 2024
@mlugg mlugg moved this to To do in Safety Aug 22, 2024
@chirino
Copy link

chirino commented Nov 9, 2024

segmented stacks/stack splitting seems like a simpler approach to the recursion issue vs converting to a coroutine and then doing heap allocations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
proposal This issue suggests modifications. If it also has the "accepted" label then it is planned.
Projects
Status: To do
Development

No branches or pull requests

10 participants