Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use case: generalize the format string mechanism so code other than printf can use it #250

Closed
andrewrk opened this issue Feb 6, 2017 · 1 comment
Labels
enhancement Solving this issue will likely involve adding new logic or components to the codebase.
Milestone

Comments

@andrewrk
Copy link
Member

andrewrk commented Feb 6, 2017

/// Return value of output is a boolean indicating whether to continue.
pub fn format(output: fn([]const u8)->bool, comptime fmt: []const u8, args: ...) {
    comptime var start_index: usize = 0;
    comptime var state = State.Start;
    comptime var next_arg: usize = 0;
    inline for (format) |c, i| {
        switch (state) {
            State.Start => switch (c) {
                '{' => {
                    if (start_index < i) if (!output(format[start_index...i])) return;
                    state = State.OpenBrace;
                },
                '}' => {
                    if (start_index < i) if (!output(format[start_index...i])) return;
                    state = State.CloseBrace;
                },
                else => {},
            },
            State.OpenBrace => switch (c) {
                '{' => {
                    state = State.Start;
                    start_index = i;
                },
                '}' => {
                    if (!formatValue(args[next_arg], output)) return;
                    next_arg += 1;
                    state = State.Start;
                    start_index = i + 1;
                },
                else => @compileError("Unknown format character: " ++ c),
            },
            State.CloseBrace => switch (c) {
                '}' => {
                    state = State.Start;
                    start_index = i;
                },
                else => @compileError("Single '}' encountered in format string"),
            },
        }
    }
    comptime {
        if (args.len != next_arg) {
            @compileError("Unused arguments");
        }
        if (state != State.Start) {
            @compileError("Incomplete format string: " ++ format);
        }
    }
    if (start_index < format.len) {
        if (!output(format[start_index...format.len])) return;
    }
}

fn formatValue(value: var, output: fn([]const u8)->bool) -> bool {
    const T = @typeOf(value);
    if (@isInteger(T)) {
        return formatInt(T, value, output);
    } else if (@isFloat(T)) {
        return formatFloat(T, value, output);
    } else if (@canImplicitCast([]const u8, value)) {
        const casted_value = ([]const u8)(value);
        return output(casted_value);
    } else if (T == void) {
        return output("void");
    } else {
        @compileError("Unable to format type '" ++ @typeName(T) ++ "'");
    }
}

pub fn printf(self: &Self, comptime fmt: []const u8, args: ...) -> %void {
    var result: %void = {};
    fn writeBytes(buf: []const u8) -> bool {
        self.write(buf) %% |err| {
            result = err;
            return false
        };
        return true;
    }
    format(writeBytes, fmt, args);
    %return result;
    %return self.flush();
}

Requirements:

The type of writeBytes is a bound method, so that it can access
the result variable. We can allow users to call functions which
expect normal function pointers. Whenever a function expects a
function pointer, if it is ever potentially called with a bound
function, we can secretly make it expect a bound function, because
we can set theh instance pointer value to null, since it is never
accessed in a normal function.

The point of all this is that the format code is now decoupled from
I/O, and it could be used for other purposes.

For example, in a kernel, one might define a function like this:

pub fn kernelLog(comptime fmt: []const u8, args: ...) {
    format(fmt, args, kernelLogBytes);
}

Also, in general, it makes callbacks more viable, because it solves the
void * context unsafety problem.

There is a flaw, however, in the optimality. Here is some C:

struct SoundIo {
    void *userdata;
    void (*on_devices_change)(void *userdata);
    void (*on_backend_disconnect)(void *userdata, enum SoundIoError err);
    void (*on_events_signal)(void *userdata);
};

Here, the struct efficiently stores userdata - representing the pointer we
are binding to in a bound function. If we had automatic bound functions
as proposed above:

const SoundIo = struct {
    on_devices_change: fn(),
    on_backend_disconnect: fn(enum SoundIoError err),
    on_events_signal: fn(),
};

With the above automatic bound function proposal, if the programmer only set
these fields to normal functions, we would actually end up saving space
over C.

But if the programmer set all 3 fields at some point to bound functions,
for example, to the same instance, that would waste 2 pointers worth of
space.

However, this isn't quite analogous to the C example, because the Zig version
allows the programmer to set each function to a different instance of
potentially even different types. If you did that in C you would end up
introducing the extra pointers anyway.

A completely different take on this:

/// Return value of output is a boolean indicating whether to continue.
pub fn format(context: var, output: fn(@typeOf(context), []const u8)->bool, comptime fmt: []const u8, args: ...) {
    comptime var start_index: usize = 0;
    comptime var state = State.Start;
    comptime var next_arg: usize = 0;
    inline for (format) |c, i| {
        switch (state) {
            State.Start => switch (c) {
                '{' => {
                    if (start_index < i) if (!output(context, format[start_index...i])) return;
                    state = State.OpenBrace;
                },
                '}' => {
                    if (start_index < i) if (!output(context, format[start_index...i])) return;
                    state = State.CloseBrace;
                },
                else => {},
            },
            State.OpenBrace => switch (c) {
                '{' => {
                    state = State.Start;
                    start_index = i;
                },
                '}' => {
                    if (!formatValue(context, args[next_arg], output)) return;
                    next_arg += 1;
                    state = State.Start;
                    start_index = i + 1;
                },
                else => @compileError("Unknown format character: " ++ c),
            },
            State.CloseBrace => switch (c) {
                '}' => {
                    state = State.Start;
                    start_index = i;
                },
                else => @compileError("Single '}' encountered in format string"),
            },
        }
    }
    comptime {
        if (args.len != next_arg) {
            @compileError("Unused arguments");
        }
        if (state != State.Start) {
            @compileError("Incomplete format string: " ++ format);
        }
    }
    if (start_index < format.len) {
        if (!output(context, format[start_index...format.len])) return;
    }
}

fn formatValue(context: var, value: var, output: fn([]const u8)->bool) -> bool {
    const T = @typeOf(value);
    if (@isInteger(T)) {
        return formatInt(context, output, value);
    } else if (@isFloat(T)) {
        return formatFloat(context, output, value);
    } else if (@canImplicitCast([]const u8, value)) {
        const casted_value = ([]const u8)(value);
        return output(context, casted_value);
    } else if (T == void) {
        return output(context, "void");
    } else {
        @compileError("Unable to format type '" ++ @typeName(T) ++ "'");
    }
}

const Context = struct {
    result: %void,
    self: &Self,
};

fn writeBytes(context: &Context, result: &%void, buf: []const u8) -> bool {
    context.self.write(buf) %% |err| {
        context.result = err;
        return false
    };
    return true;
}

pub fn printf(self: &Self, comptime fmt: []const u8, args: ...) -> %void {
    var context = Context {
        .result = {},
        .self = self,
    };
    format(&context, writeBytes, fmt, args);
    %return context.result;
    %return self.flush();
}

This is more explicit - requiring the definition to explicitly support a
context - and it has none of the requirements of the first example (except completing var args a little more #77 (comment)).

One downside is that it generates wasteful template instantiations. But
since most will use a pointer which is passed verbatim, we should be
able to avoid most of the waste when we can detect that these
instantiations can codegen to the same thing.

For completeness, the kernel log example with the 2nd proposal:

fn kernelLogBytesIgnoreVoidParam(nothing: void, buf: []const u8) -> bool {
    return kernelLogBytes(buf);
}
pub fn kernelLog(comptime fmt: []const u8, args: ...) {
    format({}, fmt, args, kernelLogBytesIgnoreVoidParam);
}
@andrewrk andrewrk added the enhancement Solving this issue will likely involve adding new logic or components to the codebase. label Feb 6, 2017
@andrewrk andrewrk added this to the 0.1.0 milestone Feb 6, 2017
@andrewrk
Copy link
Member Author

andrewrk commented Feb 6, 2017

Another thing I wanted to point out is that this demonstrates a use case for the error type being used somewhere that is not the return value of a function, which contradicts #83.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Solving this issue will likely involve adding new logic or components to the codebase.
Projects
None yet
Development

No branches or pull requests

1 participant