use case: generalize the format string mechanism so code other than printf can use it #250

andrewrk · 2017-02-06T21:08:36Z

/// Return value of output is a boolean indicating whether to continue.
pub fn format(output: fn([]const u8)->bool, comptime fmt: []const u8, args: ...) {
    comptime var start_index: usize = 0;
    comptime var state = State.Start;
    comptime var next_arg: usize = 0;
    inline for (format) |c, i| {
        switch (state) {
            State.Start => switch (c) {
                '{' => {
                    if (start_index < i) if (!output(format[start_index...i])) return;
                    state = State.OpenBrace;
                },
                '}' => {
                    if (start_index < i) if (!output(format[start_index...i])) return;
                    state = State.CloseBrace;
                },
                else => {},
            },
            State.OpenBrace => switch (c) {
                '{' => {
                    state = State.Start;
                    start_index = i;
                },
                '}' => {
                    if (!formatValue(args[next_arg], output)) return;
                    next_arg += 1;
                    state = State.Start;
                    start_index = i + 1;
                },
                else => @compileError("Unknown format character: " ++ c),
            },
            State.CloseBrace => switch (c) {
                '}' => {
                    state = State.Start;
                    start_index = i;
                },
                else => @compileError("Single '}' encountered in format string"),
            },
        }
    }
    comptime {
        if (args.len != next_arg) {
            @compileError("Unused arguments");
        }
        if (state != State.Start) {
            @compileError("Incomplete format string: " ++ format);
        }
    }
    if (start_index < format.len) {
        if (!output(format[start_index...format.len])) return;
    }
}

fn formatValue(value: var, output: fn([]const u8)->bool) -> bool {
    const T = @typeOf(value);
    if (@isInteger(T)) {
        return formatInt(T, value, output);
    } else if (@isFloat(T)) {
        return formatFloat(T, value, output);
    } else if (@canImplicitCast([]const u8, value)) {
        const casted_value = ([]const u8)(value);
        return output(casted_value);
    } else if (T == void) {
        return output("void");
    } else {
        @compileError("Unable to format type '" ++ @typeName(T) ++ "'");
    }
}

pub fn printf(self: &Self, comptime fmt: []const u8, args: ...) -> %void {
    var result: %void = {};
    fn writeBytes(buf: []const u8) -> bool {
        self.write(buf) %% |err| {
            result = err;
            return false
        };
        return true;
    }
    format(writeBytes, fmt, args);
    %return result;
    %return self.flush();
}

Requirements:

functions inside other functions (functions inside functions (closures) #229)
implicit function calling convention resolution (implicit function calling convention resolution #105)
bound functions (bound methods #141)
passing var args as an argument (support functions with variable length arguments #77)

The type of writeBytes is a bound method, so that it can access
the result variable. We can allow users to call functions which
expect normal function pointers. Whenever a function expects a
function pointer, if it is ever potentially called with a bound
function, we can secretly make it expect a bound function, because
we can set theh instance pointer value to null, since it is never
accessed in a normal function.

The point of all this is that the format code is now decoupled from
I/O, and it could be used for other purposes.

For example, in a kernel, one might define a function like this:

pub fn kernelLog(comptime fmt: []const u8, args: ...) {
    format(fmt, args, kernelLogBytes);
}

Also, in general, it makes callbacks more viable, because it solves the
void * context unsafety problem.

There is a flaw, however, in the optimality. Here is some C:

struct SoundIo {
    void *userdata;
    void (*on_devices_change)(void *userdata);
    void (*on_backend_disconnect)(void *userdata, enum SoundIoError err);
    void (*on_events_signal)(void *userdata);
};

Here, the struct efficiently stores userdata - representing the pointer we
are binding to in a bound function. If we had automatic bound functions
as proposed above:

const SoundIo = struct {
    on_devices_change: fn(),
    on_backend_disconnect: fn(enum SoundIoError err),
    on_events_signal: fn(),
};

With the above automatic bound function proposal, if the programmer only set
these fields to normal functions, we would actually end up saving space
over C.

But if the programmer set all 3 fields at some point to bound functions,
for example, to the same instance, that would waste 2 pointers worth of
space.

However, this isn't quite analogous to the C example, because the Zig version
allows the programmer to set each function to a different instance of
potentially even different types. If you did that in C you would end up
introducing the extra pointers anyway.

A completely different take on this:

/// Return value of output is a boolean indicating whether to continue.
pub fn format(context: var, output: fn(@typeOf(context), []const u8)->bool, comptime fmt: []const u8, args: ...) {
    comptime var start_index: usize = 0;
    comptime var state = State.Start;
    comptime var next_arg: usize = 0;
    inline for (format) |c, i| {
        switch (state) {
            State.Start => switch (c) {
                '{' => {
                    if (start_index < i) if (!output(context, format[start_index...i])) return;
                    state = State.OpenBrace;
                },
                '}' => {
                    if (start_index < i) if (!output(context, format[start_index...i])) return;
                    state = State.CloseBrace;
                },
                else => {},
            },
            State.OpenBrace => switch (c) {
                '{' => {
                    state = State.Start;
                    start_index = i;
                },
                '}' => {
                    if (!formatValue(context, args[next_arg], output)) return;
                    next_arg += 1;
                    state = State.Start;
                    start_index = i + 1;
                },
                else => @compileError("Unknown format character: " ++ c),
            },
            State.CloseBrace => switch (c) {
                '}' => {
                    state = State.Start;
                    start_index = i;
                },
                else => @compileError("Single '}' encountered in format string"),
            },
        }
    }
    comptime {
        if (args.len != next_arg) {
            @compileError("Unused arguments");
        }
        if (state != State.Start) {
            @compileError("Incomplete format string: " ++ format);
        }
    }
    if (start_index < format.len) {
        if (!output(context, format[start_index...format.len])) return;
    }
}

fn formatValue(context: var, value: var, output: fn([]const u8)->bool) -> bool {
    const T = @typeOf(value);
    if (@isInteger(T)) {
        return formatInt(context, output, value);
    } else if (@isFloat(T)) {
        return formatFloat(context, output, value);
    } else if (@canImplicitCast([]const u8, value)) {
        const casted_value = ([]const u8)(value);
        return output(context, casted_value);
    } else if (T == void) {
        return output(context, "void");
    } else {
        @compileError("Unable to format type '" ++ @typeName(T) ++ "'");
    }
}

const Context = struct {
    result: %void,
    self: &Self,
};

fn writeBytes(context: &Context, result: &%void, buf: []const u8) -> bool {
    context.self.write(buf) %% |err| {
        context.result = err;
        return false
    };
    return true;
}

pub fn printf(self: &Self, comptime fmt: []const u8, args: ...) -> %void {
    var context = Context {
        .result = {},
        .self = self,
    };
    format(&context, writeBytes, fmt, args);
    %return context.result;
    %return self.flush();
}

This is more explicit - requiring the definition to explicitly support a
context - and it has none of the requirements of the first example (except completing var args a little more #77 (comment)).

One downside is that it generates wasteful template instantiations. But
since most will use a pointer which is passed verbatim, we should be
able to avoid most of the waste when we can detect that these
instantiations can codegen to the same thing.

For completeness, the kernel log example with the 2nd proposal:

fn kernelLogBytesIgnoreVoidParam(nothing: void, buf: []const u8) -> bool {
    return kernelLogBytes(buf);
}
pub fn kernelLog(comptime fmt: []const u8, args: ...) {
    format({}, fmt, args, kernelLogBytesIgnoreVoidParam);
}

The text was updated successfully, but these errors were encountered:

andrewrk · 2017-02-06T21:24:38Z

Another thing I wanted to point out is that this demonstrates a use case for the error type being used somewhere that is not the return value of a function, which contradicts #83.

andrewrk added the enhancement Solving this issue will likely involve adding new logic or components to the codebase. label Feb 6, 2017

andrewrk added this to the 0.1.0 milestone Feb 6, 2017

thejoshwolfe mentioned this issue Feb 7, 2017

multiple expression return values, error type redesign, introduction of copyable property of types #83

Closed

andrewrk closed this as completed in 47f267d Mar 10, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

use case: generalize the format string mechanism so code other than printf can use it #250

use case: generalize the format string mechanism so code other than printf can use it #250

andrewrk commented Feb 6, 2017 •

edited by thejoshwolfe

Loading

andrewrk commented Feb 6, 2017

use case: generalize the format string mechanism so code other than printf can use it #250

use case: generalize the format string mechanism so code other than printf can use it #250

Comments

andrewrk commented Feb 6, 2017 • edited by thejoshwolfe Loading

andrewrk commented Feb 6, 2017

andrewrk commented Feb 6, 2017 •

edited by thejoshwolfe

Loading