Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A printf replacement #10610

Closed
simonbyrne opened this issue Mar 23, 2015 · 14 comments
Closed

A printf replacement #10610

simonbyrne opened this issue Mar 23, 2015 · 14 comments
Labels
domain:display and printing Aesthetics and correctness of printed representations of objects. status:help wanted Indicates that a maintainer wants help on an issue or pull request

Comments

@simonbyrne
Copy link
Contributor

Now we have stagedfunctions, I think we could replace @printf with something much more useful and flexible.

Each basic format would be a parametric type, e.g.

abstract AbstractFormat

immutable DecimalFixed{fmt} <: AbstractFormat
end

The parameter would contain the format information. There's a couple of ways we could do this:

  • as a symbol, containing the format string, e.g. DecimalFixed{"+10.2f"}
  • use multiple parameters, e.g. DecimalFixed{10,2,'+','f'} or something like that
  • use a tuple of pairs, e.g. DecimalFixed{(:width=>10,:prec=>2,:plus=2)}

or perhaps something else.

We use stagedfunctions to overload print at compile time,

stagedfunction print{fmt}(io::IO, ::DecimalFixed{fmt}, x::Real)
   # current @printf magic goes here
end

We can glue all this together using an extra type

type CompositeFormat{T} <: AbstractFormat
    formats::T
end

which takes a tuple of either strings or formats as types:

CompositeFormat(("The sum is ",DecimalFormat{symbol("+10.2f")}()))

and we can tie all this together with a nonstandard string macro to generate these things:

print(fmt"The sum is %+10.2f",sum(x))

We can similarly overload string to replace @sprintf, and we could do much more than is possible currently. For example, we could define:

immutable CSVFormat{T<:AbstractFormat} <:AbstractFormat
end

for printing arrays of comma-separated values in a specific format.

As the formats would be objects in their own right, we would solve many of the current problems that currently arise, for example, this one:
https://groups.google.com/d/msg/julia-users/iG6qwZ_GWzA/lSE6iDBdayIJ
could be done using

fmt"%15s - " * fmt"  %7.4f"^17

via appropriately overloading * and ^ as is done with strings.

@JeffBezanson
Copy link
Sponsor Member

I think this would be great --- except perhaps for using staged functions. Using formats determined at run time is currently one of the problems with printf. However, staged functions are not much better at that. You would still incur the overhead of generating and compiling new code. The overhead of a run-time format should at most be either a dynamic dispatch, or the cost of parsing a format string. Both of those are way cheaper than compilation.

All the information in a DecimalFixed can be fields, i.e. DecimalFixed(10,2,'+','f'). That should be just fine for avoiding parsing overhead.

@JeffBezanson
Copy link
Sponsor Member

See also #5866

@simonbyrne
Copy link
Contributor Author

@JeffBezanson My thinking was that people tend to only ever use a small number of formats in any piece of code, so writing fmt" %7.4f"^17 would only generate one DecimalFixed method.

@JeffBezanson
Copy link
Sponsor Member

If we're going to bother overhauling printf, why not get full flexibility, and make it just as easy and fast to change the 7 and 4 at run time?

@simonbyrne
Copy link
Contributor Author

I don't know how much of a performance difference it would make, but @printf does actually generate different code depending on things like width and precision, e.g.

julia/base/printf.jl

Lines 326 to 338 in 67ed8d1

if precision > 0 || '#' in flags
width -= precision+1
end
if '+' in flags || ' ' in flags
width -= 1
if width > 1
padding = :($width-(pt > 0 ? pt : 1))
end
else
if width > 1
padding = :($width-(pt > 0 ? pt : 1)-neg)
end
end

@JeffBezanson
Copy link
Sponsor Member

I know that, and I think it's arguably a problem with the current approach. It's really fast for a fixed static format, but if you want even the tiniest part to be dynamic there's a huge performance cliff.

It would be an interesting experiment to see how much is gained by specializing on everything, vs. just the basic structure of the format. My guess is there's no need to specialize code on the exact number of digits wanted.

@simonbyrne
Copy link
Contributor Author

That's probably true, but I suspect that it might be beneficial to specialise on some things like left or right alignment.

@StefanKarpinski
Copy link
Sponsor Member

One thought I've had bouncing around since I first perpetrated the printf code is to take left/right alignment statements as a prediction that the strings will fit into the allotted space and use that to do less work when printing. You can fill in a "template" buffer with all the stuff that doesn't change from printing to printing, assuming that aligned strings fit, and then only write the parts that need filling in. You could bail out to a slow path in the cases where any of the strings to be printed don't fit. This would be maximally effective when there's more than one field to be formatted. I'm also not sure if this will make a practical difference when doing actual I/O.

On Mar 23, 2015, at 10:15 PM, Simon Byrne notifications@github.com wrote:

That's probably true, but I suspect that it might be beneficial to specialise on some things like left or right alignment.


Reply to this email directly or view it on GitHub.

@vtjnash
Copy link
Sponsor Member

vtjnash commented Mar 24, 2015

If we're going to bother overhauling printf, why not get full flexibility, and make it just as easy and fast to change the 7 and 4 at run time?

the full printf spec allows for that with the * specification: http://linux.die.net/man/3/printf

@ihnorton ihnorton added the domain:io Involving the I/O subsystem: libuv, read, write, etc. label Mar 24, 2015
@simonbyrne
Copy link
Contributor Author

@vtjnash I don't think the * is standard (at least it isn't in the C-2011 standard, I haven't looked in the POSIX standard). Some implementations also support using an apostrophe ' for a thousands separator (i.e. 12,345), which is another common request on the mailing list, but this technically depends on locale.

@mason-bially
Copy link
Contributor

So I was fixing #14331 and improving code coverage for printf.jl and I have some thoughts.

First, the use of DIGITS seems problematic, especially with threading landing soon. It might be nice to have a threadlocal macro (and/or dynamic vars which would also fix the problem) available sooner rather than later.

I'm also not at all sure what the optimization strategy (ahem or documentation) for the printf.jl code is, I think a fixed buffer size computed based off of the type information that should be available in any stable code at compile time would be quite helpful. My thoughts for improving speed in this function actually mirror @StefanKarpinski 's.

But I also don't think printf is worth of inclusion in Julia base, and I think it would fit better in a core package like JuliaLang/Formatting.jl, especially as part of #5155. I see the primary purpose of printf as a compatibility tool (that can potentially run quite fast). Instead include a simple, powerful, extensible system (making full use of julia's strengths) as a default formatting system, I agree with some of the thoughts presented here. People are then free to choose the formatting system they prefer between printf, fmt, markdown and whatever else is out there, depending on their project (porting c vs. python code vs. generating documents) and personal preferences.

I like Julia's usage of string interpolation, I think expanding that with some sort of system for describing how to stringfy the values (say mixed with changes to show?) would solve the problem.

@musm
Copy link
Contributor

musm commented Nov 14, 2016

is it planned to get rid of the printf/sprintf macros in 0.6, i.e. what milestone is this slated for.

@StefanKarpinski
Copy link
Sponsor Member

If someone does it, yes. I may if I have time, but it's looking unlikely given that I'm working on Pkg3.

@StefanKarpinski StefanKarpinski added the status:help wanted Indicates that a maintainer wants help on an issue or pull request label Nov 14, 2016
@JeffBezanson JeffBezanson added domain:display and printing Aesthetics and correctness of printed representations of objects. and removed domain:io Involving the I/O subsystem: libuv, read, write, etc. labels Jan 3, 2017
@quinnj
Copy link
Member

quinnj commented Sep 8, 2020

It turns out I implemented something very similar to @simonbyrne's original proposal in this issue in #32859. But instead of making the format object parametric on the specifiers and all the modifiers, they're only parametric on specifier characters. I think that's a good compromise w/ @JeffBezanson's original concern here since it means we can effectively precompile each format specifier's code once, and it's a fixed set of specifiers, so there's not a real fear of continuous compilation when tweaking different modifiers.

@quinnj quinnj closed this as completed Sep 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain:display and printing Aesthetics and correctness of printed representations of objects. status:help wanted Indicates that a maintainer wants help on an issue or pull request
Projects
None yet
Development

No branches or pull requests

8 participants