Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

would it be possible to reduce the dependencies of core_bench? #18

Open
gasche opened this issue Jul 11, 2020 · 10 comments
Open

would it be possible to reduce the dependencies of core_bench? #18

gasche opened this issue Jul 11, 2020 · 10 comments
Labels
forwarded-to-js-devs This report has been forwarded to Jane Street's internal review system.

Comments

@gasche
Copy link

gasche commented Jul 11, 2020

I would like to use the faster-map benchmark to test ongoing work on tail-recursion-modulo-cons ( ocaml/ocaml#181 ); this means creating an empty switch from an experimental compiler, then building faster-map's dependencies.

The dependencies are as follows: dune batteries containers core_bench. Building this from an empty switch is fairly slow on my machine, and most of the time is spent compiling dependencies of core_bench:

$ time opam install dune
real	0m23.240s
user	0m37.759s
sys	0m10.028s

$ time opam install batteries  # deps: ocamlbuild, ocamlfind, num
real	0m34.083s
user	0m28.428s
sys	0m9.122s

$ time opam install containers # deps: dune-private-libs, dune-configurator
real	0m49.980s
user	0m32.759s
sys	0m37.364s

$ time opam install core_bench # deps: core, ppx_jane, many others
real	6m48.733s
user	11m16.617s
sys	4m7.479s

Would there be interest among core_bench maintainers to make it more pleasant to use for compiler-optimization microbenchmarking? It may not be that much work to reduce the dependency surface. (I understand that you need access to fine-grained timers, but for example it is not clear why core_bench depends on ppx_jane and many ppx extensions.)

(cc @ksromanov who kindly updated faster-map for TRMC testing.)

@gasche gasche changed the title would be be possible to reduce the dependencies of core_bench? would it be possible to reduce the dependencies of core_bench? Jul 11, 2020
@github-iron github-iron added the forwarded-to-js-devs This report has been forwarded to Jane Street's internal review system. label Jul 14, 2020
@yminsky
Copy link

yminsky commented Jul 14, 2020

It's possible, for sure, but it's a real amount of work. Core_bench depends on Core, as well as command, and it's tricky to get all the functionality in the command-line tool, including the autocompletion support, without depending on Core, and at least Core_kernel. Also, a lot of the included libraries are also packaged together. i.e., ansi_kernel, which is on the dependency table for creating pretty tables full of results, is packaged together with core_kernel, which means that even if we move its dependency from core_kernel to base, requesting it in opam will still build core_kernel for you. I think a story for doing this would probably require:

  • Factoring out a kernel library that doesn't include command-line support
  • Moving a bunch of dependencies to core_kernel, and maybe base if we can swing it
  • Moving a bunch of packages around so that they're not in core_kernel

We've left a lot of packages in core_kernel that aren't strictly necessary there because we didn't want to overwhelm opam with lots of tiny little packages. There's around 35 little libraries in core_kernel's opam package that aren't part of core_kernel proper. The downside of this is that it means that building core_kernel is a lot slower than it would otherwise need to be. Maybe this is something that we could fix more easily?

By the way, I suspect you'll find that most of the extra time is the building of Core and Core_kernel, and not the PPXs.

How urgent is a fix here? The earliest time that I think you'd get a useful version of the fix is at our next stable release anyway. Am I correct in assuming you're not using the bleeding edge version?

@gasche
Copy link
Author

gasche commented Jul 14, 2020

Thanks for the quick reply!

There is nothing urgent. It's more about making something nice that is a bit inconvenient right now. Maybe it would be worth asking your compiler-hacking people if they would also benefit from such a change. (It may be that other people felt the need but did not bother reporting it.)

Having looked at Core_bench sources briefly recently, I realize that there are a lot of features in there that require external-libraries support (for example export to various formats, etc.). Going along your reply: maybe rather than trying to slim down core_bench with its current feature set, it would be easier to carve out the core logic we typically use for barebones microbenchmarking (precise timing (already a separate dependency), benchmark estimation, running benchmarks and minimal reporting) in a separate package (base_bench ? :-). It would only offer a more restricted feature set, on top of which core_bench could come with all bells and whistles.

@yminsky
Copy link

yminsky commented Jul 14, 2020

Yeah, that's essentially what I meant when I talked about breaking out a "kernel" library. I guess an open question is what features would actually be required to make such a library useful. Notably, how much of the pretty tabular reporting would one be willing to jettison?

@gasche
Copy link
Author

gasche commented Jul 14, 2020

Here is a pretty table produced by core_bench:

  Name                     Time/Run     mWd/Run   mjWd/Run   Prom/Run   Percentage  
 ---------------------- ------------ ----------- ---------- ---------- ------------ 
  stdlib                 1_350.65ns     536.00w      1.62w      1.62w       82.06%  
  naive-tail-recursive   1_641.85ns   1_072.00w      3.80w      3.80w       99.75%  
  trmc                   1_305.17ns     536.00w      2.17w      2.17w       79.30%  

I can reproduce this output exactly with the following code:

type justification = Left | Right 

type table = {
  arity: int;
  justif: justification array;
  header: string array;
  rows: string array list;
}

let max_width ~column table =
  List.fold_left
    (fun w r -> max w (String.length r.(column)))
    (String.length table.header.(column)) table.rows

let to_pretty_string table =
  let widths = Array.init table.arity (fun i -> max_width ~column:i table) in
  (* each column is printed with a space on both sides,
     so the apparent width is widths.(i) + 2 *)
  let buf = Buffer.create 42 in
  let printf fmt = Printf.bprintf buf fmt in
  let print_text ~column:i s =
    printf
      (match table.justif.(i) with
       | Right -> " %.*s "
       | Left -> " %-.*s ")
      widths.(i) s in
  let print_row row =
    row |> Array.iteri (fun i s ->
      printf " ";
      print_text ~column:i s);
    printf " \n";
  in
  let print_header_line () =
    for i = 0 to table.arity - 1 do
      printf " %s" (String.make (1 + widths.(i) + 1) '-');
    done;
    printf " \n";
  in
  print_row table.header;
  print_header_line ();
  List.iter print_row table.rows;
  Buffer.contents buf

@ghost
Copy link

ghost commented Jul 14, 2020

@gasche I expect that you can re-implement the whole library and provide the same set of features with a fraction of the overall amount of code required to build core_bench. In fact, it is the same story as when you compare dune and jenga; jenga is big (29MB) and takes long to build (2m), while dune is much smaller (9MB) and faster to compile (8s). Yet, they do pretty much the same thing. However, keeping dune lean requires much more work and sometimes duplication of effort, and we can't realistically apply the same policy to all our packages. We only do it for really central packages such as dune or ppxlib.

For the rest, it's not that we are not interested in fast compilation times or small binaries, in fact we are putting a lot of effort into both, however we are mostly interested in solutions we can apply systematically. Duplicating a functionality already provided by another library feels a bit too ad-hoc, even if it is (initially) not that much code. It would strictly increase the amount of code we have to maintain, which doesn't feel right.

The right way to go about all this is to first understand where the time is being spent. 6m to build core_bench feels pretty high, even considering the amount of dependencies. We have been working on assembling a buildable repository that contains all our code:
https://github.com/janestreet/universe

It's not yet quite ready, but soon you'll be able to go there and run dune build core_bench/core_bench.install, which will build the minimum amount of code to build the core_bench package. Comparing the timing of this command with opam install core_bench will give us a good idea of how fast things could be in an ideal world, and then we can work towards that.

@ghost
Copy link

ghost commented Jul 14, 2020

Actually, I just gave it a try. You just have to clone ppxlib and it works:

$ git clone github.com/janestreet/universe
$ cd universe
$ opam source ppxlib.0.13.0
$ time dune buid core_bench/core_bench.install

The last command takes just a bit less that 1 minute on the machine I'm using. By comparison, "opam install core_bench" with everything pre-downloaded takes 2m42s. So we could win a factor of almost 3 simply by making the tooling more clever.

@ghost
Copy link

ghost commented Jul 15, 2020

Another data point, I did the following:

$ cat >dune <<EOF
(alias
 (name default)
 (deps (package core_bench)))
EOF
$ time dune build --profile release

I'll skip the details, but essentially it is building exactly the same thing as "opam install core_bench". Including all the non-necessary code in core/core_kernel. The last command takes 1m15s for me, compared to the 2m42s with opam.

So simply by changing the way opam build and install things, we could cut installation times by more than 2.

@avsm mentioned the idea that rather than building and installing packages one by one, opam could build them all at once with a single dune invocation. Such a change would be a big improvement here.

@ceastlund
Copy link
Member

I have updated core_bench in our internal repo to drop most of the ppx dependencies, except for the few it actually uses. The public release process should push out the change before long.

I didn't clean up any other dependencies - I think the biggest source left is core. Currently all our time representations come from core, so until we separate those I think we're not going to do much better.

@gasche
Copy link
Author

gasche commented Sep 3, 2021

Thanks!

@aalekseyev
Copy link
Contributor

aalekseyev commented Oct 25, 2021

The change @ceastlund mentions landed in f319e14. I'm not sure if this "closes" the issue as there seems to be a lot of ideas being discussed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
forwarded-to-js-devs This report has been forwarded to Jane Street's internal review system.
Projects
None yet
Development

No branches or pull requests

5 participants