Skip to content

Speed Comparisons

okamsn edited this page Aug 21, 2021 · 5 revisions

NOTE: As Loopy develops, these examples and results might become out of date. For the most up-to-date information, refer to this package’s documentation, which is installed with the package as the Info file loopy.

Additionally, this information can/should probably be expanded to compare to several other approaches.


This file contains benchmarks comparing cl-loop, loopy, some of the built-in C functions like mapcar and mapcan, some of CL Lib’s functions like cl-mapcar and cl-mapcan, and some of Dash’s functions like -map and -mapcat.

In general, for simple cases, they can be sorted from fastest to slowest as

  1. C functions
  2. Dash’s functions (usually)
  3. loopy and cl-loop.

The C functions are fastest because they are in C, and Dash’s functions are faster because they try to be pure and side-effect free (which allows for better optimizations). loopy tries to follow the same logic as cl-loop, so they are usually about the same speed.

loopy and cl-loop work best for more procedural (as opposed to functional) approaches, but come with some overhead for being good at that case. If you’re working with simple functions, it might be best to use the built-in functions or Dash. If working with complex functions or special forms like dolist or dotimes, your code might be easier to write with loopy or cl-loop.

Table of Contents

Benchmark Process

loopy tries to follow the same logic as cl-loop, but in being more flexible in some cases, it cannot always implement concepts as efficiently as cl-loop. However, for general cases, they should be more or less the same.

These comparisons are made using macros listed in user Alphapapa’s useful Emacs Package Dev Handbook. For completeness, they are copied below.

(require 'cl-lib)

;;;###autoload
(cl-defmacro bench (&optional (times 100000) &rest body)
  "Call `benchmark-run-compiled' on BODY with TIMES iterations, returning list suitable for Org source block evaluation.
Garbage is collected before calling `benchmark-run-compiled' to
avoid counting existing garbage which needs collection."
  (declare (indent defun))
  `(progn
     (garbage-collect)
     (list '("Total runtime" "# of GCs" "Total GC runtime")
           'hline
           (benchmark-run-compiled ,times
             (progn
               ,@body)))))

;; TODO: Consider not using `-on' here to avoid the extra dependency.
(require 'dash-functional)

;;;###autoload
(cl-defmacro bench-multi (&key (times 1) forms ensure-equal raw)
  "Return Org table as a list with benchmark results for FORMS.
Runs FORMS with `benchmark-run-compiled' for TIMES iterations.

When ENSURE-EQUAL is non-nil, the results of FORMS are compared,
and an error is raised if they aren't `equal'. If the results are
sequences, the difference between them is shown with
`seq-difference'.

When RAW is non-nil, the raw results from
`benchmark-run-compiled' are returned instead of an Org table
list.

If the first element of a form is a string, it's used as the
form's description in the bench-multi-results; otherwise, forms
are numbered from 0.

Before each form is run, `garbage-collect' is called."
  ;; MAYBE: Since `bench-multi-lexical' byte-compiles the file, I'm not sure if
  ;; `benchmark-run-compiled' is necessary over `benchmark-run', or if it matters.
  (declare (indent defun))
  (let*((keys (gensym "keys"))
        (result-times (gensym "result-times"))
        (header '(("Form" "x faster than next" "Total runtime" "# of GCs" "Total GC runtime")
                  hline))
        ;; Copy forms so that a subsequent call of the macro will get the original forms.
        (forms (cl-copy-list forms))
        (descriptions (cl-loop for form in forms
                               for i from 0
                               collect (if (stringp (car form))
                                           (prog1 (car form)
                                             (setf (nth i forms) (cadr (nth i forms))))
                                         i))))
    `(unwind-protect
         (progn
           (defvar bench-multi-results nil)
           (let* ((bench-multi-results (make-hash-table))
                  (,result-times (sort (list ,@(cl-loop for form in forms
                                                        for i from 0
                                                        for description = (nth i descriptions)
                                                        collect `(progn
                                                                   (garbage-collect)
                                                                   (cons ,description
                                                                         (benchmark-run-compiled ,times
                                                                           ,(if ensure-equal
                                                                                `(puthash ,description ,form bench-multi-results)
                                                                              form))))))
                                       (lambda (a b)
                                         (< (cl-second a) (cl-second b))))))
             ,(when ensure-equal
                `(cl-loop with ,keys = (hash-table-keys bench-multi-results)
                          for i from 0 to (- (length ,keys) 2)
                          unless (equal (gethash (nth i ,keys) bench-multi-results)
                                        (gethash (nth (1+ i) ,keys) bench-multi-results))
                          do (if (sequencep (gethash (car (hash-table-keys bench-multi-results)) bench-multi-results))
                                 (let* ((k1) (k2)
                                        ;; If the difference in one order is nil, try in other order.
                                        (difference (or (setq k1 (nth i ,keys)
                                                              k2 (nth (1+ i) ,keys)
                                                              difference (seq-difference (gethash k1 bench-multi-results)
                                                                                         (gethash k2 bench-multi-results)))
                                                        (setq k1 (nth (1+ i) ,keys)
                                                              k2 (nth i ,keys)
                                                              difference (seq-difference (gethash k1 bench-multi-results)
                                                                                         (gethash k2 bench-multi-results))))))
                                   (user-error "Forms' bench-multi-results not equal: difference (%s - %s): %S"
                                               k1 k2 difference))
                               ;; Not a sequence
                               (user-error "Forms' bench-multi-results not equal: %s:%S %s:%S"
                                           (nth i ,keys) (nth (1+ i) ,keys)
                                           (gethash (nth i ,keys) bench-multi-results)
                                           (gethash (nth (1+ i) ,keys) bench-multi-results)))))
             ;; Add factors to times and return table
             (if ,raw
                 ,result-times
               (append ',header
                       (bench-multi-process-results ,result-times)))))
       (unintern 'bench-multi-results nil))))

(defun bench-multi-process-results (results)
  "Return sorted RESULTS with factors added."
  (setq results (sort results (-on #'< #'cl-second)))
  (cl-loop with length = (length results)
           for i from 0 to (1- length)
           for description = (car (nth i results))
           for factor = (if (< i (1- length))
                            (format "%.2f" (/ (cl-second (nth (1+ i) results))
                                              (cl-second (nth i results))))
                          "slowest")
           collect (append (list description factor)
                           (list (format "%.6f" (cl-second (nth i results)))
                                 (cl-third (nth i results))
                                 (if (> (cl-fourth (nth i results)) 0)
                                     (format "%.6f" (cl-fourth (nth i results)))
                                   0)))))

Benchmark Results

For Clauses

The following clauses don’t have exact equivalents in Loopy, but most can be mimicked by using Emacs’s built-in functions and the list loop command. See the section For Clauses under Translating … for the relevant function.

  • for VAR from EXPR1 to EXPR2 by EXPR3
  • for VAR being the symbols
  • for VAR being the hash-keys
  • for VAR being the hash-values
  • for VAR being the key-codes
  • for VAR being the key-bindings
  • for VAR being the key-bindings
  • for VAR being the key-seqs
  • for VAR being the overlays
  • for VAR being the intervals
  • for VAR being the frames
  • for VAR being the windows
  • for VAR being the buffers
  • for VAR in LIST by FUNCTION:
    (let ((l (number-sequence 1 100)))
      (bench-multi
        :times 10000
        :forms (("cl-loop" (cl-loop for i in l))
                ("loopy" (loopy ((list i l))))
                ;; These shouldn’t be different.
                ("cl-loop with function" (cl-loop for i in l by #'cdr))
                ("loopy with function" (loopy ((list i l #'cdr)))))))
        
    Form x faster than next Total runtime # of GCs Total GC runtime
    cl-loop 1.04 0.027221 0 0
    loopy 1.00 0.028386 0 0
    cl-loop with function 1.00 0.028422 0 0
    loopy with function slowest 0.028553 0 0
  • for VAR on LIST by FUNCTION:
    (let ((l (number-sequence 1 100)))
      (bench-multi
        :times 10000
        :forms (("cl-loop" (cl-loop for i on l))
                ("loopy" (loopy ((cons i l))))
                ;; These shouldn’t be different.
                ("cl-loop with function" (cl-loop for i on l by #'cdr))
                ("loopy with function" (loopy ((cons i l #'cdr)))))))
        
    Form x faster than next Total runtime # of GCs Total GC runtime
    loopy 1.02 0.017559 0 0
    loopy with function 1.00 0.017893 0 0
    cl-loop with function 1.01 0.017907 0 0
    cl-loop slowest 0.018028 0 0
  • for VAR in-ref LIST by FUNCTION:
    (let ((l (number-sequence 1 100)))
      (bench-multi
        :times 10000
        :forms (("cl-loop" (cl-loop for i in-ref l))
                ("loopy" (loopy ((list-ref i l))))
                ;; These shouldn’t be different.
                ("cl-loop with function" (cl-loop for i in-ref l by #'cdr))
                ("loopy with function" (loopy ((list-ref i l #'cdr)))))))
        
    Form x faster than next Total runtime # of GCs Total GC runtime
    cl-loop 1.00 0.017619 0 0
    loopy 1.04 0.017678 0 0
    loopy with function 1.01 0.018400 0 0
    cl-loop with function slowest 0.018612 0 0
  • for VAR across ARRAY:
    (let ((a (make-vector 1000 1)))
      (bench-multi
        :times 10000
        :forms (("cl-loop" (cl-loop for i across a))
                ("loopy" (loopy ((array i a)))))))
        
    Form x faster than next Total runtime # of GCs Total GC runtime
    loopy 1.08 0.500913 0 0
    cl-loop slowest 0.541114 0 0
  • for VAR across ARRAY:
    (let ((a (make-vector 1000 1)))
      (bench-multi
        :times 10000
        :forms (("cl-loop" (cl-loop for i across-ref a))
                ("loopy" (loopy ((array-ref i a)))))))
        
    Form x faster than next Total runtime # of GCs Total GC runtime
    loopy 1.16 0.342828 0 0
    cl-loop slowest 0.397609 0 0
  • for VAR being the elements of SEQUENCE:
    (let ((a (make-vector 1000 1))
          (l (make-list 1000 1)))
      (bench-multi
        :times 10000
        :forms (("cl-loop array" (cl-loop for i being the elements of a))
                ("loopy array" (loopy ((seq i a))))
                ("cl-loop list" (cl-loop for i being the elements of l))
                ("loopy list" (loopy ((seq i l)))))))
        
    Form x faster than next Total runtime # of GCs Total GC runtime
    cl-loop list 1.01 0.667158 0 0
    loopy list 1.25 0.674926 0 0
    loopy array 1.01 0.845188 0 0
    cl-loop array slowest 0.852027 0 0
  • for VAR being the elements of-ref SEQUENCE:
    (let ((a (make-vector 1000 1))
          (l (make-list 1000 1)))
      (bench-multi
        :times 10000
        :forms (("cl-loop array" (cl-loop for i being the elements of-ref a))
                ("loopy array" (loopy ((seq-ref i a))))
                ("cl-loop list" (cl-loop for i being the elements of-ref l))
                ("loopy list" (loopy ((seq-ref i l)))))))
        
    Form x faster than next Total runtime # of GCs Total GC runtime
    loopy array 1.00 0.291586 0 0
    cl-loop array 1.04 0.292248 0 0
    cl-loop list 1.02 0.302581 0 0
    loopy list slowest 0.309345 0 0
  • for VAR = EXPR1 then EXPR2:
    (bench-multi
      :times 10000
      :forms (("cl-loop" (cl-loop for i = 0 then (1+ i)
                                  repeat 100))
              ("loopy" (loopy ((expr i 0 (1+ i))
                               (repeat 100))))))
        
    Form x faster than next Total runtime # of GCs Total GC runtime
    cl-loop 1.02 0.005220 0 0
    loopy slowest 0.005346 0 0

Iteration Clauses

The clauses always, never, thereis, and iter-by are currently not implemented in Loopy, but their usages can be matched with existing (though currently less convenient) loop commands.

  • repeat:
    (bench-multi
      :times 1000
      :forms (("cl-loop" (cl-loop repeat 100))   ; Uses `>=' and `1-'.
              ("loopy" (loopy ((repeat 100)))))) ; Uses `<' and `1+'.
        
    Form x faster than next Total runtime # of GCs Total GC runtime
    cl-loop 1.03 0.002867 0 0
    loopy slowest 0.002954 0 0
  • while:

    while conditions in loopy are checked separately from the loops normal continuance conditions in an if form. In cl-loop, they in the same code section that sets variables in an and form. This is probably the cause of the speed difference.

    (bench-multi
      :times 10000
      :forms (("cl-loop v1" (cl-loop while t repeat 100))
              ("loopy v1" (loopy ((while t) (repeat 100))))
              ;; Try opposite order.
              ("cl-loop v2" (cl-loop repeat 100 while t))
              ("loopy v2" (loopy ((repeat 100) (while t))))))
        
    Form x faster than next Total runtime # of GCs Total GC runtime
    cl-loop v2 1.05 0.028047 0 0
    cl-loop v1 1.11 0.029580 0 0
    loopy v2 1.03 0.032949 0 0
    loopy v1 slowest 0.034065 0 0
  • until:
    (bench-multi
      :times 10000
      :forms (("cl-loop v1" (cl-loop until nil repeat 100))
              ("loopy v1" (loopy ((until nil) (repeat 100))))
              ;; Try opposite order.
              ("cl-loop v2" (cl-loop repeat 100 until nil))
              ("loopy v2" (loopy ((repeat 100) (until nil))))))
        
    Form x faster than next Total runtime # of GCs Total GC runtime
    cl-loop v2 1.02 0.027976 0 0
    cl-loop v1 1.14 0.028576 0 0
    loopy v2 1.01 0.032519 0 0
    loopy v1 slowest 0.032985 0 0

Accumulation Clauses

cl-loop uses special considerations for append, collect, and nconc using combinations append, nconc, push, nreverse, and reverrse depending on how variables are being declared and used. loopy is currently much simpler, such as using push and nreverse for collect when no explicitly variable is given (as in (collect some-val)) and using append otherwise.

You should note that naming the variables into which you append, collect, or nconc can be much slower, since the macros can no longer use tricks with reverse and nreverse. If they did, you might try to access the accumulation variable before it was reversed.

  • collect:
    (let ((l (number-sequence 1 1000)))
      (bench-multi
        :times 10000
        :forms (("cl-loop" (cl-loop for i in l
                                    collect (* (1+ i) 2)))
                ("loopy" (loopy ((list i l)
                                 (collect (* (1+ i) 2)))))
                ("mapcar" (mapcar (lambda (i) (* (1+ i) 2)) l))
                ("cl-mapcar" (cl-mapcar (lambda (i) (* (1+ i) 2)) l))
                ("dash" (-map (lambda (i) (* (1+ i) 2)) l))
                ("anaphoric dash" (--map (* (1+ it) 2) l))
                ("dolist" (let ((result))
                            (dolist (i l (nreverse result))
                              (push (* (1+ i) 2) result)))))))
        
    Form x faster than next Total runtime # of GCs Total GC runtime
    dolist 1.27 0.669873 0 0
    cl-loop 1.05 0.849984 0 0
    loopy 1.09 0.896564 0 0
    mapcar 1.02 0.974399 0 0
    cl-mapcar 1.01 0.989138 0 0
    anaphoric dash 1.43 0.995825 0 0
    dash slowest 1.425671 0 0
  • append and nconc:
    (let ((l (number-sequence 1 1000)))
      (bench-multi
        :times 10000
        :forms (("cl-loop append" (cl-loop for i in l
                                           append (list (* (1+ i) 2)
                                                        (* (1+ i) 3))))
                ("cl-loop nconc" (cl-loop for i in l
                                          nconc (list (* (1+ i) 2)
                                                      (* (1+ i) 3))))
                ("loopy append" (loopy ((list i l)
                                        (append (list (* (1+ i) 2)
                                                      (* (1+ i) 3))))))
                ("loopy nconc" (loopy ((list i l)
                                       (nconc (list (* (1+ i) 2)
                                                    (* (1+ i) 3))))))
                ("mapcan" (mapcan (lambda (i) (list (* (1+ i) 2)
                                                    (* (1+ i) 3)))
                                  l)) ; C function.
                ("cl-mapcan" (cl-mapcan (lambda (i) (list (* (1+ i) 2)
                                                          (* (1+ i) 3)))
                                        l))
                ("dash" (-mapcat (lambda (i) (list (* (1+ i) 2)
                                                   (* (1+ i) 3)))
                                 l))
                ("anaphoric dash" (--mapcat (list (list (* (1+ it) 2)
                                                        (* (1+ it) 3)))
                                            l)))))
        
    Form x faster than next Total runtime # of GCs Total GC runtime
    mapcan 1.01 1.554805 0 0
    cl-mapcan 1.10 1.576170 0 0
    cl-loop nconc 1.01 1.733220 0 0
    loopy nconc 1.12 1.758132 0 0
    anaphoric dash 1.02 1.966069 0 0
    cl-loop append 1.01 2.014755 0 0
    loopy append 2.58 2.034501 0 0
    dash slowest 5.250537 1 0.432886
  • concat:

    This difference might be because cl-loop uses cl-callf, while loopy currently uses concat and setq in all cases.

    seq-mapcat and mapconcat might be much faster because they only call concat once after mapping across all values. On the other hand, loopy and cl-loop currently call concat for each iteration. On the other hand, cl-loop calls concat for each iteration, as does loopy if the accumulation variable is explicitly named.

    (let ((l (make-list 100 (make-string 100 ?a))))
      (bench-multi
        :times 1000
        :forms (("cl-loop" (cl-loop for i in l concat (capitalize i)))
                ;; Here, `loopy' calls `concat' once:
                ("loopy implicit" (loopy (list i l) (concat (capitalize i))))
                ;; Here, `loopy', calls `concat' each iteration:
                ("loopy explicit" (loopy (list i l) (concat my-str (capitalize i))))
                ("mapconcat" (mapconcat #'capitalize l "")) ; C function.
                ("concat" (apply #'concat (mapcar #'capitalize l)))
                ;; `cl-concatenate' is alias of `seq-concatenate'.
                ("cl-concatenate" (apply #'cl-concatenate 'string
                                         (mapcar #'capitalize l)))
                ("seq-macpcat" (seq-mapcat #'capitalize l 'string)))))
        
    Form x faster than next Total runtime # of GCs Total GC runtime
    concat 1.01 0.235516 0 0
    cl-concatenate 1.01 0.237146 0 0
    seq-macpcat 1.00 0.239688 0 0
    mapconcat 1.01 0.240340 0 0
    loopy implicit 3.28 0.243335 0 0
    loopy explicit 1.01 0.797231 0 0
    cl-loop slowest 0.803007 0 0
  • vconcat:

    As with concat above, cl-loop calls concat for each iteration, while other functions create a list of sequences and then call concat once.

    (let* ((l (make-list 100 (number-sequence 1 1000))))
      (bench-multi
        :times 1000
        :forms (("cl-loop" (cl-loop for i in l
                                    vconcat (cdr i)))
                ("loopy implicit" (loopy (list i l) (vconcat (cdr i))))
                ("loopy explicit" (loopy (list i l) (vconcat my-vect (cdr i))))
                ("vconcat" (apply #'vconcat (mapcar #'cdr l)))
                ("cl-concatenate" (apply #'cl-concatenate 'vector
                                         (mapcar #'cdr l)))
                ;; This concatenates once after the `cdr' of each list is gotten.
                ("seq-macpcat" (seq-mapcat #'cdr l 'vector)))))
        
    Form x faster than next Total runtime # of GCs Total GC runtime
    vconcat 1.00 0.425114 0 0
    seq-macpcat 1.00 0.425321 0 0
    cl-concatenate 1.00 0.426510 0 0
    loopy implicit 61.55 0.427345 0 0
    loopy explicit 1.00 26.303019 37 4.140312
    cl-loop slowest 26.337261 37 4.138096
  • count:
    (let* ((l (mapcar #'cl-evenp (number-sequence 1 1000))))
      (bench-multi
        :times 10000
        :forms (("cl-loop" (cl-loop for i in l
                                    count i))
                ("loopy" (loopy ((list i l)
                                 (count i))))
                ("cl-count" (cl-count t l))
                ("cl-count-if" (cl-count-if #'identity l))
                ("seq-count" (seq-count #'identity l))
                ("-count" (-count #'identity l)))))
        
    Form x faster than next Total runtime # of GCs Total GC runtime
    -count 1.55 0.238401 0 0
    cl-loop 1.03 0.369262 0 0
    loopy 1.03 0.378889 0 0
    cl-count 1.65 0.389203 0 0
    cl-count-if 1.19 0.641331 0 0
    seq-count slowest 0.762337 0 0
  • sum:
    (let* ((l (number-sequence 1 1000)))
      (bench-multi
        :times 10000
        :forms (("cl-loop" (cl-loop for i in l
                                    sum (1+ i)))
                ("loopy" (loopy ((list i l)
                                 (sum (1+ i)))))
                ("mapcar and apply" (apply #'+ (mapcar #'1+ l)))
                ("cl-reduce and apply" (cl-reduce #'+ (mapcar #'1+ l)))
                ("-reduce and apply" (-reduce #'+ (mapcar #'1+ l))))))
        
    Form x faster than next Total runtime # of GCs Total GC runtime
    mapcar and apply 1.31 0.432464 0 0
    loopy 1.05 0.565446 0 0
    cl-loop 1.18 0.591978 0 0
    -reduce and apply 1.65 0.699388 0 0
    cl-reduce and apply slowest 1.153148 0 0
  • maximize:
    (let* ((l (number-sequence 1 1000)))
      (bench-multi
        :times 10000
        :forms (("cl-loop" (cl-loop for i in l
                                    maximize (1- i)))
                ("loopy" (loopy ((list i l)
                                 (max (1- i)))))
                ("mapcar and apply" (apply #'max (mapcar #'1- l)))
                ("mapcar and cl-reduce" (cl-reduce #'max (mapcar #'1- l)))
                ("mapcar and -reduce" (-reduce #'max (mapcar #'1- l))))))
        
    Form x faster than next Total runtime # of GCs Total GC runtime
    mapcar and apply 1.17 0.505320 0 0
    loopy 1.29 0.591463 0 0
    mapcar and -reduce 1.35 0.761307 0 0
    cl-loop 1.20 1.028224 0 0
    mapcar and cl-reduce slowest 1.233562 0 0
  • minimize:
    (let* ((l (number-sequence 1 1000)))
      (bench-multi
        :times 10000
        :forms (("cl-loop" (cl-loop for i in l
                                    minimize (1- i)))
                ("loopy" (loopy ((list i l)
                                 (min (1- i)))))
                ("mapcar and apply" (apply #'min (mapcar #'1- l)))
                ("mapcar and cl-reduce" (cl-reduce #'min (mapcar #'1- l)))
                ("mapcar and -reduce" (-reduce #'min (mapcar #'1- l))))))
        
    Form x faster than next Total runtime # of GCs Total GC runtime
    mapcar and apply 1.21 0.491284 0 0
    loopy 1.24 0.595057 0 0
    mapcar and -reduce 1.43 0.737828 0 0
    cl-loop 1.13 1.051751 0 0
    mapcar and cl-reduce slowest 1.193562 0 0

Destructuring

(let ((alist (cl-mapcar #'cons
                        (number-sequence 0 1000)
                        (number-sequence 2000 3000))))
  (bench-multi
    :times 10000
    :forms (("cl-loop" (cl-loop for (i . j) in alist
                                collect (+ i j)))
            ("loopy" (loopy ((list (i . j) alist)
                             (collect (+ i j)))))
            ("pcase-lambda" (mapcar (pcase-lambda (`(,i . ,j)) (+ i j)) alist))
            ("-lambda" (mapcar (-lambda ((i . j)) (+ i j)) alist)))))
Form x faster than next Total runtime # of GCs Total GC runtime
cl-loop 1.12 0.790794 0 0
loopy 2.08 0.882744 0 0
-lambda 1.13 1.835495 0 0
pcase-lambda slowest 2.081928 0 0
Clone this wiki locally