Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Categorical Performs > x10 worse than Matlab (Octave) #25

Open
idontgetoutmuch opened this issue Aug 23, 2014 · 2 comments
Open

Categorical Performs > x10 worse than Matlab (Octave) #25

idontgetoutmuch opened this issue Aug 23, 2014 · 2 comments

Comments

@idontgetoutmuch
Copy link
Member

{-# OPTIONS_GHC -Wall                      #-}
{-# OPTIONS_GHC -fno-warn-name-shadowing   #-}
{-# OPTIONS_GHC -fno-warn-type-defaults    #-}

import Data.Random.Source.PureMT
import Data.Random
import Data.Random.Distribution.Categorical

import Control.Monad.State

testCategorical :: [Double] -> Int -> RVar [Int]
testCategorical ws n = replicateM (fromIntegral n) $
                       categorical (zip ws [0..(fromIntegral (length ws - 1))])

n :: Int
n = 1000000

main :: IO ()
main =
  print $
    sum $
    evalState (sample (testCategorical (replicate n (recip (fromIntegral n))) (fromIntegral n)))
              (pureMT 2)
~/Dropbox/Private/Multinomial $ ./Multinomial +RTS -s
500118181962
   2,145,547,624 bytes allocated in the heap
     625,841,120 bytes copied during GC
     103,975,464 bytes maximum residency (11 sample(s))
       3,574,920 bytes maximum slop
             259 MB total memory in use (0 MB lost due to fragmentation)

                                    Tot time (elapsed)  Avg pause  Max pause
  Gen  0      4082 colls,     0 par    0.44s    0.45s     0.0001s    0.0064s
  Gen  1        11 colls,     0 par    0.26s    0.37s     0.0334s    0.0879s

  INIT    time    0.00s  (  0.00s elapsed)
  MUT     time    2.07s  (  2.07s elapsed)
  GC      time    0.70s  (  0.81s elapsed)
  EXIT    time    0.00s  (  0.02s elapsed)
  Total   time    2.78s  (  2.90s elapsed)

  %GC     time      25.3%  (28.1% elapsed)

  Alloc rate    1,035,723,896 bytes per MUT second

  Productivity  74.7% of total user, 71.5% of total elapsed

The same in Octave

num_particles = 1000000;
likelihood = zeros(num_particles,1);
likelihood(:,1) = 1/num_particles;

tic ();
[ns,index] = histc(rand(num_particles,1),[0;cumsum(likelihood/sum(likelihood))]);

s = sum(index);
toc ()
octave> num_particles = 1000000;
octave> likelihood = zeros(num_particles,1);
octave> likelihood(:,1) = 1/num_particles;
octave> 
octave> tic ();
octave> [ns,index] = histc(rand(num_particles,1),[0;cumsum(likelihood/sum(likelihood))]);
octave> 
octave> s = sum(index);
octave> toc ()
Elapsed time is 0.247516 seconds.
@idontgetoutmuch
Copy link
Member Author

So this might not be a problem just with Categorical.

{-# OPTIONS_GHC -Wall                      #-}
{-# OPTIONS_GHC -fno-warn-name-shadowing   #-}
{-# OPTIONS_GHC -fno-warn-type-defaults    #-}

import Data.Random.Source.PureMT
import Data.Random
import Data.Random.Distribution.Categorical
import Data.Random.Distribution.T

import Control.Monad.State

testUniform :: MonadRandom m => Int -> m [Double]
testUniform n = replicateM (fromIntegral n) (sample stdUniform)

n :: Int
n = 10^7

us :: [Double]
us = evalState (testUniform n) (pureMT $ fromIntegral arbSeed)

mean :: Double
mean = sum us / fromIntegral n

main :: IO ()
main = do
  print mean
~/Dropbox/Private/Multinomial $ ./Multinomial +RTS -s
0.4999607889729769
   5,122,925,704 bytes allocated in the heap
       1,213,648 bytes copied during GC
          44,312 bytes maximum residency (2 sample(s))
          21,224 bytes maximum slop
               1 MB total memory in use (0 MB lost due to fragmentation)

                                    Tot time (elapsed)  Avg pause  Max pause
  Gen  0     10015 colls,     0 par    0.03s    0.03s     0.0000s    0.0000s
  Gen  1         2 colls,     0 par    0.00s    0.00s     0.0001s    0.0002s

  INIT    time    0.00s  (  0.00s elapsed)
  MUT     time    1.15s  (  1.16s elapsed)
  GC      time    0.03s  (  0.03s elapsed)
  EXIT    time    0.00s  (  0.00s elapsed)
  Total   time    1.18s  (  1.20s elapsed)

  %GC     time       2.3%  (2.8% elapsed)

  Alloc rate    4,452,445,320 bytes per MUT second

  Productivity  97.7% of total user, 96.1% of total elapsed

Here's the matlab equivalent.

num_particles = 10^7;

tic();
s = sum(rand(num_particles,1));
toc()
octave> num_particles = 10^7;
octave> tic();
octave> s = sum(rand(num_particles,1));
octave> toc()
Elapsed time is 0.149504 seconds.

According to http://octave.sourceforge.net/octave/function/rand.html, octave uses the Mersenne Twister with a period of 2^19937-1. I thought that was what random-fu also used. So my next step will be to look at the random number generator and see if that is causing the 10x slowdown.

@idontgetoutmuch
Copy link
Member Author

I've spent a bit of time investigating mwc-random see: haskell/mwc-random#35 and I can generate uniform samples about x10 faster but with what appears to be a space leak.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant