-
Notifications
You must be signed in to change notification settings - Fork 536
Known Python Issues
For floating point numbers and given increment numpy's np.arange
is dangerous / numerical unstable (further reading, or #395). Use np.linspace
instead.
Allocation of numpy arrays is much faster using the numpy.empty
instead of the numpy.zeros
function. The by far worst method is to use the numpy.array
function on an list object. For lists the data type must be stored for each element, in contrast to numpy arrays where the data type is stored only once for the whole array. The following ipython
lines show some performance tests. The differences in speed are in the order of 100.
In [1]: import numpy as np
In [2]: N = int(1e7)
In [3]: %timeit np.empty(N)
100000 loops, best of 3: 12.2 µs per loop
In [4]: %timeit np.zeros(N)
10 loops, best of 3: 70.4 ms per loop
In [5]: %timeit x = np.empty(N); x[:] = np.NaN
10 loops, best of 3: 86.1 ms per loop
In [6]: %timeit x = np.array([np.NaN]*N)
10 loops, best of 3: 1.57 s per loop
numpy is designed for vector operations. Thus for each expression it needs to check whether the argument is a vector or not. This consumes time and is one of the reasons that the library math is faster for non-vector arguments.
In [1]: import math as M
In [2]: import numpy as np
In [3]: %timeit M.floor(3.5)
1000000 loops, best of 3: 396 ns per loop
In [4]: %timeit np.floor(3.5)
100000 loops, best of 3: 3.72 us per loop
Note: math cannot handle vectors
- NULL pointers in ctypes are generally of
None
type. However sometimes the C functions need directly given addresses. In such cases a NULL pointer to an e.g. integer must be passed with the constructctypes.POINTER(ctypes.c_int)()
- For cross platform compilation of C code the
#ifdef
statement can be really useful. To see all available variables for#ifdef
type in a shell
echo "" | gcc -E -dM -c -
In order to compile the C extensions on 64bit platforms add the compiler option -m64 -fPIC
. The data types on 64bit platforms have a different type. Avoid using the type long
in Python, better use int32
, int64
or for variable size int
. long
changes a lot between 32bit and 64bit platforms.
[http://www.unix.org/whitepapers/64bit.html Read more ...]
Different operation systems are delivering different output for the exponential format of floats. Here we ensure to deliver in a for SEED valid format independent of the OS. For speed issues we simple cut any number ending with E+0XX or E-0XX down to E+XX or E-XX. This fails for numbers XX>99, but should not occur, because the SEED standard does not allow this values either.
Python 2.5.2 (r252:60911, Feb 21 2008, 13:11:45)
[MSC v.1310 32 bit (Intel)] on win32
>>> '%E' % 2.5
'2.500000E+000'
Python 2.5.2 (r252:60911, Apr 2 2008, 18:38:52)
[GCC 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)] on linux2
>>> '%E' % 2.5
'2.500000E+00'
Python’s handling of default parameter values is one of a few things that tends to trip up most new Python programmers (but usually only once).
What causes the confusion is the behaviour you get when you use a “mutable” object as a default value; that is, a value that can be modified in place, like a list or a dictionary.
An example:
>>> def function(data=[]):
... data.append(1)
... return data
...
>>> function()
[1]
>>> function()
[1, 1]
>>> function()
[1, 1, 1]
Micro seconds cannot be calculated by the modulo operator, as the modulo operator returns only positive results.
An example:
>>> -0.5 % 1
0.5
>>> -0.2 % 1
0.80000000000000004
A preferred way is to use the modf function:
>>> from math import modf
>>> sec = 1.5
>>> msec, dsec = modf(sec)
>>> msec *= 1000
>>> print dsec, msec
1 500.0
The following example shows that the effect of casting (of the sampling interval 0.01) can be circumvented by dividing with a number which got the same casting effect.
>>> import numpy as np
>>> np.float32(0.01)
0.0099999998
>>> 1.0 / np.float32(0.01)
100.00000223517424
>>> np.float32(1.0) / np.float32(0.01)
100.0
Just to avoid problems with negative numbers:
>>> import math as M
>>> M.floor(3.5)
3.0
>>> int(3.5)
3
>>> M.floor(-3.5)
-4.0
>>> int(-3.5)
-3
A known problem are locale settings so that the Python shell uses comma instead of dot as decimal separator. In this case the ctypes library could cause problems (Read more ...). As soon as this problem occurs with !ObsPy please let us know.
Using convenient indexing operations on Numpy ndarray
s can lead to problems when writing data via external functions with ctypes (C/Fortran). Consider the following example...
import numpy as np
from obspy.core import read, Trace
x = np.arange(10)
y = x[:5]
z = x[::2]
tr1 = Trace(data=y)
tr2 = Trace(data=z)
print tr1.data
print tr2.data
...which shows that in Python the data of Traces 1 and 2 is:
[0 1 2 3 4]
[0 2 4 6 8]
But after writing and reading the data again...
tr1.write("/tmp/tr1.tmp", "MSEED")
tr2.write("/tmp/tr2.tmp", "MSEED")
tr1 = read("/tmp/tr1.tmp")[0]
tr2 = read("/tmp/tr2.tmp")[0]
print tr1.data
print tr2.data
...it is obvious that there was a problem with using the reference to the original data array. During the write operation not the correct data got written to file:
[0 1 2 3 4]
[0 1 2 3 4]
This can be avoided by ensuring that the data is C-contiguous. Use either np.require(data, dtype=data.dtype, requirements='C_CONTIGUOUS')
or np.ascontiguousarray(data)
for this purpose:
z_safe = np.ascontiguousarray(z)
# or: z_safe = np.require(z, dtype=z.dtype, requirements='C_CONTIGUOUS')
tr1 = Trace(data=y)
tr2 = Trace(data=z_safe)
tr1.write("/tmp/tr1.tmp", "MSEED")
tr2.write("/tmp/tr2.tmp", "MSEED")
tr1 = read("/tmp/tr1.tmp")[0]
tr2 = read("/tmp/tr2.tmp")[0]
print tr1.data
print tr2.data
which gives the expected result:
[0 1 2 3 4]
[0 2 4 6 8]
Whether the data is safe for operations with ctypes or not can be checked looking at the ndarray flags:
print y.flags
print z.flags
print z_safe.flags
True
False
True
To summarize: When data arrays are created via e.g. indexing operations on other arrays it should be checked if the correct data get passed on during ctypes
calls. Also refer to bugs #192 and #193.
- Python documentation page on floating point arithmetic
- Numpy information on machine limits for floating point types:
numpy.finfo()