Skip to content

Commit

Permalink
Disables AVX and fp16 intrinsics by default
Browse files Browse the repository at this point in the history
Should fix #24
  • Loading branch information
poke1024 committed Aug 16, 2019
1 parent 7720bb0 commit dab29f7
Show file tree
Hide file tree
Showing 3 changed files with 65 additions and 12 deletions.
19 changes: 17 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ will fix the links to the lib and asset folders, so you can run e.g. `love demos

## Building TÖVE Yourself

On Linux and macOS:
### On Linux and macOS

```
git clone --recurse-submodules https://github.com/poke1024/tove2d
Expand All @@ -40,16 +40,31 @@ scons
love demos/hearts
```

On Windows:
### On Windows

```
# in git bash:
git clone --recurse-submodules https://github.com/poke1024/tove2d
cd tove2d
# in regular cmd shell:
cd path/to/tove2d
setup.bat /C
```

Take a look inside `setup.bat` to learn more about installing a compiler environment.

### Experimental

Use `scons --arch=sandybridge` or `scons --arch=haswell` to compile for a custom CPU architecture. On Windows, this enables AVX extensions,
which mainly speeds up the rasterizer dithering code.

On POSIX, in addition, you can enable hardware intrinsics for 16-bit floating point operations by using `scons --f16c`. These intrinsics
might give small performance benefits in the `gpux` mode.

Since enabling custom options causes crashes on some CPUs (https://github.com/poke1024/tove2d/issues/24), I strongly recommend sticking to
the defaults, as they ensure that your binary will run on many CPUs.

## Roadmap

I keep my ideas of what might eventually end up in TÖVE in [TÖVE's Public Trello Board](https://trello.com/b/p5nWCZVC/t%C3%B6ve).
Expand Down
54 changes: 46 additions & 8 deletions SConstruct
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,21 @@ AddOption(
help='build static library',
default=False)

AddOption(
'--arch',
dest='arch',
type='string',
nargs=1,
action='store',
help='CPU architecture to use (e.g. sandybridge, haswell)',
default=None)

AddOption(
'--f16c',
action='store_true',
help='assume hardware support for 16-bit floating point numbers',
default=False)

sources = [
"src/cpp/version.cpp",
"src/cpp/interface/api.cpp",
Expand All @@ -38,32 +53,55 @@ sources = [

if env["PLATFORM"] == 'win32':
env["CCFLAGS"] = ' /EHsc /std:c++17 '

if not GetOption('tovedebug'):
env["CCFLAGS"] += ' /Oi /Ot /Oy /Ob2 /GF /Gy /fp:fast /arch:AVX '

# enabling /O2 (or /Og above) will crash demos/retro for some reason.
# env["CCFLAGS"] += ' /O2 /fp:fast '

env["CCFLAGS"] += ' /Oi /Ot /Oy /Ob2 /GF /Gy /fp:fast '
# enabling /O2 (or /Og) will crash demos/retro for some reason.


# concering CPU architectures and AVX:

# by default, we do not specify a value for /arch - even though in theory, /arch:AVX should be
# fine for processors not older than 2011 (https://en.wikipedia.org/wiki/Advanced_Vector_Extensions),
# in practice setting AVX causes issues (see https://github.com/poke1024/tove2d/issues/24).
# for /arch:AVX2, we even need Haswell architectures.

# the main benefit for enabling AVX2 would probably be src/thirdparty/nanosvg/tove/svgrast.cpp,
# where various dithering operations are optimizable for SIMD (AVX can then be seen in asm code).

arch = GetOption('arch')
if arch == 'sandybridge':
env["CCFLAGS"] += ' /arch:AVX '
elif arch == 'haswell':
env["CCFLAGS"] += ' /arch:AVX2 '
elif arch is not None:
raise RuntimeError("please specify sandybridge or haswell for arch")


env["LINKFLAGS"] = ' /OPT:REF '
else:
env["CCFLAGS"] += ' /DEBUG:FULL '
env["LINKFLAGS"] = ' /DEBUG:FULL '

else:
# might want to trigger these with additional options:
# -march=haswell
# -Wreorder -Wunused-variable

CCFLAGS = ' -std=c++17 -fvisibility=hidden -funsafe-math-optimizations '

if GetOption('arch'):
CCFLAGS += ' -march=%s ' % GetOption('arch')

if GetOption('tovedebug'):
CCFLAGS += '-g '
else:
CCFLAGS += '-O3 '

if env["PLATFORM"] == 'posix':
if env["PLATFORM"] == 'posix' and GetOption('f16c'):
CCFLAGS += ' -mf16c '

if GetOption('f16c'):
CCFLAGS += ' -DTOVE_F16C=1 '

env["CCFLAGS"] = CCFLAGS

env["CPPPATH"] = "src/thirdparty/fp16/include"
Expand Down
4 changes: 2 additions & 2 deletions src/cpp/common.h
Original file line number Diff line number Diff line change
Expand Up @@ -63,11 +63,11 @@ inline void store_gpu_float(float &p, float x) {
p = x;
}

#if defined(__clang__)
#if defined(TOVE_F16C) && defined(__clang__)
inline void store_gpu_float(uint16_t &p, float x) {
*reinterpret_cast<__fp16*>(&p) = x;
}
#elif defined(__GNUC__)
#elif defined(TOVE_F16C) && defined(__GNUC__)
#include <x86intrin.h>
inline void store_gpu_float(uint16_t &p, float x) {
p = _cvtss_sh(x, 0);
Expand Down

0 comments on commit dab29f7

Please sign in to comment.