-
Notifications
You must be signed in to change notification settings - Fork 2
/
FAQ
149 lines (119 loc) · 7.9 KB
/
FAQ
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
A collection of Frequently Asked Questions about libfreevec.
Q: Who should use libfreevec?
A: libfreevec is targeted for use in applications that are destined to run in
the G4 PowerPC family of CPUs, or any PowerPC that includes an AltiVec unit,
and could benefit from a speed up in commonly used routines, such as memory
and string routines. It's more fitted for use in larger cpu-intensive apps
rather than the custom 100-line C program that takes 1 second to run.
Q: What about the G5 CPUs?
A: Well, in theory these should run, or at least with slight modifications,
but there is no guarrantee that they will perform as well as on the G4. One
reason is that the G5 has a slightly different Altivec unit (I will not say
'crippled' because it isn't, it's just different). The other, more important,
reason is that the G5 has a very capable 64-bit unit, and a much faster FSB
which in some cases more than makes up for the less capable AltiVec unit.
Preliminary tests on the G5 using very early versions of the library, have
shown that if there is to be any gain from using AltiVec for such function
replacements, then these will have to be modified specifically for the G5.
It won't do to just take the 32-bit versions and expect them to work
top-speed.
Q: Who can use it and under what License?
A: Anyone really. The license chosen is the LGPL. This means that you are
free to use it in free software projects, and you're most welcome to use it
even in closed-source/proprietary software, albeit with a few not irrational
restrictions. Read the license text for details.
Q: Why libfreevec? There is already libmotovec by Freescale!
A: Indeed. And much to their credit I'm still struggling to reach the
performance of this library. These guys really knew what they did :-)
Seriously, the fact is that libmotovec is not really free software, there are
difficulties in getting it incorporated into major distributions, eg. Debian
will never include it under this current license. Plus, it's written in ppc
assembly, which although is great for performance, it's really difficult to
maintain/debug/whatever. Not many people are that proficient with ppc asm, I
know I am not. libfreevec is written in C, using the AltiVec intrinsics
available in GCC. This means that as GCC gets better instruction scheduling
for the G4, so libfreevec might benefit from this as well. In addition, the
way it's written right now, it allows for easy expansion with new functions,
or even optimization of the existing ones, with just fiddling with a couple
of macro modules. And lastly, libfreevec offers more AltiVec optimized
functions than libmotovec, plus a consistent cache-prefetching mechanism
used in all of the available functions. Plus, if I might say so myself, some
of the libfreevec functions are even faster than the libmotovec
equivalents.:-)
Q: Why can't you just copy parts of libmotovec into libfreevec?
A: That's not really a solution. To be truthful, I actually have looked at
the source code of the libmotovec's functions. But my knowledge of ppc asm
is very minimal, I barely understood the reason of the particular choice of
instructions, and their sequence. I'm sure others will probably find it as
obvious as sunlight, but I assure you I didn't. Anyway, the thing is that I
can't just copy them, due to license issues. Even if it was possible, I don't
think it would be nice, at least not without extensive comments, which I can't
add anyway. And as an AltiVec guru, Holger Bettag, says quite often, "AltiVec
asm might give you an extra 2% performance, but why bother?". Holger, I
paraphrased it a little bit, I hope you don't mind! And anyway, the fact that
it would be written in ppc asm, would also mean that I would not be able to
get as much feedback from users as I would like, as again not many people are
experienced in PowerPC assembly.
Q: There is already liboil, why don't you put your code there?
A: Actually I intend to, but not this particular code. The goal for liboil is
slightly different, it offers its own API, and a whole lot of highly optimized
routines to perform various algorithms. On the other hand, I wanted to
optimize existing functions from GLIBC, libstring (which is heavily used in
MySQL), etc. I do plan to write some code for liboil at a later stage, but not
at this particular moment.
Q: Will my program run faster with this library? And how much faster?
A: It depends. If your program does 1 million memcpy(), of 5 bytes each, the
library will not benefit you at all. It might even be slower, due to the a
slightly bigger overhead. Actually, in truth it's quite bad design to do a
memcpy for such a small buffer. On the other hand, depending on what your
application does, you might enjoy significant benefit from using such a
library. Eg. the AltiVec version of swab() in this library is ~7x faster than
the scalar code. But it won't make a difference to your program if you only
call it at the initialization code for a 100 byte buffer. Also, does your app
use mainly aligned or unaligned buffers? So far, I can say that the
performance hit from unaligned addresses is mostly minimized, but it's of
course a penalty. For actual results, please check the Features and Benchmarks
pages.
Q: But AltiVec is useless/not good on small buffers?
A: Quite true. AltiVec is a very powerful beast but it needs lots of data to
feed. Throughout the library I try to use AltiVec only where it's useful and
needed. Most of the times, I just redesign an algorithm to be more efficient,
and after a particular size threshold, AltiVec kicks in. That way I try to get
the best of both worlds.
Q: I noticed that some of the functions are even faster for small sizes, how
is that possible?
A: Well, for certain AltiVec is not used in these cases. Most probably the
original algorithm was quite dumb (ie. not optimized). When I was writing the
replacement functions, I always had in mind that they have to be equally fast,
if not faster, to the originals, for small sizes. It would look bad if a
program that uses memcpy(), but only on smaller sizes, became slower for that
reason. I tried to achieve this as much as possible, though I might have missed
something. That's why user and developer feedback is so important, so send them
patches :-)
Q: Why does the speed drop so much with very big sizes?
A: Assuming you refer to the memory functions, it's because data has to be
fetched from the actual memory rather than the L1 or L2 caches. And AltiVec
has an 128-bit bus but to the L1 and L2 caches not to the main memory. Still,
I use cache prefetching in most of the functions and the performance will
still be better than the original functions. Don't expect miracles though,
in these cases a 20-30% performance gain is more likely rather than a 10x.
Q: How can I make sure I get the most of it?
A: Try to use as much aligned data as possible. eg using well known tricks
like the following:
unsigned char __attribute__ ((aligned(16))) *buffer;
instead of just declaring a variable. That way, you'll skip the time spent
on handling unaligned data. Also, try to avoid useless invocations of
memcpy() or similar functions for tiny buffers. Though GCC is supposed to
inline copying code for some cases, this is not guaranteed and should not
be taken for granted all of the time. Instead try to organize your data
into bigger structures. It's better in the long run anyway. Q: How does
it work?
A: Please see the Docs section and the actual Source code for details.
Q: Ok, let's say that it's nice, what's next?
A: Well, the ultimate goal is to get as much user feedback for these
functions, optimize them as much as possible and then incorporate them
into the actual GLIBC and perhaps even the kernel. That way, Linux/powerpc
will have what MacOS X users have enjoyed for years :-)
Q: Are there any mailing lists to discuss about its development?
A: There is a project on Alioth, which offers a mailing list, plus the
Forums on PPCZone. Also, great discussions on AltiVec are taking place on simdtech.org.