-
Notifications
You must be signed in to change notification settings - Fork 0
/
MajorityElement.hh
189 lines (178 loc) · 8.91 KB
/
MajorityElement.hh
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
/****************************************************************************
* File: MajorityElement.hh
* Author: Keith Schwarz (htiek@cs.stanford.edu)
*
* A one-pass, linear time algorithm that returns the majority element of a
* range of data. The majority element is an element that appears strictly
* more than half the time. For example, in the sequence
*
* 0 1 0 0 2 0 3
*
* The number 0 is a majority element, since it appears 4/7 times. However,
* in the sequence
*
* 0 1 0 0 2 0 3 3
*
* There is no majority element, since even though 0 occurs 4/8 times, this
* isn't strictly greater than half the elements.
*
* The algorithm for finding the majority element is remarkably simple, but its
* correctness is not immediately obvious. The algorithm works as follows. At
* each step, we maintain our "guess" of what the majority element will be, and
* also a counter. We then scan across the array. At each point, if the new
* element matches our current guess we increment the counter, and otherwise
* we decrement it. If the counter is ever zero, then on the next element we
* change the counter to 1 and pick the next element as our guess. Finally, we
* output the guessed element. For example, here is the algorithm running on
* the earlier input. The topmost row shows the input, below that our guess,
* and below that the counter:
*
* INPUT 0 1 0 0 2 0 3
* GUESS ? 0 ? 0 0 0 0 0
* COUNTER 0 1 0 1 2 1 2 1
*
* Since our guess at the end is zero, we output zero. This algorithm is
* due to Boyer and Moore and is described in their paper "MJRTY - A Fast
* Majority Vote Algorithm."
*
* There are many ways to think about why this algorithm works. One good
* intuition is to think of the algorithm as breaking the input down into lots
* of stretches of consecutive copies of particular values. Incrementing the
* counter then corresponds to marking that multiple copies of the same value
* were found, while decrementing it corresponds to some other sequences of
* values "canceling out" the accumulation of values of a particular type.
*
* A formal proof of correctness of this algorithm (based on the proof in
* Boyer and Moore's paper) relies on a key lemma. In this section, we'll
* let C be the number that is currently a candidate for the majority element,
* K be its count after some number of steps, and N be the number of total
* elements.
*
* Lemma 1: For any i, 1 <= i <= N, after i steps of the algorithm, the
* elements in the range [1, i] can rearranged into two groups A and B such
* that A is K copies of C, and B is a collection of elements with at most
* i / 2 copies of any one element.
*
* Let's hold off of the proof of this lemma for now, and show that if it
* holds and there is a majority element, the algorithm must be correct. Using
* the above lemma, note that when the algorithm terminates, there must be some
* element C that was chosen with some count K. Assume for the sake of
* contradiction that C is not the majority element; then there is some other
* element C' that must be the majority element. Consequently, there are at
* least n / 2 elements of the range equal to C'. Let's consider where they
* are. By the above lemma, all the elements of the input can be broken up
* into groups A and B, where everything in group A has value C and at most
* |B| / 2 elements of |B| have value C'. Since |A| = K and |A| + |B| = N,
* this means that there are at most (N - K) / 2 copies of C', contradicting
* the fact that C' is the actual majority element. We have reached a
* contradiction, and so C must be the majority element at the end of the
* algorithm's run.
*
* We can now prove the claim of the lemma by induction on i. As a base case,
* if i = 1, then K = 1 and C is the first element of the range. Then we can
* let A be the singleton element and B be the empty set, which trivially obeys
* the criteria of the lemma. For the inductive step, assume that for some i
* the claim holds and consider the execution of the algorithm on step i + 1.
* Let A and B be the sets A and B from the ith step. Then we consider three
* possible cases:
*
* 1. On entry to this step, K = 0. Then after this step finishes, K = 1
* and C is the newest element. This means that on entry to this step,
* A was the empty set and B was some set where no element appeared more
* than i/2 times in B. If we then let A' be the singleton set containing
* the new element and B' = B, then these sets satisfy the requirements of
* the lemma and the claim holds.
* 2. On entry to this step, K > 0 and the new element matches the current
* majority element. Then we can add this element to A to get a new set
* A' meeting the lemma's requirements, so the claim holds.
* 3. On entry to this step, K > 0, and the new element does not match the
* current majority element. This means that the new K is one minus the
* previous K, but the candidate majority element does not change. If
* we then move one element from A into B, then place the new element into
* the set B, then the updated A and B will satisfy the lemma's claims.
* This is tedious but simple to check, so I'll leave it as an exercise
* to the reader. :-)
*
* In the case where there is no majority element, the element produced by the
* algorithm will be arbitrary. We can then check whether we have the majority
* element by performing a linear scan over the input range and counting the
* frequency of the element.
*/
#ifndef MajorityElement_Included
#define MajorityElement_Included
#include <functional> // For equal_to
#include <iterator> // For iterator_traits
/**
* Function: ForwardIterator MajorityElement(ForwardIterator begin,
* ForwardIterator end);
* Usage: int maj = MajorityElement(v.begin(), v.end());
* -----------------------------------------------------------------------
* Given a range of values [begin, end) where strictly more than half the
* elements have the same value, returns the value of this most-common
* element. If no element is a majority element, end is returned as a
* sentinel.
*/
template <typename ForwardIterator>
ForwardIterator MajorityElement(ForwardIterator begin, ForwardIterator end);
/**
* Function: ForwardIterator MajorityElement(ForwardIterator begin,
* ForwardIterator end,
* Comparator comp);
* Usage: int maj = MajorityElement(v.begin(), v.end(), MyComparator);
* -----------------------------------------------------------------------
* Given a range of values [begin, end) where strictly more than half the
* elements have the same value according to the binary comparator comp,
* returns the value of this most-common element. If no element is a majority
* element, end is returned as a sentinel.
*/
template <typename ForwardIterator, typename Comparator>
ForwardIterator MajorityElement(ForwardIterator begin, ForwardIterator end,
Comparator comp);
/* * * * * Implementation Below This Point * * * * */
/* Main implementation uses the parameterized comparator. */
template <typename ForwardIterator, typename Comparator>
ForwardIterator MajorityElement(ForwardIterator begin, ForwardIterator end,
Comparator comp) {
/* Initially, we have no guess and our count is zero. However, to avoid
* edge cases with the empty range, we initialize the candidate to end.
*/
ForwardIterator candidate = end;
size_t confidence = 0;
/* Scan over the input using the Boyer-Moore update rules. */
for (ForwardIterator itr = begin; itr != end; ++itr) {
/* If we have no confidence in our previous guess, update it to this new
* element.
*/
if (confidence == 0) {
candidate = itr;
++confidence;
}
/* Otherwise, increment or decrement the confidence based on whether the
* next element matches.
*/
else if (comp(*candidate, *itr))
++confidence;
else
--confidence;
}
/* Finally, do one more pass to confirm that we have a majority element. */
size_t numMatches = 0, totalElements = 0;
for (ForwardIterator itr = begin; itr != end; ++itr) {
/* Check whether this is a match and update appropriately. */
if (comp(*candidate, *itr))
++numMatches;
/* Either way, increment the total number of elements. */
++totalElements;
}
/* This is a majority element if it accounts for at least half the number
* of elements.
*/
return totalElements / 2 < numMatches ? candidate : end;
}
/* Non-comparator version implemented in terms of comparator version. */
template <typename ForwardIterator>
ForwardIterator MajorityElement(ForwardIterator begin, ForwardIterator end) {
return MajorityElement(begin, end,
std::equal_to<typename std::iterator_traits<ForwardIterator>::value_type>());
}
#endif