Given two sorted arrays nums1
and nums2
of size m
and n
respectively, return the median of the two sorted arrays.
The overall run time complexity should be O(log (m+n))
.
Example 1:
Input: nums1 = [1,3], nums2 = [2] Output: 2.00000 Explanation: merged array = [1,2,3] and median is 2.
Example 2:
Input: nums1 = [1,2], nums2 = [3,4] Output: 2.50000 Explanation: merged array = [1,2,3,4] and median is (2 + 3) / 2 = 2.5.
Constraints:
nums1.length == m
nums2.length == n
0 <= m <= 1000
0 <= n <= 1000
1 <= m + n <= 2000
-106 <= nums1[i], nums2[i] <= 106
Companies: Amazon, Google, Adobe, Bloomberg, Apple, Facebook, Yahoo, Goldman Sachs, Microsoft, TikTok, Uber, MathWorks, Accenture, VMware, LinkedIn, ServiceNow, tcs, Walmart Labs, PayPal, Oracle, Morgan Stanley, Samsung, Yandex, SAP, Arcesium, Sprinklr, Zoho, Capgemini, OLX, Zenefits, Dropbox
Related Topics:
Array, Binary Search, Divide and Conquer
Similar Questions:
// OJ: https://leetcode.com/problems/median-of-two-sorted-arrays
// Author: github.com/lzl124631x
// Time: O(M + N)
// Space: O(1)
class Solution {
public:
double findMedianSortedArrays(vector<int>& A, vector<int>& B) {
int M = A.size(), N = B.size(), a = 0, b = 0;
auto getMin = [&]() {
int ans;
if (b >= N || (a < M && A[a] <= B[b])) ans = A[a++];
else ans = B[b++];
return ans;
};
for (int i = 0; i < (M + N - 1) / 2; ++i) getMin();
if ((M + N) % 2) return getMin();
return (getMin() + getMin()) / 2.;
}
};
Because the inputs are sorted arrays and the problem asks for a logarithmic time limit, we strongly feel that binary search (or a similar approach) is a promising method. While we're not sure how to cast the same pattern as a normal binary search on this problem, let's go over some steps of a regular binary search and see if we can get any inspiration. (If you are not familiar with binary search, you can refer to our Binary Search Explore Card)
Here we use binary search to find target
in a sorted array A
:
-
Locate the middle index (element) of
A
. -
Compare the value of the middle element with
target
. -
Reduce the search space by cutting the current array in half and discarding the half which is guaranteed not to contain
target
. -
Repeat the above process until we either empty the array (move to half a the length of 0) or find
target
.
At each step, the search space is cut in half, so we can quickly get the result. Now back to this problem where we have two sorted arrays. For the sake of convenience, let's call them A
and B
.
Similarly, we can get and compare their middle values A_mid
and B_mid
. Without loss of generality in this example we assume A_mid <= B_mid
initially, as shown in the yellow boxes.
What does this comparison imply?
It implies that we can compare sections of A
and B
.
For the rest of this article, we will use ≤\le≤ to represent the relative magnitude of values in arrays. For example, Aleft≤ArightA_{\text{left}} \le A_{\text{right}}Aleft≤Aright means that every element in AleftA_{\text{left}}Aleft is no larger than any element in ArightA_{\text{right}}Aright. We also 'compare' elements in an array with a single element similarly, for example, Aleft≤AmidA_{\text{left}} \le A_{\text{mid}}Aleft≤Amid means that every element in AleftA_{\text{left}}Aleft is no larger than the element AmidA_{\text{mid}}Amid.
This may not be the most standard way of expressing it, but is easy enough to understand.
Recall that both arrays are sorted, so we know that:
- Aleft≤AmidA_{\text{left}} \le A_{\text{mid}}Aleft≤Amid
- Bmid≤BrightB_{\text{mid}} \le B_{\text{right}}Bmid≤Bright
Combine these observations with the comparison we just made:
Amid≤BmidA_{\text{mid}} \le B_{\text{mid}}Amid≤Bmid
We have the following result:
Aleft≤Amid≤Bmid≤BrightA_{\text{left}} \le A_{\text{mid}} \le B_{\text{mid}} \le B_{\text{right}}Aleft≤Amid≤Bmid≤Bright
Thus,
Aleft≤BrightA_{\text{left}} \le B_{\text{right}}Aleft≤Bright
As shown in the picture below:
Since A
is sorted, we know that Aleft≤ArightA_{\text{left}} \le A_{\text{right}}Aleft≤Aright.
Now we know that A_left
is smaller than two halves: A_right
and B_right
. Although we still don't know where exactly these elements are, what we do know is A_left
doesn't intersect with A_right + B_right
! There is an invisible boundary between the A_left
segment and the mixed segment A_right + B_right
. As shown in the picture below, the dashed line divides all sorted elements into two halves.
We can apply all the same logic to the mixed segment AleftA_{\text{left}}Aleft + BleftB_{\text{left}}Bleft and BrightB_{\text{right}}Bright, which also do not intersect. You can try to prove it yourself as an exercise.
It looks somewhat clearer, we have clearly separated some subarrays. How do we continue to leverage this knowledge and use the cut-in-half method repeatedly?
The following step is the most important one.
Remember that we are looking for the median of sorted A + B
which is one or two target values. We regard the index of the target value in the sorted(A + B)
as k
. For example:
-
If the lengths of
A
andB
are6
and5
, the target index isk = (6 + 5 + 1) / 2 = 6
, we shall look for the 6th smallest element. -
If the lengths of
A
andB
are6
and6
, the target indexes arek = (6 + 6) / 2 = 6
andk + 1 = 7
, we shall look for the 6th and the 7th smallest elements.
Depending on whether the total number of elements is odd or even, we need the kthk^{th}kth (and maybe the (k+1)th(k + 1)^{th}(k+1)th) elements. What matters is that we set an index k
at the beginning and we want to find the kthk^{th}kth smallest element using the Binary Search-like algorithm discussed previously (for convenience, we will discuss only the kthk^{th}kth element for now).
However, during the Binary Search-like algorithm, we keep removing one half of an array, so the index k
might not stay unchanged. Suppose we removed 3
elements that are smaller than the original kthk^{th}kth smallest element, we shall look for the (k−3)th(k-3)^{th}(k−3)th smallest element from the remaining arrays.
More specifically:
If k
is larger than half the total number of elements in sorted(A + B)
, it means that the kthk^{th}kth element is in the second (larger) half of sorted(A + B)
, thus AleftA_{\text{left}}Aleft (or BleftB_{\text{left}}Bleft, the smaller of the two smaller sections according to the comparison) is guaranteed not to contain this element, and we can safely cut this half, and reduce k
by the length of the removed half.
If k
is not larger than half the total number of elements in sorted(A + B)
, it means that the kthk^{th}kth element is in the first (smaller) half of sorted(A + B)
, thus BrightB_{\text{right}}Bright (or ArightA_{\text{right}}Aright, the larger of the two larger sections according to the comparison) is guaranteed not to contain this element, and we can safely discard it. Note that we don't need to modify k
this time, since we removed one larger half that doesn't affect the order of the kthk^{th}kth smallest element.
We can continue our search like above in the remaining arrays. The long arrow that starts from the bottom and points to the top-left indicates that we are repeating the process. Once we cut off part of either A
or B
, we regard the remaining arrays as modified A
and B
and restart this algorithm. Note that the following picture represents one case only: we consider the case that a_value < b_value
, thus we remove either the smaller half of A
or the larger half of B
. If the comparison result is a_value >= b_value
, we shall remove either the smaller half of B
or the larger half of A
.
That's it. We cut one of the two arrays in half at each step, so this approach has a logarithmic time complexity which we will discuss in detail later.
One more thing!
In the previous picture, we repeat all processes using the modified arrays, but this is just for the sake of understanding. We won't create copies of two arrays repeatedly, because that would introduce a linear time complexity at least. Instead, we just treat a part of the original array as the modified array for the next step, so that we can repeat the process on the original array without making any duplication. To do this, we need to maintain four pointers, two pointers for each array, e.g., a_start
and a_end
represent an inclusive range [a_start, a_end]
of A
.
Let's define a function that helps us find the kthk^{th}kth smallest element from two inclusive ranges [a_start, a_end]
and [b_start, b_end]
from arrays A
and B
.
-
If the range (for example, a range of
A
) is empty, in other wordsa_start > a_end
, it means all elements inA
are passed, we just return the(k - a_start)
-th element from the other arrayB
. Vice versa ifb_start > b_end
. -
Otherwise, get the middle indexes of the two ranges:
a_index = (a_start + a_end) / 2
,b_index = (b_start + b_end) / 2
. -
Get the middle values of the two ranges:
a_value = A[a_index]
,b_value = B[b_index]
. -
Cut one array in half, according to:
- If
a_index + b_index < k
, cut one smaller half.- If
a_value < b_value
, cut the smaller half ofA
. - Otherwise, cut the smaller half of
B
.
- If
- Otherwise, cut one larger half.
- If
b_value < a_value
, cut the larger half ofB
. - Otherwise, cut the larger half of
A
.
- If
- If
-
Repeat step 1 using the new starting and ending indexes of
A
andB
.
Then we move on to find the median elements, and get the length of both arrays na = len(A)
and nb = len(B)
.
- If the total number of elements in
A
andB
is odd, we just use the above function to find the middle element, that isk = (na + nb) / 2
. - Otherwise, we use the function to find two middle elements:
k = (na + nb) / 2 - 1
andk = (na + nb) / 2
, and return their average.
Let mmm be the size of array nums1
and nnn be the size of array nums2
.
-
Time complexity: O(log(m⋅n))O(\log(m \cdot n))O(log(m⋅n))
- At each step, we cut one half off from either
nums1
ornums2
. If one of the arrays is emptied, we can directly get the target from the other array in a constant time. Therefore, the total time spent depends on when one of the arrays is cut into an empty array. - In the worst-case scenario, we may need to cut both arrays before finding the target element.
- One of the two arrays is cut in half at each step, thus it takes logarithmic time to empty an array. The time to empty two arrays are independent of each other.
- Therefore, the time complexity is O(logm+logn)O(\log m + \log n)O(logm+logn).
O(logm+logn)=O(log(m⋅n))O(\log m + \log n) = O(\log (m\cdot n))O(logm+logn)=O(log(m⋅n))
- At each step, we cut one half off from either
-
Space complexity: O(logm+logn)O(\log m + \log n)O(logm+logn)
-
Similar to the analysis on time complexity, the recursion steps depend on the number of iterations before we cut an array into an empty array. In the worst-case scenario, we need O(logm+logn)O(\log m + \log n)O(logm+logn) recursion steps.
-
However, during the recursive self-call, we only need to maintain 4 pointers:
a_start
,a_end
,b_start
andb_end
. The last step of the function is to call itself, so if tail call optimization is implemented, the call stack always has O(1)O(1)O(1) records. -
Please refer to Tail Call for more information on tail call optimization.
-
// OJ: https://leetcode.com/problems/median-of-two-sorted-arrays
// Author: github.com/lzl124631x
// Time: O(log(MN))
// Space: O(log(MN))
class Solution {
public:
double findMedianSortedArrays(vector<int>& A, vector<int>& B) {
int M = A.size(), N = B.size();
function<int(int, int, int, int, int)> search = [&](int target, int al, int ar, int bl, int br) {
if (ar < al) return B[target - al];
if (br < bl) return A[target - bl];
int amid = (al + ar) / 2, bmid = (bl + br) / 2;
if (amid + bmid < target) {
if (A[amid] > B[bmid]) return search(target, al, ar, bmid + 1, br);
return search(target, amid + 1, ar, bl, br);
}
if (A[amid] > B[bmid]) return search(target, al, amid - 1, bl, br);
return search(target, al, ar, bl, bmid - 1);
};
if ((M + N) % 2) return search((M + N) / 2, 0, M - 1, 0, N - 1);
return (search((M + N - 1) / 2, 0, M - 1, 0, N - 1) + search((M + N) / 2, 0, M - 1, 0, N - 1)) / 2.;
}
};
// OJ: https://leetcode.com/problems/median-of-two-sorted-arrays/
// Author: github.com/lzl124631x
// Time: O(log(min(M, N)))
// Space: O(1)
// Ref: https://leetcode.com/problems/median-of-two-sorted-arrays/solution/
// Ref: https://youtu.be/LPFhl65R7ww
class Solution {
public:
double findMedianSortedArrays(vector<int>& nums1, vector<int>& nums2) {
if (nums1.size() > nums2.size()) swap(nums1, nums2);
int M = nums1.size(), N = nums2.size(), L = 0, R = M, K = (M + N + 1) / 2;
while (true) {
int i = (L + R) / 2, j = K - i;
if (i < M && nums2[j - 1] > nums1[i]) L = i + 1;
else if (i > L && nums1[i - 1] > nums2[j]) R = i - 1;
else {
int maxLeft = max(i ? nums1[i - 1] : INT_MIN, j ? nums2[j - 1] : INT_MIN);
if ((M + N) % 2) return maxLeft;
int minRight = min(i == M ? INT_MAX : nums1[i], j == N ? INT_MAX : nums2[j]);
return (maxLeft + minRight) / 2.0;
}
}
}
};