We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
比如这个testcase
test("sortByKey") {
val pairs = sc.parallelize(Array((1, 0), (2, 0), (0, 0), (3, 0)), 2)
assert(pairs.sortByKey().collect() === Array((0, 0), (1, 0), (2, 0), (3, 0)))
}
collect()之前都是一个一个new新的RDD,好像没有实际计算,之后runJob里面又debug不到,
全部的归并结果的代码应该在RDD.scala的Array.concat(results: _*)
def collect(): Array[T] = withScope {
val results = sc.runJob(this, (iter: Iterator[T]) => iter.toArray)
Array.concat(results: _*)
您的shuffle那章有sortByKey的转化图,可是这些计算过程在哪呢?
The text was updated successfully, but these errors were encountered:
计算逻辑在RDD.compute(),计算过程是pipeline的,你可以通过finalRDD.debug()看到RDD的依赖图,建议你仔细看下LogicalPlan和PhysicalPlan那两章就明白了
Sorry, something went wrong.
No branches or pull requests
比如这个testcase
test("sortByKey") {
val pairs = sc.parallelize(Array((1, 0), (2, 0), (0, 0), (3, 0)), 2)
assert(pairs.sortByKey().collect() === Array((0, 0), (1, 0), (2, 0), (3, 0)))
}
collect()之前都是一个一个new新的RDD,好像没有实际计算,之后runJob里面又debug不到,
全部的归并结果的代码应该在RDD.scala的Array.concat(results: _*)
def collect(): Array[T] = withScope {
val results = sc.runJob(this, (iter: Iterator[T]) => iter.toArray)
Array.concat(results: _*)
}
您的shuffle那章有sortByKey的转化图,可是这些计算过程在哪呢?
The text was updated successfully, but these errors were encountered: