[SPARK-1704][SQL] Fully support EXPLAIN commands as SchemaRDD. #1003

concretevitamin · 2014-06-07T00:05:26Z

This PR attempts to resolve SPARK-1704 by introducing a physical plan for EXPLAIN commands, which just prints out the debug string (containing various SparkSQL's plans) of the corresponding QueryExecution for the actual query.

AmplabJenkins · 2014-06-07T00:07:50Z

Merged build triggered.

AmplabJenkins · 2014-06-07T00:07:55Z

Merged build started.

AmplabJenkins · 2014-06-07T01:21:33Z

Merged build finished. All automated tests passed.

AmplabJenkins · 2014-06-07T01:21:33Z

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15520/

rxin · 2014-06-07T06:17:35Z

sql/core/src/main/scala/org/apache/spark/sql/execution/commands.scala

+case class ExplainCommandPhysical(child: SparkPlan)
+                                 (@transient context: SQLContext) extends UnaryNode {
+  def execute(): RDD[Row] = {
+    val lines = child.toString.split("\n").map(s => new GenericRow(Array[Any](s)))


seems a little bit strange to have one row per row ... I think the plan should be a single value.

AmplabJenkins · 2014-06-07T06:47:49Z

Merged build triggered.

AmplabJenkins · 2014-06-07T06:47:58Z

Merged build started.

rxin · 2014-06-07T06:59:04Z

This might not be your problem, but when I tried the following, I got ...

scala> c.hql("explain select key, count(value) from src group by key").collect()
14/06/06 23:58:05 INFO parse.ParseDriver: Parsing command: explain select key, count(value) from src group by key
14/06/06 23:58:05 INFO parse.ParseDriver: Parse Completed
14/06/06 23:58:05 INFO analysis.Analyzer: Max iterations (2) reached for batch MultiInstanceRelations
14/06/06 23:58:05 INFO analysis.Analyzer: Max iterations (2) reached for batch CaseInsensitiveAttributeReferences
14/06/06 23:58:05 INFO analysis.Analyzer: Max iterations (2) reached for batch MultiInstanceRelations
14/06/06 23:58:05 INFO analysis.Analyzer: Max iterations (2) reached for batch CaseInsensitiveAttributeReferences
14/06/06 23:58:05 INFO metastore.HiveMetaStore: 0: get_table : db=default tbl=src
14/06/06 23:58:05 INFO HiveMetaStore.audit: ugi=rxin    ip=unknown-ip-addr  cmd=get_table : db=default tbl=src  
14/06/06 23:58:05 INFO storage.MemoryStore: ensureFreeSpace(147699) called with curMem=737503, maxMem=1145674137
14/06/06 23:58:05 INFO storage.MemoryStore: Block broadcast_5 stored as values to memory (estimated size 144.2 KB, free 1091.8 MB)
14/06/06 23:58:05 INFO sql.SQLContext$$anon$1: Max iterations (2) reached for batch Add exchange
14/06/06 23:58:05 INFO sql.SQLContext$$anon$1: Max iterations (2) reached for batch Prepare Expressions
org.apache.spark.sql.catalyst.errors.package$TreeNodeException: makeCopy, tree:
ExplainCommandPhysical 
 Aggregate false, [key#12], [key#12,SUM(PartialCount#14L) AS c_1#10L]
  Exchange (HashPartitioning [key#12:0], 150)
   Aggregate true, [key#12], [key#12,COUNT(value#13) AS PartialCount#14L]
    HiveTableScan [key#12,value#13], (MetastoreRelation default, src, None), None

    at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:47)
    at org.apache.spark.sql.catalyst.trees.TreeNode.makeCopy(TreeNode.scala:265)
    at org.apache.spark.sql.catalyst.trees.TreeNode.transformChildrenUp(TreeNode.scala:249)
    at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:215)
    at org.apache.spark.sql.execution.AddExchange$.apply(Exchange.scala:93)
    at org.apache.spark.sql.execution.AddExchange$.apply(Exchange.scala:89)
    at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1$$anonfun$apply$2.apply(RuleExecutor.scala:62)
    at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1$$anonfun$apply$2.apply(RuleExecutor.scala:60)
    at scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:51)
    at scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:60)
    at scala.collection.mutable.WrappedArray.foldLeft(WrappedArray.scala:34)
    at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1.apply(RuleExecutor.scala:60)
    at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1.apply(RuleExecutor.scala:52)
    at scala.collection.immutable.List.foreach(List.scala:318)
    at org.apache.spark.sql.catalyst.rules.RuleExecutor.apply(RuleExecutor.scala:52)
    at org.apache.spark.sql.SQLContext$QueryExecution.executedPlan$lzycompute(SQLContext.scala:275)
    at org.apache.spark.sql.SQLContext$QueryExecution.executedPlan(SQLContext.scala:275)
    at org.apache.spark.sql.hive.HiveContext$QueryExecution.toRdd$lzycompute(HiveContext.scala:260)
    at org.apache.spark.sql.hive.HiveContext$QueryExecution.toRdd(HiveContext.scala:248)
    at org.apache.spark.sql.hive.HiveContext.hiveql(HiveContext.scala:85)
    at org.apache.spark.sql.hive.HiveContext.hql(HiveContext.scala:90)
    at $i$$$$9579e5b89ab1eb428704b684e2e341c$$$$$.<init>(<console>:70)
    at $i$$$$9579e5b89ab1eb428704b684e2e341c$$$$$.<clinit>(<console>)
    at .<init>(<console>:7)
    at .<clinit>(<console>)
    at $print(<console>)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:734)
    at scala.tools.nsc.interpreter.IMain$Request.loadAndRun(IMain.scala:983)
    at scala.tools.nsc.interpreter.IMain.loadAndRunReq$1(IMain.scala:573)
    at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:604)
    at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:568)
    at scala.tools.nsc.interpreter.ILoop.reallyInterpret$1(ILoop.scala:760)
    at scala.tools.nsc.interpreter.ILoop.interpretStartingWith(ILoop.scala:805)
    at scala.tools.nsc.interpreter.ILoop.command(ILoop.scala:717)
    at scala.tools.nsc.interpreter.ILoop.processLine$1(ILoop.scala:581)
    at scala.tools.nsc.interpreter.ILoop.innerLoop$1(ILoop.scala:588)
    at scala.tools.nsc.interpreter.ILoop.loop(ILoop.scala:591)
    at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply$mcZ$sp(ILoop.scala:882)
    at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply(ILoop.scala:837)
    at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply(ILoop.scala:837)
    at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
    at scala.tools.nsc.interpreter.ILoop.process(ILoop.scala:837)
    at scala.tools.nsc.interpreter.ILoop.main(ILoop.scala:904)
    at xsbt.ConsoleInterface.run(ConsoleInterface.scala:69)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at sbt.compiler.AnalyzingCompiler.call(AnalyzingCompiler.scala:102)
    at sbt.compiler.AnalyzingCompiler.console(AnalyzingCompiler.scala:77)
    at sbt.Console.sbt$Console$$console0$1(Console.scala:23)
    at sbt.Console$$anonfun$apply$2$$anonfun$apply$1.apply$mcV$sp(Console.scala:24)
    at sbt.Console$$anonfun$apply$2$$anonfun$apply$1.apply(Console.scala:24)
    at sbt.Console$$anonfun$apply$2$$anonfun$apply$1.apply(Console.scala:24)
    at sbt.Logger$$anon$4.apply(Logger.scala:90)
    at sbt.TrapExit$App.run(TrapExit.scala:244)
    at java.lang.Thread.run(Thread.java:744)
Caused by: org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Failed to copy node.  Is otherCopyArgs specified correctly for ExplainCommandPhysical?, tree:
ExplainCommandPhysical 
 Aggregate false, [key#12], [key#12,SUM(PartialCount#14L) AS c_1#10L]
  Exchange (HashPartitioning [key#12:0], 150)
   Aggregate true, [key#12], [key#12,COUNT(value#13) AS PartialCount#14L]
    HiveTableScan [key#12,value#13], (MetastoreRelation default, src, None), None

    at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$makeCopy$1.apply(TreeNode.scala:275)
    at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$makeCopy$1.apply(TreeNode.scala:266)
    at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:46)
    ... 60 more

AmplabJenkins · 2014-06-07T08:02:59Z

Merged build finished. All automated tests passed.

AmplabJenkins · 2014-06-07T08:02:59Z

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15523/

marmbrus · 2014-06-07T10:19:15Z

Caused by: org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Failed to copy node. Is otherCopyArgs specified correctly for ExplainCommandPhysical?, tree:

Second parameter lists aren't automatically copied. You'll need to add override def otherCopyArgs = sc :: Nil

concretevitamin · 2014-06-07T17:40:49Z

Fixed. Additionally, should output be a single column of something like "plan" w/ StringType?

AmplabJenkins · 2014-06-07T17:42:49Z

Merged build triggered.

AmplabJenkins · 2014-06-07T17:42:59Z

Merged build started.

marmbrus · 2014-06-07T17:43:24Z

Yes, that sounds reasonable.

rxin · 2014-06-07T18:11:52Z

Can you add a test case?

AmplabJenkins · 2014-06-07T18:57:29Z

Merged build finished. All automated tests passed.

AmplabJenkins · 2014-06-07T18:57:29Z

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15528/

concretevitamin · 2014-06-09T19:10:36Z

I added a test which is passing in this branch but failing in master.

AmplabJenkins · 2014-06-09T19:12:50Z

Merged build triggered.

AmplabJenkins · 2014-06-09T20:05:24Z

Merged build started.

AmplabJenkins · 2014-06-09T21:19:01Z

Merged build finished. All automated tests passed.

AmplabJenkins · 2014-06-09T21:19:02Z

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15573/

marmbrus · 2014-06-09T23:46:51Z

sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala

@@ -255,6 +256,11 @@ class SQLContext(@transient val sparkContext: SparkContext)
      Batch("Prepare Expressions", Once, new BindReferences[SparkPlan]) :: Nil
  }

+  // TODO: or should we make QueryExecution protected[sql]?
+  protected[sql] def mkQueryExecution(plan: LogicalPlan) = new QueryExecution {


This method already exists above as executePlan.

This PR attempts to resolve [SPARK-1704](https://issues.apache.org/jira/browse/SPARK-1704) by introducing a physical plan for EXPLAIN commands, which just prints out the debug string (containing various SparkSQL's plans) of the corresponding QueryExecution for the actual query. Author: Zongheng Yang <zongheng.y@gmail.com> Closes #1003 from concretevitamin/explain-cmd and squashes the following commits: 5b7911f [Zongheng Yang] Add a regression test. 1bfa379 [Zongheng Yang] Modify output(). 719ada9 [Zongheng Yang] Override otherCopyArgs for ExplainCommandPhysical. 4318fd7 [Zongheng Yang] Make all output one Row. 439c6ab [Zongheng Yang] Minor cleanups. 408f574 [Zongheng Yang] SPARK-1704: Add CommandStrategy and ExplainCommandPhysical. (cherry picked from commit a9ec033) Signed-off-by: Michael Armbrust <michael@databricks.com>

marmbrus · 2014-06-09T23:49:32Z

Merged into 1.0 and master. Thanks!

This PR attempts to resolve [SPARK-1704](https://issues.apache.org/jira/browse/SPARK-1704) by introducing a physical plan for EXPLAIN commands, which just prints out the debug string (containing various SparkSQL's plans) of the corresponding QueryExecution for the actual query. Author: Zongheng Yang <zongheng.y@gmail.com> Closes apache#1003 from concretevitamin/explain-cmd and squashes the following commits: 5b7911f [Zongheng Yang] Add a regression test. 1bfa379 [Zongheng Yang] Modify output(). 719ada9 [Zongheng Yang] Override otherCopyArgs for ExplainCommandPhysical. 4318fd7 [Zongheng Yang] Make all output one Row. 439c6ab [Zongheng Yang] Minor cleanups. 408f574 [Zongheng Yang] SPARK-1704: Add CommandStrategy and ExplainCommandPhysical.

qiaohaijun · 2014-11-03T07:22:36Z

+1

concretevitamin added 2 commits June 6, 2014 16:13

SPARK-1704: Add CommandStrategy and ExplainCommandPhysical.

408f574

Minor cleanups.

439c6ab

rxin reviewed Jun 7, 2014
View reviewed changes

Make all output one Row.

4318fd7

Override otherCopyArgs for ExplainCommandPhysical.

719ada9

Modify output().

1bfa379

Add a regression test.

5b7911f

marmbrus reviewed Jun 9, 2014
View reviewed changes

asfgit closed this in a9ec033 Jun 9, 2014

concretevitamin deleted the explain-cmd branch June 9, 2014 23:51

marmbrus mentioned this pull request Jun 10, 2014

[WIP] Add CLI Support for Catalyst amplab/shark#337

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-1704][SQL] Fully support EXPLAIN commands as SchemaRDD. #1003

[SPARK-1704][SQL] Fully support EXPLAIN commands as SchemaRDD. #1003

concretevitamin commented Jun 7, 2014

AmplabJenkins commented Jun 7, 2014

AmplabJenkins commented Jun 7, 2014

AmplabJenkins commented Jun 7, 2014

AmplabJenkins commented Jun 7, 2014

rxin Jun 7, 2014

concretevitamin Jun 7, 2014

AmplabJenkins commented Jun 7, 2014

AmplabJenkins commented Jun 7, 2014

rxin commented Jun 7, 2014

AmplabJenkins commented Jun 7, 2014

AmplabJenkins commented Jun 7, 2014

marmbrus commented Jun 7, 2014

concretevitamin commented Jun 7, 2014

AmplabJenkins commented Jun 7, 2014

AmplabJenkins commented Jun 7, 2014

marmbrus commented Jun 7, 2014

rxin commented Jun 7, 2014

AmplabJenkins commented Jun 7, 2014

AmplabJenkins commented Jun 7, 2014

concretevitamin commented Jun 9, 2014

AmplabJenkins commented Jun 9, 2014

AmplabJenkins commented Jun 9, 2014

AmplabJenkins commented Jun 9, 2014

AmplabJenkins commented Jun 9, 2014

marmbrus Jun 9, 2014

marmbrus commented Jun 9, 2014

qiaohaijun commented Nov 3, 2014

[SPARK-1704][SQL] Fully support EXPLAIN commands as SchemaRDD. #1003

[SPARK-1704][SQL] Fully support EXPLAIN commands as SchemaRDD. #1003

Conversation

concretevitamin commented Jun 7, 2014

AmplabJenkins commented Jun 7, 2014

AmplabJenkins commented Jun 7, 2014

AmplabJenkins commented Jun 7, 2014

AmplabJenkins commented Jun 7, 2014

rxin Jun 7, 2014

Choose a reason for hiding this comment

concretevitamin Jun 7, 2014

Choose a reason for hiding this comment

AmplabJenkins commented Jun 7, 2014

AmplabJenkins commented Jun 7, 2014

rxin commented Jun 7, 2014

AmplabJenkins commented Jun 7, 2014

AmplabJenkins commented Jun 7, 2014

marmbrus commented Jun 7, 2014

concretevitamin commented Jun 7, 2014

AmplabJenkins commented Jun 7, 2014

AmplabJenkins commented Jun 7, 2014

marmbrus commented Jun 7, 2014

rxin commented Jun 7, 2014

AmplabJenkins commented Jun 7, 2014

AmplabJenkins commented Jun 7, 2014

concretevitamin commented Jun 9, 2014

AmplabJenkins commented Jun 9, 2014

AmplabJenkins commented Jun 9, 2014

AmplabJenkins commented Jun 9, 2014

AmplabJenkins commented Jun 9, 2014

marmbrus Jun 9, 2014

Choose a reason for hiding this comment

marmbrus commented Jun 9, 2014

qiaohaijun commented Nov 3, 2014