Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

distsql, sql, ui: update SQL metrics #12299

Merged
merged 1 commit into from
Dec 15, 2016
Merged

Conversation

rjnn
Copy link
Contributor

@rjnn rjnn commented Dec 12, 2016

Update the SQL metrics to track DistSQL queries separately from
non-DistSQL queries in the Admin UI. Also fixed a TODO to stop
double-counting transactions that are retried. This is the first PR
in a series to add all the features listed in #12143.

The Admin UI part is WIP, since we might want to list these under a new "SQL internals" page, since users may not know or care what DistSQL is. But the work in the backend is the same.


This change is Reviewable

@rjnn rjnn requested a review from cuongdo December 12, 2016 22:51
@rjnn rjnn assigned rjnn and unassigned rjnn Dec 12, 2016
@rjnn
Copy link
Contributor Author

rjnn commented Dec 12, 2016

cc @vivekmenezes @andreimatei

@cuongdo
Copy link
Contributor

cuongdo commented Dec 13, 2016

Looks mostly good to me. @andreimatei should look at the changes to the txn retry code.


Review status: 0 of 10 files reviewed at latest revision, 3 unresolved discussions, all commit checks successful.


pkg/sql/metric_test.go, line 41 at r1 (raw file):

		txnBeginCount    int64
		selectCount      int64
		distSQLCount     int64

is there a test where distSQLCount > 0?


pkg/sql/metric_test.go, line 235 at r1 (raw file):

		t.Error(err)
	}
	if err := checkCounterEQ(s, sql.MetaSelect, 0); err != nil {

So, the semantics of the SELECT counter are changing? I'm not sure if this is the right thing. There's a argument to be made that counters should log all attempts instead of just successful executions. Also, don't the counters for UPDATEs and other queries now have different semantics from SELECT with your change?


pkg/ui/app/containers/nodeGraphs.tsx, line 104 at r1 (raw file):

            <LineGraph title="Reads" sources={sources} tooltip={`The average number of SELECT statements per second ${specifier}.`}>
              <Axis format={ d3.format(".1f") }>
                <Metric name="cr.node.sql.select.count" title="Non-distsql Reads" nonNegativeRate />

distsql is capitalized differently than the following line. Also, shouldn't these be DistSQL, because SQL is an acronym?


Comments from Reviewable

@andreimatei
Copy link
Contributor

I think it'd be nice if we could use different dimensions of a metric to count different related things. For example, ideally, distsql/classic would be one dimension of the statements metric. original/auto-retry/user-directed-retry would be another dimension. @mrtracy, our metrics don't support having multiple dimensions, do they?


Review status: 0 of 10 files reviewed at latest revision, 10 unresolved discussions, all commit checks successful.


pkg/internal/client/txn.go, line 482 at r1 (raw file):

// used.
func (txn *Txn) Exec(
	opt TxnExecOptions, fn func(txn *Txn, opt *TxnExecOptions, isRetry bool) error,

mmm do we really need to add this param to this core interface that just one client needs? That one client can capture a variable from the outer scope, just like it does for a bunch of other stuff.


pkg/sql/executor.go, line 199 at r1 (raw file):

	// Transient stats.
	Latency            *metric.Histogram
	SelectCount        *metric.Counter

comment whether the two overlap or not. As I've said below, I think the first should be a super-set of the 2nd.


pkg/sql/executor.go, line 978 at r1 (raw file):

//  so that nested BEGIN statements are caught.
// stmtTimestamp: Used as the statement_timestamp().
// isRetry: A boolean that is set for retries so that we don't double count in

well so the way you wrote the code, isRetry is gonna be set for automatic retries, but not for user-directed retries. I think that this is probably the behavior we want wrt counting queries in metrics, but this should be documented somewhere and this param should be named accordingly.


pkg/sql/executor.go, line 1340 at r1 (raw file):

// updateStmtCounts updates metrics for the number of times the different types of SQL
// statements have been received by this node.
func (e *Executor) updateStmtCounts(stmt parser.Statement, isRetry bool) {

nit: i'd have the parent not call this on retries


pkg/sql/executor.go, line 1350 at r1 (raw file):

		e.TxnBeginCount.Inc(1)
	case *parser.Select:
		// Do nothing as SELECTs are counted separately, depending on whether

I think it's very unfortunate to start counting these things in different places. How about we count selects here just like everything else (and so a select would always be counted), and we just count the distsql metric separately.


pkg/sql/metric_test.go, line 115 at r1 (raw file):

		}

		// Everything after this query will also fail, so quit now to avoid deluge of errors.

well if you're giving this test some love, how about changing it to the use sub-tests and checking the delta between current value and the value at the beginning on the sub-test, instead of doing this early fail?


pkg/sql/metric_test.go, line 184 at r1 (raw file):

		t.Fatal(err)
	}
	query := "INSERT INTO db.t VALUES ('key', 'marker')"

superfluous change? same below


pkg/sql/metric_test.go, line 235 at r1 (raw file):

Previously, cuongdo (Cuong Do) wrote…

So, the semantics of the SELECT counter are changing? I'm not sure if this is the right thing. There's a argument to be made that counters should log all attempts instead of just successful executions. Also, don't the counters for UPDATEs and other queries now have different semantics from SELECT with your change?

yeah, how did this change?


Comments from Reviewable

@@ -478,7 +478,9 @@ func (e *AutoCommitError) Error() string {
// to clean up the transaction before returning an error. In case of
// TransactionAbortedError, txn is reset to a fresh transaction, ready to be
// used.
func (txn *Txn) Exec(opt TxnExecOptions, fn func(txn *Txn, opt *TxnExecOptions) error) (err error) {
func (txn *Txn) Exec(
opt TxnExecOptions, fn func(txn *Txn, opt *TxnExecOptions, isRetry bool) error,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make isRetry a part of the TxnExecOptions?

@rjnn rjnn force-pushed the distsqlmetrics branch 6 times, most recently from 1408933 to a9e1c2a Compare December 14, 2016 23:19
@rjnn
Copy link
Contributor Author

rjnn commented Dec 14, 2016

Thank you for the reviews! Apologies for waiting a day, my email client didn't show me that there were reviews, so I didn't notice until I manually checked.

@andreimatei I think you're right. I've refactored it that way. Now distsql.select.count is a subset of select.count, and the semantics of select.count are left unchanged.


Review status: 0 of 5 files reviewed at latest revision, 11 unresolved discussions, some commit checks pending.


pkg/sql/executor.go, line 199 at r1 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

comment whether the two overlap or not. As I've said below, I think the first should be a super-set of the 2nd.

Done. I agree with you that DistSQLSelectCount should be a subset of SelectCount and have refactored.


pkg/sql/executor.go, line 978 at r1 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

well so the way you wrote the code, isRetry is gonna be set for automatic retries, but not for user-directed retries. I think that this is probably the behavior we want wrt counting queries in metrics, but this should be documented somewhere and this param should be named accordingly.

renamed to isAutomaticRetry, and added a comment in line 598 (where it first is created).


pkg/sql/executor.go, line 1340 at r1 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

nit: i'd have the parent not call this on retries

Done.


pkg/sql/executor.go, line 1350 at r1 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

I think it's very unfortunate to start counting these things in different places. How about we count selects here just like everything else (and so a select would always be counted), and we just count the distsql metric separately.

Done. You are correct, this is a much better way.


pkg/sql/metric_test.go, line 41 at r1 (raw file):

Previously, cuongdo (Cuong Do) wrote…

is there a test where distSQLCount > 0?

There is now! It's also renamed to distSQLSelectCount.


pkg/sql/metric_test.go, line 115 at r1 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

well if you're giving this test some love, how about changing it to the use sub-tests and checking the delta between current value and the value at the beginning on the sub-test, instead of doing this early fail?

Done. I've earned some cleanup karma for today. We use subtests, and the test values are now defined in terms of increments so its easier to understand whats going on. PTAL.


pkg/sql/metric_test.go, line 184 at r1 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

superfluous change? same below

Done.


pkg/sql/metric_test.go, line 235 at r1 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

yeah, how did this change?

This was a mistake. Reverted. The semantics of SELECT should not change.


pkg/ui/app/containers/nodeGraphs.tsx, line 104 at r1 (raw file):

Previously, cuongdo (Cuong Do) wrote…

distsql is capitalized differently than the following line. Also, shouldn't these be DistSQL, because SQL is an acronym?

Done.


pkg/internal/client/txn.go, line 482 at r1 (raw file):

Previously, vivekmenezes wrote…

Can you make isRetry a part of the TxnExecOptions?

Followed Andrei's suggestion below instead.


pkg/internal/client/txn.go, line 482 at r1 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

mmm do we really need to add this param to this core interface that just one client needs? That one client can capture a variable from the outer scope, just like it does for a bunch of other stuff.

Yes, that's a better idea. Thanks!


Comments from Reviewable

@cuongdo
Copy link
Contributor

cuongdo commented Dec 15, 2016

@andreimatei: what do you mean by dimensions? I'm unclear whether the latest changes capture your intent.

Do you mean that the new DistSQL select count should be subordinate to the select count? If so, are you thinking about something like sql.select.distributed-count (I don't love the name, but it's an example).


Review status: 0 of 5 files reviewed at latest revision, 11 unresolved discussions, some commit checks failed.


Comments from Reviewable

@vivekmenezes
Copy link
Contributor

Reviewed 1 of 10 files at r1, 9 of 9 files at r2.
Review status: all files reviewed at latest revision, 10 unresolved discussions, some commit checks failed.


Comments from Reviewable

@andreimatei
Copy link
Contributor

:lgtm:


Review status: 4 of 5 files reviewed at latest revision, 7 unresolved discussions, all commit checks successful.


pkg/sql/executor.go, line 1007 at r3 (raw file):

	// Do not double count automatically retried transactions.
	if !isAutomaticRetry {

nit: I would move calling updateStmtCounts to the parent and not add the retry param to this method. Unless you need to pass the argument to execStmt() as I was saying in a comment below.


pkg/sql/executor.go, line 1329 at r3 (raw file):

		switch stmt.(type) {
		case *parser.Select:
			e.DistSQLSelectCount.Inc(1)

don't you need to pipe the retry argument to this level, and increment the metric conditionally?


pkg/sql/metric_test.go, line 115 at r1 (raw file):

Previously, arjunravinarayan (Arjun Narayan) wrote…

Done. I've earned some cleanup karma for today. We use subtests, and the test values are now defined in terms of increments so its easier to understand whats going on. PTAL.

much better test now :)


pkg/sql/metric_test.go, line 52 at r3 (raw file):

	var testcases = []queryCounter{
		// The counts are increments for each query.

nit: I think this comment, or the comment on queryCounter struct, should say something about deltas.


Comments from Reviewable

@cuongdo
Copy link
Contributor

cuongdo commented Dec 15, 2016

:lgtm:


Review status: 4 of 5 files reviewed at latest revision, 7 unresolved discussions, all commit checks successful.


Comments from Reviewable

Update the SQL metrics to track DistSQL queries separately from
non-DistSQL queries in the Admin UI. Also fixed a TODO to stop
double-counting transactions that are retried.
@rjnn
Copy link
Contributor Author

rjnn commented Dec 15, 2016

TFTRs!


Review status: 2 of 5 files reviewed at latest revision, 7 unresolved discussions, some commit checks pending.


pkg/sql/executor.go, line 1007 at r3 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

nit: I would move calling updateStmtCounts to the parent and not add the retry param to this method. Unless you need to pass the argument to execStmt() as I was saying in a comment below.

Doing the latter, so keeping it here.


pkg/sql/executor.go, line 1329 at r3 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

don't you need to pipe the retry argument to this level, and increment the metric conditionally?

Yes, you're right. Done.


pkg/sql/metric_test.go, line 52 at r3 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

nit: I think this comment, or the comment on queryCounter struct, should say something about deltas.

Done.


Comments from Reviewable

@rjnn rjnn merged commit 205badf into cockroachdb:master Dec 15, 2016
@rjnn rjnn deleted the distsqlmetrics branch December 15, 2016 20:44
@andreimatei
Copy link
Contributor

Review status: 2 of 5 files reviewed at latest revision, 7 unresolved discussions, all commit checks successful.


pkg/sql/executor.go, line 1329 at r3 (raw file):

Previously, arjunravinarayan (Arjun Narayan) wrote…

Yes, you're right. Done.

I think there's an error here - the execDistSQL() should not be conditional on isAutomaticRetry - only the metric.
Fixing in #14116


Comments from Reviewable

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants