-
Notifications
You must be signed in to change notification settings - Fork 377
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add SQL election module #3318
base: master
Are you sure you want to change the base?
Add SQL election module #3318
Conversation
util/election2/sql/election.go
Outdated
} | ||
defer tx.Rollback() | ||
row := tx.QueryRow( | ||
"SELECT leader, last_update FROM leader_election WHERE resource_id = $1", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't the $<num>
format be problematic in mysql?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, switched to ?
, which probably makes postgres-likes unhappy, but since sqlite is happy with both and there's already a MySQL backend for trillian, that makes sense.
util/election2/sql/election.go
Outdated
} | ||
|
||
func (e *Election) initializeLock(ctx context.Context) error { | ||
insert, err := e.db.Prepare("INSERT INTO leader_election (resource_id, leader, last_update) VALUES ($1, $2, $3)") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't we need to handle the case where more than one replica executes an initialization at the same time? Or is the idea is to crash loop, restart and continue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the reminder to actually handle the error in NewElection
. The idea is that leader_election
has a primary-key index on resource_id
, so attempting a second INSERT
with the same resource_id
will simply fail with a conflict. But we should handle other SQL errors if there's a good way to do so.
Note that we also don't have a good way to clean up resource_id
rows after they've been created... that's probably not a problem, as resource_id
s are probably pretty stable, but still...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And, fixing the lint errors and adding a test to cover this case, I realize that "INSERT had a conflict" is not a standardized error, so I need to do the query first to look for sql.ErrNoRows
, which is standardized. ¯\_(ツ)_/¯
/gcbrun |
/gcbrun |
I fixed the remaining lint issue and added what I think is the right integration hooks. I'm still working on running the integration tests -- in particular, adding a test that uses the |
/gcbrun |
cmd/trillian_log_signer/main.go
Outdated
@@ -47,13 +47,15 @@ import ( | |||
"github.com/google/trillian/util/election" | |||
"github.com/google/trillian/util/election2" | |||
etcdelect "github.com/google/trillian/util/election2/etcd" | |||
"github.com/google/trillian/util/election2/sql" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment relates to the package of the new code. I recommend mysql
over sql
to be clear at which implementation it is targeting, and for consistency with the flag names used in this file to pick it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Ah! Linter found something |
/gcbrun |
cmd/trillian_log_signer/main.go
Outdated
@@ -71,6 +73,7 @@ var ( | |||
numSeqFlag = flag.Int("num_sequencers", 10, "Number of sequencer workers to run in parallel") | |||
sequencerGuardWindowFlag = flag.Duration("sequencer_guard_window", 0, "If set, the time elapsed before submitted leaves are eligible for sequencing") | |||
forceMaster = flag.Bool("force_master", false, "If true, assume master for all logs") | |||
electionBackend = flag.String("election_backend", "etcd", fmt.Sprintf("Election backend to use. One of: mysql, etcd, noop")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: rename to electionSystem to match with quotaSystem
@@ -143,13 +146,22 @@ func main() { | |||
instanceID := fmt.Sprintf("%s.%d", hostname, os.Getpid()) | |||
var electionFactory election2.Factory | |||
switch { | |||
case *forceMaster: | |||
case *forceMaster || *electionBackend == "noop": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not fantastic that we can now control this behavior with two different flags, but I also cannot thing of a better way forward for now. On the long run, we could / should probably deprecate forceMaster. No action required!
currentLeader leaderData | ||
leaderLock sync.Cond | ||
|
||
// If a channel is supplied with the cancel, it will be signalled when the election routine has exited. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In practice, cancel is always created by Await, and Resign always sends a message there? I'm not sure I understand what's conditional.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wrote this before realizing that I could re-use Resign
for Close
, and I think one of those did not wait for the error in the first implementation. I can remove the comment, and change the type from *chan error
to chan error
probably...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha - makes sense to me to remove the comment then yes!
} | ||
|
||
// Close implements election2.Election | ||
func (e *Election) Close(ctx context.Context) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given this is essentially a wrapper around Resign, this election object could be re-used for a new election, which is slightly at odds with the interface definition. In practice, I don't think anything relies on this behaviour, and there aren't any resources that can be released, so I think it's safe. Does that match your understanding?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes.
Do you want me to make the object unusable, to catch errors that might occur by calling other methods after Close
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you can think of a way of an elegant way of doing this, sure thing. I don't think it's super super important, it's just some extra care to make sure it's aligned with the interface definition :)
// WithMastership implements election2.Election | ||
func (e *Election) WithMastership(ctx context.Context) (context.Context, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having the inclusivity and diversity in mind, may I suggest replacing master
with another inclusive term?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can rewrite the election2.Election
interface if you'd like. 😁
Otherwise, I think those are the only two instances of the word "master" rather than "leader".
@evankanderson thanks for this PR as it simplifies Sigstore dependencies. Do we have any plans to merge this PR any soon? Thanks |
Sorry, I got distracted by this by life. If this is still of interest, I can try to revive it after my camping break this week -- maybe by around 14 July or so. |
Adds a generic SQL leader election module, as discussed in #3219; it sounds like Sigstore is looking at possibly using this for simplicity.
Fixes #3219
Note that I haven't yet wired this up into Trillian nor added database table declarations -- I'm happy to do that in this PR or a follow-up. I'm not including this in the changelog until it's wired in, but I wanted to get feedback on how to do that first.
Checklist