Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add SQL election module #3318

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

evankanderson
Copy link

Adds a generic SQL leader election module, as discussed in #3219; it sounds like Sigstore is looking at possibly using this for simplicity.

Fixes #3219

Note that I haven't yet wired this up into Trillian nor added database table declarations -- I'm happy to do that in this PR or a follow-up. I'm not including this in the changelog until it's wired in, but I wanted to get feedback on how to do that first.

Checklist

@evankanderson evankanderson requested a review from a team as a code owner January 27, 2024 00:13
@evankanderson evankanderson requested a review from phbnf January 27, 2024 00:13
util/election2/sql/election.go Outdated Show resolved Hide resolved
}
defer tx.Rollback()
row := tx.QueryRow(
"SELECT leader, last_update FROM leader_election WHERE resource_id = $1",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't the $<num> format be problematic in mysql?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, switched to ?, which probably makes postgres-likes unhappy, but since sqlite is happy with both and there's already a MySQL backend for trillian, that makes sense.

}

func (e *Election) initializeLock(ctx context.Context) error {
insert, err := e.db.Prepare("INSERT INTO leader_election (resource_id, leader, last_update) VALUES ($1, $2, $3)")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't we need to handle the case where more than one replica executes an initialization at the same time? Or is the idea is to crash loop, restart and continue?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the reminder to actually handle the error in NewElection. The idea is that leader_election has a primary-key index on resource_id, so attempting a second INSERT with the same resource_id will simply fail with a conflict. But we should handle other SQL errors if there's a good way to do so.

Note that we also don't have a good way to clean up resource_id rows after they've been created... that's probably not a problem, as resource_ids are probably pretty stable, but still...

Copy link
Author

@evankanderson evankanderson Jan 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And, fixing the lint errors and adding a test to cover this case, I realize that "INSERT had a conflict" is not a standardized error, so I need to do the query first to look for sql.ErrNoRows, which is standardized. ¯\_(ツ)_/¯

@JAORMX JAORMX requested review from roger2hk and AlCutter January 28, 2024 08:58
@roger2hk
Copy link
Contributor

/gcbrun

@JAORMX
Copy link
Collaborator

JAORMX commented Jan 29, 2024

/gcbrun

@evankanderson
Copy link
Author

I fixed the remaining lint issue and added what I think is the right integration hooks. I'm still working on running the integration tests -- in particular, adding a test that uses the --election_backend=mysql option.

@evankanderson
Copy link
Author

/gcbrun

@@ -47,13 +47,15 @@ import (
"github.com/google/trillian/util/election"
"github.com/google/trillian/util/election2"
etcdelect "github.com/google/trillian/util/election2/etcd"
"github.com/google/trillian/util/election2/sql"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment relates to the package of the new code. I recommend mysql over sql to be clear at which implementation it is targeting, and for consistency with the flag names used in this file to pick it.

Copy link
Collaborator

@JAORMX JAORMX left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@JAORMX
Copy link
Collaborator

JAORMX commented Jan 30, 2024

Ah! Linter found something

@JAORMX
Copy link
Collaborator

JAORMX commented Jan 30, 2024

/gcbrun

@@ -71,6 +73,7 @@ var (
numSeqFlag = flag.Int("num_sequencers", 10, "Number of sequencer workers to run in parallel")
sequencerGuardWindowFlag = flag.Duration("sequencer_guard_window", 0, "If set, the time elapsed before submitted leaves are eligible for sequencing")
forceMaster = flag.Bool("force_master", false, "If true, assume master for all logs")
electionBackend = flag.String("election_backend", "etcd", fmt.Sprintf("Election backend to use. One of: mysql, etcd, noop"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: rename to electionSystem to match with quotaSystem

@@ -143,13 +146,22 @@ func main() {
instanceID := fmt.Sprintf("%s.%d", hostname, os.Getpid())
var electionFactory election2.Factory
switch {
case *forceMaster:
case *forceMaster || *electionBackend == "noop":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not fantastic that we can now control this behavior with two different flags, but I also cannot thing of a better way forward for now. On the long run, we could / should probably deprecate forceMaster. No action required!

currentLeader leaderData
leaderLock sync.Cond

// If a channel is supplied with the cancel, it will be signalled when the election routine has exited.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In practice, cancel is always created by Await, and Resign always sends a message there? I'm not sure I understand what's conditional.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wrote this before realizing that I could re-use Resign for Close, and I think one of those did not wait for the error in the first implementation. I can remove the comment, and change the type from *chan error to chan error probably...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha - makes sense to me to remove the comment then yes!

}

// Close implements election2.Election
func (e *Election) Close(ctx context.Context) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given this is essentially a wrapper around Resign, this election object could be re-used for a new election, which is slightly at odds with the interface definition. In practice, I don't think anything relies on this behaviour, and there aren't any resources that can be released, so I think it's safe. Does that match your understanding?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

Do you want me to make the object unusable, to catch errors that might occur by calling other methods after Close?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you can think of a way of an elegant way of doing this, sure thing. I don't think it's super super important, it's just some extra care to make sure it's aligned with the interface definition :)

Comment on lines +100 to +101
// WithMastership implements election2.Election
func (e *Election) WithMastership(ctx context.Context) (context.Context, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having the inclusivity and diversity in mind, may I suggest replacing master with another inclusive term?

https://developers.google.com/style/word-list#master

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can rewrite the election2.Election interface if you'd like. 😁

Otherwise, I think those are the only two instances of the word "master" rather than "leader".

@vipulagarwal
Copy link

@evankanderson thanks for this PR as it simplifies Sigstore dependencies. Do we have any plans to merge this PR any soon? Thanks

@evankanderson
Copy link
Author

Sorry, I got distracted by this by life. If this is still of interest, I can try to revive it after my camping break this week -- maybe by around 14 July or so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Leader election using SQL database?
6 participants