Skip to content
This repository has been archived by the owner on Nov 24, 2022. It is now read-only.

report backchannel events #57

Merged
merged 6 commits into from
Oct 7, 2021
Merged

report backchannel events #57

merged 6 commits into from
Oct 7, 2021

Conversation

merlinran
Copy link
Contributor

No description provided.

Copy link
Contributor Author

@merlinran merlinran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

solicit some thoughts



message BidBotEvent {
google.protobuf.Timestamp ts = 1;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding this considering: 1) it may take longer than anticipated time for the event to arrive auctioneer; 2) the local time where bidbot runs may not be synchronized. Both are very unlikely, so not sure if it's worth adding this field for every message.

Imagine a timeline of events for a bid showing up on the UI, for consistency we may prefer to show the timestamp the events being received by auctioneer. your thoughts @asutula ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Imagine a timeline of events for a bid showing up on the UI, for consistency we may prefer to show the timestamp the events being received by auctioneer.

💯

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say it's better to over collect here so we have the options of displaying either or both of the timestamps (bidbot and auctioneer).

string semantic_version = 1;
string storage_provider_id = 2;
uint64 deal_start_window = 3;
bool cid_gravity_configured = 4;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a few flags we care about most. anything missing?

Copy link
Contributor

@jsign jsign Oct 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving storage_provider_id to the top level BidBotEvent may make life easier in any way for auctioneer?
At least is self-contained and we don't need PeerID mappings to understand from where is an event (I mean for other oneof that don't contain the provider id).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah good point. I'll see if auctioneer need this field for storing data or logging when implementing this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea good point @jsign. I was also wondering how auctioneer would know who is sending these events? Is there any auth mechanism? If we already are using libp2p and we know who the connected bitbots are there, would it be better to leverage that channel instead of grpc?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, or are these protos going to be sent over libp2p? Maybe I just answered my own question.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes they are sent over libp2p.

oneof type {
StartUp start_up = 2;
StartFetching start_fetching = 3;
ErrorFetching error_fetching = 4;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no DoneFetching because it's immediately followed by StartImporting

string bid_id = 1;
uint32 attempts = 2;
};
message EndImporting {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EndImporting can be either success or error.

Signed-off-by: Merlin Ran <merlinran@gmail.com>
Signed-off-by: Merlin Ran <merlinran@gmail.com>
Signed-off-by: Merlin Ran <merlinran@gmail.com>
@merlinran merlinran force-pushed the merlin/backchannel-event branch from 9e9cfaf to 5769ff6 Compare October 5, 2021 19:40
Signed-off-by: Merlin Ran <merlinran@gmail.com>
@merlinran merlinran force-pushed the merlin/backchannel-event branch 4 times, most recently from 25ccef2 to 1e55743 Compare October 6, 2021 13:49
Signed-off-by: Merlin Ran <merlinran@gmail.com>
@merlinran merlinran force-pushed the merlin/backchannel-event branch 4 times, most recently from 8df3dba to 213f80b Compare October 7, 2021 00:02
@merlinran merlinran changed the title add protobuf definitions for the backchannel events report backchannel events Oct 7, 2021
@merlinran merlinran force-pushed the merlin/backchannel-event branch 2 times, most recently from 9b40580 to 2c1d907 Compare October 7, 2021 00:41
Signed-off-by: Merlin Ran <merlinran@gmail.com>
@merlinran merlinran force-pushed the merlin/backchannel-event branch from 2c1d907 to 411dd01 Compare October 7, 2021 00:58
Copy link
Contributor Author

@merlinran merlinran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR does the following:

  1. Added protobuf definitions about the events. already reviewed.
  2. Added a global topic for the events.
  3. Refactored the code to extract the libp2p pubsub code to a separate file, for easier adding features.
  4. Finally, code to fire up the events.

While testing with make up, I noticed a weird issue - the first auction ran fine, but the subsequent ones kept failing, because auctioneer timed out getting the response to the winning bid message, even though bidbot apparently sent it back. Eventually I found out it is not new in this PR, but rather introduced somewhere between v0.1.0 and the current main tip. Will debug as the first thing as it could affect miners if it's a real issue.

@@ -40,7 +39,6 @@ require (
github.com/textileio/crypto v0.0.0-20210929130053-08edebc3361a
github.com/textileio/go-datastore-extensions v1.0.1
github.com/textileio/go-ds-badger3 v0.0.0-20210324034212-7b7fb3be3d1c
github.com/textileio/go-ds-mongo v0.1.4
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const Topic string = "/textile/auction/0.0.1"

// BidbotEventsTopic is used by bidbots to notify auctioneers various events, mainly around the lifecycle of bids.
var BidbotEventsTopic string = path.Join(Topic, "bidbot_events")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initially I created one topic per bidbot, but it's tricky because auctioneer need to subscribe to them before they fire up the startup message. Using a single topic is much easier.

bool cid_gravity_configured = 4;
bool cid_gravity_strict = 5;
};
message Unhealthy {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added this one. Can be extended to other problem reporting in the future. Don't like the name though

)

// CommChannel represents the communication channel with auctioneer.
type CommChannel interface {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extracted out the pubsub code along the way. I've long thought about doing this, to avoid cluttering business logic with communication details. I didn't touch the tests but later on if we add more complex logic, we can test them without firing up libp2p pubsub instances. Worth a better name though.

"google.golang.org/protobuf/types/known/timestamppb"
)

type progressReporter struct {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pretty boring. just compose and fire up the protobuf messages.

@@ -163,7 +163,6 @@ type Service struct {

ctx context.Context
finalizer *finalizer.Finalizer
lk sync.Mutex
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This lock was removed when extracting the libp2p pubsub code. It was for locking Subscribe, but it's only called once, so became useless.

s.peer.Bootstrap()
err := s.commChannel.Subscribe(bootstrap, s)
if err == nil {
time.AfterFunc(30*time.Second, s.reportStartup)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It wasted me hours trying to find a way to see if auctioneer is connected. no luck, falls back to this hack. If bidbot doesn't hold for 30s, it probably shouldn't be considered started up.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably I might not be totally understanding, but can't host.Network().Connectedness(auctioneersPeerID) help answer that question?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't know this one actually, thanks! Follow this path I see I can also register a Notifiee via host.Network().Notify() to avoid polling, but that involves more code. Maybe not bad to lean on the simpler side?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Glad it was useful! Yeah, I'd lean on the simpler side. Not a big deal.

pcid, err := cid.Decode(prop.ProposalCid)
if err != nil {
log.Errorf("decoding proposal cid: %v", err)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

error is printed by the caller of ProposalsHandler now

}
log.Infof("bidbot up %v, %d connected peers",
time.Since(startAt), len(s.peer.ListPeers()))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was not very useful - just puzzled miners. Removed this and renamed the method.

@@ -554,6 +571,7 @@ func (s *Store) fetchWorker(num int) {
if b.DataURIFetchAttempts >= s.dealDataFetchAttempts {
status = BidStatusErrored
s.bytesLimiter.Withdraw(string(b.AuctionID))
s.dealProgressReporter.Errored(b.ID, b.ErrorCause)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are where the events originated.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

@merlinran merlinran marked this pull request as ready for review October 7, 2021 01:13
Copy link
Contributor

@jsign jsign left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice! Looking good.

@merlinran, I'm wondering if worth tagging a version before merging this PR. At least to have a version for the closed auctions that we know some miners are running now and looks safe; and leave these changes for the next version.

s.peer.Bootstrap()
err := s.commChannel.Subscribe(bootstrap, s)
if err == nil {
time.AfterFunc(30*time.Second, s.reportStartup)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably I might not be totally understanding, but can't host.Network().Connectedness(auctioneersPeerID) help answer that question?

@@ -554,6 +571,7 @@ func (s *Store) fetchWorker(num int) {
if b.DataURIFetchAttempts >= s.dealDataFetchAttempts {
status = BidStatusErrored
s.bytesLimiter.Withdraw(string(b.AuctionID))
s.dealProgressReporter.Errored(b.ID, b.ErrorCause)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

@merlinran
Copy link
Contributor Author

Good point on cutting a release. Will do after figuring out the weirdness of make up and before merging this.

@merlinran
Copy link
Contributor Author

Okay cut off v0.1.1 for the closed auction @jsign , and verified the make up issue is something with my auction system changes (not found out yet), irrelevant to bidbot (glad that the miners are unaffected!). Merging!

@merlinran merlinran merged commit 32d70a4 into main Oct 7, 2021
@merlinran merlinran deleted the merlin/backchannel-event branch October 7, 2021 14:53
@jsign
Copy link
Contributor

jsign commented Oct 7, 2021

Okay cut off v0.1.1 for the closed auction @jsign , and verified the make up issue is something with my auction system changes (not found out yet), irrelevant to bidbot (glad that the miners are unaffected!). Merging!

Excellent, I'll mention that tag in the notion doc where we refer to the feature. Thanks!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants