-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SP Selection Improvement #545
Comments
@jcace i personally am the biggest fan of the automated approach number 2. we can find ways to prevent the scalability issues i.e. stochastic sampling of the SP pool + periodically resampling. it doesn't sound too hard to write and only requires an estuary code change, meaning brand new estuary nodes and SPs will get the benefit by default, and no one needs to track anything manually |
I certainly like the automated approach. I think we can apply the same logic for miner and shuttle selection. We use the multiaddresses for miners, and use the equinix API for the shuttles (I have this on my metrics api). We can get the long/lat of each addresses and use that as the basis. I imagine we will have an input like this:
Output:
I imagine that if we have this, we can use it as a way to reliability look up miners for either making storage or retrieval deals. Let me know thoughts. |
I like both approaches tbh, however if the automated solution is smart enough, we may not need the first. The scoring could probably be some combination of IP geolocation and performance ping/historical stats with weightings as we see fit. Also might be able to make these configurable by the estuary administrators. For example:
|
I think we can start working on a standalone library for this. sp-selection-service and start just building the scaffolding. I assume you would need a DAO to access the DB to get the miner list. I was thinking we should have the core function (and exposed them via endpoints):
*shuttles will have the same and we can use it as an alternative to the "/viewer" endpoint. The inputs can be configurable by admin. We can add more param input to it and we can "protect" this endpoint with Estuary-Auth. Let me know your thoughts. |
Do we actually care where the SPs are located? For example imagine:
In this scenario we want to give SP B additional weighting because of its geographical location, despite it probably having worse download performance? Are we trying to provide a method to localize the data within a certain region, to support client needs? Reason for considering this is that the map-based geolocation forces us to rely on an external service for the IP-> Coordinates mapping (ex, ipinfo.io, ipgeolocation.io, etc). While free for < 150K requests per month, it requires a subscription if we start doing lots of queries and will require us to provision an API key. I couldn't find any reasonable open source methods to do that - so it introduces a bit of centralization there. Doing a strict ping/bandwidth based selection simplifies this, as all we care about is bandwidth between the shuttle and target SP (this metric also pertains to to deal transfer / retrieval times). We forego the concept of geographical location, but I presume that this will be captured by the bandwidth tests. Also simplifies it from the white label / Estuary fork perspective, if standing up a new instance of Estuary does not require an ipinfo API key to be provisioned. Thoughts? |
I like the microservices idea, but I wonder if it makes sense to have a single endpoint (maybe GraphQL?), giving a list of all providers with their associated stats. We could come up with some input parameters when making the request (ex, location), to help filter it down, and perhaps put in some pagination too. Ex)
Returns: [
{
"id": "f01234",
"addr": "/ip4/1.1.1.1/tcp/1111",
"deals": {
"open": 23,
"sealed": 412,
"slashed": 6
},
"location": {
"lat": -41.123,
"lon": 110.456,
"city": "NY",
"region": "NY",
"country": "USA"
},
"uptime_score": 0.9923, // % uptime based on some polling metric
"shuttle_connections": [
{
"name": "shuttle1",
"addr": "/ip4/2.2.2.2/tcp/2222",
"ping": 64.21,
}
...
]
}
...
] The SP sorting/selection; |
Had a call with @alvin-reyes @en0ma @jlogelin :
|
Did some research re: the geo-location aspect of this design.
I think the mapping of SP->Country could be useful as a filter parameter. There are some real use cases where data residency matters, and being in the same country likely means there's good network connectivity. However, I don't think it should be our only data point, as it can be wrong in some cases and irrelevant in others. I think the ping-based metrics for determining network speed are still needed. What's everyone's thoughts about this approach to implementing this geolocation feature (country only)? It will require a some work to port over the library and add IPv6 support. |
Did a bunch of research and testing recently, formulated a revised problem statement and plan - Let me know what you think! ProblemCurrently, Estuary lacks a system to route data streams in an intelligent way that maximizes data transfer speeds. This results in suboptimal, highly-variable upload speeds for clients adding content to Estuary, and during dealmaking when files are transferred to Storage Providers. SolutionAll file transfers (whether HTTPS upload or libp2p) utilize TCP as the underlying transport protocol. As TCP is a stateful protocol, latency has a direct correlation with upload speeds. For instance, data transfer over a connection with 30ms RTT may cap out at over 300Mbit/sec, whereas a connection with 150ms RTT would be scaled back to 140Mbit/sec. This is a more than 50% reduction in transfer speed, and is very possible given the global distribution of Estuary nodes and participants in our ecosystem. Thus, to solve this problem, we need to measure the latency (ping) between nodes in the Estuary platform, and choose the lowest latency destination. For content uploads, this means Clients should ping all of our Shuttles, and choose the one with the lowest latency to upload to. For dealmaking, this means our Storage Providers should also be pinged, and selected based on their latency to the shuttles. How?Storage ProvidersStorage Providers will be provided with a simple, open-source Go program or shell script to be executed on their node. This program pings all of our shuttles, orders them based on lowest latency, and makes a call to a Example Script Output from running on my SP node f01886797 ┌────────────────────────┬────────┐
│ shuttle-4.estuary.tech │ 84.00 │
│ shuttle-5.estuary.tech │ 86.00 │
│ shuttle-6.estuary.tech │ 114.80 │
│ shuttle-7.estuary.tech │ 104.33 │
│ shuttle-8.estuary.tech │ 111.00 │
│ shuttle-1.estuary.tech │ 28.00 │
│ shuttle-2.estuary.tech │ 85.00 │
│ shuttle-3.estuary.tech │ 89.00 │
└────────────────────────┴────────┘
Shuttle precedence order:
1 | shuttle-1.estuary.tech
2 | shuttle-4.estuary.tech
3 | shuttle-2.estuary.tech
4 | shuttle-5.estuary.tech
5 | shuttle-3.estuary.tech
6 | shuttle-7.estuary.tech
7 | shuttle-8.estuary.tech
8 | shuttle-6.estuary.tech On the Estuary side, we'll "pivot" the data, assigning each SP node to a priority bucket/order based on each shuttle. The data will look something like this:
When making deals, the priority 0 / best SPs can be attempted first, before moving to the next priority bucket and so on. Note 1Only the shuttle precedence order will be captured. Ultimately, we only need to know the ordering and not the exact latency value. As the ping is initiated client-side, we don't want any incentive for SPs to "forge requests", attempting to game the system by setting all latencies to 0ms, for instance. Note 2Why Client-side? Why not initiate the ping request server-side? In short, because it's less complicated. If we initiate pings from our side, then the burden is on SP's to ensure their networks are configured properly to respond to ICMP requests on their public-facing router/firewall. They may not want, or be permitted to enable this functionality for security reasons. It will result in more administrative overhead on the Estuary teams as we support/troubleshoot SP's that do not show up in our system. Initiating the connection from the SP side will work every time, as we control the server side of the equation. SP Experience could be as simple as requesting them to run a single command, i.e:
This is a fairly normal thing to ask Storage Providers to do. We have tons of other services that we download and run to interface with different marketplaces, like bidbot, FilSwan Provider, and Evergreen/Spade requires custom scripting to pull deals from their API ClientsThe client side of the equation is quite straightforward. We'll use the same principle to benchmark connectivity between the client and the shuttles, and pick the one with the lowest latency to direct the file upload towards. This can be done in the browser, in the background, without any input from the user. I've verified that the principle works with a simple HTML page. Browser-based latency benchmarking is possible with a simple websocket. ICMP ping will not work as it can't be initiated from browser javascript code. This shuttle-ordering information could be added to the user's account metadata, so it can be automatically taken into account when direct API calls are made and the UI is bypassed. Low-level task breakdown
Why not geolocation?During initial investigation into this problem, we considered using traditional geolocation (country/city/region, latitude and longitude) data points to make the routing decisions. This approach would utilize an IP address<->geolocation service. This approach is suboptimal for a few reasons:
Geographic location is useful information, as there do exist data residency requirements in certain industries and countries. This could be a valuable use case to support, but it is a different problem from the one we're trying to solve here. The Filecoin Spade (Slingshot/Evergreen) program is an example of one that does use SP geographic information to ensure geographic distribution of files stored. However, signing up to participate in the program requires official documentation (datacenter lease, Internet contract) to make this attestation. That introduces a considerable amount of administrative overhead. Opinion/Conclusion - as I see it, Estuary is currently an international, borderless ecosystem. We don't need to know or control where data is going in a certain geographical region, and doing so doesn't provide us with the optimal SP/Client experience. Our only driver right now is to maximize the speed at which clients can upload data, and the speed at which that data makes it on to the Storage Providers. That's best accomplished with by optimizing for network latency above all else. |
@jcace - it looks like we can interrogate the SPs letancy through our libp2p node (estuary main and/or shuttles). See https://filecoinproject.slack.com/archives/C016APFREQK/p1670270536095929?thread_ts=1670270103.300649&cid=C016APFREQK |
Just tested the This allows us to ping SPs from our side as it does not require any additional configuration from them, if they have Lotus set up properly. Will rework the game plan when I create tasks to use this approach instead. |
Expanding on the SP side of @alvin-reyes 's issue on shuttle selection improvement:
Problem
For miners, there is no way to optionally specify their location that could give them a better chance of sealing a deal for large content from a more favourable (nearest) shuttle.
This results in an issue where SP's can receive a lot of deals from an estuary shuttle that they have a very slow/limited connection to. The deals have no chance of being completed, and the available bandwidth will be consumed trying to stream many deals all at once.
Current geographic distribution of SPs (blue) and Estuary shuttles (red):
Performance issues arise when a shuttle streams data to an SP far away (in the example above, SP node in Vancouver being dealt 138 deals from Shuttle-5 which is in Tokyo)
When a fast shuttle streams deals to a miner it's a beautiful sight 🥹. Downloads complete in mere minutes.
Potential Solution
Model Update: Shuttle Preference
The StorageMiner struct should be updated to contain an additional field ShuttlePreference. This would be a slice containing Shuttle identifiers, in order from most preferred to least preferred.
Replication: Miner Selection
Currently, SP selection for dealmaking is essentially random. Update the Miner Selection code to consider shuttle preference.
Determining ShuttlePreference
Manual Approach
Initially, this can be accomplished with a simple UI for the miner to order shuttles on their Estuary dashboard. A reorderable list component like this:
The list would contain Shuttle identifiers, IP address and geographic region. That way, miners could self-order by geographic proximity but also perform ping tests if desired and order based on network speed.
On the backend, a protected API endpoint could be added under
/miners/shuttle-preference
to allow the preference to be specified programmatically.Automated Approach
A couple of ideas for automating the ShuttlePreference.
estuary-shuttle-bench
, make it available for SPs to download and execute on their node. The tool would ping/run a speedtest against each of the shuttles, order them, and make a call to the/miners/shuttle-preference
API to set shuttle preference automatically.The text was updated successfully, but these errors were encountered: