Design and Connectivity Patterns

Partitioning workloads

Modularize application to functional units.
Each module
- Handles portion of application's overall functionality
- Represents set of related concerns.
Why?
- Easier to design both current & future iterations of your application.
- Modules can also be tested & distributed and otherwise verified in isolation.

Application traffic or load is distributed among various endpoints by using algorithms.
Allows
- Multiple instances of the website can be created
- They can behave in a predictable manner
- Flexibility to grow or shrink the number of instances in application without changing the expected behavior
Load balancing strategy considerations
- Physical vs Virtual Load balancers
  - Use Virtual Load Balancers (hosted in VMs) if company requires a very specific configuration.
- Load balancing algorithm
  - round robin => Selects next instance for each request based on a predetermined order that includes all of the instances.
  - random choice
- Configurations
  - Affinity/stickiness: If subsequent requests from the same client machine should be routed to the same service instance.
  - Required when application has state.

Leads to more resilient applications.
Implemented in .NET libraries (Entity Framework, Azure SDK etc)
Transient errors=> occur due to temporary interruptions in the service or to excess latency.
Many are self-healing and can be resolved with retry policy
Retry policy
- Retry when a temporary failure occurs.
- A break in the circuit => abort retries if it's a serious issue.

Provides a degree of consistency regardless of the behavior of the modules.
Direct method invocation
- Connection is severed on transient errors
- Use 3rd party queue to persist the requests beyond a temporary failure.
  - Allows you to audit failing requests independently.

Cloud applications must be sensitive to transient faults.
- E.g. loss of network connectivity, the temporary unavailability of a service, timeouts that arise when a service is busy.
They're typically self-correcting, if the action that triggered a fault is repeated after a suitable delay, it's likely to be successful.
- DB with too many concurrent requests can have throttling (fails until workload is eased). Fixes itself after some delay.
Solution: Retry for temporarily fails.
- Remote service => retry after short wait.
  - Fails again => Limit attempts to avoid brute forcing retry again until maximum tries are reached.
    - to spread requests from multiple instances of the application as evenly as possible.

Sudden large number of requests may cause unpredictable workload.
Single consumer => risk of being flooded, or messaging system being overloaded.
Solution: asynchronous messaging with variable quantities of message producers and consumers
- Business logic in the application is not blocked while the requests are being processed.
- Handle fluctuating workloads => system can run multiple instances of the consumer service.

Problem: Cached data consistency
- A strategy is needed to ensure that the data is up-to-date & handle situations where the data in cache has become stale.
Solution: read-through and write-through caching
- Cache-aside => Effectively loads data into the cache on demand if it's not already available in the cache.
  - Not in cache? Fetch & add it to cache, modifications on cache=> write to data store.

Problem: hosting large volumes of data in a traditional singe-instance store
Some limitations
- Storage space: Upgrading disks is not easy.
- Computing resources: It's not possible to always increase more
- Network bandwidth: Network traffic might exceed
- Geography: Reduce latency of data access for different across regions.
Scaling up can postpone affects but only temporary solution
Solution: partitioning data horizontally across many nodes
- Divide data store into horizontal partitions, or shards.
- Shard: Same schema but distinct subset of data.
- Sharding can be in data access code, or storage system with transparent sharding
  - Abstracting physical location => High level of control over which shard contain which data.
    - Easier to migrate between shard without touching application logic.
    - Tradeoff => Additional data access overhead to determine the location of each data item as it's retrieved
- For optimal performance & scalability
  - Split data in a way that's appropriate for the types of queries the application performs.
  - Sharding schema will exactly match requirements of every query.
  - E.g.:
    - In multi-tenant system => You lookup with tenant id + e.g. tenant's name, Tenant's name = sharding key