Fundamental of Distributed Systems

This article is derived from Designing Data-Intensive Applications.

Most modern systems are data-heavy, which means that compute, while an important aspect, is rarely the limiting factor for applications.

Almost all applications store, process and retrieve data in different ways, for various periods under different circumstances. Examples:

Databases store and retrieve user data.
In-memory caches store computed results for faster retrieval.
Queues store data to be processed.
Data warehouses store data so it can be processed by analytical workloads.
The list goes on & on…

To reason about these data systems, we need a few key terms & metrics.

Reliability

The system keeps working as expected when a fault occurs. These faults can be software faults, such as bugs, or hardware faults, such as disk failure. A system that can deal with faults under reasonable constraints is said to be fault-tolerant.

Scalability

The ability of the system to keep working under increased load without a deterioration in its performance. The load parameters are variables used to define load, as it can mean different things in different contexts. For example, a common load parameter used for backed systems is response time in seconds.

The performance of a system can be effectively measured by the median of the response time, which can be calculated by sorting all the response times for requests in a certain time window and finding the centre-most value.

For example:

p99 = 1.1 seconds

This means that 99% of requests were completed in <= 1.1 seconds. Amazon uses p999 to measure the scalability of its internal services.

Maintainability

Every successful software project gets worked on by multiple people/teams. A system must follow certain principles for it to be considered maintainable. Here are a few things to keep in mind:

Operability: The system must be operable without too much effort.
Simplicity: The system must be easy to understand for new developers. Abstractions are a good way to reduce accidental complexity.
Evolvability: The developers working on the system must be able to make changes to the system without facing major issues. This ties back to the system being observable and simple - these 2 characteristics go a long way in making sure a system is evolvable.