By Aiden Gallagher and Andy Garratt - Special thanks to Callum Jackson for his review and thoughts.
Introduction
Modern applications are built to be light-weight, easy to deploy and expendable - especially when combined with containerised deployments. Because of this, a train of thought suggests that applications should therefore be stateless.
"My Application is Stateless."
Whilst an application can be stateless, this does not help in scenarios where applications simply cannot be stateless such as an application processing booking which includes add-ons, or payments being processed with a card holders bank.
Even if state is secured primarily in some data store using some of the patterns outlined later in the article, it is still essential for processes like transformation, security and movement to be performed which are not best suited in the data store layer.
In this article we will discuss why state is important, how stateless processes are different to stateless applications, why stateful applications can be integral to a sophisticated and reliable system and why applications without any state have limited value.
Table of Contents
INTRODUCTION
12 FACTOR APPLICATIONS
SO, WHAT IS STATE?
CONSIDERATIONS OF STATE
STATE PATTERNS
Availability Patterns
Last Man Standing
Primary and replicas
Sharding
Storage Patterns
Data Lakes
Caching
Contention and locking
Transient State Patterns
Cargo Pattern
Event Sourcing
STATE PATTERNS IN CONTAINERS
Third Party Delegation (Volumes)
Replica Sets
Stateful Sets
MODERN STATEFUL APPLICATIONS
CONCLUSION
12 Factor Applications
One of the primary reference points for modern applications is the 12-Factor methodology, which gives advice on good practice when creating software as a service or web application.
A guiding principle of the 12-Factor (modern) Application is that applications should be stateless and that there is no sharing between processes.
"Twelve-factor processes are stateless and share-nothing. Any data that needs to persist must be stored in a stateful backing service, typically a database." - https://12factor.net/processes
The driver behind this decision is the ability to have self-contained processes that can be deleted, rebuilt or upgraded easily and without risk to user journeys; but state is important and in many cases it cannot and should not be avoided.
Even in the case of containers, state can be managed if thought about and cared about. The application is able to gain greater control and independence by not relying on an external service where a large amount of trust must be held by the application developer in regards to the reliability and security of the datastore.
So, what is State?
"The particular condition that someone or something is in at a specific time." - Oxford Dictionary
There are three types of state that are used as part of the management, creation and usage of applications; whether to handle the configuration of an application, the replication of applications for availability or the messages that pass through the application itself. These are;
· Long Term State - State that is held for a long period of time, usually in a database but occasionally in long term memory. This state is designed to be accessible over longer periods that might span generation of applications, software versions and even underlying products themselves.
Examples: Tax Returns, Grades received at university, Birth Death and Marriage Records, purchase records over time etc.
Examples: Tax Returns, Grades received at university, Birth Death and Marriage Records, purchase records over time etc.
Additionally, there is transient state in applications - live information that is subject to change whilst part of a transaction. The information is important to the complete process that is being fulfilled and without it, the transaction will fail, will be repeated or will no longer be subject to audit.
· Persisted Transient State - Whilst completing an application process, each point that the process reaches in the journey needs to be recorded for ‘in-flight recovery’ of transactions being made.
Examples: Payment failing - did it fail before or after the payment was made?, Application failed whilst adding new stock - Is the stock displaying the right amount or does it still need increasing?
Examples: Payment failing - did it fail before or after the payment was made?, Application failed whilst adding new stock - Is the stock displaying the right amount or does it still need increasing?
· Non-Persisted Transient State - During an application process the current progress is not needed. Often this is because a failure will not result in any harmful effects to the system the application interacts with.
Examples: An API that retrieves all existing supermarkets in a town, reloading a webpage with static data
Examples: An API that retrieves all existing supermarkets in a town, reloading a webpage with static data
State comes into play for a variety of reasons whether to pass relevant user information such as their permissions and account details, or to ensure that transactions are persisted, and their actions only committed a fixed number of times. State is also what is used to describe an application’s make up; its sizing’s, its connections, how it is expected to handle traffic by the wider system and how many instances are required - without this state applications cannot be deployed.
State can be difficult because it requires the tracking of a piece of data and any changes to that piece of data, it requires thoughts to what should happen to the state should a failure occur either in retrieving relevant state, the application flow, the application or the runtime hardware of the transaction. It is this difficulty that leads to the avoidance of its usage in the first place.
As we see in the 12 Factor Application methodology rather than deal with all state as a necessary and needed part of an application, long term state and persisted transient state is instead passed to other services whilst non-persisted transient state is described as permittable through a single-transaction cache. However, the cache data is not assured and is never assumed to be available for any future transactions.
Methodologies - like 12 Factor applications - that infer that state should be reduced and eliminated where possible misunderstand the importance of state within an application, and the limited value these applications provide.
There are methods for handling state that do not require a blanket ban on its usage which will be explored later in the article.
Considerations of State
There are a number of factors that derives state of an application and its acceptance into an application. It is important to understand how that state should be handled both in the application and at the storage layer. Below are some factors to be considered;
· State 'Life' - What is the requirement for the state to be kept? What are the ramifications of losing the state? If a user had to resubmit the transaction what would the impact be? Can we lose the data safely? How should data be stored and how can it be accessed if the application was to be lost i.e. will the database lock
· Order and Sequence - Does the State rely on the order and sequence of event being adhered to? Will calls be required to be synchronous or asynchronous? How does this fit into a general wider modern application strategy?
· Governance – Who is permitted to use the State? How can the state be accessed? Can data be altered in transit?
In Databases, state is managed by applying ACID properties:
· Atomicity - Determination of whether a transaction happens or does not happen
· Consistency - Use of standards to enforce consistency across the application
· Isolation - Controlling of concurrency to give the application complete independence where other parts of the same application cannot see or interfere with the transaction being completed
· Durability - When a transaction is completed the onus is on the application that the transaction has been passed to handle that state even if the system is to shut down.
Similar techniques can be applied to applications to aid in achieving transient stateful applications.
State Patterns
There are several patterns which are currently used by existing applications to ensure that state is handled correctly;
Availability Patterns
Last Man Standing
This pattern looks to dynamically assign voting power based on the number of remaining servers in a cluster to allow a set of weighted applications and servers to reduce to a single entity in the event of issues arising in the cluster. This looks to allow applications and state to continue to function as the number of entities in the cluster are reduced.
Advantages:
- Allows all servers in a system to be utilised for availability by readjusting a quorum
- Reduces chance of outages by allowing a single server to survive on its own
Disadvantages:
- Can lead to contention as to which is the current source of truth
- Adds complexity to the architecture of the system
- Requires that less than half the quorums 'votes' fail in an outage so that quorum can be rebalanced
Primary and replicas
By having a primary source of data that either pushes its state or has its state copied, then the Primary can be used as the single point of truth for all applications. The use of replicas means that there is a backup of data should anything happen to the primary whereby a replica might become the primary source of data.
Advantages:
- Little to no downtime in the event of an outage at the data source.
- Updating of replicas can be performed on changed files only making the procedure relatively quick
Disadvantages:
- Requires a lot of additional infrastructure
- Has to contend with availability requirements
- Can be affected by disassociation between Primary and secondary in the event of a network outage
Sharding
This method separates data into independent, self-contained data blocks known as 'portions' which can be moved across a cluster. This is also sometimes referred to as partitioning and in a database might be completed either at a vertical or horizontal level i.e. columns or rows respectively.
Note: Many benefits listed below also make Sharding a ‘storage’ pattern too
Advantages:
- The spreading of data allows for greater requests to the database and faster processing in horizontal scaling
- Searching for data can be quicker by looking for data across shards before honing into a individual shard rather than searching a whole database you search smaller parts of it to find the relevant data. i.e. searching for 'Fred' in Folder 'F' is much simpler than searching for the same name in Folder 'Names'.
- Mitigates against outages by making the risk on a subset of the database not the whole thing, each partitions availability can be handled in isolation.
Disadvantages:
- Greater risk of loss of data or data corruption if not correctly thought out and implemented
- Hard to revert back to a single database after Sharding has occurred
- Partitioned can become 'unbalanced' meaning the risk is pushed into a single partition reducing the benefits. i.e. partitions based on home address would have a significantly more data for United States addresses than Monaco.
Storage Patterns
Data Lakes
This method stores all data being used in a common resource in its raw format. This data can then be selected and used as and when each system requires the information.
Advantages:
- Data can be obtained at any time, in any order by the connected application giving greater flexibility
- Stores different data types in a single data store
- Can be queried more diversely making it accessible to different application types.
Disadvantages:
- Data can end up sat unused in a giant expensive system – sometimes known as a data swamp
- Storing all data in a single location can create risk as a single point of failure – even if highly available
- Querying can take time which may lead to latency in transactions
Caching
This method stores data in a temporary storage area. This is usually relatively small data based on previous computation or the result of a previous request which is retrieved by the application or database from the cache on the next request. There are many forms of caching, below are some generic local caching advantages and disadvantages.
Advantages:
- Allows for reduced latency as data is already stored locally to the application or database meaning no waiting for further processing or a response
- Throughput can be improved as a result of speed enhancements
Disadvantages:
- Large Caches can cause memory usage on the application processing the requests which might slow the application processing speed
- Inconsistencies may be found in the cache as the true state changes
Contention and locking
This method, most commonly found in database update operations looks to lock either an entire table, a single row or a single cell in order to prevent contention between two versions of a table updated by two entities at a given time. In applications the same principles can be utilised to ensure that data and resources cannot be accessed whilst being used or altered.
Advantages:
- Ensures that conflicting data cannot be saved at the same time
- Ensures that the most up to date data is being accessed by all connecting applications
Disadvantages:
- Applications might have to 'wait' to update the data source
- Applications waiting for access to the data source might fail, timeout or become indefinitely blocked
- Applications that have locked the data source will be tightly coupled with issues of the data source i.e. if it runs out of space and needs manual intervention to proceed
Transient State Patterns
Cargo Pattern
This method is used for ‘picking up’ and ‘dropping off’ state as part of a transaction and determines if state is carried to the application at a specific time i.e. during an event, or if the application actively retrieves the data it requires at a determined point in the transaction.
Advantages:
- State is only retained and collected in the relevant locations – all state not carried everywhere
- State can be stored by the relevant parties providing greater security of data
- Based on the States status at each point of the transaction, it can be determined how far through a transaction a request has gone (for repudiation etc.)
Disadvantages:
- Data has to be stored between processes causing latency and overhead
- Messages and state might need to be cleaned by a secondary process
Event Sourcing
This method stores information in an event object. At a basic level this pattern looks to ensure that changes are made in the order in which they are sequenced which allows us to understand how data has changed over time, not just how it currently exists.
Advantages:
- State can be traced across time and therefore a great picture of state changes over time can be established and losses of data greater understood
- Application state can be destroyed and rebuilt by rerunning an event sequence
- Greater control over how mistaken changes are rebuilt i.e. a change to the sequence before replay or a parallel implementation with correct change to understand divergence/affect with the new change
Disadvantages:
- Larger data storing required
- Event objects themselves still need retaining and their availability secured
These patterns are not new and can be used in most modern applications and are the basis of many software’s both in containerised infrastructures and on traditional static deployments. But how do these patterns apply and how can they be used, safely, to allow applications to manage and use state.
State patterns in containers
When considering state in the context of containerised application there are three main patterns that can be used to manage and ensure that state is correctly handled and retained to prevent data loss. It is important to note that containers and their configuration are stateful concerns of containerised systems and must be treated as a stateful artefact which is a dependency of the application.
Third Party Delegation (Volumes)
State is never retained on the container or its application. Instead all Stateful data is passed off to a third party which can then be accessed via the application when the state is again needed. One example might be accessing a public certificate from a public endpoint via a REST API, or the configuration of a container is stored in a source repository.
Advantages:
- Applications are expendable; can be killed or might fail without too much worry
- State can be handled by none containerised application, modules etc
- Configuration of containers and application data can be restored with relative ease
Disadvantages:
- Availability of the Third Party needs to be assured
- Messages lost in transit need to be completed again which adds complexity to integration flows
Replica Sets
Replicates the configuration of a container template, ensuring that a set number of replica containers exist. This is used to provide a load balancing style of high availability.
Advantages:
- Availability of the container is assured
- Can access persisted data on a third party without impacting the end user
Disadvantages:
- Messages lost in transit need to be completed again which adds complexity to integration flows
- Reliance on the underlying container management system
- May be dependent on being able to retrieve updates of the configuration templates from their storage system
Stateful Sets
An application that has state retained within the container would need to be rebuilt with the same state if it was to be lost, additionally the order and uniqueness might be relevant to the redeployment decision of the container by the container management system. This would also need a volume; however, this might be locked to a container that has specific characteristics.
Advantages:
- Applications are expendable; can be killed or might fail without too much worry
- Not reliant on third party availability and connectivity
- Messages not lost in transit as they are stored and retrieved from a volume
Disadvantages:
- Requires stability in naming and sourcing of the container i.e. DNS
- Requires the volume to be persisted
- There are some complexities that might require manual intervention during container deployment
Modern Stateful applications
In modern applications replication of both the container and the state therein needs to be considered in the context of applications that can be terminated, scaled up/down or stopped both unexpectedly and on demand.
The container management system handles a lot of this contention through the use of container stateful patterns which might include a mixture of stateful sets, replica sets and the storage of state on third party applications.
But state is not expendable and will continue to be important for a multitude of requirements such as the processing of payments - which should never be completed twice or that need to be audited. Container systems should still consider state patterns as dependence on external or third parties simply moves stateful requirements elsewhere.
Even where state is offloaded the availability and connection to the third-party storage has to be hard wired into the applications functionality as contention, dissociation and other data issues might occur during transit or whilst making a connection.
By using a mixture of existing state patterns alongside container management system stateful functionality it is possible to create stateful applications that fit into the modern application architecture.
Conclusion
In the above article, we have shown what state is, how existing applications deal with state and how state can be effectively handled and managed by modern applications in containers. The article looks at pattern usage in these environments and describes how stateful application might be scaled.