diff --git a/docs/API.en.epub b/docs/API.en.epub index 42c7eac..ebadf42 100644 Binary files a/docs/API.en.epub and b/docs/API.en.epub differ diff --git a/docs/API.en.html b/docs/API.en.html index db219e6..d227c6b 100644 --- a/docs/API.en.html +++ b/docs/API.en.html @@ -135,10 +135,10 @@
The book you're holding in your hands is dedicated to developing APIs as a separate engineering task. Although many concepts we're going to discuss apply to any type of software, our primary goal is to describe those problems and approaches to solving them that are most relevant in the context of the API subject area.
-We expect that the reader possesses expertise in software engineering, so we do not provide detailed definitions and explanations of the terms that a developer should already be familiar with in our understanding. Without this knowledge, it will be rather uncomfortable to read the last section of the book (and even more so, other sections). We sincerely apologize for this but that's the only way of writing the book without tripling its size.
The book comprises the Introduction and six large sections. The first three (namely, “The API Design”, “The API Patterns”, and “The Backward Compatibility”) are fully abstract and not bound to any concrete technology. We hope they will help those readers who seek to build a systematic understanding of the API architecture in developing complex interface hierarchies. The proposed approach, as we see it, allows for designing APIs from start to finish, from a raw idea to concrete implementation.
The fourth and fifth sections are dedicated to specific technologies, namely developing HTTP APIs (in the “REST paradigm”) and SDKs (we will mostly talk about UI component libraries).
Finally, in the sixth section, which is the least technical of all, we will discuss APIs as products and focus on non-engineering aspects of the API lifecycle: doing market research, positioning the service, communicating to consumers, setting KPIs for the team, etc. We insist that the last section is equally important to both PMs and software engineers as products for developers thrive only if the product and technical teams work jointly on them.
+We expect that the reader possesses expertise in software engineering, so we do not provide detailed definitions and explanations of the terms that a developer should already be familiar with in our understanding. Without this knowledge, it will be rather uncomfortable to read the last section of the book (and even more so, other sections). We sincerely apologize for this but that's the only way of writing the book without tripling its size. We provide the list of recommended readings in the “Bibliography” section.
Let's start.
Before we start talking about the API design, we need to explicitly define what the API is. Encyclopedias tell us that “API” is an acronym for “Application Program Interface.” This definition is fine but useless, much like the “Man” definition by Plato: “Man stands upright on two legs without feathers.” This definition is fine again, but it gives us no understanding of what's so important about a Man. (Actually, it's not even “fine”: Diogenes of Sinope once brought a plucked chicken, saying “That's Plato's Man.” And Plato had to add “with broad nails” to his definition.)
What does the API mean apart from the formal definition?
@@ -154,6 +154,7 @@What differs between a Roman aqueduct and a good API is that in the case of APIs, the contract is presumed to be programmable. To connect the two areas, writing some code is needed. The goal of this book is to help you design APIs that serve their purposes as solidly as a Roman aqueduct does.
An aqueduct also illustrates another problem with the API design: your customers are engineers themselves. You are not supplying water to end-users. Suppliers are plugging their pipes into your engineering structure, building their own structures upon it. On the one hand, you may provide access to water to many more people through them, not spending your time plugging each individual house into your network. On the other hand, you can't control the quality of suppliers' solutions, and you are to blame every time there is a water problem caused by their incompetence.
+The situation with API design becomes even more complicated when we acknowledge that modern APIs are typically interfaces to distributed systems. There is no single aqueduct but rather a collection of connections between multiple sources and destinations, often established on-demand — and your task is to make these connections work coherently so that clients don't even need to know how complex this water distribution architecture is internally.
That's why designing an API implies a larger area of responsibility. An API is a multiplier to both your opportunities and your mistakes.
Before we start laying out the recommendations for designing API architecture, we ought to specify what constitutes a “high-quality API,” and what the benefits of having a high-quality API are. Quite obviously, the quality of an API is primarily defined through its capability to solve developers' and users' problems. (Let's leave out the part where an API vendor pursues its own goals, not providing a useful product.)
So, how can a “high-quality” API design assist developers in solving their (and their users') problems? Quite simply: a well-designed API allows developers to do their jobs in the most efficient and convenient manner. The gap between formulating a task and writing working code must be as short as possible. Among other things, this means that:
@@ -266,14 +267,23 @@ Cache-Control: no-cacheApart from HTTP API notation, we will employ C-style pseudocode, or, to be more precise, JavaScript-like or Python-like one since types are omitted. We assume such imperative structures are readable enough to skip detailed grammar explanations. HTTP API-like samples intend to illustrate the contract, i.e., how we would design an API. Samples in pseudocode are intended to illustrate how developers might work with the API in their code, or how we would implement SDKs based on the contract.
The approach we use to design APIs comprises four steps:
Defining an application field
+Separating abstraction levels
+Isolating responsibility areas
+Describing final interfaces.
+This four-step algorithm actually builds an API from top to bottom, from common requirements and use case scenarios down to a refined nomenclature of entities. In fact, moving this way will eventually conclude with a ready-to-use API, and that's why we value this approach highly.
It might seem that the most useful pieces of advice are given in the last chapter, but that's not true. The cost of a mistake made at certain levels differs. Fixing the naming is simple; revising the wrong understanding of what the API stands for is practically impossible.
-NB: Here and throughout we will illustrate the API design concepts using a hypothetical example of an API that allows ordering a cup of coffee in city cafes. Just in case: this example is totally synthetic. If we were to design such an API in the real world, it would probably have very little in common with our fictional example.
Here and throughout we will illustrate the API design concepts using a hypothetical example of an API that allows ordering a cup of coffee in city cafes. Just in case: this example is totally synthetic. If we were to design such an API in the real world, it would probably have very little in common with our fictional example.
+NB. A knowledgeable reader might notice that the approach we discuss is quite similar to the concept of “Levels of Design” proposed by Steve McConnell in his definitive book.1 This is both true and not true at the same time. On one hand, as APIs are software, all the classical architecture design patterns work for them, including those described by McConnell. On the other hand, there is a major difference between exposing APIs and working on shared code: you only provide the contract to customers, as they are unable and/or unwilling to check the code itself. This shifts the focus significantly, starting from the very first McConnell's design level: while it is your number-one task to split the grand design into subsystems when you develop a software project as an architect, it is often undesirable to provide the notion of your subsystem split in the API, as API consumers do not need to know about it. In the following chapters, we will focus on providing a well-designed nomenclature of entities that is both convenient for external developers and allows for implementing efficient architecture under the hood.
1 McConnell, S. C. (2004), 5.2 Key Design Concepts
The key question you should ask yourself before starting to develop any software product, including an API, is: what problem do we solve? It should be asked four times, each time putting emphasis on a different word.
It is also worth mentioning that unresolvable errors are useless to a user at the time of the error occurrence (since the client couldn't react meaningfully to unknown errors). Still, providing extended error data is not excessive as a developer will read it while fixing the issue in their code.
From our own API development experience, we can tell without a doubt that the greatest final interface design mistake (and the greatest developer's pain accordingly) is the excessive overloading of entities' interfaces with fields, methods, events, parameters, and other attributes.
-Meanwhile, there is the “Golden Rule” of interface design (applicable not only to APIs but almost to anything): humans can comfortably keep 7±2 entities in short-term memory. Manipulating a larger number of chunks complicates things for most humans. The rule is also known as Miller's Law1.
+Meanwhile, there is the “Golden Rule” of interface design (applicable not only to APIs but almost to anything): humans can comfortably keep 7±2 entities in short-term memory. Manipulating a larger number of chunks complicates things for most humans. The rule is also known as Miller's Law.1
+NB. The law shouldn't be taken literally, as its direct applicability to human cognition in general and software engineering in particular is quite controversial. Still, many influential works (such as the foundational research by Victor Basili, Lionel Briand, and Walcelio Melo2 and its numerous follow-ups by other authors) claim that an increased number of methods in classes and analogous metrics indicate poor code quality. While the exact numbers are debatable, we envision the “7±2” rule as good guidance.
The only possible method of overcoming this law is decomposition. Entities should be grouped under a single designation at every concept level of the API so that developers never have to operate on more than a reasonable amount of entities (let's say, ten) at a time.
Let's take a look at the coffee machine search function response in our API. To ensure an adequate UX of the app, quite bulky datasets are required:
{
@@ -1075,7 +1086,7 @@ For example, the invalid price error is resolvable: a client could obtain a new
Such a decomposed API is much easier to read than a long list of different attributes. Furthermore, it's probably better to group even more entities in advance. For example, a place
and a route
could be nested fields under a synthetic location
property, or offer
and pricing
fields might be combined into some generalized object.
It is important to say that readability is achieved not only by merely grouping the entities. Decomposing must be performed in such a manner that a developer, while reading the interface, instantly understands, “Here is the place description of no interest to me right now, no need to traverse deeper.” If the data fields needed to complete some action are scattered all over different composites, the readability doesn't improve and even degrades.
-Proper decomposition also helps with extending and evolving an API. We'll discuss the subject in Section III.
1 Miller's Law
en.wikipedia.org/wiki/Working_memory#Capacity
Proper decomposition also helps with extending and evolving an API. We'll discuss the subject in Section III.
1 Miller's Law
en.wikipedia.org/wiki/Working_memory#Capacity
2 Basili, V., Briand, L., Melo, W. (1996) A validation of object-oriented design metrics as quality indicators
ieeexplore.ieee.org/document/544352
When all entities, their responsibilities, and their relations to each other are defined, we proceed to the development of the API itself. We need to describe the objects, fields, methods, and functions nomenclature in detail. In this chapter, we provide practical advice on making APIs usable and understandable.
One of the most important tasks for an API developer is to ensure that code written by other developers using the API is easily readable and maintainable. Remember that the law of large numbers always works against you: if a concept or call signature can be misunderstood, it will be misunderstood by an increasing number of partners as the API's popularity grows.
NB: The examples in this chapter are meant to illustrate the consistency and readability problems that arise during API development. We do not provide specific advice on designing REST APIs (such advice will be given in the corresponding section of this book) or programming languages' standard libraries. The focus is o the idea, not specific syntax.
@@ -1725,9 +1736,9 @@ X-Idempotency-Token: <token>If the author of this book were given a dollar each time he had to implement an additional security protocol invented by someone, he would be retired by now. API developers' inclination to create new signing procedures for requests or complex schemes of exchanging passwords for tokens is both obvious and meaningless.
-First, there is no need to reinvent the wheel when it comes to security-enhancing procedures for various operations. All the algorithms you need are already invented, just adopt and implement them. No self-invented algorithm for request signature checking can provide the same level of protection against a Manipulator-in-the-middle (MitM) attack3 as a mutual TLS authentication with certificate pinning.·4
-Second, assuming oneself to be an expert in security is presumptuous and dangerous. New attack vectors emerge daily, and staying fully aware of all actual threats is a full-time job. If you do something different during workdays, the security system you design will contain vulnerabilities that you have never heard about — for example, your password-checking algorithm might be susceptible to a timing attack5 or your webserver might be vulnerable to a request splitting attack.·6
-The OWASP Foundation compiles a list of the most common vulnerabilities in APIs every year,7 which we strongly recommend studying. We also recommend a definitive work by Andrew Hoffman·8 for everyone interested in Web security.
+First, there is no need to reinvent the wheel when it comes to security-enhancing procedures for various operations. All the algorithms you need are already invented, just adopt and implement them. No self-invented algorithm for request signature checking can provide the same level of protection against a Manipulator-in-the-middle (MitM) attack3 as a mutual TLS authentication with certificate pinning.4
+Second, assuming oneself to be an expert in security is presumptuous and dangerous. New attack vectors emerge daily, and staying fully aware of all actual threats is a full-time job. If you do something different during workdays, the security system you design will contain vulnerabilities that you have never heard about — for example, your password-checking algorithm might be susceptible to a timing attack5 or your webserver might be vulnerable to a request splitting attack.6
+The OWASP Foundation compiles a list of the most common vulnerabilities in APIs every year,7 which we strongly recommend studying. We also recommend a definitive work by Andrew Hoffman8 for everyone interested in Web security.
And just in case: all APIs must be provided over TLS 1.2 or higher (preferably 1.3).
It is equally important to provide interfaces to partners that minimize potential security problems for them.
@@ -1805,7 +1816,7 @@ X-Idempotency-Token: <token>Sometimes explicit location passing is not enough since there are lots of territorial conflicts in the world. How the API should behave when user coordinates lie within disputed regions is a legal matter, regretfully. The author of this book once had to implement a “state A territory according to state B official position” concept.
Important: mark a difference between localization for end users and localization for developers. In the examples above, the localized_message
field is meant for the user; the app should show it if no specific handler for this error exists in the client code. This message must be written in the user's language and formatted according to the user's location. But the details.checks_failed[].message
is meant to be read by developers examining the problem. So it must be written and formatted in a manner that suits developers best — which usually means “in English,” as English is a de facto standard in software development.
It is worth mentioning that the localized_
prefix in the examples is used to differentiate messages to users from messages to developers. A concept like that must be, of course, explicitly stated in your API docs.
And one more thing: all strings must be UTF-8, no exclusions.
1 De Morgan's laws
en.wikipedia.org/wiki/De_Morgan's_laws
2 Hrala, J. Welcome to Null Island, The Most 'Visited' Place on Earth That Doesn't Actually Exist
www.sciencealert.com/welcome-to-null-island-the-most-visited-place-that-doesn-t-exist
3 Manipulator-in-the-middle Attack
owasp.org/www-community/attacks/Manipulator-in-the-middle_attack
4 Mutual Authentication. mTLS
en.wikipedia.org/wiki/Mutual_authentication#mTLS
5 Timing Attack
en.wikipedia.org/wiki/Timing_attack
6 HTTP Request Splitting
capec.mitre.org/data/definitions/105.html
7 OWASP API Security Project
owasp.org/www-project-api-security
8 Hoffman, A. (2024) Web Application Security. Second Edition
9 Universally Unique Identifier. Version 4 (random)
en.wikipedia.org/wiki/Universally_unique_identifier#Version_4_(random)
And one more thing: all strings must be UTF-8, no exclusions.
1 De Morgan's laws
en.wikipedia.org/wiki/De_Morgan's_laws
2 Hrala, J. Welcome to Null Island, The Most 'Visited' Place on Earth That Doesn't Actually Exist
www.sciencealert.com/welcome-to-null-island-the-most-visited-place-that-doesn-t-exist
3 Manipulator-in-the-middle Attack
owasp.org/www-community/attacks/Manipulator-in-the-middle_attack
4 Madden, N. (2020), 11.4 Mutual TLS authentication
5 Timing Attack
en.wikipedia.org/wiki/Timing_attack
6 HTTP Request Splitting
capec.mitre.org/data/definitions/105.html
7 OWASP API Security Project
owasp.org/www-project-api-security
8 Hoffman, A. (2024), Web Application Security. Second Edition
9 Universally Unique Identifier. Version 4 (random)
en.wikipedia.org/wiki/Universally_unique_identifier#Version_4_(random)
Let's summarize the current state of our API study.
POST /v1/offers/search
@@ -1962,9 +1973,9 @@ X-Idempotency-Token: <token>
// Terminates the runtime
POST /v1/runtimes/{id}/terminate
The concept of “patterns” in the field of software engineering was introduced by Kent Beck and Ward Cunningham in 19871 and popularized by “The Gang of Four” (Erich Gamma, Richard Helm, Ralph Johnson, and John Vlissides) in their book “Design Patterns: Elements of Reusable Object-Oriented Software,” which was published in 1994.·2 According to the most widespread definition, a software design pattern is a “general, reusable solution to a commonly occurring problem within a given context.”
+The concept of “patterns” in the field of software engineering was introduced by Kent Beck and Ward Cunningham in 19871 and popularized by “The Gang of Four” (Erich Gamma, Richard Helm, Ralph Johnson, and John Vlissides) in their book “Design Patterns: Elements of Reusable Object-Oriented Software,” which was published in 1994.2 According to the most widespread definition, a software design pattern is a “general, reusable solution to a commonly occurring problem within a given context.”
If we talk about APIs, especially those to which developers are end users (e.g., frameworks or operating system interfaces), the classical software design patterns are well applicable to them. Indeed, many examples in the previous Section of this book are just about applying some design patterns.
-However, if we try to extend this approach to include API development in general, we will soon find that many typical API design issues are high-level and can't be reduced to basic software patterns. Let's say, caching resources (and invalidating the cache) or organizing paginated access are not covered in classical writings.
+However, if we try to extend this approach to include API development in general (which, let us remind to the reader, is typically about building interfaces to distributed systems), we will soon find that many typical API design issues are high-level and can't be reduced to basic software patterns. Let's say, caching resources (and invalidating the cache) or organizing paginated access are not covered in classical writings.
In this Section, we will specify those API design problems that we see as the most important ones. We are not aiming to encompass every problem, let alone every solution, and rather focus on describing approaches to solving typical problems with their pros and cons. We do understand that readers familiar with the works of “The Gang of Four,” Grady Booch, and Martin Fowler might expect a more systematic approach and greater depth of outreach from a section called “The API Patterns,” and we apologize to them in advance.
NB: The first such pattern we need to mention is the API-first approach to software engineering, which we described in the corresponding chapter.
1 Software Design Pattern. History
en.wikipedia.org/wiki/Software_design_pattern#History
2 Gamma, E., Helm, R., Johnson, R., Vlissides, J. (1994) Design Patterns. Elements of Reusable Object-Oriented Software
1 Software Design Pattern. History
en.wikipedia.org/wiki/Software_design_pattern#History
2 Gamma, E., Helm, R., Johnson, R., Vlissides, J. (1994), Design Patterns. Elements of Reusable Object-Oriented Software
Before we proceed further to discussing technical matters, we feel obliged to provide an overview of the problems related to authorizing API calls and authenticating clients. Based on the main principle that “an API serves as a multiplier to both your opportunities and mistakes,” organizing authorization and authentication (AA) is one of the most important challenges that any API vendor faces, especially when it comes to public APIs. It is rather surprising that there is no standard approach to this issue, as every big vendor develops its own interface to solve AA problems, and these interfaces are often quite archaic.
If we set aside implementation details (for which we strongly recommend not reinventing the wheel and using standard techniques and security protocols), there are basically two approaches to authorizing an API call:
If the API is not about providing additional access to a service for end users, it is usually much easier to opt for the second approach and authorize clients with API keys. In this case, per-endpoint granularity can be achieved (i.e., allowing partners to regulate the set of permitted endpoints for a key), while developing more granular access can be much more complex and because of that rarely see implementations.
Both approaches can be morphed into each other (e.g., allowing robot users to perform operations on behalf of any other users effectively becomes API key-based authorization; allowing binding of a limited dataset to an API key effectively becomes a user account), and there are some hybrid systems in the wild (where the request must be signed with both an API key and a user token).
Let's proceed to the technical problems that API developers face. We begin with the last one described in the introductory chapter: the necessity to synchronize states. Let us imagine that a user creates a request to order coffee through our API. While this request travels from the client to the coffee house and back, many things might happen. Consider the following chain of events:
+Let's proceed to the technical problems that API developers face. We begin with the last one described in the introductory chapter: the distributed nature of modern software that necessitates the problem of synchronizing shared states. Let us imagine that a user creates a request to order coffee through our API. While this request travels from the client to the coffee house and back, many things might happen. Consider the following chain of events:
The client sends the order creation request
Because of network issues, the request propagates to the server very slowly, and the client gets a timeout
+The client requests the current state of the system and gets an empty response as the initial request still hasn't reached the server:
let pendingOrders = await
- api.getOngoingOrders(); // → []
+api.getOngoingOrders(); // → []
The client, being unaware of this, tries to create an order anew.
As the operations of reading the list of ongoing orders and of creating a new order happen at different moments of time, we can't guarantee that the system state hasn't changed in between. If we do want to have this guarantee, we must implement some synchronization strategy1. In the case of, let's say, operating system APIs or client frameworks we might rely on the primitives provided by the platform. But in the case of distributed client-server APIs, we would need to implement such a primitive of our own.
+As the operations of reading the list of ongoing orders and of creating a new order happen at different moments of time, we can't guarantee that the system state hasn't changed in between. This might happen if the application backend state is replicated (i.e., the second request reads data from a different node of the data storage) or if the customer uses two client devices simultaneously. In other words, we encountered the classical problem of state synchronization in distributed systems. To solve this issue, we need to select a consistency model1 for our application and implement some synchronization strategy.
+As clients are your customers, it is highly desirable to provide them an API with the highest degree of robustness — strong consistency,2 which guarantees that all clients read the most recent writes. It is not universally possible, and we will discuss relaxing this constraint in the following chapters. However, with APIs the rule of thumbs is: if you can provide strongly consistent interfaces, do it.
There are two main approaches to solving this problem: the pessimistic one (implementing locks in the API) and the optimistic one (resource versioning).
-NB: Generally speaking, the best approach to tackling an issue is not having the issue at all. Let's say, if your API is idempotent, the duplicating calls are not a problem. However, in the real world, not every operation is idempotent; for example, creating new orders is not. We might add mechanisms to prevent automatic retries (such as client-generated idempotency tokens) but we can't forbid users from just creating a second identical order.
+NB: Generally speaking, the best solution is not having the issue at all. Let's say, if your API is idempotent, the duplicating calls are not a problem. However, in the real world, not every operation is idempotent; for example, creating new orders is not. We might add mechanisms to prevent automatic retries (such as client-generated idempotency tokens) but we can't forbid users from just creating a second identical order.
The first approach is to literally implement standard synchronization primitives at the API level. Like this, for example:
let lock;
try {
// Capture the exclusive
- // right to create new orders
+ // right to manipulate orders
lock = await api.
- acquireLock(ORDER_CREATION);
+ acquireLock(ORDERS_ACCESS);
// Get the list of current orders
// known to the system
let pendingOrders = await
@@ -2061,15 +2075,23 @@ X-Idempotency-Token: <token>
await lock.release();
}
-Rather unsurprisingly, this approach sees very rare use in distributed client-server APIs because of the plethora of related problems:
+This solution is quite similar to using mutexes to avoid race conditions in multithreaded systems,3 just exposed via a formal API. Rather unsurprisingly, this approach sees very rare use in distributed client-server APIs because of the plethora of related problems:
getPendingOrders
function must return the up-to-date state of the system otherwise the duplicate order will be anyway created.Waiting for acquiring a lock introduces new latencies to the interaction that are hardly predictable and might potentially be quite significant.
+The locks themselves [i.e., the storage for lock identifiers and its API] constitute a separate subsystem of its own and require additional effort from the API vendor to implement it.
+As it's partners who develop client code, we can't guarantee it works with locks always correctly. Inevitably, “lost” locks will occur in the system, and that means we need to provide some tools to partners so they can find the problem and debug it.
+A certain granularity of locks is to be developed so that partners can't affect each other. We are lucky if there are natural boundaries for a lock — for example, if it's limited to a specific user in the specific partner's system. If we are not so lucky (let's say all partners share the same user profile), we will have to develop even more complex systems to deal with potential errors in the partners' code — for example, introduce locking quotas.
+A less implementation-heavy approach is to develop an optimistic concurrency control3 system, i.e., to require clients to pass a flag proving they know the actual state of a shared resource.
+A less implementation-heavy approach is to develop an optimistic concurrency control4 system, i.e., to require clients to pass a flag proving they know the actual state of a shared resource.
// Retrieve the state
let orderState =
await api.getOrderState();
@@ -2092,11 +2114,12 @@ X-Idempotency-Token: <token>
}
}
-NB: An attentive reader might note that the necessity to implement some synchronization strategy and strongly consistent reading has not disappeared: there must be a component in the system that performs a locking read of the resource version and its subsequent change. It's not entirely true as synchronization strategies and strongly consistent reading have disappeared from the public API. The distance between the client that sets the lock and the server that processes it became much smaller, and the entire interaction now happens in a controllable environment. It might be a single subsystem in the form of an ACID-compatible4 database or even an in-memory solution.
+NB: An attentive reader might note that the necessity to implement locking has not disappeared: there must be a component in the system that performs a locking read of the resource version and its subsequent change. It's not entirely true as synchronization strategies and strongly consistent reading have disappeared from the public API. The distance between the client that sets the lock and the server that processes it became much smaller, and the entire interaction now happens in a controllable environment, being free from the problems we've described earlier.
Instead of a version, the date of the last modification of the resource might be used (which is much less reliable as clocks are not ideally synchronized across different system nodes; at least save it with the maximum possible precision!) or entity identifiers (ETags).
-The advantage of optimistic concurrency control is therefore the possibility to hide under the hood the complexity of implementing locking mechanisms. The disadvantage is that the versioning errors are no longer exceptional situations — it's now a regular behavior of the system. Furthermore, client developers must implement working with them otherwise the application might render inoperable as users will be infinitely creating an order with the wrong version.
-NB: Which resource to select for making versioning is extremely important. If in our example we create a global system version that is incremented after any order comes, users' chances to successfully create an order will be close to zero.
1 Synchronization (Computer Science)
en.wikipedia.org/wiki/Synchronization_(computer_science)
2 Strong consistency
en.wikipedia.org/wiki/Strong_consistency
3 Optimistic concurrency control
en.wikipedia.org/wiki/Optimistic_concurrency_control
The approach described in the previous chapter is in fact a trade-off: the API performance issues are traded for “normal” (i.e., expected) background errors that happen while working with the API. This is achieved by isolating the component responsible for controlling concurrency and only exposing read-only tokens in the public API. Still, the achievable throughput of the API is limited, and the only way of scaling it up is removing the strict consistency from the external API and thus allowing reading system state from read-only replicas:
+The advantage of optimistic concurrency control is therefore the possibility to hide under the hood the complexity of implementing locking mechanisms. The disadvantage is that the versioning errors are no longer exceptional situations — it's now a regular behavior of the system. Furthermore, client developers must implement working with them otherwise the application might render inoperable as users will be infinitely creating an order with the wrong version.
+NB: Which resource to select for making versioning is extremely important. If in our example we create a global system version that is incremented after any order comes, users' chances to successfully create an order will be close to zero.
1 See “Consistency model” · en.wikipedia.org/wiki/Consistency_model or refer to Van Steen, M., Tanenbaum A. (2024), 7.3 Client-centric consistency models
2 See “Strong Consistency” · en.wikipedia.org/wiki/Strong_consistency or refer to Gorton, I. (2022), Chapter 12. Strong Consistency
3 See “Lock” · en.wikipedia.org/wiki/Lock_(computer_science) or refer to Stevens, W. R. (1990), Chapter 7. Mutexes and Condition Variables
4 See “Optimistic concurrency control” · en.wikipedia.org/wiki/Optimistic_concurrency_control or refer to Kung, H. T., Robinson, J. T. (1981)
The approach described in the previous chapter is in fact a trade-off: the API performance issues are traded for “normal” (i.e., expected) background errors that happen while working with the API. This is achieved by isolating the component responsible for controlling concurrency and only exposing only revision tokens in the public API. Still, the achievable throughput of the API is limited as strong consistency implies strict constraints on backend implementation.
+In many situations, given the rate of writes is much less than reads (as in out case, when making two orders from two different devices under one account is rather an exceptional situation), it might make sense to stick eventual consistency rather than the strict one.1 The typical setup in Web often involves having asynchronously replicated databases:
// Reading the state,
// possibly from a replica
let orderState =
@@ -2113,8 +2136,14 @@ X-Idempotency-Token: <token>
…
}
-As orders are created much more rarely than read, we might significantly increase the system performance if we drop the requirement of returning the most recent state of the resource from the state retrieval endpoints. The versioning will help us avoid possible problems: creating an order will still be impossible unless the client has the actual version. In fact, we transited to the eventual consistency1 model: the client will be able to fulfill its request sometime when it finally gets the actual data. In modern microservice architectures, eventual consistency is rather an industrial standard, and it might be close to impossible to achieve the opposite, i.e., strict consistency.
-NB: Let us stress that you might choose the approach only in the case of exposing new APIs. If you're already providing an endpoint implementing some consistency model, you can't just lower the consistency level (for instance, introduce eventual consistency instead of the strict one) even if you never documented the behavior. This will be discussed in detail in the “On the Waterline of the Iceberg” chapter of “The Backward Compatibility” section of this book.
+As orders are created much more rarely than read, we might significantly increase the system performance if we drop the requirement of returning the most recent state of the resource from the state retrieval endpoints. The versioning will help us avoid possible problems: creating an order will still be impossible unless the client has the actual version. The client will be able to fulfill its request eventually when it finally gets the actual data.
+NB: Strictly speaking, in this example, we're referring to the “single-leader replication” type of eventual consistency: while reads might return outdated data, writes are nevertheless strictly ordered, and the service that physically makes writes can resolve the actual state of the system. There is also the “multi-leader replication” class of systems, where there is no such thing as “the actual state” or “the latest version,” as every leader replica handles writes independently and concurrently — which, in our case, means clients can always create duplicate orders, whatever precautions we take. Typically, such systems are only used in the following cases:
+The curious reader may refer to Martin Kleppmann's work on the subject.2
Choosing weak consistency instead of a strict one, however, brings some disadvantages. For instance, we might require partners to wait until they get the actual resource state to make changes — but it is quite unobvious for partners (and actually inconvenient) they must be prepared to wait for changes they made themselves to propagate.
// Creates an order
let api = await api
@@ -2125,7 +2154,7 @@ X-Idempotency-Token: <token>
// The list is empty
If strict consistency is not guaranteed, the second call might easily return an empty result as it reads data from a replica, and the newest order might not have hit it yet.
-An important pattern that helps in this situation is implementing the “read-your-writes2” model, i.e., guaranteeing that clients observe the changes they have just made. The consistency might be lifted to the read-your-writes level by making clients pass some token that describes the last changes known to the client.
+An important pattern that helps in this situation is implementing the “read-your-writes3” model: it guarantees that clients observe the changes they have just made. In APIs, the read-your-writes strategy could be implemented by by making clients pass some token that describes the last change known to the client.
let der = await api
.createOrder(…);
let pendingOrders = await api.
@@ -2139,28 +2168,46 @@ X-Idempotency-Token: <token>
Such a token might be:
An identifier (or identifiers) of the last modifying operations carried out by the client
+The last known resource version (modification date, ETag) known to the client.
+Upon getting the token, the server must check that the response (e.g., the list of ongoing operations it returns) matches the token, i.e., the eventual consistency converged. If it did not (the client passed the modification date / version / last order id newer than the one known to the server), one of the following policies or their combinations might be applied:
The server might repeat the request to the underlying DB or to the other kind of data storage in order to get the newest version (eventually)
+The server might return an error that requires the client to try again later
+The server queries the main node of the DB, if such a thing exists, or otherwise initiates retrieving the master data.
+The advantage of this approach is client development convenience (compared to the absence of any guarantees): by preserving the version token, client developers get rid of the possible inconsistency of the data got from API endpoints. There are two disadvantages, however:
It is still a trade-off between system scalability and a constant inflow of background errors:
If you're querying master data or repeating the request upon the version mismatch, the load on the master storage is increased in poorly a predictable manner
+If you return a client error instead, the number of such errors might be considerable, and partners will need to write some additional code to deal with the errors.
+This approach is still probabilistic, and will only help in a limited number of use cases (to be discussed below).
+There is also an important question regarding the default behavior of the server if no version token was passed. Theoretically, in this case, master data should be returned, as the absence of the token might be the result of an app crash and subsequent restart or corrupted data storage. However, this implies an additional load on the master node.
Let us state an important assertion: the methods of solving architectural problems we're discussing in this section are probabilistic. Abolishing strict consistency means that even if all components of the system work perfectly, client errors will still occur. It might appear that they could be simply ignored, but in reality, doing so means introducing risks.
+First, let us stress that you might choose the approach only in the case of exposing new APIs. If you're already providing an endpoint implementing some consistency model, you can't just lower the consistency level (for instance, introduce eventual consistency instead of the strict one) even if you never documented the behavior. This will be discussed in detail in the “On the Waterline of the Iceberg” chapter of “The Backward Compatibility” section of this book.
+Second, let us state another important assertion: the methods of solving architectural problems we're discussing in this section are probabilistic. Abolishing strict consistency means that even if all components of the system work perfectly, client errors will still occur. It might appear that they could be simply ignored, but in reality, doing so means introducing risks.
Imagine that because of eventual consistency, users of our API sometimes cannot create orders with their first attempt. For example, a customer adds a new payment method in the application, but their subsequent order creation request is routed to a replica that hasn't yet received the information regarding the newest payment method. As these two actions (adding a bank card and making an order) often go in conjunction, there will be a noticeable percentage of errors — let's say, 1%. At this stage, we could disregard the situation as it appears harmless: in the worst-case scenario, the client will repeat the request.
But let's go a bit further and imagine there is an error in a new version of the application, and 0.1% of end users cannot make an order at all because the client sends a wrong payment method identifier. In the absence of this 1% background noise of consistency-bound errors, we would find the issue very quickly. However, amidst this constant inflow of errors, identifying problems like this one could be very challenging as it requires configuring monitoring systems to reliably exclude the data consistency errors, and this could be very complicated or even impossible. The author of this book, in his job, has seen several situations when critical mistakes that affect a small percentage of users were not noticed for months.
Therefore, the task of proactively lowering the number of these background errors is crucially important. We may try to reduce their occurrence for typical usage profiles.
@@ -2185,7 +2232,7 @@ X-Idempotency-Token: <token>Mathematically, the probability of getting the error is expressed quite simply. It's the ratio between two durations: the time period needed to get the actual state to the time period needed to restart the app and repeat the request. (Keep in mind that the last failed request might be automatically repeated on startup by the client.) The former depends on the technical properties of the system (for instance, on the replication latency, i.e., the lag between the master and its read-only copies) while the latter depends on what client is repeating the call.
If we talk about applications for end users, the typical restart time there is measured in seconds, which normally should be much less than the overall replication latency. Therefore, client errors will only occur in case of data replication problems / network issues / server overload.
If, however, we talk about server-to-server applications, the situation is totally different: if a server repeats the request after a restart (let's say because the process was killed by a supervisor), it's typically a millisecond-scale delay. And that means that the number of order creation errors will be significant.
-As a conclusion, returning eventually consistent data by default is only viable if an API vendor is either ready to live with background errors or capable of making the lag of getting the actual state much less than the typical app restart time.
1 Consistency Model. Eventual Consistency
en.wikipedia.org/wiki/Consistency_model#Eventual_consistency
2 Consistency Model. Read-Your-Writes Consistency
en.wikipedia.org/wiki/Consistency_model#Read-your-writes_consistency
As a conclusion, returning eventually consistent data by default is only viable if an API vendor is either ready to live with background errors or capable of making the lag of getting the actual state much less than the typical app restart time.
1 Van Steen, M., Tanenbaum A. (2024), 7.2.2 Eventual consistency
2 Kleppmann, M. (2017), Chapter 5. Replication
3 See “Consistency Model. Read-Your-Writes Consistency” · en.wikipedia.org/wiki/Consistency_model#Read-your-writes_consistency or refer to Van Steen, M., Tanenbaum A. (2024), 7.3.3 Read your writes
Let's continue working with the previous example: the application retrieves some system state upon start-up, perhaps not the most recent one. What else does the probability of collision depend on, and how can we lower it?
We remember that this probability is equal to the ratio of time periods: getting an actual state versus starting an app and making an order. The latter is almost out of our control (unless we deliberately introduce additional waiting periods in the API initialization function, which we consider an extreme measure). Let's then talk about the former.
Our usage scenario looks like this:
@@ -2230,12 +2277,20 @@ X-Idempotency-Token: <token>Thus we naturally came to the pattern of organizing asynchronous APIs through task queues. Here we use the term “asynchronous” logically meaning the absence of mutual logical locks: the party that makes a request gets a response immediately and does not wait until the requested procedure is fully carried out being able to continue to interact with the API. Technically in modern application environments, locking (of both the client and server) almost universally doesn't happen during long-responding calls. However, logically allowing users to work with the API while waiting for a response from a modifying endpoint is error-prone and leads to collisions like the one we described above.
The asynchronous call pattern is useful for solving other practical tasks as well:
Caching operation results and providing links to them (implying that if the client needs to reread the operation result or share it with another client, it might use the task identifier to do so)
+Ensuring operation idempotency (through introducing the task confirmation step we will actually get the draft-commit system as discussed in the “Describing Final Interfaces” chapter)
+Naturally improving resilience to peak loads on the service as the new tasks will be queuing up (possibly prioritized)
+Organizing interaction in the cases of very long-lasting operations that require more time than typical timeouts (which are tens of seconds in the case of network calls) or can take unpredictable time.
+Also, asynchronous communication is more robust from a future API development point of view: request handling procedures might evolve towards prolonging and extending the asynchronous execution pipelines whereas synchronous handlers must retain reasonable execution times which puts certain restrictions on possible internal architecture.
+Also, asynchronous communication is more robust from a future API development point of view: request handling procedures might evolve towards prolonging and extending the asynchronous execution pipelines whereas synchronous handlers must retain reasonable execution times which puts certain restrictions on possible internal architecture. One might refer to the definitive work by Adam Bellemare on advantages of event-driven architectures.1
NB: In some APIs, an ambivalent decision is implemented where endpoints feature a double interface that might either return a result or a link to a task. Although from the API developer's point of view, this might look logical (if the request was processed “quickly”, e.g., served from cache, the result is to be returned immediately; otherwise, the asynchronous task is created), for API consumers, this solution is quite inconvenient as it forces them to maintain two execution branches in their code. Sometimes, a concept of providing a double set of endpoints (synchronous and asynchronous ones) is implemented, but this simply shifts the burden of making decisions onto partners.
The popularity of the asynchronicity pattern is also driven by the fact that modern microservice architectures “under the hood” operate in asynchronous mode through event queues or pub/sub middleware. Implementing an analogous approach in external APIs is the simplest solution to the problems caused by asynchronous internal architectures (the unpredictable and sometimes very long latencies of propagating changes). Ultimately, some API vendors make all API methods asynchronous (including the read-only ones) even if there are no real reasons to do so.
However, we must stress that excessive asynchronicity, though appealing to API developers, implies several quite objectionable disadvantages:
@@ -2279,7 +2334,7 @@ X-Idempotency-Token: <token> status: "new" }]} */ -NB: Let us also mention that in the asynchronous format, it's possible to provide not only binary status (task done or not) but also execution progress as a percentage if needed.
1 Token Bucket
en.wikipedia.org/wiki/Token_bucket
NB: Let us also mention that in the asynchronous format, it's possible to provide not only binary status (task done or not) but also execution progress as a percentage if needed.
1 Bellemare, A. (2020), Building Event-Driven Microservices
In the previous chapter, we concluded with the following interface that allows minimizing collisions while creating orders:
let pendingOrders = await api
.getOngoingOrders();
@@ -2465,9 +2520,9 @@ X-Idempotency-Token: <token>
Another possible anchor to rely on is the record creation date. However, this approach is harder to implement for the following reasons:
- Creation dates for two records might be identical, especially if the records are mass-generated programmatically. In the worst-case scenario, it might happen that at some specific moment, more records were created than one request page contains making it impossible to traverse them.
-- If the storage supports parallel writing to several nodes, the most recently created record might have a slightly earlier creation date than the second-recent one because clocks on different nodes might tick slightly differently, and it is challenging to achieve even microsecond-precision coherence.1 This breaks the monotonicity invariant, which makes it poorly fit for use in public APIs. If there is no other choice but relying on such storage, one of two evils is to be chosen:
+
- If the storage supports parallel writing to several nodes (i.e., implements the “multi-leader replication” approach), the most recently created record might have a slightly earlier creation date than the second-recent one because clocks on different nodes might tick slightly differently, and it is challenging to achieve even microsecond-precision coherence.1 This breaks the monotonicity invariant, which makes it poorly fit for use in public APIs, as we discussed it in the “Eventual Consistency” chapter. If there is no other choice but relying on such storage, one of two evils is to be chosen:
-- Introducing artificial delays, i.e., returning only items created earlier than N seconds ago, selecting this N to be certainly less than the clock irregularity. This technique also works in the case of asynchronously populated lists. Keep in mind, however, that this solution is probabilistic, and wrong data will be served to clients in case of backend synchronization problems.
+- Introducing artificial delays, i.e., returning only items created earlier than N seconds ago, selecting this N to be certainly more than the clock irregularity and the replication lag. This technique also works in the case of asynchronously populated lists. Keep in mind, however, that this solution is probabilistic, and wrong data will be served to clients in case of backend synchronization problems.
- Describe the instability of ordering list items in the docs (and thus make partners responsible for solving arising issues).
@@ -2573,8 +2628,8 @@ X-Idempotency-Token: <token>
}
Events themselves and the order of their occurrence are immutable. Therefore, it's possible to organize traversing the list. It is important to note that the order creation event is not the order itself: when a partner reads an event, the order might have already changed its status. However, accessing all new orders is ultimately doable, although not in the most efficient manner.
-NB: In the code samples above, we omitted passing metadata for responses, such as the number of items in the list, the has_more_items
flag, etc. Although this metadata is not mandatory (i.e., clients will learn the list size when they retrieve it fully), having it makes working with the API more convenient for developers. Therefore we recommend adding it to responses.
1 Ranganathan, K. A Matter of Time: Evolving Clock Sync for Distributed Databases
www.yugabyte.com/blog/evolving-clock-sync-for-distributed-databases
In the previous chapter, we discussed the following scenario: a partner receives information about new events occuring in the system by periodically requesting an endpoint that supports retrieving ordered lists.
+NB: In the code samples above, we omitted passing metadata for responses, such as the number of items in the list, the has_more_items
flag, etc. Although this metadata is not mandatory (i.e., clients will learn the list size when they retrieve it fully), having it makes working with the API more convenient for developers. Therefore we recommend adding it to responses.
1 See “Ranganathan, K. A. Matter of Time: Evolving Clock Sync for Distributed Databases” · www.yugabyte.com/blog/evolving-clock-sync-for-distributed-databases or refer to Kleppmann, M. (2017), Chapter 8. The Trouble with Distributed Systems
In the previous chapter, we discussed the following scenario: a partner receives information about new events occurring in the system by periodically requesting an endpoint that supports retrieving ordered lists.
GET /v1/orders/created-history↵
?older_than=<item_id>&limit=<limit>
→
@@ -2599,16 +2654,24 @@ X-Idempotency-Token: <token>
As various mobile platforms currently constitute a major share of all client devices, this implies significant limitations in terms of battery and partly traffic savings on the technologies for data exchange between the server and the end user. Many platform and device manufacturers monitor the resources consumed by the application and can send it to the background or close open connections. In such a situation, frequent polling should only be used in active phases of the application work cycle (i.e., when the user is directly interacting with the UI) or in controlled environments (for example, if employees of a partner company use the application in their work and can add it to system exceptions).
Three alternatives to polling might be proposed:
1. Duplex Connections
-The most obvious option is to use technologies that can transmit messages in both directions over a single connection. The best-known example of such technology is WebSockets3. Sometimes, the Server Push functionality of the HTTP/2 protocol·4 is used for this purpose; however, we must note that the specification formally does not allow such usage. There is also the WebRTC5 protocol; its main purpose is a peer-to-peer exchange of media data, and it's rarely used in client-server interaction.
+The most obvious option is to use technologies that can transmit messages in both directions over a single connection. The best-known example of such technology is WebSockets3. Sometimes, the Server Push functionality of the HTTP/2 protocol4 is used for this purpose; however, we must note that the specification formally does not allow such usage. There is also the WebRTC5 protocol; its main purpose is a peer-to-peer exchange of media data, and it's rarely used in client-server interaction.
Although the idea looks simple and attractive, its applicability to real-world use cases is limited. Popular server software and frameworks do not support server-initiated message sending (for instance, gRPC does support streamed responses6, but the client should initiate the exchange; using gRPC server streams to send server-initiated events is essentially employing HTTP/2 server pushes for this purpose, and it's the same technique as in the long polling approach, just a bit more modern), and the existing specification definition standards do not support it — as WebSocket is a low-level protocol, and you will need to design the interaction format on your own.
Duplex connections still suffer from the unreliability of the network and require implementing additional tricks to tell the difference between a network problem and the absence of new messages. All these issues result in limited applicability of the technology; it's mostly used in web applications.
2. Separate Callback Channels
-Instead of a duplex connection, two separate connections might be used: one for sending requests to the server and one to receive notifications from the server. The most popular technology of this kind is MQTT7. Although it is considered very effective because of utilizing low-level protocols, its disadvantages follow from its advantages:
+Instead of a duplex connection, two separate channels might be used: one for sending requests to the server and one for receiving notifications from the server. This implies that clients subscribe to message queues generated by the server (a “message broker”) or, sometimes, other clients, typically by implementing the publisher/subscriber (“pub/sub”) pattern.7 This implies that:
-- The technology is meant to implement the pub/sub pattern, and its main value is that the server software (MQTT Broker) is provided alongside the protocol itself. Applying it to other tasks, especially bidirectional communication, might be challenging.
-- The low-level protocols force you to develop your own data formats.
+- The client sends requests either through regular API calls or by publishing events to a queue (or queues).
+- The client receives callback notifications by listening for events on a queue. It might be the same queue the client used for sending events or a completely different queue (or queues).
-There is also a Web standard for sending server notifications called Server-Sent Events8 (SSE). However, it's less functional than WebSocket (only text data and unidirectional flow are allowed) and rarely used.
+Therefore, this approach is following neither request-response (even if a callback event is a direct response to the client’s actions, it is received asynchronously, requiring the client to match the response to its requests) nor a duplex connection pattern. However, we must note that this is a logical distinction for the convenience of client developers, as, under the hood, the underlying messaging system framework typically relies on WebSockets or implements polling.
+The most popular technology of this kind is MQTT8. Although it is considered highly efficient due to its use of low-level protocols, its disadvantages stem from its advantages:
+
+- The technology is designed to implement the pub/sub pattern, and its primary value lies in the fact that the server software (MQTT Broker) is provided alongside the protocol itself. Applying it to other tasks, especially bidirectional communication, can be challenging.
+- The use of low-level protocols requires developers to define their own data formats.
+
+Another popular technology for organizing message queues is the Advanced Message Queuing Protocol (AMQP). AMQP is an open standard for implementing message queues,9 with many independent client and server (broker) implementations. One notable broker implementation is RabbitMQ,10 while AMQP clients are typically implemented as libraries for specific client platforms and programming languages.
+There is also a web standard for sending server notifications called Server-Sent Events11 (SSE). However, SSE is less functional than WebSockets (supporting only text data and unidirectional flow) and is rarely used.
+A curious reader may refer to the corresponding chapter in Ian Gorton’s influential book12 or to Adam Bellemare’s compendium on the topic.13
3. Third-Party Push Notifications
One of the notorious problems with the long polling / WebSocket / SSE / MQTT technologies is the necessity to maintain an open network connection between the client and the server, which might be a problem for mobile applications and IoT devices from in terms of performance and battery life. One option that allows for mitigating the issue is delegating sending push notifications to a third-party service (the most popular choice today is Google's Firebase Cloud Messaging) that delivers notifications through the built-in mechanisms of the platform. Using such integrated services takes most of the load of maintaining open connections and checking their status off the developer's shoulders. The disadvantages of using third-party services are the necessity to pay for them and strict limits on message sizes.
Also, sending push notifications to end-user devices suffers from one important issue: the percentage of successfully delivered messages never reaches 100%; the message drop rate might be tens of percent. Taking into account the message size limitations, it's actually better to implement a mixed model than a pure push model: the client continues polling the server, just less frequently, and push notifications just trigger ahead-of-time polling. (This problem is actually applicable to any notification delivery technology. Low-level protocols offer more options to set delivery guarantees; however, given the situation with forceful closing of open connections by OSes, having low-frequency polling as a precaution in an application is almost never a bad thing.)
@@ -2636,14 +2699,14 @@ X-Idempotency-Token: <token>
What is important is that the must be a formal contract (preferably in a form of a specification) for webhook's request and response formats and all the errors that might happen.
2. Agree on Authorization and Authentication Methods
-As a webhook is a callback channel, you will need to develop a separate authorization system to deal with it as it's partners duty to check that the request is genuinely coming from the API backend, not vice versa. We reiterate here our strictest recommendation to stick to existing standard techniques, for example, mTLS9; though in the real world, you will likely have to use archaic methods like fixing the caller server's IP address.
+As a webhook is a callback channel, you will need to develop a separate authorization system to deal with it as it's partners duty to check that the request is genuinely coming from the API backend, not vice versa. We reiterate here our strictest recommendation to stick to existing standard techniques, such as mTLS; though in the real world, you will likely have to use archaic methods like fixing the caller server's IP address.
3. Develop an Interface for Setting the URL of a
As the callback endpoint is developed by partners, we do not know its URL beforehand. It implies some interface must exist for setting this URL and authorized public keys (probably in a form of a control panel for partners).
Importantly, the operation of setting a webhook URL is to be treated as a potentially hazardous one. It is highly desirable to request a second authentication factor to authorize the operations as a potential attacker wreak a lot of havoc if there is a vulnerability in the procedure:
- By setting an arbitrary URL, the perpetrator might get access to all partner's orders (and the partner might lose access)
- This vulnerability might be used for organizing DoS attacks on third parties
-- If an internal URL might be set as a webhook, a SSRF attack10 might be directed toward the API vendor's own infrastructure.
+- If an internal URL might be set as a webhook, a SSRF attack14 might be directed toward the API vendor's own infrastructure.
Typical Problems of Webhook-Powered Integrations
Bidirectional data flows (both client-server and server-server ones, though the latter to a greater extent) bear quite undesirable risks for an API provider. In general, the quality of integration primarily depends on the API developers. In the callback-based integration, it's vice versa: the integration quality depends on how partners implemented the webhook. We might face numerous problems with the partners' code:
@@ -2662,7 +2725,7 @@ X-Idempotency-Token: <token>
Help partners to write proper code by describing in the documentation all unobvious subtleties that inexperienced developers might be unaware of:
- Idempotency keys for every operation
-- Delivery guarantees (“at least once,” “exactly ones,” etc.; see the reference description11 on the example of Apache Kafka API)
+- Delivery guarantees (“at least once,” “exactly ones,” etc.; see the reference description15 on the example of Apache Kafka API)
- Possibility of the server generating parallel requests and the maximum number of such requests at a time
- Guarantees of message ordering (i.e., the notifications are always delivered ordered from the oldest one to the newest one) or the absence of such guarantees
- The sizes of all messages and message fields in bytes
@@ -2677,8 +2740,8 @@ X-Idempotency-Token: <token>
Message Queues
-As for internal APIs, the webhook technology (i.e., the possibility to programmatically define a callback URL) is either not needed at all or is replaced with the Service Discovery12 protocol as services comprising a single backend are symmetrically able to call each other. However, the problems of callback-based integration discussed above are equally actual for internal calls. Requesting an internal API might result in a false-negative mistake, internal clients might be unaware that ordering is not guaranteed, etc.
-To solve these problems, and also to ensure better horizontal scalability, message queues13 were developed, most notably numerous pub/sub pattern·14 implementations. At present moment, pub/sub-based architectures are very popular in enterprise software development, up to switching any inter-service communication to message queues.
+As for internal APIs, the webhook technology (i.e., the possibility to programmatically define a callback URL) is typically not needed at all as backend services comprising are symmetrically able to call each other. However, the problems of callback-based integration discussed above are equally actual for internal calls. Requesting an internal API might result in a false-negative mistake, internal clients might be unaware that ordering is not guaranteed, etc.
+To solve these problems, as with client-server interaction, message queues might be used instead of making direct calls. At present moment, pub/sub-based architectures are very popular in enterprise software development, up to switching any inter-service communication to message queues.
NB: Let us note that everything comes with a price, and these delivery guarantees and horizontal scalability are not an exclusion:
- All communication becomes eventually consistent with all the implications
@@ -2686,7 +2749,7 @@ X-Idempotency-Token: <token>
- Queues might accumulate unprocessed events, introducing increasing delays, and solving this issue on the subscriber's side might be quite non-trivial.
Also, in public APIs both technologies are frequently used in conjunction: the API backend sends a task to call the webhook in the form of publishing an event which the specially designed internal service will try to process by making the call.
-Theoretically, we can imagine an integration that exposes directly accessible message queues in one of the standard formats for partners to subscribe. However, we are unaware of any examples of such APIs.
References
1 Polling (Computer Science)
en.wikipedia.org/wiki/Polling_(computer_science)
2 Push Technology. Long Polling
en.wikipedia.org/wiki/Push_technology#Long_polling
3 WebSockets
websockets.spec.whatwg.org
4 Hypertext Transfer Protocol Version 2 (HTTP/2). Server Push
datatracker.ietf.org/doc/html/rfc7540#section-8.2
5 WebRTC: Real-Time Communication in Browsers
www.w3.org/TR/webrtc
6 gRPC. Server streaming RPC
grpc.io/docs/what-is-grpc/core-concepts/#server-streaming-rpc
8 HTML Living Standard. Server-Sent Events
html.spec.whatwg.org/multipage/server-sent-events.html
9 Mutual Authentication. mTLS
en.wikipedia.org/wiki/Mutual_authentication#mTLS
11 Apache Kafka. Kafka Design. Message Delivery Guarantees
docs.confluent.io/kafka/design/delivery-semantics.html
12 Web Services Discovery
en.wikipedia.org/wiki/Web_Services_Discovery
13 Message Queue
en.wikipedia.org/wiki/Message_queue
14 Publish / Subscribe Pattern
en.wikipedia.org/wiki/Publish%E2%80%93subscribe_pattern
Chapter 22. Multiplexing Notifications. Asynchronous Event Processing
+Theoretically, we can imagine an integration that exposes directly accessible message queues in one of the standard formats for partners to subscribe. However, we are unaware of any examples of such APIs.
References
1 Polling (Computer Science)
en.wikipedia.org/wiki/Polling_(computer_science)
2 Push Technology. Long Polling
en.wikipedia.org/wiki/Push_technology#Long_polling
3 WebSockets
websockets.spec.whatwg.org
4 Hypertext Transfer Protocol Version 2 (HTTP/2). Server Push
datatracker.ietf.org/doc/html/rfc7540#section-8.2
5 WebRTC: Real-Time Communication in Browsers
www.w3.org/TR/webrtc
6 gRPC. Server streaming RPC
grpc.io/docs/what-is-grpc/core-concepts/#server-streaming-rpc
7 Publish / Subscribe Pattern
en.wikipedia.org/wiki/Publish%E2%80%93subscribe_pattern
9 AMQP
www.amqp.org
10 RabbitMQ
www.rabbitmq.com
11 HTML Living Standard. Server-Sent Events
html.spec.whatwg.org/multipage/server-sent-events.html
12 Gorton, I. (2022), Chapter 7. Asynchronous Messaging
13 Bellemare, A. (2020), Building Event-Driven Microservices
14 See “Server Side Request Forgery” · owasp.org/www-community/attacks/Server_Side_Request_Forgery or refer to Madden, N. (2020), 10.2.7 Preventing SSRF attacks
15 Apache Kafka. Kafka Design. Message Delivery Guarantees
docs.confluent.io/kafka/design/delivery-semantics.html
Chapter 22. Multiplexing Notifications. Asynchronous Event Processing
One of the vexing restrictions of almost every technology mentioned in the previous chapter is the limited size of messages. With client push notifications the situation is the most problematic: Google Firebase Messaging at the moment this chapter is being written allowed no more than 4000 bytes of payload. In backend development, the restrictions are also notable; let's say, Amazon SQS limits the size of messages to 256 KiB. While developing webhook-based integrations, you risk hitting the maximum body size allowed by the partner's webserver (for example, in nginx the default value is 1MB). This leads us to the necessity of making two technical decisions regarding the notification formats:
- Whether a message contains all data needed to process it or just notifies some state change has happened
@@ -3207,7 +3270,7 @@ X-Idempotency-Token: <token>
}
This approach is much more complex to implement, but it is the only viable technique for realizing collaborative editing as it explicitly reflects the exact actions the client applied to an entity. Having the changes in this format also allows for organizing offline editing with accumulating changes on the client side for the server to resolve the conflict later based on the revision history.
-NB: One approach to this task is developing a set of operations in which all actions are transitive (i.e., the final state of the entity does not change regardless of the order in which the changes were applied). One example of such a nomenclature is a conflict-free replicated data type (CRDT).2 However, we consider this approach viable only in some subject areas, as in real life, non-transitive changes are always possible. If one user entered new text in the document and another user removed the document completely, there is no way to automatically resolve this conflict that would satisfy both users. The only correct way of resolving this conflict is explicitly asking users which option for mitigating the issue they prefer.
1 Protocol Buffers. Field Masks in Update Operations
protobuf.dev/reference/protobuf/google.protobuf/#field-masks-updates
2 Conflict-Free Replicated Data Type
en.wikipedia.org/wiki/Conflict-free_replicated_data_type
NB: One approach to this task is developing a set of operations in which all actions are transitive (i.e., the final state of the entity does not change regardless of the order in which the changes were applied). One example of such a nomenclature is a conflict-free replicated data type (CRDT).2 However, we consider this approach viable only in some subject areas, as in real life, non-transitive changes are always possible. If one user entered new text in the document and another user removed the document completely, there is no way to automatically resolve this conflict that would satisfy both users. The only correct way of resolving this conflict is explicitly asking users which option for mitigating the issue they prefer.
1 Protocol Buffers. Field Masks in Update Operations
protobuf.dev/reference/protobuf/google.protobuf/#field-masks-updates
2 See “Conflict-Free Replicated Data Type” · en.wikipedia.org/wiki/Conflict-free_replicated_data_type or refer to Shapiro, M., Preguiça, N., Baquero, C., Zawirski, M. (2011), Conflict-Free Replicated Data Types
In the previous chapters, we repeatedly discussed that the background level of errors is not just unavoidable, but in many cases, APIs are deliberately designed to tolerate errors to make the system more scalable and predictable.
But let's ask ourselves a question: what does a “more predictable system” mean? For an API vendor, the answer is simple: the distribution and number of errors are both indicators of technical problems (if the numbers are growing unexpectedly) and KPIs for technical refactoring (if the numbers are decreasing after the release).
However, for partner developers, the concept of “API predictability” means something completely different: how solidly they can cover the API use cases (both happy and unhappy paths) in their code. In other words, how well one can understand based on the documentation and the nomenclature of API methods what errors might arise during the API work cycle and how to handle them.
@@ -3875,7 +3938,7 @@ X-Idempotency-Token: <token>NB: There is nothing novel about these rules: one might easily recognize them as the SOLID architecture principles1·2. This is not surprising either, as SOLID focuses on contract-oriented development, and APIs are contracts by definition. We have simply introduced the concepts of “abstraction levels” and “informational contexts” to these principles.
+NB: There is nothing novel about these rules: one might easily recognize them as the SOLID architecture principles1. This is not surprising either, as SOLID focuses on contract-oriented development, and APIs are contracts by definition. We have simply introduced the concepts of “abstraction levels” and “informational contexts” to these principles.
However, there remains an unanswered question: how should we design the entity nomenclature from the beginning so that extending the API won't result in a mess of assorted inconsistent methods from different stages of development? The answer is quite obvious: to avoid clumsy situations during abstracting (as with the recipe properties), all the entities must be originally considered as specific implementations of a more general interface, even if there are no planned alternative implementations for them.
For example, while designing the POST /search
API, we should have asked ourselves a question: what is a “search result”? What abstract interface does it implement? To answer this question we need to decompose this entity neatly and identify which facet of it is used for interacting with which objects.
Then we would have come to the understanding that a “search result” is actually a composition of two interfaces:
@@ -3898,7 +3961,7 @@ X-Idempotency-Token: <token>And what constitutes the “abstract representation of a search result in the UI”? Do we have other types of search? Should the ISearchItemViewParameters
interface be a subtype of some even more general interface, or maybe a composition of several such interfaces?
Replacing specific implementations with interfaces not only allows us to respond more clearly to many concerns that arise during the API design phase but also helps us outline many possible directions for API evolution. This approach should assist us in avoiding API inconsistency problems in the future.
2 Martin, R. C. (2023), 12. Solid
Replacing specific implementations with interfaces not only allows us to respond more clearly to many concerns that arise during the API design phase but also helps us outline many possible directions for API evolution. This approach should assist us in avoiding API inconsistency problems in the future.
1 See “SOLID” · en.wikipedia.org/wiki/SOLID or refer to Martin, R. C. (2023), 12. Solid
Apart from the abovementioned abstract principles, let us give a list of concrete recommendations on how to make changes in existing APIs to maintain backward compatibility
If you haven't given any formal guarantee, it doesn't mean that you can violate informal ones. Often, just fixing bugs in APIs might render some developers' code inoperable. We can illustrate this with a real-life example that the author of this book actually faced once:
@@ -3944,8 +4007,8 @@ X-Idempotency-Token: <token>Whatever tips and tricks described in the previous chapters you use, it's often quite probable that you can't do anything to prevent API inconsistencies from piling up. It's possible to reduce the speed of this stockpiling, foresee some problems, and have some interface durability reserved for future use. But one can't foresee everything. At this stage, many developers tend to make some rash decisions, e.g., releasing a backward-incompatible minor version to fix some design flaws.
We highly recommend never doing that. Remember that the API is also a multiplier of your mistakes. What we recommend is to keep a serenity notepad — to write down the lessons learned and not to forget to apply this knowledge when a new major API version is released.
The problem of designing HTTP APIs is, unfortunately, one of the most “holywar”-inspiring issues. On one hand, it is one of the most popular technologies; on the other hand, it is quite complex and difficult to comprehend due to the large and fragmented standard split into many RFCs. As a result, the HTTP specification is doomed to be poorly understood and imperfectly interpreted by millions of software engineers and thousands of textbook writers. Therefore, before proceeding to the useful part of this Section, we must clarify exactly what we are going to discuss.
-Let's start with a short historical overview. Performing users' requests on a remote server has been one of the basic tasks in software engineering since mainframes, and it naturally gained additional momentum with the development of ARPANET. The first high-level protocol for network communication worked in the paradigm of sending messages over the network (as an example, see the DEL protocol that was proposed in one of the very first RFCs — RFC-5 published in 19691). However, scholars quickly understood that it would be much more convenient if calling a remote server and accessing remote resources wasn't any different from working with local memory and resources in terms of function signatures. This concept was strictly formulated under the name “Remote Procedure Call” (RPC) by Bruce Nelson, an employee of the famous Xerox Palo Alto Research Center in 1981.2 Nelson was also the co-author of the first practical implementation of the proposed paradigm, namely Sun RPC·3·4, which still exists as ONC RPC.
-First widely adopted RPC protocols (such as the aforementioned Sun RPC, Java RMI5, and CORBA·6) strictly followed the paradigm. The technology allowed achieving exactly what Nelson was writing about — that is, making no difference between local and remote code execution. The “magic” is hidden within tooling that generates the implementation of working with remote servers, and developers don't need to know how the protocol works.
+Let's start with a short historical overview. Performing users' requests on a remote server has been one of the basic tasks in software engineering since mainframes, and it naturally gained additional momentum with the development of ARPANET. The first high-level protocol for network communication worked in the paradigm of sending messages over the network (as an example, see the DEL protocol that was proposed in one of the very first RFCs — RFC-5 published in 19691). However, scholars quickly understood that it would be much more convenient if calling a remote server and accessing remote resources wasn't any different from working with local memory and resources in terms of function signatures. This concept was strictly formulated under the name “Remote Procedure Call” (RPC) by Bruce Nelson, an employee of the famous Xerox Palo Alto Research Center in 1981.2 Nelson was also the co-author of the first practical implementation of the proposed paradigm, namely Sun RPC3·4, which still exists as ONC RPC.
+First widely adopted RPC protocols (such as the aforementioned Sun RPC, Java RMI5, and CORBA6) strictly followed the paradigm. The technology allowed achieving exactly what Nelson was writing about — that is, making no difference between local and remote code execution. The “magic” is hidden within tooling that generates the implementation of working with remote servers, and developers don't need to know how the protocol works.
However, the convenience of using the technology became its Achilles heel:
We will refer to such APIs as “HTTP APIs” or “JSON-over-HTTP APIs.” We understand that this is a loose interpretation of the term, but we prefer to live with that rather than using phrases like “JSON-over-HTTP endpoints utilizing the semantics described in the HTTP and URL standards” or “a JSON-over-HTTP API complying with the REST architectural constraints” each time. As for the term “REST API,” it lacks a consistent definition (as we will discuss in the corresponding chapter), so we would avoid using it as well.
1 RFC-5. DEL
datatracker.ietf.org/doc/html/rfc5
2 Nelson, B. J. (1981) Remote Procedure Call
3 Birrell, A. D., Nelson, B. J. (1984) Implementing Remote Procedure Calls. ACM Transactions on Computer Systems (TOCS), Volume 2, Issue 1. Pages 39 - 59
4 RPC: Remote Procedure Call Protocol Specification
datatracker.ietf.org/doc/html/rfc1050
5 Remote Method Invocation (RMI)
www.oracle.com/java/technologies/javase/remote-method-invocation-home.html
6 CORBA
www.corba.org
7 The Original HTTP as defined in 1991
www.w3.org/Protocols/HTTP/AsImplemented.html
We will refer to such APIs as “HTTP APIs” or “JSON-over-HTTP APIs.” We understand that this is a loose interpretation of the term, but we prefer to live with that rather than using phrases like “JSON-over-HTTP endpoints utilizing the semantics described in the HTTP and URL standards” or “a JSON-over-HTTP API complying with the REST architectural constraints” each time. As for the term “REST API,” it lacks a consistent definition (as we will discuss in the corresponding chapter), so we would avoid using it as well.
1 RFC-5. DEL
datatracker.ietf.org/doc/html/rfc5
2 Nelson, B. J. (1981), Remote Procedure Call
3 Birrell, A. D., Nelson, B. J. (1984), Implementing Remote Procedure Calls
4 RPC: Remote Procedure Call Protocol Specification
datatracker.ietf.org/doc/html/rfc1050
5 Remote Method Invocation (RMI)
www.oracle.com/java/technologies/javase/remote-method-invocation-home.html
6 CORBA
www.corba.org
7 The Original HTTP as defined in 1991
www.w3.org/Protocols/HTTP/AsImplemented.html
As we discussed in the previous chapter, today, the choice of a technology for developing client-server APIs comes down to selecting either a resource-oriented approach (commonly referred to as “REST API”; let us reiterate that we will use the term “HTTP API” instead) or a modern RPC protocol. As we mentioned earlier, conceptually the difference is not that significant. However, technically these frameworks use the HTTP protocol quite differently:
First, different frameworks rely on different data formats:
It's not hard to notice that most of the claims regarding HTTP API performance are actually not about the HTTP protocol but the JSON format. There is no problem in developing an HTTP API that will utilize any binary format (including, for instance, Protocol Buffers). Then the difference between a Protobuf-over-HTTP API and a gRPC API would be just about using granular URLs, status codes, request / response headers, and the ensuing (in)ability to use integrated software tools out of the box.
However, on many occasions (including this book) developers prefer the textual JSON over binary Protobuf (Flatbuffers, Thrift, Avro, etc.) for a very simple reason: JSON is easy to read. First, it's a text format and doesn't require additional decoding. Second, it's self-descriptive, meaning that property names are included. Unlike Protobuf-encoded messages which are basically impossible to read without a .proto
file, one can make a very good guess as to what a JSON document is about at a glance. Provided that request metadata in HTTP APIs is readable as well, we ultimately get a communication format that is easy to parse and understand with just our eyes.
Apart from being human-readable, JSON features another important advantage: it is strictly formal meaning it does not contain any constructs that can be interpreted differently in different architectures (with a possible exception of the sizes of numbers and strings), and the deserialization result aligns very well with native data structures (i.e., indexed and associative arrays) of almost every programming language. From this point of view, we actually had no other choice when selecting a format for code samples in this book.
+NB. To get a more thorough understanding of data formats and their features the reader might refer to the Kleppmann's overview.17
As we see, HTTP APIs and alternative RPC protocols occupy different market niches:
Otherwise, gRPC is undoubtedly one of the most advanced and efficient protocols.
-GraphQL features a curious approach that combines the concept of “resources” in HTML (i.e., it focuses on detailed descriptions of data formats and domain relations) while providing a rich query vocabulary to retrieve the needed subset of fields. Its main application is in data-heavy subject areas with complex entity hierarchies. (As evident from the name, GraphQL is more of a mechanism for distributed querying of abstract data storages than an API development paradigm.) Exposing external GraphQL APIs is rather an exotic practice as of today, mainly because managing a GraphQL service becomes increasingly challenging with growing data size and query numbers.17
-NB: in theory, an API could provide a dual interface — let's say, both JSON-over-HTTP and gRPC. Since the formal description of data formats and applicable operations is fundamental to all modern frameworks, these formats could be converted from one to another, thus making such a multi-API possible. However, in practice, we are not aware of any examples of such an API. We would venture to say that the potential benefits of increased convenience for developers do not outweigh the overhead expenses of maintaining dual interfaces.
1 JSON-RPC
www.jsonrpc.org
2 GraphQL
graphql.org
3 JSON
www.ecma-international.org/publications-and-standards/standards/ecma-404
5 Apache Thrift
thrift.apache.org
6 Apache Avro
avro.apache.org/docs
7 Protocol Buffers
protobuf.dev
8 FlatBuffers
flatbuffers.dev
10 XML-RPC
xmlrpc.com
11 Extensible Markup Language (XML)
www.w3.org/TR/xml
12 JSON-RPC 2.0 Specification. Response object
www.jsonrpc.org/specification#response_object
13 Comparing sizes of protobuf vs json
nilsmagnus.github.io/post/proto-json-sizes
14 Brotli Compressed Data Format
datatracker.ietf.org/doc/html/rfc7932
15 simdjson : Parsing gigabytes of JSON per second
github.com/simdjson/simdjson
16 BSON
bsonspec.org
17 Mehta, S., Barodiya, K. Lessons learned from running GraphQL at scale
blog.dream11engineering.com/lessons-learned-from-running-graphql-at-scale-2ad60b3cefeb
GraphQL features a curious approach that combines the concept of “resources” in HTML (i.e., it focuses on detailed descriptions of data formats and domain relations) while providing a rich query vocabulary to retrieve the needed subset of fields. Its main application is in data-heavy subject areas with complex entity hierarchies. (As evident from the name, GraphQL is more of a mechanism for distributed querying of abstract data storages than an API development paradigm.) Exposing external GraphQL APIs is rather an exotic practice as of today, mainly because managing a GraphQL service becomes increasingly challenging with growing data size and query numbers.18
+NB: in theory, an API could provide a dual interface — let's say, both JSON-over-HTTP and gRPC. Since the formal description of data formats and applicable operations is fundamental to all modern frameworks, these formats could be converted from one to another, thus making such a multi-API possible. However, in practice, we are not aware of any examples of such an API. We would venture to say that the potential benefits of increased convenience for developers do not outweigh the overhead expenses of maintaining dual interfaces.
1 JSON-RPC
www.jsonrpc.org
2 GraphQL
graphql.org
3 JSON
www.ecma-international.org/publications-and-standards/standards/ecma-404
5 Apache Thrift
thrift.apache.org
6 Apache Avro
avro.apache.org/docs
7 Protocol Buffers
protobuf.dev
8 FlatBuffers
flatbuffers.dev
10 XML-RPC
xmlrpc.com
11 Extensible Markup Language (XML)
www.w3.org/TR/xml
12 JSON-RPC 2.0 Specification. Response object
www.jsonrpc.org/specification#response_object
13 Comparing sizes of protobuf vs json
nilsmagnus.github.io/post/proto-json-sizes
14 Brotli Compressed Data Format
datatracker.ietf.org/doc/html/rfc7932
15 simdjson : Parsing gigabytes of JSON per second
github.com/simdjson/simdjson
16 BSON
bsonspec.org
17 Kleppmann, M. (2017), Chapter 4. Encoding and Evolution
18 Mehta, S., Barodiya, K. Lessons learned from running GraphQL at scale
blog.dream11engineering.com/lessons-learned-from-running-graphql-at-scale-2ad60b3cefeb
Before we proceed to discuss HTTP API design patterns, we feel obliged to clarify one more important terminological issue. Often, an API matching the description we gave in the “On the HTTP API Concept” chapter is called a “REST API” or a “RESTful API.” In this Section, we don't use any of these terms as it makes no practical sense.
What is “REST”? As we mentioned earlier, in 2000, Roy Fielding, one of the authors of the HTTP and URI specifications, published his doctoral dissertation titled “Architectural Styles and the Design of Network-based Software Architectures,” the fifth chapter of which was named “Representational State Transfer (REST).”1
As anyone can attest by reading this chapter, it features a very much abstract overview of a distributed client-server architecture that is not bound to either HTTP or URL. Furthermore, it does not discuss any API design recommendations. In this chapter, Fielding methodically enumerates restrictions that any software engineer encounters when developing distributed client-server software. Here they are:
@@ -4162,8 +4226,8 @@ X-Idempotency-Token: <token>Do we want to say that REST is a meaningless concept? Definitely not. We only aimed to explain that it allows for quite a broad range of interpretations, which is simultaneously its main power and its main weakness.
On one hand, thanks to the multitude of interpretations, the API developers have built a perhaps vague but useful view of “proper” HTTP API architecture. On the other hand, the lack of concrete definitions has made REST API one of the most “holywar”-inspiring topics, and these holywars are usually quite meaningless as the popular REST concept has nothing to do with the REST described in Fielding's dissertation (and even more so, with the REST described in Fielding's manifesto of 2008).
The terms “REST architectural style” and its derivative “REST API” will not be used in the following chapters since it makes no practical sense as we explained above. We referred to the constraints described by Fielding many times in the previous chapters because, let us emphasize it once more, it is impossible to develop distributed client-server APIs without taking them into account. However, HTTP APIs (meaning JSON-over-HTTP endpoints utilizing the semantics described in the HTTP and URL standards) as we will describe them in the following chapter align well with the “average” understanding of “REST / RESTful API” as per numerous tutorials on the Web.
1 Fielding, R. T. (2001), CHAPTER 5. Representational State Transfer (REST)"
www.ics.uci.edu/~fielding/pubs/dissertation/rest_arch_style.htm
2 Von Neumann Architecture
en.wikipedia.org/wiki/Von_Neumann_architecture
3 Fielding, R. T. REST APIs must be hypertext-driven
roy.gbiv.com/untangled/2008/rest-apis-must-be-hypertext-driven
4 HATEOAS
en.wikipedia.org/wiki/HATEOAS
5 Gupta, L. What is REST
restfulapi.net
The important exercise we must conduct is to describe the format of an HTTP request and response and explain the basic concepts. Many of these may seem obvious to the reader. However, the situation is that even the basic knowledge we require to move further is scattered across vast and fragmented documentation, causing even experienced developers to struggle with some nuances. Below, we will try to compile a structured overview that is sufficient to design HTTP APIs.
-To describe the semantics and formats, we will refer to the brand-new RFC 91101, which replaces no fewer than nine previous specifications dealing with different aspects of the technology. However, a significant volume of additional functionality is still covered by separate standards. In particular, the HTTP caching principles are described in the standalone RFC 9111·2, while the popular PATCH
method is omitted in the main RFC and is regulated by RFC 57893.
The important exercise we must conduct is to describe the format of an HTTP request and response and explain the basic concepts. Many of these may seem obvious to the reader. However, the situation is that even the basic knowledge we require to move further is scattered across vast and fragmented documentation, causing even experienced developers to struggle with some nuances. Below, we will try to compile a structured overview that is sufficient to design HTTP APIs. For a deeper understanding a curious reader might refer to the comprehensive book by David Gourley and Brian Totty.1
+To describe the semantics and formats, we will refer to the brand-new RFC 91102, which replaces no fewer than nine previous specifications dealing with different aspects of the technology. However, a significant volume of additional functionality is still covered by separate standards. In particular, the HTTP caching principles are described in the standalone RFC 91113, while the popular PATCH
method is omitted in the main RFC and is regulated by RFC 57894.
An HTTP request consists of (1) applying a specific verb to a URL, stating (2) the protocol version, (3) additional meta-information in headers, and (4) optionally, some content (request body):
POST /v1/orders HTTP/1.1
Host: our-api-host.tld
@@ -4187,10 +4251,10 @@ Content-Type: application/json
"id": 123
}
-NB: In HTTP/2 (and future HTTP/3), separate binary frames are used for headers and data instead of the holistic text format.4 However, this doesn't affect the architectural concepts we will describe below. To avoid ambiguity, we will provide examples in the HTTP/1.1 format.
+NB: In HTTP/2 (and future HTTP/3), separate binary frames are used for headers and data instead of the holistic text format.5 However, this doesn't affect the architectural concepts we will describe below. To avoid ambiguity, we will provide examples in the HTTP/1.1 format.
A Uniform Resource Locator (URL) is an addressing unit in an HTTP API. Some evangelists of the technology even use the term “URL space” as a synonym for “The World Wide Web.” It is expected that a proper HTTP API should employ an addressing system that is as granular as the subject area itself; in other words, each entity that the API can manipulate should have its own URL.
-The URL format is governed by a separate standard5 developed by an independent body known as the Web Hypertext Application Technology Working Group (WHATWG). The concepts of URLs and Uniform Resource Names (URNs) together constitute a more general entity called Uniform Resource Identifiers (URIs). (The difference between the two is that a URL allows for locating a resource within the framework of some protocol whereas a URN is an “internal” entity name that does not provide information on how to find the resource.)
+The URL format is governed by a separate standard6 developed by an independent body known as the Web Hypertext Application Technology Working Group (WHATWG). The concepts of URLs and Uniform Resource Names (URNs) together constitute a more general entity called Uniform Resource Identifiers (URIs). (The difference between the two is that a URL allows for locating a resource within the framework of some protocol whereas a URN is an “internal” entity name that does not provide information on how to find the resource.)
URLs can be decomposed into sub-components, each of which is optional. While the standard enumerates a number of legacy practices, such as passing logins and passwords in URLs or using non-UTF encodings, we will skip discussing those. Instead, we will focus on the following components that are relevant to the topic of HTTP API design:
https:
)Headers contain metadata associated with a request or a response. They might describe properties of entities being passed (e.g., Content-Length
), provide additional information regarding a client or a server (e.g., User-Agent
, Date
, etc.) or simply contain additional fields that are not directly related to the request/response semantics (such as Authorization
).
The important feature of headers is the possibility to read them before the message body is fully transmitted. This allows for altering request or response handling depending on the headers, and it is perfectly fine to manipulate headers while proxying requests. Many network agents actually do this, i.e., add, remove, or modify headers while proxying requests. In particular, modern web browsers automatically add a number of technical headers, such as User-Agent
, Origin
, Accept-Language
, Connection
, Referer
, Sec-Fetch-*
, etc., and modern server software automatically adds or modifies such headers as X-Powered-By
, Date
, Content-Length
, Content-Encoding
, X-Forwarded-For
, etc.
This freedom in manipulating headers can result in unexpected problems if an API uses them to transmit data as the field names developed by an API vendor can accidentally overlap with existing conventional headers, or worse, such a collision might occur in the future at any moment. To avoid this issue, the practice of adding the prefix X-
to custom header names was frequently used in the past. More than ten years ago this practice was officially discouraged (see the detailed overview in RFC 66486). Nevertheless, the prefix has not been fully abolished, and many semi-standard headers still contain it (notably, X-Forwarded-For
). Therefore, using the X-
prefix reduces the probability of collision but does not eliminate it. The same RFC reasonably suggests using the API vendor name as a prefix instead of X-
. (We would rather recommend using both, i.e., sticking to the X-ApiName-FieldName
format. Here X-
is included for readability [to distinguish standard fields from custom ones], and the company or API name part helps avoid collisions with other non-standard header names).
This freedom in manipulating headers can result in unexpected problems if an API uses them to transmit data as the field names developed by an API vendor can accidentally overlap with existing conventional headers, or worse, such a collision might occur in the future at any moment. To avoid this issue, the practice of adding the prefix X-
to custom header names was frequently used in the past. More than ten years ago this practice was officially discouraged (see the detailed overview in RFC 66487). Nevertheless, the prefix has not been fully abolished, and many semi-standard headers still contain it (notably, X-Forwarded-For
). Therefore, using the X-
prefix reduces the probability of collision but does not eliminate it. The same RFC reasonably suggests using the API vendor name as a prefix instead of X-
. (We would rather recommend using both, i.e., sticking to the X-ApiName-FieldName
format. Here X-
is included for readability [to distinguish standard fields from custom ones], and the company or API name part helps avoid collisions with other non-standard header names).
Additionally, headers are used as control flow instructions for so-called “content negotiation,” which allows the client and server to agree on a response format (through Accept*
headers) and to perform conditional requests that aim to reduce traffic by skipping response bodies, either fully or partially (through If-*
headers, such as If-Range
, If-Modified-Since
, etc.)
One important component of an HTTP request is a method (verb) that describes the operation being applied to a resource. RFC 9110 standardizes eight verbs — namely, GET
, POST
, PUT
, DELETE
, HEAD
, CONNECT
, OPTIONS
, and TRACE
— of which we as API developers are interested in the former four. The CONNECT
, OPTIONS
, and TRACE
methods are technical and rarely used in HTTP APIs (except for OPTIONS
, which needs to be implemented to ensure access to the API from a web browser). Theoretically, the HEAD
verb, which allows for requesting resource metadata only, might be quite useful in API design. However, for reasons unknown to us, it did not take root in this capacity.
Apart from RFC 9110, many other specifications propose additional HTTP verbs, such as COPY
, LOCK
, SEARCH
, etc. — the full list can be found in the registry7. However, only one of them gained widespread popularity — the PATCH
method. The reasons for this state of affairs are quite trivial: the five methods (GET
, POST
, PUT
, DELETE
, and PATCH
) are enough for almost any API.
Apart from RFC 9110, many other specifications propose additional HTTP verbs, such as COPY
, LOCK
, SEARCH
, etc. — the full list can be found in the registry8. However, only one of them gained widespread popularity — the PATCH
method. The reasons for this state of affairs are quite trivial: the five methods (GET
, POST
, PUT
, DELETE
, and PATCH
) are enough for almost any API.
HTTP verbs define two important characteristics of an HTTP call:
Allowed symbols and escaping rules differ as well:
/
, ?
, and #
symbols in a pathTheoretically, it is possible to use kebab-case
everywhere. However, most programming languages do not allow variable names and object fields in kebab-case
, so working with such an API would be quite inconvenient.
To wrap this up, the situation with casing is so spoiled and convoluted that there is no consistent solution to employ. In this book, we follow this rule: tokens are cased according to the common practice for the corresponding request component. If a token's position changes, the casing is changed as well. (However, we're far from recommending following this approach unconditionally. Our recommendation is rather to try to avoid increasing the entropy by choosing a solution that minimizes the probability of misunderstanding.)
-NB: Strictly speaking, JSON stands for “JavaScript Object Notation,” and in JavaScript, the default casing is camelCase
. However, we dare to say that JSON ceased to be a format bound to JavaScript long ago and is now a universal format for organizing communication between agents written in different programming languages. Employing camel_case
allows for easily moving a parameter from a query to a body, which is the most frequent case. Although the inverse solution (i.e., using camelCase
in query parameter names) is also possible.
1 RFC 9110. HTTP Semantics
www.rfc-editor.org/rfc/rfc9110.html
2 RFC 9111. HTTP Caching
www.rfc-editor.org/rfc/rfc9111.html
3 PATCH Method for HTTP
www.rfc-editor.org/rfc/rfc5789.html
4 Grigorik, I. (2013), Chapter 12. HTTP/2
hpbn.co/http2
5 URL Living Standard
url.spec.whatwg.org
6 Deprecating the "X-" Prefix and Similar Constructs in Application Protocols
www.rfc-editor.org/rfc/rfc6648
7 Hypertext Transfer Protocol (HTTP) Method Registry
www.iana.org/assignments/http-methods/http-methods.xhtml
8 Punycode: A Bootstring encoding of Unicode for Internationalized Domain Names in Applications (IDNA)
www.rfc-editor.org/rfc/rfc3492.txt
NB: Strictly speaking, JSON stands for “JavaScript Object Notation,” and in JavaScript, the default casing is camelCase
. However, we dare to say that JSON ceased to be a format bound to JavaScript long ago and is now a universal format for organizing communication between agents written in different programming languages. Employing camel_case
allows for easily moving a parameter from a query to a body, which is the most frequent case. Although the inverse solution (i.e., using camelCase
in query parameter names) is also possible.
1 Gourley D., Totty, B. (2002), HTTP: The Definitive Guide
2 RFC 9110. HTTP Semantics
www.rfc-editor.org/rfc/rfc9110.html
3 RFC 9111. HTTP Caching
www.rfc-editor.org/rfc/rfc9111.html
4 PATCH Method for HTTP
www.rfc-editor.org/rfc/rfc5789.html
5 Grigorik, I. (2013), Chapter 12. HTTP/2
hpbn.co/http2
6 URL Living Standard
url.spec.whatwg.org
7 Deprecating the "X-" Prefix and Similar Constructs in Application Protocols
www.rfc-editor.org/rfc/rfc6648
8 Hypertext Transfer Protocol (HTTP) Method Registry
www.iana.org/assignments/http-methods/http-methods.xhtml
9 Punycode: A Bootstring encoding of Unicode for Internationalized Domain Names in Applications (IDNA)
www.rfc-editor.org/rfc/rfc3492.txt
Now let's discuss the specifics: what does it mean exactly to “follow the protocol's semantics” and “develop applications in accordance with the REST architectural style”? Remember, we are talking about the following principles:
This approach might be further enhanced by introducing granular permissions to carry out specific actions, access levels, additional ACL service calls, etc.
-Importantly, the visible redundancy of the format ceases to exist: user_id
in the request is now not duplicated in the token payload as these identifiers carry different semantics: on which resource the operation is performed against who performs it. The two often coincide, but this coincidence is just a special case. Unfortunately, this doesn't negate the fact that it's quite easy simply to forget to implement this unobvious check in the code. This is the way.
1 JSON Web Token (JWT)
www.rfc-editor.org/rfc/rfc7519
Importantly, the visible redundancy of the format ceases to exist: user_id
in the request is now not duplicated in the token payload as these identifiers carry different semantics: on which resource the operation is performed against who performs it. The two often coincide, but this coincidence is just a special case. Unfortunately, this doesn't negate the fact that it's quite easy simply to forget to implement this unobvious check in the code. This is the way.
NB. A curious reader may note an important problem with this setup: the list of authorized entities (user_ids
in our case) is encoded in the token itself when it is issued. If permissions change (let's say, if a specific ID is removed from the list), it will not affect existing tokens. This contributes to the general problem of invalidating stateless tokens; the usual approach to tackle this is (a) making tokens themselves short-lived so they are refreshed often, and (b) maintaining a cache of issued or revoked tokens.2 Though implementing these techniques might be challenging, it is anyway a more scalable solution than checking permissions on every call.
1 JSON Web Token (JWT)
www.rfc-editor.org/rfc/rfc7519
2 Madden, N. (2020), 6.5 Handling token revocation
As we noted on several occasions in the previous chapters, neither the HTTP and URL standards nor REST architectural principles prescribe concrete semantics for the meaningful parts of a URL (notably, path fragments and key-value pairs in the query). The rules for organizing URLs in an HTTP API exist only to improve the API's readability and consistency from the developers' perspective. However, this doesn't mean they are unimportant. Quite the opposite: URLs in HTTP APIs are a means of describing abstraction levels and entities' responsibility areas. A well-designed API hierarchy should be reflected in a well-designed URL nomenclature.
NB: The lack of specific guidance from the specification editors naturally led to developers inventing it themselves. Many of these spontaneous practices can be found on the Internet, such as the requirement to use only nouns in URLs. They are often claimed to be a part of the standards or REST architectural principles (which they are not). Nevertheless, deliberately ignoring such self-proclaimed “best practices” is a rather risky decision for an API vendor as it increases the chances of being misunderstood.
Traditionally, the following semantics are considered to be the default:
@@ -5006,10 +5071,10 @@ Retry-After: 5and include protection against these attack vectors at the webserver software level. The OWASP community provides a good cheatsheet on the best HTTP API security practices.6
+and include protection against these attack vectors at the webserver software level. The OWASP community provides a good cheatsheet on the best HTTP API security practices,6 or one may refer to the Andrew Hoffman's7 and Neil Madden's8 books we've already recommended earlier.
-In conclusion, we would like to make the following statement: building an HTTP API is relying on the common knowledge of HTTP call semantics and drawing benefits from it by leveraging various software built upon this paradigm, from client frameworks to server gateways, and developers reading and understanding API specifications. In this sense, the HTTP ecosystem provides probably the most comprehensive vocabulary, both in terms of profoundness and adoption, compared to other technologies, allowing for describing many different situations that may arise in client-server communication. While the technology is not perfect and has its flaws, for a public API vendor, it is the default choice, and opting for other technologies rather needs to be substantiated as of today.
1 Fetch Living Standard. CORS protocol
fetch.spec.whatwg.org/#http-cors-protocol
2 Cross Site Request Forgery (CSRF)
owasp.org/www-community/attacks/csrf
3 Server Side Request Forgery
owasp.org/www-community/attacks/Server_Side_Request_Forgery
4 HTTP Response Splitting
owasp.org/www-community/attacks/HTTP_Response_Splitting
5 Unvalidated Redirects and Forwards Cheat Sheet
cheatsheetseries.owasp.org/cheatsheets/Unvalidated_Redirects_and_Forwards_Cheat_Sheet.html
6 REST Security Cheat Sheet
cheatsheetseries.owasp.org/cheatsheets/REST_Security_Cheat_Sheet.html
In conclusion, we would like to make the following statement: building an HTTP API is relying on the common knowledge of HTTP call semantics and drawing benefits from it by leveraging various software built upon this paradigm, from client frameworks to server gateways, and developers reading and understanding API specifications. In this sense, the HTTP ecosystem provides probably the most comprehensive vocabulary, both in terms of profoundness and adoption, compared to other technologies, allowing for describing many different situations that may arise in client-server communication. While the technology is not perfect and has its flaws, for a public API vendor, it is the default choice, and opting for other technologies rather needs to be substantiated as of today.
1 Fetch Living Standard. CORS protocol
fetch.spec.whatwg.org/#http-cors-protocol
2 Cross Site Request Forgery (CSRF)
owasp.org/www-community/attacks/csrf
3 Server Side Request Forgery
owasp.org/www-community/attacks/Server_Side_Request_Forgery
4 HTTP Response Splitting
owasp.org/www-community/attacks/HTTP_Response_Splitting
5 Unvalidated Redirects and Forwards Cheat Sheet
cheatsheetseries.owasp.org/cheatsheets/Unvalidated_Redirects_and_Forwards_Cheat_Sheet.html
6 REST Security Cheat Sheet
cheatsheetseries.owasp.org/cheatsheets/REST_Security_Cheat_Sheet.html
7 Hoffman, A. (2024), Web Application Security
8 Madden, N. (2020), API Security in Action
As we mentioned in the Introduction, the term “SDK” (which stands for “Software Development Kit”) lacks concrete meaning. The common understanding is that an SDK differs from an API as it provides both program interfaces and tools to work with them. This definition is hardly satisfactory as today any technology is likely to be provided with a bundled toolset.
However, there is a very specific narrow definition of an SDK: it is a client library that provides a high-level interface (usually a native one) to some underlying platform (such as a client-server API). Most often, we talk about libraries for mobile OSes or Web browsers that work on top of a general-purpose HTTP API.
Among such client SDKs, one case is of particular interest to us: those libraries that not only provide programmatic interfaces to work with an API but also offer ready-to-use visual components for developers. A classic example of such an SDK is the UI libraries provided by cartographical services. Since developing a map engine, especially a vector one, is a very complex task, maps API vendors provide both “wrappers” to their HTTP APIs (such as a search function) and visual components to work with geographical entities. The latter often include general-purpose elements (such as buttons, placemarks, context menus, etc.) that can be used independently from the main functionality of the API.
@@ -5021,9 +5086,9 @@ Retry-After: 5To avoid being wordy, we will use the term “SDK” for the former and “UI libraries” for the latter.
NB: Strictly speaking, a UI library might either include a client-server API “wrapper” or not (i.e., just provide a “pure” API to some underlying system engine). In this Section, we will mostly talk about the first option as it is the most general case and the most challenging one in terms of API design. Most SDK development patterns we will discuss are also applicable to “pure” libraries.
As UI is a high-level abstraction built upon OS primitives, there are specialized visual component frameworks available for almost every platform. Choosing such a framework might be regretfully challenging. For instance, in the case of the Web platform, which is both low-level and highly popular, the number of competing technologies for SDK development is beyond imagination. We could mention the most popular ones today, including React1, Angular·2, Svelte·3, Vue.js·4, as well as those that maintain a strong presence like Bootstrap·5 and Ember.·6 Among these technologies, React demonstrates the most widespread adoption, still measured in single-digit percentages.·7 At the same time, components written in “pure” JavaScript/CSS often receive criticism for being less convenient to use in these frameworks as each of them implements a rigid methodology. The situation with developing visual libraries for Windows is quite similar. The question of “which framework to choose for developing UI components for these platforms” regretfully has no simple answer. In fact, one will need to evaluate the markets and make decisions regarding each individual framework.
+As UI is a high-level abstraction built upon OS primitives, there are specialized visual component frameworks available for almost every platform. Choosing such a framework might be regretfully challenging. For instance, in the case of the Web platform, which is both low-level and highly popular, the number of competing technologies for SDK development is beyond imagination. We could mention the most popular ones today, including React1, Angular2, Svelte3, Vue.js4, as well as those that maintain a strong presence like Bootstrap5 and Ember.6 Among these technologies, React demonstrates the most widespread adoption, still measured in single-digit percentages.7 At the same time, components written in “pure” JavaScript/CSS often receive criticism for being less convenient to use in these frameworks as each of them implements a rigid methodology. The situation with developing visual libraries for Windows is quite similar. The question of “which framework to choose for developing UI components for these platforms” regretfully has no simple answer. In fact, one will need to evaluate the markets and make decisions regarding each individual framework.
In the case of actual mobile platforms (and MacOS), the current state of affairs is more favorable as they are more homogeneous. However, a different problem arises: modern applications typically need to support several such platforms simultaneously, which leads to code (and API nomenclatures) duplication.
-One potential solution could be using cross-platform mobile (React Native8, Flutter·9, Xamarin·10, etc.) and desktop (JavaFX·11, QT·12, etc.) frameworks, or specialized technologies for specific tasks (such as Unity·13 for game development). The inherent advantages of these technologies are faster code-writing and universalism (of both code and software engineers). The disadvantages are obvious as well: achieving maximum performance could be challenging, and many platform tools (such as debugging and profiling) will not work. As of today, we rather see a parity between these two approaches (several independent applications for each platform vs. one cross-platform application).
2 Angular
angular.io
3 Svelte
svelte.dev
5 Bootstrap
getbootstrap.com
6 Ember
emberjs.com
7 How Many Websites Use React in 2023? (Usage Statistics)
increditools.com/react-usage-statistics
8 React Native
reactnative.dev
9 Flutter
flutter.dev
11 JavaFX
openjfx.io
One potential solution could be using cross-platform mobile (React Native8, Flutter9, Xamarin10, etc.) and desktop (JavaFX11, QT12, etc.) frameworks, or specialized technologies for specific tasks (such as Unity13 for game development). The inherent advantages of these technologies are faster code-writing and universalism (of both code and software engineers). The disadvantages are obvious as well: achieving maximum performance could be challenging, and many platform tools (such as debugging and profiling) will not work. As of today, we rather see a parity between these two approaches (several independent applications for each platform vs. one cross-platform application).
2 Angular
angular.io
3 Svelte
svelte.dev
5 Bootstrap
getbootstrap.com
6 Ember
emberjs.com
7 How Many Websites Use React in 2023? (Usage Statistics)
increditools.com/react-usage-statistics
8 React Native
reactnative.dev
9 Flutter
flutter.dev
11 JavaFX
openjfx.io
The first question we need to clarify about SDKs (let us reiterate that we use this term to denote a native client library that allows for working with a technology-agnostic underlying client-server API) is why SDKs exist in the first place. In other words, why is using “wrappers” more convenient for frontend developers than working with the underlying API directly?
Several reasons are obvious:
Finally, SearchBox
doesn't interact with either of them and only provides a context, methods to change it, and the corresponding notifications.
By making these reductions, in fact, we end up with a setup that follows the “Model-View-Controller” (MVC) methodology1. OfferList
and OfferPanel
(also, the code that displays the input field) constitute a view that the user observes and interacts with. Composer
is a controller that listens to the view's events and modifies a model (SearchBox
itself).
By making these reductions, in fact, we end up with a setup that follows the “Model-View-Controller” (MVC) methodology. This is one of the very first patterns for designing user interfaces proposed as early as 1979 by Trygve Reenskaug.1·2 OfferList
and OfferPanel
(also, the code that displays the input field) constitute a view that the user observes and interacts with. Composer
is a controller that listens to the view's events and modifies a model (SearchBox
itself).
NB: to follow the letter of the paradigm, we must separate the model, which will be responsible only for the data, from SearchBox
itself. We leave this exercise to the reader.
If we choose other options for reducing interaction directions, we will get other MV* frameworks (such as Model-View-Viewmodel, Model-View-Presenter, etc.). All of them are ultimately based on the “Model” pattern.
@@ -5855,11 +5920,11 @@ api.subscribe(This rigidity, however, bears disadvantages as well. If we try to fully define the component's state, we must include such technicalities as, let's say, all animations being executed (and even the current percentages of execution). Therefore, a model will include all data of all abstraction levels for both hierarchies (semantic and visual) and also the calculated option values. In our example, this means that the model will store, for example, the currentSelectedOffer
field for OfferPanel
to use, the list of buttons in the panel, and even the calculated icon URLs for those buttons.
Such a full model poses a problem not only semantically and theoretically (as it mixes up heterogeneous data in one entity) but also very practically. Serializing such models will be bound to a specific API or application version (as they store all the technical fields, including those not exposed publicly in the API). Changing subcomponent implementation will result in breaking backward compatibility as old links and cached state will be unrestorable (or we will have to maintain a compatibility level to interpret serialized models from past versions).
Another ideological problem is organizing nested controllers. If there are subordinate subcomponents in the system, all the problems that an MV* approach solved return at a higher level: we have to allow nested controllers either to modify a global model or to call parent controllers. Both solutions imply strong coupling and require exquisite interface design skill; otherwise reusing components will be very hard.
-If we take a closer look at modern UI libraries that claim to employ MV* paradigms, we will learn they employ it quite loosely. Usually, only the main principle that a model defines UI and can only be modified through controllers is adopted. Nested components usually have their own models (in most cases, comprising a subset of the parent model enriched with the component's own state), and the global model contains only a limited number of fields. This approach is implemented in many modern UI frameworks, including those that claim they have nothing to do with MV* paradigms (React, for instance2·3).
-All these problems of the MVC paradigm were highlighted by Martin Fowler in his “GUI Architectures” essay.4 The proposed solution is the “Model-View-Presenter” framework, in which the controller entity is replaced with a presenter. The responsibility of the presenter is not only translating events, but preparing data for views as well. This allows for full separation of abstraction levels (a model now stores only semantic data while a presenter transforms it into low-level parameters that define UI look; the set of these parameters is called the “Application Model” or “Presentation Model” in Fowler's text).
+If we take a closer look at modern UI libraries that claim to employ MV* paradigms, we will learn they employ it quite loosely. Usually, only the main principle that a model defines UI and can only be modified through controllers is adopted. Nested components usually have their own models (in most cases, comprising a subset of the parent model enriched with the component's own state), and the global model contains only a limited number of fields. This approach is implemented in many modern UI frameworks, including those that claim they have nothing to do with MV* paradigms (React, for instance3·4).
+All these problems of the MVC paradigm were highlighted by Martin Fowler in his “GUI Architectures” essay.5 The proposed solution is the “Model-View-Presenter” framework, in which the controller entity is replaced with a presenter. The responsibility of the presenter is not only translating events, but preparing data for views as well. This allows for full separation of abstraction levels (a model now stores only semantic data while a presenter transforms it into low-level parameters that define UI look; the set of these parameters is called the “Application Model” or “Presentation Model” in Fowler's text).
Fowler's paradigm closely resembles the Composer
concept we discussed in the previous chapter with one notable deviation. In MVP, a presenter is stateless (with possible exceptions of caches and closures) and it only deduces the data needed by views from the model data. If some low-level property needs to be manipulated, such as text color, the model needs to be extended in a manner that allows the presenter to calculate text color based on some high-level model data field. This concept significantly narrows the capability to replace subcomponents with alternate implementations.
NB: let us clarify that the author of this book is not proposing Composer
as an alternative MV* methodology. The message in the previous chapter is that complex scenarios of decomposing UI components are only solved with artificially-introduced “bridges” of additional abstraction layers. How this bridge is called and what rules it brings are not as important.
1 MVC
en.wikipedia.org/wiki/Model%E2%80%93view%E2%80%93controller
2 Why did we build React?
legacy.reactjs.org/blog/2013/06/05/why-react.html
3 Mattiazzi, R. How React and Redux brought back MVC and everyone loved it
rangle.io/blog/how-react-and-redux-brought-back-mvc-and-everyone-loved-it
4 Fowler, M. (2006) GUI Architectures
NB: let us clarify that the author of this book is not proposing Composer
as an alternative MV* methodology. The message in the previous chapter is that complex scenarios of decomposing UI components are only solved with artificially-introduced “bridges” of additional abstraction layers. How this bridge is called and what rules it brings are not as important.
1 MVC
en.wikipedia.org/wiki/Model%E2%80%93view%E2%80%93controller
2 Reenskaug, T. (1979) MVC
folk.universitetetioslo.no/trygver/themes/mvc/mvc-index.html
3 Why did we build React?
legacy.reactjs.org/blog/2013/06/05/why-react.html
4 Mattiazzi, R. How React and Redux brought back MVC and everyone loved it
rangle.io/blog/how-react-and-redux-brought-back-mvc-and-everyone-loved-it
5 Fowler, M. (2006), GUI Architectures
Another method of reducing the complexity of building “bridges” that connect different subject areas in one component is to eliminate one of them. For instance, business logic could be removed: components might be entirely abstract, and the translation of UI events into useful actions hidden beyond the developer's control.
In this paradigm, the offer search code would look like this:
class SearchBox {
@@ -6721,7 +6786,7 @@ button.view.compu
Moving Forward
Finally, apart from those specific issues, your customers must be caring about more general questions: could they trust you? Could they rely on your API evolving, absorbing modern trends, or will they eventually find the integration with your API in the scrapyard of history? Let's be honest: given all the uncertainties of the API product vision, we are very much interested in the answers as well. Even the Roman viaduct, though remaining backward-compatible for two thousand years, has been a very archaic and non-reliable way of solving customers' problems for quite a long time.
You might work with these customer expectations by publishing roadmaps. It's quite common that many companies avoid publicly announcing their concrete plans (for a reason, of course). Nevertheless, in the case of APIs, we strongly recommend providing roadmaps, even if they are tentative and lack precise dates — especially if we talk about deprecating some functionality. Announcing these promises (given the company keeps them, of course) is a very important competitive advantage for every kind of consumer.
-With this, we would like to conclude this book. We hope that the principles and the concepts we have outlined will help you in creating APIs that fit all the developers, businesses, and end users' needs and in expanding them (while maintaining backward compatibility) for the next two thousand years or so.
Birrell, A. D., Nelson, B. J. (1984) Implementing Remote Procedure Calls. ACM Transactions on Computer Systems (TOCS), Volume 2, Issue 1. Pages 39 - 59
dl.acm.org/doi/10.1145/2080.357392
Fielding, R. T. (2001) Architectural Styles and the Design of Network-based Software Architectures
ics.uci.edu/~fielding/pubs/dissertation/top.htm
Fowler, M. (2006) GUI Architectures
www.martinfowler.com/eaaDev/uiArchs.html
Gamma, E., Helm, R., Johnson, R., Vlissides, J. (1994) Design Patterns. Elements of Reusable Object-Oriented Software
ISBN 9780321700698
Grigorik, I. (2013) High Performance Browser Networking
ISBN 9781449344764
hpbn.co
Hoffman, A. (2024) Web Application Security. Second Edition
ISBN 9781098143930
Martin, R. C. (2023) Functional Design: Principles, Patterns, and Practices
ISBN 9780138176518
Nelson, B. J. (1981) Remote Procedure Call
dl.acm.org/doi/10.5555/910306