diff --git a/docs/API.en.epub b/docs/API.en.epub index 42c7eac..ebadf42 100644 Binary files a/docs/API.en.epub and b/docs/API.en.epub differ diff --git a/docs/API.en.html b/docs/API.en.html index db219e6..d227c6b 100644 --- a/docs/API.en.html +++ b/docs/API.en.html @@ -135,10 +135,10 @@

Introduction

Chapter 1. On the Structure of This Book

The book you're holding in your hands is dedicated to developing APIs as a separate engineering task. Although many concepts we're going to discuss apply to any type of software, our primary goal is to describe those problems and approaches to solving them that are most relevant in the context of the API subject area.

-

We expect that the reader possesses expertise in software engineering, so we do not provide detailed definitions and explanations of the terms that a developer should already be familiar with in our understanding. Without this knowledge, it will be rather uncomfortable to read the last section of the book (and even more so, other sections). We sincerely apologize for this but that's the only way of writing the book without tripling its size.

The book comprises the Introduction and six large sections. The first three (namely, “The API Design”, “The API Patterns”, and “The Backward Compatibility”) are fully abstract and not bound to any concrete technology. We hope they will help those readers who seek to build a systematic understanding of the API architecture in developing complex interface hierarchies. The proposed approach, as we see it, allows for designing APIs from start to finish, from a raw idea to concrete implementation.

The fourth and fifth sections are dedicated to specific technologies, namely developing HTTP APIs (in the “REST paradigm”) and SDKs (we will mostly talk about UI component libraries).

Finally, in the sixth section, which is the least technical of all, we will discuss APIs as products and focus on non-engineering aspects of the API lifecycle: doing market research, positioning the service, communicating to consumers, setting KPIs for the team, etc. We insist that the last section is equally important to both PMs and software engineers as products for developers thrive only if the product and technical teams work jointly on them.

+

We expect that the reader possesses expertise in software engineering, so we do not provide detailed definitions and explanations of the terms that a developer should already be familiar with in our understanding. Without this knowledge, it will be rather uncomfortable to read the last section of the book (and even more so, other sections). We sincerely apologize for this but that's the only way of writing the book without tripling its size. We provide the list of recommended readings in the “Bibliography” section.

Let's start.

Chapter 2. The API Definition

Before we start talking about the API design, we need to explicitly define what the API is. Encyclopedias tell us that “API” is an acronym for “Application Program Interface.” This definition is fine but useless, much like the “Man” definition by Plato: “Man stands upright on two legs without feathers.” This definition is fine again, but it gives us no understanding of what's so important about a Man. (Actually, it's not even “fine”: Diogenes of Sinope once brought a plucked chicken, saying “That's Plato's Man.” And Plato had to add “with broad nails” to his definition.)

What does the API mean apart from the formal definition?

@@ -154,6 +154,7 @@

What differs between a Roman aqueduct and a good API is that in the case of APIs, the contract is presumed to be programmable. To connect the two areas, writing some code is needed. The goal of this book is to help you design APIs that serve their purposes as solidly as a Roman aqueduct does.

An aqueduct also illustrates another problem with the API design: your customers are engineers themselves. You are not supplying water to end-users. Suppliers are plugging their pipes into your engineering structure, building their own structures upon it. On the one hand, you may provide access to water to many more people through them, not spending your time plugging each individual house into your network. On the other hand, you can't control the quality of suppliers' solutions, and you are to blame every time there is a water problem caused by their incompetence.

+

The situation with API design becomes even more complicated when we acknowledge that modern APIs are typically interfaces to distributed systems. There is no single aqueduct but rather a collection of connections between multiple sources and destinations, often established on-demand — and your task is to make these connections work coherently so that clients don't even need to know how complex this water distribution architecture is internally.

That's why designing an API implies a larger area of responsibility. An API is a multiplier to both your opportunities and your mistakes.

Chapter 3. API Quality Criteria

Before we start laying out the recommendations for designing API architecture, we ought to specify what constitutes a “high-quality API,” and what the benefits of having a high-quality API are. Quite obviously, the quality of an API is primarily defined through its capability to solve developers' and users' problems. (Let's leave out the part where an API vendor pursues its own goals, not providing a useful product.)

So, how can a “high-quality” API design assist developers in solving their (and their users') problems? Quite simply: a well-designed API allows developers to do their jobs in the most efficient and convenient manner. The gap between formulating a task and writing working code must be as short as possible. Among other things, this means that:

@@ -266,14 +267,23 @@ Cache-Control: no-cache

Apart from HTTP API notation, we will employ C-style pseudocode, or, to be more precise, JavaScript-like or Python-like one since types are omitted. We assume such imperative structures are readable enough to skip detailed grammar explanations. HTTP API-like samples intend to illustrate the contract, i.e., how we would design an API. Samples in pseudocode are intended to illustrate how developers might work with the API in their code, or how we would implement SDKs based on the contract.

Section I. The API Design

Chapter 9. The API Contexts Pyramid

The approach we use to design APIs comprises four steps:

This four-step algorithm actually builds an API from top to bottom, from common requirements and use case scenarios down to a refined nomenclature of entities. In fact, moving this way will eventually conclude with a ready-to-use API, and that's why we value this approach highly.

It might seem that the most useful pieces of advice are given in the last chapter, but that's not true. The cost of a mistake made at certain levels differs. Fixing the naming is simple; revising the wrong understanding of what the API stands for is practically impossible.

-

NB: Here and throughout we will illustrate the API design concepts using a hypothetical example of an API that allows ordering a cup of coffee in city cafes. Just in case: this example is totally synthetic. If we were to design such an API in the real world, it would probably have very little in common with our fictional example.

Chapter 10. Defining an Application Field

+

Here and throughout we will illustrate the API design concepts using a hypothetical example of an API that allows ordering a cup of coffee in city cafes. Just in case: this example is totally synthetic. If we were to design such an API in the real world, it would probably have very little in common with our fictional example.

+

NB. A knowledgeable reader might notice that the approach we discuss is quite similar to the concept of “Levels of Design” proposed by Steve McConnell in his definitive book.1 This is both true and not true at the same time. On one hand, as APIs are software, all the classical architecture design patterns work for them, including those described by McConnell. On the other hand, there is a major difference between exposing APIs and working on shared code: you only provide the contract to customers, as they are unable and/or unwilling to check the code itself. This shifts the focus significantly, starting from the very first McConnell's design level: while it is your number-one task to split the grand design into subsystems when you develop a software project as an architect, it is often undesirable to provide the notion of your subsystem split in the API, as API consumers do not need to know about it. In the following chapters, we will focus on providing a well-designed nomenclature of entities that is both convenient for external developers and allows for implementing efficient architecture under the hood.

References

Chapter 10. Defining an Application Field

The key question you should ask yourself before starting to develop any software product, including an API, is: what problem do we solve? It should be asked four times, each time putting emphasis on a different word.

  1. @@ -1000,7 +1010,8 @@ For example, the invalid price error is resolvable: a client could obtain a new

    It is also worth mentioning that unresolvable errors are useless to a user at the time of the error occurrence (since the client couldn't react meaningfully to unknown errors). Still, providing extended error data is not excessive as a developer will read it while fixing the issue in their code.

    Decomposing Interfaces. The “7±2” Rule

    From our own API development experience, we can tell without a doubt that the greatest final interface design mistake (and the greatest developer's pain accordingly) is the excessive overloading of entities' interfaces with fields, methods, events, parameters, and other attributes.

    -

    Meanwhile, there is the “Golden Rule” of interface design (applicable not only to APIs but almost to anything): humans can comfortably keep 7±2 entities in short-term memory. Manipulating a larger number of chunks complicates things for most humans. The rule is also known as Miller's Law1.

    +

    Meanwhile, there is the “Golden Rule” of interface design (applicable not only to APIs but almost to anything): humans can comfortably keep 7±2 entities in short-term memory. Manipulating a larger number of chunks complicates things for most humans. The rule is also known as Miller's Law.1

    +

    NB. The law shouldn't be taken literally, as its direct applicability to human cognition in general and software engineering in particular is quite controversial. Still, many influential works (such as the foundational research by Victor Basili, Lionel Briand, and Walcelio Melo2 and its numerous follow-ups by other authors) claim that an increased number of methods in classes and analogous metrics indicate poor code quality. While the exact numbers are debatable, we envision the “7±2” rule as good guidance.

    The only possible method of overcoming this law is decomposition. Entities should be grouped under a single designation at every concept level of the API so that developers never have to operate on more than a reasonable amount of entities (let's say, ten) at a time.

    Let's take a look at the coffee machine search function response in our API. To ensure an adequate UX of the app, quite bulky datasets are required:

    {
    @@ -1075,7 +1086,7 @@ For example, the invalid price error is resolvable: a client could obtain a new
     

    Such a decomposed API is much easier to read than a long list of different attributes. Furthermore, it's probably better to group even more entities in advance. For example, a place and a route could be nested fields under a synthetic location property, or offer and pricing fields might be combined into some generalized object.

    It is important to say that readability is achieved not only by merely grouping the entities. Decomposing must be performed in such a manner that a developer, while reading the interface, instantly understands, “Here is the place description of no interest to me right now, no need to traverse deeper.” If the data fields needed to complete some action are scattered all over different composites, the readability doesn't improve and even degrades.

    -

    Proper decomposition also helps with extending and evolving an API. We'll discuss the subject in Section III.

    References

    Chapter 13. Describing Final Interfaces

    +

    Proper decomposition also helps with extending and evolving an API. We'll discuss the subject in Section III.

    References

    Chapter 13. Describing Final Interfaces

    When all entities, their responsibilities, and their relations to each other are defined, we proceed to the development of the API itself. We need to describe the objects, fields, methods, and functions nomenclature in detail. In this chapter, we provide practical advice on making APIs usable and understandable.

    One of the most important tasks for an API developer is to ensure that code written by other developers using the API is easily readable and maintainable. Remember that the law of large numbers always works against you: if a concept or call signature can be misunderstood, it will be misunderstood by an increasing number of partners as the API's popularity grows.

    NB: The examples in this chapter are meant to illustrate the consistency and readability problems that arise during API development. We do not provide specific advice on designing REST APIs (such advice will be given in the corresponding section of this book) or programming languages' standard libraries. The focus is o the idea, not specific syntax.

    @@ -1725,9 +1736,9 @@ X-Idempotency-Token: <token>
    24. Don't Invent Security Practices

    If the author of this book were given a dollar each time he had to implement an additional security protocol invented by someone, he would be retired by now. API developers' inclination to create new signing procedures for requests or complex schemes of exchanging passwords for tokens is both obvious and meaningless.

    -

    First, there is no need to reinvent the wheel when it comes to security-enhancing procedures for various operations. All the algorithms you need are already invented, just adopt and implement them. No self-invented algorithm for request signature checking can provide the same level of protection against a Manipulator-in-the-middle (MitM) attack3 as a mutual TLS authentication with certificate pinning.·4

    -

    Second, assuming oneself to be an expert in security is presumptuous and dangerous. New attack vectors emerge daily, and staying fully aware of all actual threats is a full-time job. If you do something different during workdays, the security system you design will contain vulnerabilities that you have never heard about — for example, your password-checking algorithm might be susceptible to a timing attack5 or your webserver might be vulnerable to a request splitting attack.·6

    -

    The OWASP Foundation compiles a list of the most common vulnerabilities in APIs every year,7 which we strongly recommend studying. We also recommend a definitive work by Andrew Hoffman·8 for everyone interested in Web security.

    +

    First, there is no need to reinvent the wheel when it comes to security-enhancing procedures for various operations. All the algorithms you need are already invented, just adopt and implement them. No self-invented algorithm for request signature checking can provide the same level of protection against a Manipulator-in-the-middle (MitM) attack3 as a mutual TLS authentication with certificate pinning.4

    +

    Second, assuming oneself to be an expert in security is presumptuous and dangerous. New attack vectors emerge daily, and staying fully aware of all actual threats is a full-time job. If you do something different during workdays, the security system you design will contain vulnerabilities that you have never heard about — for example, your password-checking algorithm might be susceptible to a timing attack5 or your webserver might be vulnerable to a request splitting attack.6

    +

    The OWASP Foundation compiles a list of the most common vulnerabilities in APIs every year,7 which we strongly recommend studying. We also recommend a definitive work by Andrew Hoffman8 for everyone interested in Web security.

    And just in case: all APIs must be provided over TLS 1.2 or higher (preferably 1.3).

    25. Help Partners With Security

    It is equally important to provide interfaces to partners that minimize potential security problems for them.

    @@ -1805,7 +1816,7 @@ X-Idempotency-Token: <token>

    Sometimes explicit location passing is not enough since there are lots of territorial conflicts in the world. How the API should behave when user coordinates lie within disputed regions is a legal matter, regretfully. The author of this book once had to implement a “state A territory according to state B official position” concept.

    Important: mark a difference between localization for end users and localization for developers. In the examples above, the localized_message field is meant for the user; the app should show it if no specific handler for this error exists in the client code. This message must be written in the user's language and formatted according to the user's location. But the details.checks_failed[].message is meant to be read by developers examining the problem. So it must be written and formatted in a manner that suits developers best — which usually means “in English,” as English is a de facto standard in software development.

    It is worth mentioning that the localized_ prefix in the examples is used to differentiate messages to users from messages to developers. A concept like that must be, of course, explicitly stated in your API docs.

    -

    And one more thing: all strings must be UTF-8, no exclusions.

    References

    Chapter 14. Annex to Section I. Generic API Example

    +

    And one more thing: all strings must be UTF-8, no exclusions.

    References

    Chapter 14. Annex to Section I. Generic API Example

    Let's summarize the current state of our API study.

    1. Offer Search
    POST /v1/offers/search
    @@ -1962,9 +1973,9 @@ X-Idempotency-Token: <token>
     
    // Terminates the runtime
     POST /v1/runtimes/{id}/terminate
     

Section II. The API Patterns

Chapter 15. On Design Patterns in the API Context

-

The concept of “patterns” in the field of software engineering was introduced by Kent Beck and Ward Cunningham in 19871 and popularized by “The Gang of Four” (Erich Gamma, Richard Helm, Ralph Johnson, and John Vlissides) in their book “Design Patterns: Elements of Reusable Object-Oriented Software,” which was published in 1994.·2 According to the most widespread definition, a software design pattern is a “general, reusable solution to a commonly occurring problem within a given context.”

+

The concept of “patterns” in the field of software engineering was introduced by Kent Beck and Ward Cunningham in 19871 and popularized by “The Gang of Four” (Erich Gamma, Richard Helm, Ralph Johnson, and John Vlissides) in their book “Design Patterns: Elements of Reusable Object-Oriented Software,” which was published in 1994.2 According to the most widespread definition, a software design pattern is a “general, reusable solution to a commonly occurring problem within a given context.”

If we talk about APIs, especially those to which developers are end users (e.g., frameworks or operating system interfaces), the classical software design patterns are well applicable to them. Indeed, many examples in the previous Section of this book are just about applying some design patterns.

-

However, if we try to extend this approach to include API development in general, we will soon find that many typical API design issues are high-level and can't be reduced to basic software patterns. Let's say, caching resources (and invalidating the cache) or organizing paginated access are not covered in classical writings.

+

However, if we try to extend this approach to include API development in general (which, let us remind to the reader, is typically about building interfaces to distributed systems), we will soon find that many typical API design issues are high-level and can't be reduced to basic software patterns. Let's say, caching resources (and invalidating the cache) or organizing paginated access are not covered in classical writings.

In this Section, we will specify those API design problems that we see as the most important ones. We are not aiming to encompass every problem, let alone every solution, and rather focus on describing approaches to solving typical problems with their pros and cons. We do understand that readers familiar with the works of “The Gang of Four,” Grady Booch, and Martin Fowler might expect a more systematic approach and greater depth of outreach from a section called “The API Patterns,” and we apologize to them in advance.

NB: The first such pattern we need to mention is the API-first approach to software engineering, which we described in the corresponding chapter.

The Fundamentals of Solving Typical API Design Problems

@@ -1983,7 +1994,7 @@ X-Idempotency-Token: <token>
  • Operations that affect shared resources should have locking mechanisms in place for the duration of the operation.
  • -

    References

    Chapter 16. Authenticating Partners and Authorizing API Calls

    +

    References

    Chapter 16. Authenticating Partners and Authorizing API Calls

    Before we proceed further to discussing technical matters, we feel obliged to provide an overview of the problems related to authorizing API calls and authenticating clients. Based on the main principle that “an API serves as a multiplier to both your opportunities and mistakes,” organizing authorization and authentication (AA) is one of the most important challenges that any API vendor faces, especially when it comes to public APIs. It is rather surprising that there is no standard approach to this issue, as every big vendor develops its own interface to solve AA problems, and these interfaces are often quite archaic.

    If we set aside implementation details (for which we strongly recommend not reinventing the wheel and using standard techniques and security protocols), there are basically two approaches to authorizing an API call:

    If the API is not about providing additional access to a service for end users, it is usually much easier to opt for the second approach and authorize clients with API keys. In this case, per-endpoint granularity can be achieved (i.e., allowing partners to regulate the set of permitted endpoints for a key), while developing more granular access can be much more complex and because of that rarely see implementations.

    Both approaches can be morphed into each other (e.g., allowing robot users to perform operations on behalf of any other users effectively becomes API key-based authorization; allowing binding of a limited dataset to an API key effectively becomes a user account), and there are some hybrid systems in the wild (where the request must be signed with both an API key and a user token).

    Chapter 17. Synchronization Strategies

    -

    Let's proceed to the technical problems that API developers face. We begin with the last one described in the introductory chapter: the necessity to synchronize states. Let us imagine that a user creates a request to order coffee through our API. While this request travels from the client to the coffee house and back, many things might happen. Consider the following chain of events:

    +

    Let's proceed to the technical problems that API developers face. We begin with the last one described in the introductory chapter: the distributed nature of modern software that necessitates the problem of synchronizing shared states. Let us imagine that a user creates a request to order coffee through our API. While this request travels from the client to the coffee house and back, many things might happen. Consider the following chain of events:

    1. The client sends the order creation request

    2. Because of network issues, the request propagates to the server very slowly, and the client gets a timeout

      +
    3. +
    - +
    1. The client requests the current state of the system and gets an empty response as the initial request still hasn't reached the server:

      let pendingOrders = await 
      -  api.getOngoingOrders(); // → []
      +api.getOngoingOrders(); // → []
       
    2. @@ -2033,17 +2046,18 @@ X-Idempotency-Token: <token>

      The client, being unaware of this, tries to create an order anew.

    -

    As the operations of reading the list of ongoing orders and of creating a new order happen at different moments of time, we can't guarantee that the system state hasn't changed in between. If we do want to have this guarantee, we must implement some synchronization strategy1. In the case of, let's say, operating system APIs or client frameworks we might rely on the primitives provided by the platform. But in the case of distributed client-server APIs, we would need to implement such a primitive of our own.

    +

    As the operations of reading the list of ongoing orders and of creating a new order happen at different moments of time, we can't guarantee that the system state hasn't changed in between. This might happen if the application backend state is replicated (i.e., the second request reads data from a different node of the data storage) or if the customer uses two client devices simultaneously. In other words, we encountered the classical problem of state synchronization in distributed systems. To solve this issue, we need to select a consistency model1 for our application and implement some synchronization strategy.

    +

    As clients are your customers, it is highly desirable to provide them an API with the highest degree of robustness — strong consistency,2 which guarantees that all clients read the most recent writes. It is not universally possible, and we will discuss relaxing this constraint in the following chapters. However, with APIs the rule of thumbs is: if you can provide strongly consistent interfaces, do it.

    There are two main approaches to solving this problem: the pessimistic one (implementing locks in the API) and the optimistic one (resource versioning).

    -

    NB: Generally speaking, the best approach to tackling an issue is not having the issue at all. Let's say, if your API is idempotent, the duplicating calls are not a problem. However, in the real world, not every operation is idempotent; for example, creating new orders is not. We might add mechanisms to prevent automatic retries (such as client-generated idempotency tokens) but we can't forbid users from just creating a second identical order.

    +

    NB: Generally speaking, the best solution is not having the issue at all. Let's say, if your API is idempotent, the duplicating calls are not a problem. However, in the real world, not every operation is idempotent; for example, creating new orders is not. We might add mechanisms to prevent automatic retries (such as client-generated idempotency tokens) but we can't forbid users from just creating a second identical order.

    API Locks

    The first approach is to literally implement standard synchronization primitives at the API level. Like this, for example:

    let lock;
     try {
       // Capture the exclusive
    -  // right to create new orders
    +  // right to manipulate orders
       lock = await api.
    -    acquireLock(ORDER_CREATION);
    +    acquireLock(ORDERS_ACCESS);
       // Get the list of current orders
       // known to the system
       let pendingOrders = await 
    @@ -2061,15 +2075,23 @@ X-Idempotency-Token: <token>
       await lock.release();
     }
     
    -

    Rather unsurprisingly, this approach sees very rare use in distributed client-server APIs because of the plethora of related problems:

    +

    This solution is quite similar to using mutexes to avoid race conditions in multithreaded systems,3 just exposed via a formal API. Rather unsurprisingly, this approach sees very rare use in distributed client-server APIs because of the plethora of related problems:

      -
    1. Waiting for acquiring a lock introduces new latencies to the interaction that are hardly predictable and might potentially be quite significant.
    2. -
    3. The lock itself is one more entity that constitutes a subsystem of its own, and quite a demanding one as strong consistency2 is required for implementing locks: the getPendingOrders function must return the up-to-date state of the system otherwise the duplicate order will be anyway created.
    4. -
    5. As it's partners who develop client code, we can't guarantee it works with locks always correctly. Inevitably, “lost” locks will occur in the system, and that means we need to provide some tools to partners so they can find the problem and debug it.
    6. -
    7. A certain granularity of locks is to be developed so that partners can't affect each other. We are lucky if there are natural boundaries for a lock — for example, if it's limited to a specific user in the specific partner's system. If we are not so lucky (let's say all partners share the same user profile), we will have to develop even more complex systems to deal with potential errors in the partners' code — for example, introduce locking quotas.
    8. +
    9. +

      Waiting for acquiring a lock introduces new latencies to the interaction that are hardly predictable and might potentially be quite significant.

      +
    10. +
    11. +

      The locks themselves [i.e., the storage for lock identifiers and its API] constitute a separate subsystem of its own and require additional effort from the API vendor to implement it.

      +
    12. +
    13. +

      As it's partners who develop client code, we can't guarantee it works with locks always correctly. Inevitably, “lost” locks will occur in the system, and that means we need to provide some tools to partners so they can find the problem and debug it.

      +
    14. +
    15. +

      A certain granularity of locks is to be developed so that partners can't affect each other. We are lucky if there are natural boundaries for a lock — for example, if it's limited to a specific user in the specific partner's system. If we are not so lucky (let's say all partners share the same user profile), we will have to develop even more complex systems to deal with potential errors in the partners' code — for example, introduce locking quotas.

      +

    Optimistic Concurrency Control

    -

    A less implementation-heavy approach is to develop an optimistic concurrency control3 system, i.e., to require clients to pass a flag proving they know the actual state of a shared resource.

    +

    A less implementation-heavy approach is to develop an optimistic concurrency control4 system, i.e., to require clients to pass a flag proving they know the actual state of a shared resource.

    // Retrieve the state
     let orderState = 
       await api.getOrderState();
    @@ -2092,11 +2114,12 @@ X-Idempotency-Token: <token>
       }
     }
     
    -

    NB: An attentive reader might note that the necessity to implement some synchronization strategy and strongly consistent reading has not disappeared: there must be a component in the system that performs a locking read of the resource version and its subsequent change. It's not entirely true as synchronization strategies and strongly consistent reading have disappeared from the public API. The distance between the client that sets the lock and the server that processes it became much smaller, and the entire interaction now happens in a controllable environment. It might be a single subsystem in the form of an ACID-compatible4 database or even an in-memory solution.

    +

    NB: An attentive reader might note that the necessity to implement locking has not disappeared: there must be a component in the system that performs a locking read of the resource version and its subsequent change. It's not entirely true as synchronization strategies and strongly consistent reading have disappeared from the public API. The distance between the client that sets the lock and the server that processes it became much smaller, and the entire interaction now happens in a controllable environment, being free from the problems we've described earlier.

    Instead of a version, the date of the last modification of the resource might be used (which is much less reliable as clocks are not ideally synchronized across different system nodes; at least save it with the maximum possible precision!) or entity identifiers (ETags).

    -

    The advantage of optimistic concurrency control is therefore the possibility to hide under the hood the complexity of implementing locking mechanisms. The disadvantage is that the versioning errors are no longer exceptional situations — it's now a regular behavior of the system. Furthermore, client developers must implement working with them otherwise the application might render inoperable as users will be infinitely creating an order with the wrong version.

    -

    NB: Which resource to select for making versioning is extremely important. If in our example we create a global system version that is incremented after any order comes, users' chances to successfully create an order will be close to zero.

    References

    Chapter 18. Eventual Consistency

    -

    The approach described in the previous chapter is in fact a trade-off: the API performance issues are traded for “normal” (i.e., expected) background errors that happen while working with the API. This is achieved by isolating the component responsible for controlling concurrency and only exposing read-only tokens in the public API. Still, the achievable throughput of the API is limited, and the only way of scaling it up is removing the strict consistency from the external API and thus allowing reading system state from read-only replicas:

    +

    The advantage of optimistic concurrency control is therefore the possibility to hide under the hood the complexity of implementing locking mechanisms. The disadvantage is that the versioning errors are no longer exceptional situations — it's now a regular behavior of the system. Furthermore, client developers must implement working with them otherwise the application might render inoperable as users will be infinitely creating an order with the wrong version.

    +

    NB: Which resource to select for making versioning is extremely important. If in our example we create a global system version that is incremented after any order comes, users' chances to successfully create an order will be close to zero.

    References

    Chapter 18. Eventual Consistency

    +

    The approach described in the previous chapter is in fact a trade-off: the API performance issues are traded for “normal” (i.e., expected) background errors that happen while working with the API. This is achieved by isolating the component responsible for controlling concurrency and only exposing only revision tokens in the public API. Still, the achievable throughput of the API is limited as strong consistency implies strict constraints on backend implementation.

    +

    In many situations, given the rate of writes is much less than reads (as in out case, when making two orders from two different devices under one account is rather an exceptional situation), it might make sense to stick eventual consistency rather than the strict one.1 The typical setup in Web often involves having asynchronously replicated databases:

    // Reading the state,
     // possibly from a replica
     let orderState = 
    @@ -2113,8 +2136,14 @@ X-Idempotency-Token: <token>
       …
     }
     
    -

    As orders are created much more rarely than read, we might significantly increase the system performance if we drop the requirement of returning the most recent state of the resource from the state retrieval endpoints. The versioning will help us avoid possible problems: creating an order will still be impossible unless the client has the actual version. In fact, we transited to the eventual consistency1 model: the client will be able to fulfill its request sometime when it finally gets the actual data. In modern microservice architectures, eventual consistency is rather an industrial standard, and it might be close to impossible to achieve the opposite, i.e., strict consistency.

    -

    NB: Let us stress that you might choose the approach only in the case of exposing new APIs. If you're already providing an endpoint implementing some consistency model, you can't just lower the consistency level (for instance, introduce eventual consistency instead of the strict one) even if you never documented the behavior. This will be discussed in detail in the “On the Waterline of the Iceberg” chapter of “The Backward Compatibility” section of this book.

    +

    As orders are created much more rarely than read, we might significantly increase the system performance if we drop the requirement of returning the most recent state of the resource from the state retrieval endpoints. The versioning will help us avoid possible problems: creating an order will still be impossible unless the client has the actual version. The client will be able to fulfill its request eventually when it finally gets the actual data.

    +

    NB: Strictly speaking, in this example, we're referring to the “single-leader replication” type of eventual consistency: while reads might return outdated data, writes are nevertheless strictly ordered, and the service that physically makes writes can resolve the actual state of the system. There is also the “multi-leader replication” class of systems, where there is no such thing as “the actual state” or “the latest version,” as every leader replica handles writes independently and concurrently — which, in our case, means clients can always create duplicate orders, whatever precautions we take. Typically, such systems are only used in the following cases:

    + +

    The curious reader may refer to Martin Kleppmann's work on the subject.2

    Choosing weak consistency instead of a strict one, however, brings some disadvantages. For instance, we might require partners to wait until they get the actual resource state to make changes — but it is quite unobvious for partners (and actually inconvenient) they must be prepared to wait for changes they made themselves to propagate.

    // Creates an order
     let api = await api
    @@ -2125,7 +2154,7 @@ X-Idempotency-Token: <token>
       // The list is empty
     

    If strict consistency is not guaranteed, the second call might easily return an empty result as it reads data from a replica, and the newest order might not have hit it yet.

    -

    An important pattern that helps in this situation is implementing the “read-your-writes2” model, i.e., guaranteeing that clients observe the changes they have just made. The consistency might be lifted to the read-your-writes level by making clients pass some token that describes the last changes known to the client.

    +

    An important pattern that helps in this situation is implementing the “read-your-writes3” model: it guarantees that clients observe the changes they have just made. In APIs, the read-your-writes strategy could be implemented by by making clients pass some token that describes the last change known to the client.

    let der = await api
       .createOrder(…);
     let pendingOrders = await api.
    @@ -2139,28 +2168,46 @@ X-Idempotency-Token: <token>
     

    Such a token might be:

    Upon getting the token, the server must check that the response (e.g., the list of ongoing operations it returns) matches the token, i.e., the eventual consistency converged. If it did not (the client passed the modification date / version / last order id newer than the one known to the server), one of the following policies or their combinations might be applied:

    The advantage of this approach is client development convenience (compared to the absence of any guarantees): by preserving the version token, client developers get rid of the possible inconsistency of the data got from API endpoints. There are two disadvantages, however:

    There is also an important question regarding the default behavior of the server if no version token was passed. Theoretically, in this case, master data should be returned, as the absence of the token might be the result of an app crash and subsequent restart or corrupted data storage. However, this implies an additional load on the master node.

    Evaluating the Risks of Switching to Eventual Consistency

    -

    Let us state an important assertion: the methods of solving architectural problems we're discussing in this section are probabilistic. Abolishing strict consistency means that even if all components of the system work perfectly, client errors will still occur. It might appear that they could be simply ignored, but in reality, doing so means introducing risks.

    +

    First, let us stress that you might choose the approach only in the case of exposing new APIs. If you're already providing an endpoint implementing some consistency model, you can't just lower the consistency level (for instance, introduce eventual consistency instead of the strict one) even if you never documented the behavior. This will be discussed in detail in the “On the Waterline of the Iceberg” chapter of “The Backward Compatibility” section of this book.

    +

    Second, let us state another important assertion: the methods of solving architectural problems we're discussing in this section are probabilistic. Abolishing strict consistency means that even if all components of the system work perfectly, client errors will still occur. It might appear that they could be simply ignored, but in reality, doing so means introducing risks.

    Imagine that because of eventual consistency, users of our API sometimes cannot create orders with their first attempt. For example, a customer adds a new payment method in the application, but their subsequent order creation request is routed to a replica that hasn't yet received the information regarding the newest payment method. As these two actions (adding a bank card and making an order) often go in conjunction, there will be a noticeable percentage of errors — let's say, 1%. At this stage, we could disregard the situation as it appears harmless: in the worst-case scenario, the client will repeat the request.

    But let's go a bit further and imagine there is an error in a new version of the application, and 0.1% of end users cannot make an order at all because the client sends a wrong payment method identifier. In the absence of this 1% background noise of consistency-bound errors, we would find the issue very quickly. However, amidst this constant inflow of errors, identifying problems like this one could be very challenging as it requires configuring monitoring systems to reliably exclude the data consistency errors, and this could be very complicated or even impossible. The author of this book, in his job, has seen several situations when critical mistakes that affect a small percentage of users were not noticed for months.

    Therefore, the task of proactively lowering the number of these background errors is crucially important. We may try to reduce their occurrence for typical usage profiles.

    @@ -2185,7 +2232,7 @@ X-Idempotency-Token: <token>

    Mathematically, the probability of getting the error is expressed quite simply. It's the ratio between two durations: the time period needed to get the actual state to the time period needed to restart the app and repeat the request. (Keep in mind that the last failed request might be automatically repeated on startup by the client.) The former depends on the technical properties of the system (for instance, on the replication latency, i.e., the lag between the master and its read-only copies) while the latter depends on what client is repeating the call.

    If we talk about applications for end users, the typical restart time there is measured in seconds, which normally should be much less than the overall replication latency. Therefore, client errors will only occur in case of data replication problems / network issues / server overload.

    If, however, we talk about server-to-server applications, the situation is totally different: if a server repeats the request after a restart (let's say because the process was killed by a supervisor), it's typically a millisecond-scale delay. And that means that the number of order creation errors will be significant.

    -

    As a conclusion, returning eventually consistent data by default is only viable if an API vendor is either ready to live with background errors or capable of making the lag of getting the actual state much less than the typical app restart time.

    References

    Chapter 19. Asynchronicity and Time Management

    +

    As a conclusion, returning eventually consistent data by default is only viable if an API vendor is either ready to live with background errors or capable of making the lag of getting the actual state much less than the typical app restart time.

    References

    Chapter 19. Asynchronicity and Time Management

    Let's continue working with the previous example: the application retrieves some system state upon start-up, perhaps not the most recent one. What else does the probability of collision depend on, and how can we lower it?

    We remember that this probability is equal to the ratio of time periods: getting an actual state versus starting an app and making an order. The latter is almost out of our control (unless we deliberately introduce additional waiting periods in the API initialization function, which we consider an extreme measure). Let's then talk about the former.

    Our usage scenario looks like this:

    @@ -2230,12 +2277,20 @@ X-Idempotency-Token: <token>

    Thus we naturally came to the pattern of organizing asynchronous APIs through task queues. Here we use the term “asynchronous” logically meaning the absence of mutual logical locks: the party that makes a request gets a response immediately and does not wait until the requested procedure is fully carried out being able to continue to interact with the API. Technically in modern application environments, locking (of both the client and server) almost universally doesn't happen during long-responding calls. However, logically allowing users to work with the API while waiting for a response from a modifying endpoint is error-prone and leads to collisions like the one we described above.

    The asynchronous call pattern is useful for solving other practical tasks as well:

    -

    Also, asynchronous communication is more robust from a future API development point of view: request handling procedures might evolve towards prolonging and extending the asynchronous execution pipelines whereas synchronous handlers must retain reasonable execution times which puts certain restrictions on possible internal architecture.

    +

    Also, asynchronous communication is more robust from a future API development point of view: request handling procedures might evolve towards prolonging and extending the asynchronous execution pipelines whereas synchronous handlers must retain reasonable execution times which puts certain restrictions on possible internal architecture. One might refer to the definitive work by Adam Bellemare on advantages of event-driven architectures.1

    NB: In some APIs, an ambivalent decision is implemented where endpoints feature a double interface that might either return a result or a link to a task. Although from the API developer's point of view, this might look logical (if the request was processed “quickly”, e.g., served from cache, the result is to be returned immediately; otherwise, the asynchronous task is created), for API consumers, this solution is quite inconvenient as it forces them to maintain two execution branches in their code. Sometimes, a concept of providing a double set of endpoints (synchronous and asynchronous ones) is implemented, but this simply shifts the burden of making decisions onto partners.

    The popularity of the asynchronicity pattern is also driven by the fact that modern microservice architectures “under the hood” operate in asynchronous mode through event queues or pub/sub middleware. Implementing an analogous approach in external APIs is the simplest solution to the problems caused by asynchronous internal architectures (the unpredictable and sometimes very long latencies of propagating changes). Ultimately, some API vendors make all API methods asynchronous (including the read-only ones) even if there are no real reasons to do so.

    However, we must stress that excessive asynchronicity, though appealing to API developers, implies several quite objectionable disadvantages:

    @@ -2279,7 +2334,7 @@ X-Idempotency-Token: <token> status: "new" }]} */
    -

    NB: Let us also mention that in the asynchronous format, it's possible to provide not only binary status (task done or not) but also execution progress as a percentage if needed.

    References

    Chapter 20. Lists and Accessing Them

    +

    NB: Let us also mention that in the asynchronous format, it's possible to provide not only binary status (task done or not) but also execution progress as a percentage if needed.

    References

    Chapter 20. Lists and Accessing Them

    In the previous chapter, we concluded with the following interface that allows minimizing collisions while creating orders:

    let pendingOrders = await api
       .getOngoingOrders(); 
    @@ -2465,9 +2520,9 @@ X-Idempotency-Token: <token>
     

    Another possible anchor to rely on is the record creation date. However, this approach is harder to implement for the following reasons:

    Events themselves and the order of their occurrence are immutable. Therefore, it's possible to organize traversing the list. It is important to note that the order creation event is not the order itself: when a partner reads an event, the order might have already changed its status. However, accessing all new orders is ultimately doable, although not in the most efficient manner.

    -

    NB: In the code samples above, we omitted passing metadata for responses, such as the number of items in the list, the has_more_items flag, etc. Although this metadata is not mandatory (i.e., clients will learn the list size when they retrieve it fully), having it makes working with the API more convenient for developers. Therefore we recommend adding it to responses.

    References

    Chapter 21. Bidirectional Data Flows. Push and Poll Models

    -

    In the previous chapter, we discussed the following scenario: a partner receives information about new events occuring in the system by periodically requesting an endpoint that supports retrieving ordered lists.

    +

    NB: In the code samples above, we omitted passing metadata for responses, such as the number of items in the list, the has_more_items flag, etc. Although this metadata is not mandatory (i.e., clients will learn the list size when they retrieve it fully), having it makes working with the API more convenient for developers. Therefore we recommend adding it to responses.

    References

    Chapter 21. Bidirectional Data Flows. Push and Poll Models

    +

    In the previous chapter, we discussed the following scenario: a partner receives information about new events occurring in the system by periodically requesting an endpoint that supports retrieving ordered lists.

    GET /v1/orders/created-history↵
       ?older_than=<item_id>&limit=<limit>
     →
    @@ -2599,16 +2654,24 @@ X-Idempotency-Token: <token>
     

    As various mobile platforms currently constitute a major share of all client devices, this implies significant limitations in terms of battery and partly traffic savings on the technologies for data exchange between the server and the end user. Many platform and device manufacturers monitor the resources consumed by the application and can send it to the background or close open connections. In such a situation, frequent polling should only be used in active phases of the application work cycle (i.e., when the user is directly interacting with the UI) or in controlled environments (for example, if employees of a partner company use the application in their work and can add it to system exceptions).

    Three alternatives to polling might be proposed:

    1. Duplex Connections
    -

    The most obvious option is to use technologies that can transmit messages in both directions over a single connection. The best-known example of such technology is WebSockets3. Sometimes, the Server Push functionality of the HTTP/2 protocol·4 is used for this purpose; however, we must note that the specification formally does not allow such usage. There is also the WebRTC5 protocol; its main purpose is a peer-to-peer exchange of media data, and it's rarely used in client-server interaction.

    +

    The most obvious option is to use technologies that can transmit messages in both directions over a single connection. The best-known example of such technology is WebSockets3. Sometimes, the Server Push functionality of the HTTP/2 protocol4 is used for this purpose; however, we must note that the specification formally does not allow such usage. There is also the WebRTC5 protocol; its main purpose is a peer-to-peer exchange of media data, and it's rarely used in client-server interaction.

    Although the idea looks simple and attractive, its applicability to real-world use cases is limited. Popular server software and frameworks do not support server-initiated message sending (for instance, gRPC does support streamed responses6, but the client should initiate the exchange; using gRPC server streams to send server-initiated events is essentially employing HTTP/2 server pushes for this purpose, and it's the same technique as in the long polling approach, just a bit more modern), and the existing specification definition standards do not support it — as WebSocket is a low-level protocol, and you will need to design the interaction format on your own.

    Duplex connections still suffer from the unreliability of the network and require implementing additional tricks to tell the difference between a network problem and the absence of new messages. All these issues result in limited applicability of the technology; it's mostly used in web applications.

    2. Separate Callback Channels
    -

    Instead of a duplex connection, two separate connections might be used: one for sending requests to the server and one to receive notifications from the server. The most popular technology of this kind is MQTT7. Although it is considered very effective because of utilizing low-level protocols, its disadvantages follow from its advantages:

    +

    Instead of a duplex connection, two separate channels might be used: one for sending requests to the server and one for receiving notifications from the server. This implies that clients subscribe to message queues generated by the server (a “message broker”) or, sometimes, other clients, typically by implementing the publisher/subscriber (“pub/sub”) pattern.7 This implies that:

      -
    • The technology is meant to implement the pub/sub pattern, and its main value is that the server software (MQTT Broker) is provided alongside the protocol itself. Applying it to other tasks, especially bidirectional communication, might be challenging.
    • -
    • The low-level protocols force you to develop your own data formats.
    • +
    • The client sends requests either through regular API calls or by publishing events to a queue (or queues).
    • +
    • The client receives callback notifications by listening for events on a queue. It might be the same queue the client used for sending events or a completely different queue (or queues).
    -

    There is also a Web standard for sending server notifications called Server-Sent Events8 (SSE). However, it's less functional than WebSocket (only text data and unidirectional flow are allowed) and rarely used.

    +

    Therefore, this approach is following neither request-response (even if a callback event is a direct response to the client’s actions, it is received asynchronously, requiring the client to match the response to its requests) nor a duplex connection pattern. However, we must note that this is a logical distinction for the convenience of client developers, as, under the hood, the underlying messaging system framework typically relies on WebSockets or implements polling.

    +

    The most popular technology of this kind is MQTT8. Although it is considered highly efficient due to its use of low-level protocols, its disadvantages stem from its advantages:

    +
      +
    • The technology is designed to implement the pub/sub pattern, and its primary value lies in the fact that the server software (MQTT Broker) is provided alongside the protocol itself. Applying it to other tasks, especially bidirectional communication, can be challenging.
    • +
    • The use of low-level protocols requires developers to define their own data formats.
    • +
    +

    Another popular technology for organizing message queues is the Advanced Message Queuing Protocol (AMQP). AMQP is an open standard for implementing message queues,9 with many independent client and server (broker) implementations. One notable broker implementation is RabbitMQ,10 while AMQP clients are typically implemented as libraries for specific client platforms and programming languages.

    +

    There is also a web standard for sending server notifications called Server-Sent Events11 (SSE). However, SSE is less functional than WebSockets (supporting only text data and unidirectional flow) and is rarely used.

    +

    A curious reader may refer to the corresponding chapter in Ian Gorton’s influential book12 or to Adam Bellemare’s compendium on the topic.13

    3. Third-Party Push Notifications

    One of the notorious problems with the long polling / WebSocket / SSE / MQTT technologies is the necessity to maintain an open network connection between the client and the server, which might be a problem for mobile applications and IoT devices from in terms of performance and battery life. One option that allows for mitigating the issue is delegating sending push notifications to a third-party service (the most popular choice today is Google's Firebase Cloud Messaging) that delivers notifications through the built-in mechanisms of the platform. Using such integrated services takes most of the load of maintaining open connections and checking their status off the developer's shoulders. The disadvantages of using third-party services are the necessity to pay for them and strict limits on message sizes.

    Also, sending push notifications to end-user devices suffers from one important issue: the percentage of successfully delivered messages never reaches 100%; the message drop rate might be tens of percent. Taking into account the message size limitations, it's actually better to implement a mixed model than a pure push model: the client continues polling the server, just less frequently, and push notifications just trigger ahead-of-time polling. (This problem is actually applicable to any notification delivery technology. Low-level protocols offer more options to set delivery guarantees; however, given the situation with forceful closing of open connections by OSes, having low-frequency polling as a precaution in an application is almost never a bad thing.)

    @@ -2636,14 +2699,14 @@ X-Idempotency-Token: <token>

    What is important is that the must be a formal contract (preferably in a form of a specification) for webhook's request and response formats and all the errors that might happen.

    2. Agree on Authorization and Authentication Methods
    -

    As a webhook is a callback channel, you will need to develop a separate authorization system to deal with it as it's partners duty to check that the request is genuinely coming from the API backend, not vice versa. We reiterate here our strictest recommendation to stick to existing standard techniques, for example, mTLS9; though in the real world, you will likely have to use archaic methods like fixing the caller server's IP address.

    +

    As a webhook is a callback channel, you will need to develop a separate authorization system to deal with it as it's partners duty to check that the request is genuinely coming from the API backend, not vice versa. We reiterate here our strictest recommendation to stick to existing standard techniques, such as mTLS; though in the real world, you will likely have to use archaic methods like fixing the caller server's IP address.

    3. Develop an Interface for Setting the URL of a

    As the callback endpoint is developed by partners, we do not know its URL beforehand. It implies some interface must exist for setting this URL and authorized public keys (probably in a form of a control panel for partners).

    Importantly, the operation of setting a webhook URL is to be treated as a potentially hazardous one. It is highly desirable to request a second authentication factor to authorize the operations as a potential attacker wreak a lot of havoc if there is a vulnerability in the procedure:

    • By setting an arbitrary URL, the perpetrator might get access to all partner's orders (and the partner might lose access)
    • This vulnerability might be used for organizing DoS attacks on third parties
    • -
    • If an internal URL might be set as a webhook, a SSRF attack10 might be directed toward the API vendor's own infrastructure.
    • +
    • If an internal URL might be set as a webhook, a SSRF attack14 might be directed toward the API vendor's own infrastructure.

    Typical Problems of Webhook-Powered Integrations

    Bidirectional data flows (both client-server and server-server ones, though the latter to a greater extent) bear quite undesirable risks for an API provider. In general, the quality of integration primarily depends on the API developers. In the callback-based integration, it's vice versa: the integration quality depends on how partners implemented the webhook. We might face numerous problems with the partners' code:

    @@ -2662,7 +2725,7 @@ X-Idempotency-Token: <token>
  • Help partners to write proper code by describing in the documentation all unobvious subtleties that inexperienced developers might be unaware of:
  • This approach is much more complex to implement, but it is the only viable technique for realizing collaborative editing as it explicitly reflects the exact actions the client applied to an entity. Having the changes in this format also allows for organizing offline editing with accumulating changes on the client side for the server to resolve the conflict later based on the revision history.

    -

    NB: One approach to this task is developing a set of operations in which all actions are transitive (i.e., the final state of the entity does not change regardless of the order in which the changes were applied). One example of such a nomenclature is a conflict-free replicated data type (CRDT).2 However, we consider this approach viable only in some subject areas, as in real life, non-transitive changes are always possible. If one user entered new text in the document and another user removed the document completely, there is no way to automatically resolve this conflict that would satisfy both users. The only correct way of resolving this conflict is explicitly asking users which option for mitigating the issue they prefer.

    References

    Chapter 25. Degradation and Predictability

    +

    NB: One approach to this task is developing a set of operations in which all actions are transitive (i.e., the final state of the entity does not change regardless of the order in which the changes were applied). One example of such a nomenclature is a conflict-free replicated data type (CRDT).2 However, we consider this approach viable only in some subject areas, as in real life, non-transitive changes are always possible. If one user entered new text in the document and another user removed the document completely, there is no way to automatically resolve this conflict that would satisfy both users. The only correct way of resolving this conflict is explicitly asking users which option for mitigating the issue they prefer.

    References

    Chapter 25. Degradation and Predictability

    In the previous chapters, we repeatedly discussed that the background level of errors is not just unavoidable, but in many cases, APIs are deliberately designed to tolerate errors to make the system more scalable and predictable.

    But let's ask ourselves a question: what does a “more predictable system” mean? For an API vendor, the answer is simple: the distribution and number of errors are both indicators of technical problems (if the numbers are growing unexpectedly) and KPIs for technical refactoring (if the numbers are decreasing after the release).

    However, for partner developers, the concept of “API predictability” means something completely different: how solidly they can cover the API use cases (both happy and unhappy paths) in their code. In other words, how well one can understand based on the documentation and the nomenclature of API methods what errors might arise during the API work cycle and how to handle them.

    @@ -3875,7 +3938,7 @@ X-Idempotency-Token: <token>
  • Higher-level entities are to be the informational contexts for low-level ones, meaning they don't prescribe any specific behavior but rather translate their state and expose functionality to modify it, either directly through calling some methods or indirectly through firing events.
  • Concrete functionality, such as working with “bare metal” hardware or underlying platform APIs, should be delegated to low-level entities.
  • -

    NB: There is nothing novel about these rules: one might easily recognize them as the SOLID architecture principles1·2. This is not surprising either, as SOLID focuses on contract-oriented development, and APIs are contracts by definition. We have simply introduced the concepts of “abstraction levels” and “informational contexts” to these principles.

    +

    NB: There is nothing novel about these rules: one might easily recognize them as the SOLID architecture principles1. This is not surprising either, as SOLID focuses on contract-oriented development, and APIs are contracts by definition. We have simply introduced the concepts of “abstraction levels” and “informational contexts” to these principles.

    However, there remains an unanswered question: how should we design the entity nomenclature from the beginning so that extending the API won't result in a mess of assorted inconsistent methods from different stages of development? The answer is quite obvious: to avoid clumsy situations during abstracting (as with the recipe properties), all the entities must be originally considered as specific implementations of a more general interface, even if there are no planned alternative implementations for them.

    For example, while designing the POST /search API, we should have asked ourselves a question: what is a “search result”? What abstract interface does it implement? To answer this question we need to decompose this entity neatly and identify which facet of it is used for interacting with which objects.

    Then we would have come to the understanding that a “search result” is actually a composition of two interfaces:

    @@ -3898,7 +3961,7 @@ X-Idempotency-Token: <token>

    And what constitutes the “abstract representation of a search result in the UI”? Do we have other types of search? Should the ISearchItemViewParameters interface be a subtype of some even more general interface, or maybe a composition of several such interfaces?

    -

    Replacing specific implementations with interfaces not only allows us to respond more clearly to many concerns that arise during the API design phase but also helps us outline many possible directions for API evolution. This approach should assist us in avoiding API inconsistency problems in the future.

    References

    Chapter 32. The Serenity Notepad

    +

    Replacing specific implementations with interfaces not only allows us to respond more clearly to many concerns that arise during the API design phase but also helps us outline many possible directions for API evolution. This approach should assist us in avoiding API inconsistency problems in the future.

    References

    Chapter 32. The Serenity Notepad

    Apart from the abovementioned abstract principles, let us give a list of concrete recommendations on how to make changes in existing APIs to maintain backward compatibility

    1. Remember the Iceberg's Waterline

    If you haven't given any formal guarantee, it doesn't mean that you can violate informal ones. Often, just fixing bugs in APIs might render some developers' code inoperable. We can illustrate this with a real-life example that the author of this book actually faced once:

    @@ -3944,8 +4007,8 @@ X-Idempotency-Token: <token>

    Whatever tips and tricks described in the previous chapters you use, it's often quite probable that you can't do anything to prevent API inconsistencies from piling up. It's possible to reduce the speed of this stockpiling, foresee some problems, and have some interface durability reserved for future use. But one can't foresee everything. At this stage, many developers tend to make some rash decisions, e.g., releasing a backward-incompatible minor version to fix some design flaws.

    We highly recommend never doing that. Remember that the API is also a multiplier of your mistakes. What we recommend is to keep a serenity notepad — to write down the lessons learned and not to forget to apply this knowledge when a new major API version is released.

    Section IV. HTTP APIs & the REST Architectural Principles

    Chapter 33. On the HTTP API Concept. Paradigms of Developing Client-Server Communication

    The problem of designing HTTP APIs is, unfortunately, one of the most “holywar”-inspiring issues. On one hand, it is one of the most popular technologies; on the other hand, it is quite complex and difficult to comprehend due to the large and fragmented standard split into many RFCs. As a result, the HTTP specification is doomed to be poorly understood and imperfectly interpreted by millions of software engineers and thousands of textbook writers. Therefore, before proceeding to the useful part of this Section, we must clarify exactly what we are going to discuss.

    -

    Let's start with a short historical overview. Performing users' requests on a remote server has been one of the basic tasks in software engineering since mainframes, and it naturally gained additional momentum with the development of ARPANET. The first high-level protocol for network communication worked in the paradigm of sending messages over the network (as an example, see the DEL protocol that was proposed in one of the very first RFCs — RFC-5 published in 19691). However, scholars quickly understood that it would be much more convenient if calling a remote server and accessing remote resources wasn't any different from working with local memory and resources in terms of function signatures. This concept was strictly formulated under the name “Remote Procedure Call” (RPC) by Bruce Nelson, an employee of the famous Xerox Palo Alto Research Center in 1981.2 Nelson was also the co-author of the first practical implementation of the proposed paradigm, namely Sun RPC·3·4, which still exists as ONC RPC.

    -

    First widely adopted RPC protocols (such as the aforementioned Sun RPC, Java RMI5, and CORBA·6) strictly followed the paradigm. The technology allowed achieving exactly what Nelson was writing about — that is, making no difference between local and remote code execution. The “magic” is hidden within tooling that generates the implementation of working with remote servers, and developers don't need to know how the protocol works.

    +

    Let's start with a short historical overview. Performing users' requests on a remote server has been one of the basic tasks in software engineering since mainframes, and it naturally gained additional momentum with the development of ARPANET. The first high-level protocol for network communication worked in the paradigm of sending messages over the network (as an example, see the DEL protocol that was proposed in one of the very first RFCs — RFC-5 published in 19691). However, scholars quickly understood that it would be much more convenient if calling a remote server and accessing remote resources wasn't any different from working with local memory and resources in terms of function signatures. This concept was strictly formulated under the name “Remote Procedure Call” (RPC) by Bruce Nelson, an employee of the famous Xerox Palo Alto Research Center in 1981.2 Nelson was also the co-author of the first practical implementation of the proposed paradigm, namely Sun RPC3·4, which still exists as ONC RPC.

    +

    First widely adopted RPC protocols (such as the aforementioned Sun RPC, Java RMI5, and CORBA6) strictly followed the paradigm. The technology allowed achieving exactly what Nelson was writing about — that is, making no difference between local and remote code execution. The “magic” is hidden within tooling that generates the implementation of working with remote servers, and developers don't need to know how the protocol works.

    However, the convenience of using the technology became its Achilles heel:

    -

    We will refer to such APIs as “HTTP APIs” or “JSON-over-HTTP APIs.” We understand that this is a loose interpretation of the term, but we prefer to live with that rather than using phrases like “JSON-over-HTTP endpoints utilizing the semantics described in the HTTP and URL standards” or “a JSON-over-HTTP API complying with the REST architectural constraints” each time. As for the term “REST API,” it lacks a consistent definition (as we will discuss in the corresponding chapter), so we would avoid using it as well.

    References

    Chapter 34. Advantages and Disadvantages of HTTP APIs Compared to Alternative Technologies

    +

    We will refer to such APIs as “HTTP APIs” or “JSON-over-HTTP APIs.” We understand that this is a loose interpretation of the term, but we prefer to live with that rather than using phrases like “JSON-over-HTTP endpoints utilizing the semantics described in the HTTP and URL standards” or “a JSON-over-HTTP API complying with the REST architectural constraints” each time. As for the term “REST API,” it lacks a consistent definition (as we will discuss in the corresponding chapter), so we would avoid using it as well.

    References

    Chapter 34. Advantages and Disadvantages of HTTP APIs Compared to Alternative Technologies

    As we discussed in the previous chapter, today, the choice of a technology for developing client-server APIs comes down to selecting either a resource-oriented approach (commonly referred to as “REST API”; let us reiterate that we will use the term “HTTP API” instead) or a modern RPC protocol. As we mentioned earlier, conceptually the difference is not that significant. However, technically these frameworks use the HTTP protocol quite differently:

    First, different frameworks rely on different data formats:

    Section V. SDKs & UI Libraries

    Chapter 41. On Terminology. An Overview of Technologies for UI Development

    +

    In conclusion, we would like to make the following statement: building an HTTP API is relying on the common knowledge of HTTP call semantics and drawing benefits from it by leveraging various software built upon this paradigm, from client frameworks to server gateways, and developers reading and understanding API specifications. In this sense, the HTTP ecosystem provides probably the most comprehensive vocabulary, both in terms of profoundness and adoption, compared to other technologies, allowing for describing many different situations that may arise in client-server communication. While the technology is not perfect and has its flaws, for a public API vendor, it is the default choice, and opting for other technologies rather needs to be substantiated as of today.

    References

    Section V. SDKs & UI Libraries

    Chapter 41. On Terminology. An Overview of Technologies for UI Development

    As we mentioned in the Introduction, the term “SDK” (which stands for “Software Development Kit”) lacks concrete meaning. The common understanding is that an SDK differs from an API as it provides both program interfaces and tools to work with them. This definition is hardly satisfactory as today any technology is likely to be provided with a bundled toolset.

    However, there is a very specific narrow definition of an SDK: it is a client library that provides a high-level interface (usually a native one) to some underlying platform (such as a client-server API). Most often, we talk about libraries for mobile OSes or Web browsers that work on top of a general-purpose HTTP API.

    Among such client SDKs, one case is of particular interest to us: those libraries that not only provide programmatic interfaces to work with an API but also offer ready-to-use visual components for developers. A classic example of such an SDK is the UI libraries provided by cartographical services. Since developing a map engine, especially a vector one, is a very complex task, maps API vendors provide both “wrappers” to their HTTP APIs (such as a search function) and visual components to work with geographical entities. The latter often include general-purpose elements (such as buttons, placemarks, context menus, etc.) that can be used independently from the main functionality of the API.

    @@ -5021,9 +5086,9 @@ Retry-After: 5

    To avoid being wordy, we will use the term “SDK” for the former and “UI libraries” for the latter.

    NB: Strictly speaking, a UI library might either include a client-server API “wrapper” or not (i.e., just provide a “pure” API to some underlying system engine). In this Section, we will mostly talk about the first option as it is the most general case and the most challenging one in terms of API design. Most SDK development patterns we will discuss are also applicable to “pure” libraries.

    Selecting a Framework for UI Component Development

    -

    As UI is a high-level abstraction built upon OS primitives, there are specialized visual component frameworks available for almost every platform. Choosing such a framework might be regretfully challenging. For instance, in the case of the Web platform, which is both low-level and highly popular, the number of competing technologies for SDK development is beyond imagination. We could mention the most popular ones today, including React1, Angular·2, Svelte·3, Vue.js·4, as well as those that maintain a strong presence like Bootstrap·5 and Ember.·6 Among these technologies, React demonstrates the most widespread adoption, still measured in single-digit percentages.·7 At the same time, components written in “pure” JavaScript/CSS often receive criticism for being less convenient to use in these frameworks as each of them implements a rigid methodology. The situation with developing visual libraries for Windows is quite similar. The question of “which framework to choose for developing UI components for these platforms” regretfully has no simple answer. In fact, one will need to evaluate the markets and make decisions regarding each individual framework.

    +

    As UI is a high-level abstraction built upon OS primitives, there are specialized visual component frameworks available for almost every platform. Choosing such a framework might be regretfully challenging. For instance, in the case of the Web platform, which is both low-level and highly popular, the number of competing technologies for SDK development is beyond imagination. We could mention the most popular ones today, including React1, Angular2, Svelte3, Vue.js4, as well as those that maintain a strong presence like Bootstrap5 and Ember.6 Among these technologies, React demonstrates the most widespread adoption, still measured in single-digit percentages.7 At the same time, components written in “pure” JavaScript/CSS often receive criticism for being less convenient to use in these frameworks as each of them implements a rigid methodology. The situation with developing visual libraries for Windows is quite similar. The question of “which framework to choose for developing UI components for these platforms” regretfully has no simple answer. In fact, one will need to evaluate the markets and make decisions regarding each individual framework.

    In the case of actual mobile platforms (and MacOS), the current state of affairs is more favorable as they are more homogeneous. However, a different problem arises: modern applications typically need to support several such platforms simultaneously, which leads to code (and API nomenclatures) duplication.

    -

    One potential solution could be using cross-platform mobile (React Native8, Flutter·9, Xamarin·10, etc.) and desktop (JavaFX·11, QT·12, etc.) frameworks, or specialized technologies for specific tasks (such as Unity·13 for game development). The inherent advantages of these technologies are faster code-writing and universalism (of both code and software engineers). The disadvantages are obvious as well: achieving maximum performance could be challenging, and many platform tools (such as debugging and profiling) will not work. As of today, we rather see a parity between these two approaches (several independent applications for each platform vs. one cross-platform application).

    References

    Chapter 42. SDKs: Problems and Solutions

    +

    One potential solution could be using cross-platform mobile (React Native8, Flutter9, Xamarin10, etc.) and desktop (JavaFX11, QT12, etc.) frameworks, or specialized technologies for specific tasks (such as Unity13 for game development). The inherent advantages of these technologies are faster code-writing and universalism (of both code and software engineers). The disadvantages are obvious as well: achieving maximum performance could be challenging, and many platform tools (such as debugging and profiling) will not work. As of today, we rather see a parity between these two approaches (several independent applications for each platform vs. one cross-platform application).

    References

    Chapter 42. SDKs: Problems and Solutions

    The first question we need to clarify about SDKs (let us reiterate that we use this term to denote a native client library that allows for working with a technology-agnostic underlying client-server API) is why SDKs exist in the first place. In other words, why is using “wrappers” more convenient for frontend developers than working with the underlying API directly?

    Several reasons are obvious:

      @@ -5824,7 +5889,7 @@ api.subscribe(

      Finally, SearchBox doesn't interact with either of them and only provides a context, methods to change it, and the corresponding notifications.

      -

      By making these reductions, in fact, we end up with a setup that follows the “Model-View-Controller” (MVC) methodology1. OfferList and OfferPanel (also, the code that displays the input field) constitute a view that the user observes and interacts with. Composer is a controller that listens to the view's events and modifies a model (SearchBox itself).

      +

      By making these reductions, in fact, we end up with a setup that follows the “Model-View-Controller” (MVC) methodology. This is one of the very first patterns for designing user interfaces proposed as early as 1979 by Trygve Reenskaug.1·2 OfferList and OfferPanel (also, the code that displays the input field) constitute a view that the user observes and interacts with. Composer is a controller that listens to the view's events and modifies a model (SearchBox itself).

      NB: to follow the letter of the paradigm, we must separate the model, which will be responsible only for the data, from SearchBox itself. We leave this exercise to the reader.

      MVC entities interaction chart
      MVC entities interaction chart. Click to enlarge

      If we choose other options for reducing interaction directions, we will get other MV* frameworks (such as Model-View-Viewmodel, Model-View-Presenter, etc.). All of them are ultimately based on the “Model” pattern.

      @@ -5855,11 +5920,11 @@ api.subscribe(

      This rigidity, however, bears disadvantages as well. If we try to fully define the component's state, we must include such technicalities as, let's say, all animations being executed (and even the current percentages of execution). Therefore, a model will include all data of all abstraction levels for both hierarchies (semantic and visual) and also the calculated option values. In our example, this means that the model will store, for example, the currentSelectedOffer field for OfferPanel to use, the list of buttons in the panel, and even the calculated icon URLs for those buttons.

      Such a full model poses a problem not only semantically and theoretically (as it mixes up heterogeneous data in one entity) but also very practically. Serializing such models will be bound to a specific API or application version (as they store all the technical fields, including those not exposed publicly in the API). Changing subcomponent implementation will result in breaking backward compatibility as old links and cached state will be unrestorable (or we will have to maintain a compatibility level to interpret serialized models from past versions).

      Another ideological problem is organizing nested controllers. If there are subordinate subcomponents in the system, all the problems that an MV* approach solved return at a higher level: we have to allow nested controllers either to modify a global model or to call parent controllers. Both solutions imply strong coupling and require exquisite interface design skill; otherwise reusing components will be very hard.

      -

      If we take a closer look at modern UI libraries that claim to employ MV* paradigms, we will learn they employ it quite loosely. Usually, only the main principle that a model defines UI and can only be modified through controllers is adopted. Nested components usually have their own models (in most cases, comprising a subset of the parent model enriched with the component's own state), and the global model contains only a limited number of fields. This approach is implemented in many modern UI frameworks, including those that claim they have nothing to do with MV* paradigms (React, for instance2·3).

      -

      All these problems of the MVC paradigm were highlighted by Martin Fowler in his “GUI Architectures” essay.4 The proposed solution is the “Model-View-Presenter” framework, in which the controller entity is replaced with a presenter. The responsibility of the presenter is not only translating events, but preparing data for views as well. This allows for full separation of abstraction levels (a model now stores only semantic data while a presenter transforms it into low-level parameters that define UI look; the set of these parameters is called the “Application Model” or “Presentation Model” in Fowler's text).

      +

      If we take a closer look at modern UI libraries that claim to employ MV* paradigms, we will learn they employ it quite loosely. Usually, only the main principle that a model defines UI and can only be modified through controllers is adopted. Nested components usually have their own models (in most cases, comprising a subset of the parent model enriched with the component's own state), and the global model contains only a limited number of fields. This approach is implemented in many modern UI frameworks, including those that claim they have nothing to do with MV* paradigms (React, for instance3·4).

      +

      All these problems of the MVC paradigm were highlighted by Martin Fowler in his “GUI Architectures” essay.5 The proposed solution is the “Model-View-Presenter” framework, in which the controller entity is replaced with a presenter. The responsibility of the presenter is not only translating events, but preparing data for views as well. This allows for full separation of abstraction levels (a model now stores only semantic data while a presenter transforms it into low-level parameters that define UI look; the set of these parameters is called the “Application Model” or “Presentation Model” in Fowler's text).

      MVP entities interaction chart
      MVP entities interaction chart. Click to enlarge

      Fowler's paradigm closely resembles the Composer concept we discussed in the previous chapter with one notable deviation. In MVP, a presenter is stateless (with possible exceptions of caches and closures) and it only deduces the data needed by views from the model data. If some low-level property needs to be manipulated, such as text color, the model needs to be extended in a manner that allows the presenter to calculate text color based on some high-level model data field. This concept significantly narrows the capability to replace subcomponents with alternate implementations.

      -

      NB: let us clarify that the author of this book is not proposing Composer as an alternative MV* methodology. The message in the previous chapter is that complex scenarios of decomposing UI components are only solved with artificially-introduced “bridges” of additional abstraction layers. How this bridge is called and what rules it brings are not as important.

      References

      Chapter 46. The Backend-Driven UI

      +

      NB: let us clarify that the author of this book is not proposing Composer as an alternative MV* methodology. The message in the previous chapter is that complex scenarios of decomposing UI components are only solved with artificially-introduced “bridges” of additional abstraction layers. How this bridge is called and what rules it brings are not as important.

      References

      Chapter 46. The Backend-Driven UI

      Another method of reducing the complexity of building “bridges” that connect different subject areas in one component is to eliminate one of them. For instance, business logic could be removed: components might be entirely abstract, and the translation of UI events into useful actions hidden beyond the developer's control.

      In this paradigm, the offer search code would look like this:

      class SearchBox {
      @@ -6721,7 +6786,7 @@ button.view.compu
       

      Moving Forward

      Finally, apart from those specific issues, your customers must be caring about more general questions: could they trust you? Could they rely on your API evolving, absorbing modern trends, or will they eventually find the integration with your API in the scrapyard of history? Let's be honest: given all the uncertainties of the API product vision, we are very much interested in the answers as well. Even the Roman viaduct, though remaining backward-compatible for two thousand years, has been a very archaic and non-reliable way of solving customers' problems for quite a long time.

      You might work with these customer expectations by publishing roadmaps. It's quite common that many companies avoid publicly announcing their concrete plans (for a reason, of course). Nevertheless, in the case of APIs, we strongly recommend providing roadmaps, even if they are tentative and lack precise dates — especially if we talk about deprecating some functionality. Announcing these promises (given the company keeps them, of course) is a very important competitive advantage for every kind of consumer.

      -

      With this, we would like to conclude this book. We hope that the principles and the concepts we have outlined will help you in creating APIs that fit all the developers, businesses, and end users' needs and in expanding them (while maintaining backward compatibility) for the next two thousand years or so.

    Bibliography