Organizing HTTP API finished

2025-04-23 11:07:53 +02:00 · 2023-06-16 01:56:09 +03:00 · 2023-06-16 01:56:09 +03:00 · 148a9dfbff
commit 148a9dfbff
parent 9d37f03a73
3 changed files with 56 additions and 56 deletions
--- a/src/en/clean-copy/05-[Work
+++ b/src/en/clean-copy/05-[Work
@ -1,24 +1,24 @@
 ### [Organizing an HTTP API Based on the REST Principles][http-api-rest-organizing]

-Now let's discuss the specifics: what does it mean exactly to “follow the protocol's semantics” and “develop applications in accordance to the REST architectural style.” Remember, we talk about the following principles:
+Now let's discuss the specifics: what does it mean exactly to “follow the protocol's semantics” and “develop applications in accordance with the REST architectural style”? Remember, we are talking about the following principles:
  * Operations must be stateless
  * Data must be marked as cacheable or non-cacheable
  * There must be a uniform interface of communication between components
  * Network systems are layered.

-We need to apply these principles to an HTTP-based interface, sticking to the letter and soul of the standard:
-  * A URL of an operation must point to a resource the operation is applied to, and be a cache key for `GET` operations and an idempotency key for `PUT` and `DELETE` ones.
+We need to apply these principles to an HTTP-based interface, adhering to the letter and spirit of the standard:
+  * The URL of an operation must point to the resource the operation is applied to, serving as a cache key for `GET` operations and an idempotency key for `PUT` and `DELETE` operations.
  * HTTP verbs must be used according to their semantics.
-  * Properties of the operation, such as safety, cacheability, idempotency, and also the symmetry of `GET` / `PUT` / `DELETE` methods, request and response headers, response status codes, etc., must be aligned with the specification.
+  * Properties of the operation, such as safety, cacheability, idempotency, as well as the symmetry of `GET` / `PUT` / `DELETE` methods, request and response headers, response status codes, etc., must align with the specification.

 **NB**: we're deliberately skipping many nuances of the standard:
-  * a caching key might be composite [include request headers] if the response contains the `Vary` header.
-  * an idempotency key might composite as well if the request contains the `Range` header.
-  * if there are no explicit cache control headers, the caching policy will be defined not by the HTTP verb alone, but also by the response status code, other request and response headers, and platform policies.
+  * a caching key might be composite (i.e., include request headers) if the response contains the `Vary` header.
+  * an idempotency key might also be composite if the request contains the `Range` header.
+  * if there are no explicit cache control headers, the caching policy will not be defined by the HTTP verb alone. It will also depend on the response status code, other request and response headers, and platform policies.

-      To keep the chapter size reasonable, we will not discuss these details, but we hardly recommend reading the standard thoroughly.
+      To keep the chapter size reasonable, we will not discuss these details, but we highly recommend reading the standard thoroughly.

-Let's talk about organizing HTTP APIs based on a specific example. Imagine an application start procedure: as a rule of thumb, the application requests the current user profile and the important information regarding them (in our case, ongoing orders), using the authorization token saved in the device's memory. We can propose a quite straightforward endpoint for this purpose:
+Let's talk about organizing HTTP APIs based on a specific example. Imagine an application start procedure: as a rule of thumb, the application requests the current user profile and important information regarding them (in our case, ongoing orders), using the authorization token saved in the device's memory. We can propose a straightforward endpoint for this purpose:

 ```
 GET /v1/state HTTP/1.1
@ -29,7 +29,7 @@ HTTP/1.1 200 OK
 { "profile", "orders" }
 ```

-Upon getting such a request, the server will check the validity of the token, fetch the identifier of the user `user_id`, query the database, and return the user's profile and the list of their orders.
+Upon receiving such a request, the server will check the validity of the token, fetch the identifier of the user `user_id`, query the database, and return the user's profile and the list of their orders.

 This simple monolith API service violates several REST architectural principles:
  * There is no obvious solution for caching responses on the client side (the order state is being frequently updated and there is no sense in saving it)
@ -40,7 +40,7 @@ While scaling the backend is not a problem, this approach works. However, with t
  * Service A checks authentication tokens
  * Service B stores user accounts
  * Service C stores orders
-  * Gateway Service D routes incoming requests to other microservices.
+  * Gateway service D routes incoming requests to other microservices.

 This implies that a request traverses the following path:
  * Gateway D receives the request and sends it to both Service C and Service D.
@ -49,8 +49,8 @@ This implies that a request traverses the following path:

 [![CTL](/img/graphs/http-api-organizing-01.en.png "The original microservice mesh")]()

-It is quite obvious that in this setup, we put an excessive load on the authorization service as every nested microservice now needs to query it. Even if we abolish checking the authenticity of internal requests, it won't help as services B and C can't know the identifier of the user. Naturally, this leads to the idea of propagating once retrieved `user_id` through the microservice mesh:
-  * Gateway D receives a request and exchanges token for `user_id` through service A
+It is quite obvious that in this setup, we put excessive load on the authorization service as every nested microservice now needs to query it. Even if we abolish checking the authenticity of internal requests, it won't help as services B and C can't know the identifier of the user. Naturally, this leads to the idea of propagating the once-retrieved `user_id` through the microservice mesh:
+  * Gateway D receives a request and exchanges the token for `user_id` through service A
  * Gateway D queries service B:
      ```
      GET /v1/profiles/{user_id}
@ -64,26 +64,26 @@ It is quite obvious that in this setup, we put an excessive load on the authoriz

 **NB**: we used the `/v1/orders?user_id` notation and not, let's say, `/v1/users/{user_id}/orders`, because of two reasons:
  * The orders service stores orders, not users, and it would be logical to reflect this fact in URLs
-  * If in the future, we require to allow several users to share one order, the `/v1/orders?user_id` notation will better reflect the relations between entities.
+  * If in the future, we require allowing several users to share one order, the `/v1/orders?user_id` notation will better reflect the relations between entities.

      We will discuss organizing URLs in HTTP APIs in more detail in the next chapter.

-Now both services A and B receive the request in the form that makes it redundant to perform additional actions (identifying user through service A) to obtain the result. By doing so, we refactored the interface *allowing a microservice to stay within its area of responsibility*, thus making it compliant with the stateless constraint.
+Now both services A and B receive the request in a form that makes it redundant to perform additional actions (identifying the user through service A) to obtain the result. By doing so, we refactored the interface *allowing a microservice to stay within its area of responsibility*, thus making it compliant with the stateless constraint.

 Let us emphasize that the difference between **stateless** and **stateful** approaches is not clearly defined. Microservice B stores the client state (i.e., the user profile) and therefore is stateful according to Fielding's dissertation. However, we rather intuitively agree that storing profiles and just checking token validity is a better approach than doing all the same operations plus having the token cache. In fact, we rather embrace the *logical* principle of separating abstraction levels which we discussed in detail in the [corresponding chapter](#api-design-separating-abstractions):
  * **Microservices should be designed to clearly outline their responsibility area and to avoid storing data belonging to other abstraction levels**
  * External entities should be just context identifiers, and microservices should not interpret them
-  * If operations with external data are unavoidable (for example, the authority making a call must be checked), the **operations must be organized in a way that reduces them into checking the data integrity**.
+  * If operations with external data are unavoidable (for example, the authority making a call must be checked), the **operations must be organized in a way that reduces them to checking the data integrity**.

      In our example, we might get rid of unnecessary calls to service A in a different manner — by using stateless tokens, for example, employing the [JWT standard](https://www.rfc-editor.org/rfc/rfc7519). Then services B and C would be capable of deciphering tokens and extracting user identifiers on their own.

-Let us make a step further and notice that the user profile rarely changes, so there is no need to retrieve it each time as we might cache it at the gateway level. To do so, we must form a cache key which is essentially the client identifier. We can do it taking a long way:
+Let us take a step further and notice that the user profile rarely changes, so there is no need to retrieve it each time as we might cache it at the gateway level. To do so, we must form a cache key which is essentially the client identifier. We can do this by taking a long way:
  * Before requesting service B, generate a cache key and probe the cache
  * If the data is in the cache, respond with the cached snapshot; if it is not, query service B and cache the response.

-Alternatively, we can rely on HTTP caching which is most likely already implemented in the framework we use or is easily added as a plugin. In this scenario, gateway D requests the `/v1/profiles/{user_id}` resource in service B, retrieves the data alongside the cache control headers, and caches it locally.
+Alternatively, we can rely on HTTP caching which is most likely already implemented in the framework we use or easily added as a plugin. In this scenario, gateway D requests the `/v1/profiles/{user_id}` resource in service B, retrieves the data alongside the cache control headers, and caches it locally.

-Now let's avert our attention to service C. The results retrieved from it might also be cached. However, the state of an ongoing order changes more frequently than the user's profiles, and returning an invalid state might entail objectionable consequences. However, as discussed in the “[Synchronization Strategies](#api-patterns-sync-strategies)” chapter, we need an optimistic concurrency control (i.e., the resource revision) to make the functionality work correctly, and nothing could prevent us from using this revision as a cache key. Let service C return us a tag describing the current state of the user's orders:
+Now let's shift our attention to service C. The results retrieved from it might also be cached. However, the state of an ongoing order changes more frequently than the user's profiles, and returning an invalid state might entail objectionable consequences. However, as discussed in the “[Synchronization Strategies](#api-patterns-sync-strategies)” chapter, we need optimistic concurrency control (i.e., the resource revision) to ensure the functionality works correctly, and nothing could prevent us from using this revision as a cache key. Let service C return a tag describing the current state of the user's orders:

 ```
 GET /v1/orders?user_id=<user_id> HTTP/1.1
@ -93,8 +93,8 @@ ETag: <revision>
 …
 ```

-Then gateway D might be implemented following this scenario:
-  1. Cache the `GET /v1/orders?user_id=<user_id>` response using a URL as a cache key
+Then gateway D can be implemented following this scenario:
+  1. Cache the response of `GET /v1/orders?user_id=<user_id>` using the URL as a cache key
  2. Upon receiving a subsequent request:
      * Fetch the cached state, if any
      * Query service C passing the following parameters:
@ -102,12 +102,12 @@ Then gateway D might be implemented following this scenario:
          GET /v1/orders?user_id=<user_id> HTTP/1.1
          If-None-Match: <revision>
          ```
-      * If service C responds with `304 Not Modified`, return the cached state
-      * If service C responds with new version of the data, cache it and then return to the client.
+      * If service C responds with a `304 Not Modified` status code, return the cached state
+      * If service C responds with a new version of the data, cache it and then return it to the client.

 [![CTL](/img/graphs/http-api-organizing-03.en.png "Step 2. Adding server-side caches")]()

-By employing this approach [with using URLs as caching and idempotency keys], we automatically get another pleasant bonus. We can reuse the same data in the order creation endpoint design. In optimistic concurrency control paradigm, the client must pass an actual revision of the `orders` resource to change its state:
+By employing this approach [using `ETag`s to control caching], we automatically get another pleasant bonus. We can reuse the same data in the order creation endpoint design. In the optimistic concurrency control paradigm, the client must pass the actual revision of the `orders` resource to change its state:

 ```
 POST /v1/orders HTTP/1.1
@ -121,7 +121,7 @@ POST /v1/orders?user_id=<user_id> HTTP/1.1
 If-Match: <revision>
 ```

-If the revision is actual and the operation is executed, service C might return the updated list of orders alongside the new revision:
+If the revision is valid and the operation is executed, service C might return the updated list of orders alongside the new revision:

 ```
 HTTP/1.1 201 Created
@ -131,16 +131,16 @@ ETag: <new revision>
 { /* The updated list of orders */ }
 ```

-and gateway D will update the cache with the actual data snapshot.
+and gateway D will update the cache with the current data snapshot.

 [![CTL](/img/graphs/http-api-organizing-04.en.png "Creating a new order")]()

 **Importantly**, after this API refactoring, we end up with a system in which we can *remove gateway D* and make the client itself perform its duty. Nothing prevents the client from:
-  * Storing `user_id` on its side (or retrieve it from the token, if the format allows it) as well as the last known `ETag` of the order list
-  * Instead of a single `GET /v1/state` request perform two HTTP calls (`GET /v1/profiles/{user_id}` and `GET /v1/orders?user_id=<user_id>`) which might be multiplexed thanks to HTTP/2
+  * Storing `user_id` on its side (or retrieving it from the token, if the format allows it) as well as the last known `ETag` of the order list
+  * Instead of a single `GET /v1/state` request performing two HTTP calls (`GET /v1/profiles/{user_id}` and `GET /v1/orders?user_id=<user_id>`) which might be multiplexed thanks to HTTP/2
  * Caching the result on its side using standard libraries and/or plugins.

-From the perspective of implementing services B and C, the presence of a gateway affects nothing, with an exception of security checks. Vice versa, we might add a nested gateway to, let's say, split order storage into “cold” and “hot” ones, or make either service B or C work as a gateway themselves.
+From the perspective of implementing services B and C, the presence of a gateway affects nothing, with the exception of security checks. Vice versa, we might add a nested gateway to, let's say, split order storage into “cold” and “hot” ones, or make either service B or C work as a gateway themselves.

 If we refer to the beginning of the chapter, we will find that we designed a system fully compliant with the REST architectural principles:
  * Requests to services contain all the data needed to process the request
@ -149,21 +149,9 @@ If we refer to the beginning of the chapter, we will find that we designed a sys

 Let us reiterate once more that we can achieve exactly the same qualities with RPC protocols by designing formats for describing caching policies, resource versions, reading and modifying operation metadata, etc. However, the author of this book would firstly, express doubts regarding the quality of such a custom solution and secondly, emphasize the considerable amount of code needed to be written to realize all the functionality stated above.

-**NB**: passing variables as either query parameters or path fragments affects not only readability. If gateway D is implemented as a stateless proxy with a declarative configuration, than receiving a request like:
-  * `GET /v1/state?user_id=<user_id>`
-
-      and transforming into a pair of nested sub-requests:
-
-  * `GET /v1/profiles?user_id=<user_id>`
-  * `GET /v1/orders?user_id=<user_id>`
-
-      would be much more convenient than extracting identifiers from the path or some header and putting them into query parameters. The former operation [replacing one path with another] is easily described declaratively and is supported by most server software out of the box. And vice versa, retrieving data from various components and rebuilding requests is a complex functionality that most likely requires a gateway supporting scripting languages and/or plugins for such manipulations. conversely, automated creation of monitoring panels in serives like Prometheus+Grafana bundle is much easier to organize by path prefix than by a synthetic key computed from request parameters.
-
-      All this leads us to a conclusion than maintaining identical URL structure when only path changes while custom parameters passed in queries will lead to even more uniform interface, although less readable and semantical. In internal systems, preferring convenience of usage over readability is sometimes an obvious decision. In public APIs, we would rather discourage implementing this approach.
-
 #### Authorizing Stateless Requests

-Let's elaborate a bit over the solution without an authorizing service (or, to be more precise, with authorizing functionality being implemented as a library or a local SDK within services B, C, and D) with all the data embedded in the authorization token itself, In this scenario, every service performs the following actions:
+Let's elaborate a bit on the no-authorizing service solution (or, to be more precise, the solution with the authorizing functionality being implemented as a library or a local daemon inside services B, C, and D) with all the data embedded in the authorization token itself. In this scenario, every service performs the following actions:
  1. Receives a request like this:
      ```
      GET /v1/profiles/{user_id}
@ -181,7 +169,7 @@ Let's elaborate a bit over the solution without an authorizing service (or, to b
      ```
  3. Checks that the permissions stated in the token payload match the operation parameters (in our case, compares `user_id` passed as a query parameter with `user_id` encrypted in the token itself) and decides on the validity of the operation.

-The necessity to compare two `user_id`s might appear illogical and redundant. However, this opinion is invalid; it originates in the widespread (anti)pattern we started the chapter with, namely the stateful determining of operation parameters:
+The necessity to compare two `user_id`s might appear illogical and redundant. However, this opinion is invalid; it originates from the widespread (anti)pattern we started the chapter with, namely the stateful determining of operation parameters:

 ```
 GET /v1/profile
--- a/src/ru/clean-copy/05-[В
+++ b/src/ru/clean-copy/05-[В
@ -107,7 +107,7 @@ ETag: <ревизия>

 [![CTL](/img/graphs/http-api-organizing-03.ru.png "Шаг 2. Добавление серверного кэширования")]()

-Использовав такое решение [с формированием URL как ключа кэширования и идемпотентности], мы автоматически получаем ещё один приятный бонус: эти же данные пригодятся нам, если пользователь попытается создать новый заказ. Если мы используем оптимистичное управление параллелизмом, то клиент должен передать в запросе актуальную ревизию ресурса `orders`:
+Использовав такое решение [функциональность управления кэшом через `ETag` ресурсов], мы автоматически получаем ещё один приятный бонус: эти же данные пригодятся нам, если пользователь попытается создать новый заказ. Если мы используем оптимистичное управление параллелизмом, то клиент должен передать в запросе актуальную ревизию ресурса `orders`:

 ```
 POST /v1/orders HTTP/1.1
@ -149,18 +149,6 @@ ETag: <новая ревизия>

 Повторимся, что мы можем добиться того же самого, использовав RPC-протоколы или разработав свой формат описания статуса операции, параметров кэширования, версионирования ресурсов, приписывания и чтения метаданных и параметров операции. Но автор этой книги позволит себе, во-первых, высказать некоторые сомнения в качестве получившегося решения, и, во-вторых, отметить значительное количество кода, которое придётся написать для реализации всего вышеперечисленного.

-**NB**: отметим, что передача параметров в виде пути или query-параметра в URL влияет не только на читабельность. Если представить, что гейтвей D реализован в виде stateless прокси с декларативной конфигурацией, то получать от клиента запрос в виде:
-  * `GET /v1/state?user_id=<user_id>`
-
-      и преобразовывать в пару вложенных запросов
-
-  * `GET /v1/profiles?user_id=<user_id>`
-  * `GET /v1/orders?user_id=<user_id>`
-
-      гораздо удобнее, чем извлекать идентификатор из path и преобразовывать его в query-параметр. Первую операцию [замена одного path целиком на другой] достаточно просто описать декларативно, и в большинстве ПО для веб-серверов она поддерживается из коробки. Напротив, извлечение данных из разных компонентов и полная пересборка запроса — достаточно сложная функциональность, которая, скорее всего, потребует от гейтвея поддержки скриптового языка программирования и/или написания специального модуля для таких манипуляций. Аналогично, автоматическое построение мониторинговых панелей в популярных сервисах типа связки Prometheus+Grafana гораздо проще организовать по path, чем вычленять из данных запроса какой-то синтетический ключ группировки запросов.
-
-      Всё это приводит нас к соображению, что поддержание одинаковой структуры URL, в которой меняется только путь или домен, а параметры всегда находятся в query и именуются одинаково, приводит к ещё более унифицированному интерфейсу, хотя бы и в ущерб читабельности и семантичности URL. Во многих внутренних системах выбор в пользу удобства выглядит самоочевидным, хотя во внешних API мы бы такой подход не рекомендовали.
-
 #### Авторизация stateless-запросов

 Рассмотрим подробнее подход, в котором авторизационного сервиса A фактически нет (точнее, он имплементируется как библиотека или локальный демон в составе сервисов B, C и D), и все необходимые данные зашифрованы в самом токене авторизации. Тогда каждый сервис должен выполнять следующие действия:
--- a/src/ru/drafts/05-Раздел
+++ b/src/ru/drafts/05-Раздел
@ -41,6 +41,30 @@
  * семантика HTTP-глагола приоритетнее ложного предупреждения о небезопасности/неидемпотентности (в частности, если операция является безопасной, но ресурсозатратной, с нашей точки зрения вполне разумно использовать метод `POST` для индикации этого факта);
  * для выполнения кросс-доменных операций предпочтительнее завести специальный ресурс, выполняющий операцию (т.е. в примере с кофе-машинами и рецептами автор этой книги выбрал бы вариант `/prepare?coffee_machine_id=<id>&recipe=lungo`).

+**NB**: отметим, что передача параметров в виде пути или query-параметра в URL влияет не только на читабельность. Если представить, что гейтвей D реализован в виде stateless прокси с декларативной конфигурацией, то получать от клиента запрос в виде:
+  * `GET /v1/state?user_id=<user_id>`
+
+      и преобразовывать в пару вложенных запросов
+
+  * `GET /v1/profiles?user_id=<user_id>`
+  * `GET /v1/orders?user_id=<user_id>`
+
+      гораздо удобнее, чем извлекать идентификатор из path и преобразовывать его в query-параметр. Первую операцию [замена одного path целиком на другой] достаточно просто описать декларативно, и в большинстве ПО для веб-серверов она поддерживается из коробки. Напротив, извлечение данных из разных компонентов и полная пересборка запроса — достаточно сложная функциональность, которая, скорее всего, потребует от гейтвея поддержки скриптового языка программирования и/или написания специального модуля для таких манипуляций. Аналогично, автоматическое построение мониторинговых панелей в популярных сервисах типа связки Prometheus+Grafana гораздо проще организовать по path, чем вычленять из данных запроса какой-то синтетический ключ группировки запросов.
+
+      Всё это приводит нас к соображению, что поддержание одинаковой структуры URL, в которой меняется только путь или домен, а параметры всегда находятся в query и именуются одинаково, приводит к ещё более унифицированному интерфейсу, хотя бы и в ущерб читабельности и семантичности URL. Во многих внутренних системах выбор в пользу удобства выглядит самоочевидным, хотя во внешних API мы бы такой подход не рекомендовали.
+
+**NB**: passing variables as either query parameters or path fragments affects not only readability. If gateway D is implemented as a stateless proxy with a declarative configuration, then receiving a request like:
+  * `GET /v1/state?user_id=<user_id>`
+
+      and transforming it into a pair of nested sub-requests:
+
+  * `GET /v1/profiles?user_id=<user_id>`
+  * `GET /v1/orders?user_id=<user_id>`
+
+      would be much more convenient than extracting identifiers from the path or some header and putting them into query parameters. The former operation [replacing one path with another] is easily described declaratively and is supported by most server software out of the box. On the other hand, retrieving data from various components and rebuilding requests is a complex functionality that most likely requires a gateway supporting scripting languages and/or plugins for such manipulations. Conversely, automated creation of monitoring panels in services like Prometheus+Grafana bundle is much easier to organize by path prefix than by a synthetic key computed from request parameters.
+
+      All this leads us to the conclusion than maintaining an identical URL structure when only the path changes while custom parameters are passed in queries will lead to an even more uniform interface, although less readable and semantic. In internal systems, preferring convenience of usage over readability is sometimes an obvious decision. In public APIs, we would rather discourage implementing this approach.
+
 #### CRUD-операции

 Одно из самых популярных приложений HTTP API — это реализация CRUD-интерфейсов. Акроним CRUD (**C**reate, **R**ead, **U**pdate, **D**elete) был популяризирован ещё в 1983 году Джеймсом Мартином, но с развитием HTTP API обрёл второе дыхание. Ключевая идея соответствия CRUD и HTTP заключается в том, что каждой из CRUD-операций соответствует один из глаголов HTTP: