The HTTP Status Problem translated into Russian

2025-01-05 10:20:22 +02:00 · 2020-12-17 16:23:24 +03:00 · 2020-12-17 16:23:24 +03:00 · f033ec8d52
commit f033ec8d52
parent 77767898d5
3 changed files with 82 additions and 10 deletions
--- a/src/en/drafts/HTTP
+++ b/src/en/drafts/HTTP
@ -2,17 +2,17 @@

 The situation with HTTP status codes demonstrated a disastrous colliding of a well-meaning specification design with a ruthless reality as nothing before. This collision actually comes from three sides.

-As we discussed in the [Chapter 10](https://twirl.github.io/The-API-Book/docs/API.en.html#chapter-10), one goal of making errors semantic is to help clients understand, what caused an error. HTTP errors, outlined in the corresponding RFCs (most recently in the [RFC 7231](https://tools.ietf.org/html/rfc7231#section-6)), are specifically designed bearing this purpose in mind. Furthermore, the REST architectural constraints, as defined by Fielding, imply that not only end user agents should understand error code, but also every network proxy between a client and a server (the ‘layered’ architecture principle). And, in accordance to Fielding's writings, HTTP status code nomenclature does extensively describe virtually every situation which could happen with your HTTP request: wrong `Accept-*` headers value, `Content-Length` is absent, HTTP method is unsupported, URI too long, etc.
+As we discussed in the [Chapter 10](https://twirl.github.io/The-API-Book/docs/API.en.html#chapter-10), one goal of making errors semantic is to help clients understand, what caused an error. HTTP errors, outlined in the corresponding RFCs (most recently in the [RFC 7231](https://tools.ietf.org/html/rfc7231#section-6)), are specifically designed bearing this purpose in mind. Furthermore, the REST architectural constraints, as [defined by Fielding](https://www.ics.uci.edu/~fielding/pubs/dissertation/rest_arch_style.htm), imply that not only end user agents should understand error code, but also every network proxy between a client and a server (the ‘layered’ architecture principle). And, in accordance to Fielding's writings, HTTP status code nomenclature does extensively describe virtually every situation which could happen with your HTTP request: wrong `Accept-*` headers value, `Content-Length` is absent, HTTP method is unsupported, URI too long, etc.

 What the RFC fails to describe is what to do with the error. As we discussed, errors could be resolvable or not. If the error is unresolvable, all this status codes and headers stuff is simply irrelevant to clients, even more so to interim proxies. In fact, three error codes are enough:
  * `400` to denote persistent situation (error couldn't be resolved by just repeating the request);
  * `404` to denote ‘uncertainity’ cases (the request could be repeated — possibly with different outcomes);
  * `500` to denote server-side problems, with `Retry-After` header to indicate the desirable retry period.

-**Aside note:** mark a design flaw here. All `4xx` status codes are by default not cacheable, except for `404`, `405`, `410` and `414`, which are cacheable. We presume that editors of the spec did this with the best intentions, but the number of people who knows this nuance is probably quite close to the number of the spec editors. As a result, there are lots of cases (the author of this book had to deal with one) when `404` was returned erroneously and cached on clients, thus prolonging the outage for an indefinite time.
+**Aside note:** mark a design flaw here. All `4xx` status codes are by default not cacheable, except for `404`, `405`, `410` and `414`, which are cacheable. We presume that editors of the spec did this with the best intentions, but the number of people who knows this nuance is probably quite close to the number of the spec editors. As a result, there are lots of cases (the author of this book had to deal with one) when `404`s were returned erroneously and cached on clients, thus prolonging the outage for an indefinite time.

 As for *resolvable* errors, having status codes nomenclature partially helps. Some of them are concrete, like `411 Length Required`, for example; others are not. There are several situations where just having a code is not enough:
-  * `400 Bad Request` code when some parameters are invalid or missing. This error make absolutely no sense to clients unless specific missing or invalid field is specified — but that's exactly the thing the standard does nothing with! There are no conventional standards to specify which parameter is wrong exactly. Yes, we can, of course, invite a standard of ourselves, but that would contradict the REST idea of protocol transparency.  
+  * `400 Bad Request` code when some parameters are invalid or missing. This error makes absolutely no sense to clients unless specific missing or invalid field is specified — but that's exactly the thing the standard does nothing with! There are no conventional standards to specify which parameter is wrong exactly. Yes, we can, of course, invent a standard of our own, but that would contradict the REST idea of protocol transparency.  
    **NB**: some purists insist, that a `400` code indicates a problem with the request itself, i.e. malformed URI, or header, or body, etc. Sometimes `422 Unprocessable Entity` or `412 Precondition Failed` are claimed to be the ‘right’ code for invalid parameters error. It doesn't change anything, of course.
  * `403 Forbidden` when some authorization or authentication error occurs. There are several quite different `Forbidden` situations, which require quite different actions from the client:
      * an authorization token is missing — the user must be invited to log in;
@ -27,28 +27,29 @@ As for *resolvable* errors, having status codes nomenclature partially helps. So

 So we quite naturally are moving to the idea of denoting error details with headers and/or response bodies, not trying to invent a specific error code to each situation. It's quite obvious that we can't design a new error code for every possible parameter missing in case of `400` error, for example.

-**Aside note**: the spec authors understood this too, adding the following sentence: ‘The response message will usually contain a representation that explains the status’. We couldn't agree more, but this sentence not only renders the entire spec section redundant (why use status codes at all?), but also contradicts to the REST paradigm: other agents in the layered system couldn't understand what the response message explains, thus making the error appear opaque to them.
+**Aside note**: the spec authors understood this too, adding the following sentence: ‘The response message will usually contain a representation that explains the status’. We couldn't agree more, but this sentence not only renders the entire spec section redundant (why use status codes in the first place?), but also contradicts to the REST paradigm: other agents in the layered system couldn't understand what the response message explains, thus making the error appear opaque to them.

-The conclusion seems to be: use status codes just to indicate a general error class, expressed in the HTTP protocol terms, and fill the response body with details. But here the third collision occurs: the implementation practice. From the very beginning of the Web frameworks and server software started relying on status codes for logging and monitoring purposes. I think I wouldn't exaggerate gravely if I said that there were literally no platform which natively supported building charts and graphs using custom semantic data in the responses, not status codes. One severe implication is that developers started inventing new codes to have their monitoring work correctly, cutting off insignificant errors and escalating vital ones.
+The conclusion seems to be: use status codes just to indicate a general error class, expressed in the HTTP protocol terms, and fill the response body with details. But here the third collision occurs: the implementation practice. From the very beginning of the Web, frameworks and server software started relying on status codes for logging and monitoring purposes. I think I wouldn't exaggerate gravely if I said that there were literally no platform which natively supported building charts and graphs using custom semantic data in the responses, not status codes. One severe implication is that developers started inventing new codes to have their monitoring work correctly, cutting off insignificant errors and escalating vital ones.

 Not only the number of status codes soared, but also their semantic meaning stirs. Many developers simply never read specs. The most evident example is the `401 Unauthorized` code: the spec prescribes the servers to return `WWW-Authenticate` header, which they never do — for obvious reasons, since the only usable value for this header is `Basic`. Furthermore, the spec is extensible at this point, new authentication realms could be introduced and standardized — but nobody cares. Right now using `401` to indicate an absence of authorization headers is a common practice — omitting the `WWW-Authenticate` header, of course.

-In a modern world we have to deal with a literal mess: HTTP status codes are used not for the protocol's purity sake, but to build the graphs; their semantic meaning forgotten; and clients often don't event try to get some useful information from the status codes, reducing them to the first digit. It's also a common practice to return resolvable errors as `200`.
+In a modern world we have to deal with a literal mess: HTTP status codes are used not for the protocol's purity sake, but to build the graphs; their semantic meaning forgotten; and clients often don't even try to get some useful information from the status codes, reducing them to the first digit. It's also a common practice to return resolvable errors as `200`s.

 #### So, what are you proposing, pal?

 Actually, there are three different approaches to solve this situation.

  * Abandon REST paradigm, stick to pure RPC. Use HTTP status codes to indicate the problems with the HTTP network layer itself. So you would actually need just 2 of them:
-    * `200 OK` if the server got the request, regardless of the result — execution errors are to be returned as `200`
+    * `200 OK` if the server got the request, regardless of the result — execution errors are to be returned as `200`s;
    * `500 Internal Server Error` if the request can't reach the server.  
-    You may employ `400 Bad Request` also to denote client errors; it slightly complicates the setup, but allows for using some interim software like API gateways.
+    You may employ the `400 Bad Request` also, to denote client errors; it slightly complicates the setup, but allows for using some interim software like API gateways.

-  * ‘Run with scissors’, using common practices, just cautiously, avoiding violating HTTP semantics. Use HTTP status codes to separate graphs (sometimes using exotic codes). Describe errors semantically and make sure clients don't try to detect anything valuable from the status codes.
+  * ‘Run with scissors’, using common practices, just cautiously, avoiding violating HTTP semantics. Use HTTP status codes to separate graphs (sometimes using exotic codes). Describe errors semantically and make sure clients don't try to detect anything valuable from the status codes.  
+    **NB**: some industrial-grade platforms manage to do both, i.e. combine a pure RPC-style approach with extensively employing various HTTP status codes to indicate a subset of problems (`403`s and `429`s for instance, which are purely business logic-bound, having nothing to do with the HTTP itself). Though in a practical sense this approach seems to work, it's very hard to to tell which problems they face in modern smart-proxy rich environments, not mentioning aesthetic impressions. 

  * Try organizing the mess. Including, but not limited to:
    * using HTTP codes to indicate the problems expressible in HTTP terms (like using `406 Unacceptable` to indicate invalid `Accept-Language` request header value);
-    * defining additional machine-readable error response details, preferably in a form of HTTP headers (since reading them doesn't require parsing the entire response body, so interim proxies and API gateways might operate them less expensively); for example, use something like an `X-My-API-Error-Reason` header containing with pre-defined semantic errors nomenclature;
+    * defining additional machine-readable error response details, preferably in a form of HTTP headers (since reading them doesn't require parsing the entire response body, so interim proxies and API gateways might operate them less expensively, and they could be easily logged); for example, use something like an `X-My-API-Error-Reason` header limited to pre-defined semantic errors nomenclature;
    * customize graphs and monitoring to make them use this specific data in addition to the status codes or instead of them;
    * make sure that clients are treating status codes and the error details correctly, especially with regard to dealing with unknown errors.

--- a/совместимость/01.md
+++ b/совместимость/01.md
@ -0,0 +1,11 @@
+### Постановка проблемы обратной совместимости
+
+Как обычно, дадим смысловой определение «обратной совместимости», прежде чем начинать изложение.
+
+Обратная совместимость — это прежде всего свойство системы API быть стабильной во времени. Это значит следующее: **код, написанный разработчиками с использованием вашего API, продолжает работать функционально корректно в течение длительного времени**. К этому определению есть два больших вопроса, и два уточнения к ним.
+
+  1. Что значит «функционально корректно»? Это значит, что код продолжает выполнять свою функцию — решать какую-то задачу пользователя. Это не означает, что он продолжает работать одинаково: например, если вы предоставляете UI-библиотеку, то изменение функционально несущественных деталей дизайна, типа глубины теней или формы штриха границы, обратную совместимость не нарушит. А вот, например, изменение размеров компонентов, скорее всего, приведёт к тому, что какие-то пользовательские макеты развалятся.
+
+  2. Что значит «длительное время»? С нашей точки зрения длительность поддержания обратной совместимости следует увязывать с длительностью жизненных циклов приложений в соответствующей предметной области. Хороший ориентир в большинстве случаев — это LTS-периоды платформ. Так как приложение все равно будет переписано в связи с окончанием поддержки платформы, нормально предложить также и переход на новую версию API. В основных предметных областях (веб-приложения) этот срок исчисляется несколькими годами.
+
+Почему обратную совместимость необходимо поддерживать (в том числе на этапе проектирования API) — понятно из определения. Прекращение работы приложения (полное или частичное) по вине поставщика API — крайне неприятное событие, а то и катастрофа, для любого разработчика, особенно если он платит за это API деньги.
--- a/src/ru/drafts/HTTP
+++ b/src/ru/drafts/HTTP
@ -0,0 +1,60 @@
+### Статус-коды HTTP
+
+Ситуация с использованием кодов ответов HTTP можно заносить в палату мер и весов: вот что происходит, когда благие намерения разработчиков спецификации сталкиваются с жестокой реальностью. Даже с двумя жестокими реальностями.
+
+Как мы обсудили в [Главе 10](https://twirl.github.io/The-API-Book/docs/API.en.html#chapter-10), одна из целей существования семантических ошибок — помочь клиенту понять, что стало причиной ошибки. При разработке спецификации HTTP (в частности, [RFC 7231](https://tools.ietf.org/html/rfc7231#section-6)) эта цель очевидно была одной из главных. Более того, архитектурные ограничения REST, как их описал Фьелдинг [в своей диссертации](https://www.ics.uci.edu/~fielding/pubs/dissertation/rest_arch_style.htm), предполагают, что не только клиенты должны понимать семантику ошибки, но и все сетевые агенты (прокси) между клиентом и сервером в «многослойной» архитектуре. И, в соответствии с этим, номенклатура статус-кодов HTTP действительно весьма подробно описывает почти любые проблемы, которые могут случиться с HTTP-запросом: недопустимые значения `Accept-*`-заголовков, отсутствующий `Content-Length`, неподдерживаемый HTTP-метод, слишком длинный URI и так далее.
+
+Но вот с чем RFC совершенно не помогает — это с вопросом, а что собственно клиенту или прокси делать с ошибкой. Как мы обсуждали, ошибки могут быть устранимыми или неустранимыми. Если ошибки неустранимая, то клиентам по большому счёту наплевать на всю эту петрушку со статус-кодами и заголовками, а уж промежуточным прокси тем более. Для этого на самом деле трёх кодов было бы достаточно:
+  * `400` для персистентных ошибок (если просто повторить запрос — ошибка никуда не денется);
+  * `404` для статуса неопределённости (повтор запроса может дать другой результат);
+  * `500` для проблем на стороне сервера плюс заголовок `Retry-After`, чтобы дать понять клиенту, когда прийти снова.
+
+**Замечание**: кстати, обратите внимание на проблему дизайна спецификации. По умолчанию все `4xx` коды не кэшируются, за исключением: `404`, `405`, `410`, `414`. Мы не сомневаемся, что это было сделано из благих намерений, но подозреваем, что количество людей, знающих об этой тонкости, примерно равно количеству редакторов спецификации. В результате мы имеем множество ситуаций (автор лично разгребал последствия одной из них), когда `404`-ки были возвращены ошибочно, но клиент их закэшировал, тем самым продлив факап на неопределённое время.
+
+Что касается *устранимых* проблем — то да, статус-коды в чем-то помогают. Некоторые из них вполне конкретны, например `411 Length Required`. А некоторые — нет. Можно привести множество ситуаций, где под одним кодом прячутся разнородные ошибки:
+  * `400 Bad Request` для ситуаций, когда часть параметров отсутствует или имеет недопустимое значение. От этой ошибки клиентам нет абсолютно никакого толку, если только в ответе не указано, какое конкретно поле имеет недопустимое значение — и вот как раз именно это стандарт и не стандартизирует! Да, конечно, можно самому стандарт придумать — но это как минимум противоречит идее прозрачности в REST.
+
+    **NB**: некоторые пуристы считают, что `400` означает проблемы с самим запросом, т.е. кривой URI, заголовок, невалидный JSON и т.д., а для логических ошибок с параметрами предлагают использовать `422 Unprocessable Entity` или `412 Precondition Failed`. Как вы понимаете, это влияет примерно ни на что.
+
+  * `403 Forbidden` для ошибок аутентификации и/или авторизации. И вот тут есть множество совершенно разных `Forbidden`-ов, которые требует совершенно разных действий от клиента:
+      * токен авторизации отсутствует — надо предложить клиенту залогиниться;
+      * токен протух — надо выполнить процедуру подновления токена;
+      * токен принадлежит другому пользователю — обычно свидетельствует о протухании кэша;
+      * токен отозван — пользователь выполнил выход со всех устройств;
+      * злоумышленник брутфорсит авторизационный эндпойнт — надо выполнить какие-то антифродные действия.
+
+    Каждая `403` связана со своим сценарием разрешения, некоторые из них (например, брутфорсинг) вообще ничего общего не имеют с другими.
+
+  * `409 Conflict`;
+  * тысячи их.
+
+Таким образом, мы вполне естественным образом приходим к идее отдавать детальное описание ошибки в заголовках и/или теле ответа, не пытаясь изобрести новый код для каждой ситуации — абсолютно очевидно, что нельзя задизайнить по ошибке на каждый потенциально неправильный параметр вместо единой `400`-ки, например.
+
+**Замечание**: авторы спецификации тоже это понимали, и добавили следующую фразу: ‘The response message will usually contain a representation that explains the status’. Мы с ними, конечно, полностью согласны, но не можем не отметить, что эта фраза не только делает кусок спецификации бесполезным (а зачем нужны коды-то тогда?), но и противоречит парадигме REST: другие агенты в многоуровневой системе не могут понять, что же там «объясняет» представление ошибки, и сама ошибка становится для них непрозрачной.
+
+Казалось бы, мы пришли к логичному выводу: используйте статус-коды для индикации «класса» ошибки в терминах протокола HTTP, а детали положите в ответ. Но вот тут теория повторно на всех парах напарывается на практику. С самого появления Web все фреймворки и серверное ПО полагаются на статус-коды для логирования и построения мониторингов. Я не думаю, что сильно совру, если скажу, что буквально не существует платформы, которая из коробки умеет строить графики по семантическим данным в ответе ошибки, а не по статус-кодам. И отсюда автоматически следует дальнейшее усугубление проблемы: чтобы отсечь в своих мониторингах незначимые ошибки и эскалировать значимые, разработчики начали попросту придумывать новые статус-коды — или использовать существующие не по назначению.
+
+Это в свою очередь привело не только к распуханию номенклатуры кодов, но и размытию их значений. Многие разработчики просто не читают спецификации ¯\\\_(ツ)\_/¯. Самый очевидный пример — это ошибка `401 Unauthorized`: по спецификации она **обязана** сопровождаться заголовком `WWW-Authenticate` — чего, в реальности, конечно никто не делает, и по очевидным причинам, т.к. единственное разумное значение этого заголовка — `Basic` (да-да, это та самая логин-парольная авторизация времён Web 1.0, когда браузер диалоговое окно показывает). Более того, спецификация в этом месте расширяема, никто не мешает стандартизовать новые виды `realm`-ов авторизации — но всем традиционно всё равно. Прямо сейчас использование `401` при отсутствии авторизационных заголовков фактически является стандартом индустрии — и никакого `WWW-Authenticate` при этом, конечно, не шлётся.
+
+В современном мире мы буквально живём в этом бардаке: статус-коды HTTP используются вовсе не в целях поддержания чистоты протокола, а для графиков; их истинное значение забыто; клиенты обычно и не пытаются хоть какие-то выводы из кода ответа сделать, редуцируя его до первой цифры. (Честно говоря, ещё неизвестно, что хуже — игнорировать код или, напротив, писать логику поверх кодов, использованных не по назначению.) Ну и, конечно, нельзя не упомянуть о широко распространённой практике отдавать ошибки внутри `200`-ок.
+
+#### А какие ваши предложения?
+
+На самом деле есть три подхода к решению этой ситуации:
+  * отказаться от REST и перейти на чистый RPC. Использовать статус-коды HTTP только для индикации проблем с соответствующим уровнем сетевого стэка. Достаточно двух:
+    * `200 OK` если сервер получил запрос, независимо от результата — ошибки исполнения запроса все равно возвращаются как `200`.
+    * `500 Internal Server Error` если запрос до сервера не дошёл.
+
+    Можно ещё использовать `400 Bad Request` для клиентских ошибок. Это чуть усложняет конструкцию, но позволяет пользоваться ПО и сервисами для организации API Gateway;
+
+  * «и так сойдёт» — ну раз сложилась такая ситуация, ну в ней и жить, только осторожненько, совсем уж явно не нарушая стандарт. Графики строить по кодам; нужно поделить ошибки по типу — используй какой-нибудь экзотический код. Клиенты код ответа игнорируют и смотрят на данные в теле ответа.
+
+    **NB**: некоторые признанные лидеры индустрии умудряются при этом делать и то, и другое: использовать RPC-подход и, одновременно, кучу статус-кодов для каких-то частных проблем (например, `403` и `429`, которые вообще-то явно связаны с бизнес-логикой работы клиентов, а не с самим HTTP). В чисто практическом смысле такой подход работает, хотя и трудно предсказать наперёд, какие проблемы могут притаиться в современной инфраструктуре, где любая «умная» прокси норовит прочитать запрос. Ну и эстетические чувства соответствующие;
+
+  * прибрать бардак. Включая, но не ограничиваясь:
+    * использовать HTTP-коды для проблем, которые можно описать в терминах HTTP (т.е. использовать `406 Unacceptable` при недопустимом значении заголовка `Accept-Language`, например, а не для чего-то ещё);
+    * стандартизировать дополнительные машиночитаемые данные в ответе, предпочтительно в форме заголовков HTTP (потому что чтение заголовков не требует вычитывания и разбора всего тела ответа, так что промежуточные прокси и гейтвеи смогут понять семантику ошибки без дополнительных расходов; а так же их можно логировать) — например, использовать что-то наподобие `X-My-API-Error-Reason` и жестко регламентировать возможные значения;
+    * настроить графики и мониторинги так, чтобы они работали по доп. данным из предыдущего пункта в дополнение к статус-кодам (или вместо них);
+    * убедиться, что клиенты верно трактуют и статус-коды, и дополнительные данные, особенно в случае неизвестных ошибок.
+
+Выбор за вами, но на всякий случай заметим, что подход \#3 весьма дорог в реализации.