mirror of
https://github.com/twirl/The-API-Book.git
synced 2025-02-16 18:34:31 +02:00
proofreading
This commit is contained in:
parent
922f927ef5
commit
2cb80dae20
BIN
docs/API.en.epub
BIN
docs/API.en.epub
Binary file not shown.
142
docs/API.en.html
142
docs/API.en.html
@ -6586,170 +6586,170 @@ api.<span class="hljs-title function_">subscribe</span>(
|
||||
</ol>
|
||||
<p><strong>NB</strong>: While developing a “vertical” lineup of APIs, it is crucial to follow the principles discussed in the “<a href="#back-compat-iceberg-waterline">On the Waterline of the Iceberg</a>” chapter. You will only be able to manipulate the content and behavior of a widget if developers cannot “escape the sandbox,” meaning they do not have direct access to low-level objects encapsulated within the widget.</p>
|
||||
<p>In general, your aim should be to have each partner use the API services in a manner that maximizes your profit as an API vendor. When a partner only needs a typical solution, you would benefit from making them use widgets as they are under your direct control. This will help ease the API version fragmentation problem and allow for experimentation to reach your KPIs. When the partner possesses expertise in the subject area and develops a unique service on top of your API, you would benefit from allowing full freedom in customizing the integration. This way, they can cover specific market niches and enjoy the advantage of offering more flexibility compared to services using competing APIs.</p><h4>References</h4><ul class="references"><li><p><a href="#ref-chapter-55-no-84-back" class="back-anchor" id="ref-chapter-55-no-84"><sup>1</sup> </a><span>Mobile Deep Linking<br><a target="_blank" class="external" href="https://en.wikipedia.org/wiki/Mobile_deep_linking">https://en.wikipedia.org/wiki/Mobile_deep_linking</a></span></p></li></ul><div class="page-break"></div><h3><a href="#api-product-kpi" class="anchor" id="api-product-kpi">Chapter 56. API Key Performance Indicators</a><a href="#chapter-56" class="secondary-anchor" id="chapter-56"> </a></h3>
|
||||
<p>As we described in the previous chapters, there are many API monetization models, both direct and indirect. Importantly, most of them are fully or conditionally free for partners, and the direct-to-indirect benefits ratio tends to change during the API lifecycle. That naturally leads us to the question of how exactly shall we measure the API success and what goals are to be set for the product team.</p>
|
||||
<p>As we described in the previous chapters, there are many API monetization models, both direct and indirect. Importantly, most of them are fully or conditionally free for partners, and the direct-to-indirect benefits ratio tends to change during the API lifecycle. That naturally leads us to the question of how exactly shall we measure the API's success and what goals are to be set for the product team.</p>
|
||||
<p>Of course, the most explicit metric is money: if your API is monetized directly or attracts visitors to a monetized service, the rest of the chapter will be of little interest to you, maybe just as a case study. If, however, the contribution of the API to the company's income cannot be simply measured, you have to stick to other, synthetic, indicators.</p>
|
||||
<p>The obvious key performance indicator (KPI) #1 is the number of end users and the number of integrations (i.e., partners using the API). Normally, they are in some sense a business health barometer: if there is a normal competitive situation among the API suppliers, and all of them are more or less in the same position, then the figure of how many developers (and consequently, how many end users) are using the API is the main metric of success of the API product.</p>
|
||||
<p>The obvious key performance indicator (KPI) #1 is the number of end users and the number of integrations (i.e., partners using the API). Normally, they are in some sense a business health barometer: if there is a normal competitive situation among the API suppliers, and all of them are more or less in the same position, then the figure of how many developers (and consequently, how many end users) are using the API is the main metric of success for the API product.</p>
|
||||
<p>However, sheer numbers might be deceiving, especially if we talk about free-to-use integrations. There are several factors that make them less reliable:</p>
|
||||
<ul>
|
||||
<li>The high-level API services that are meant for point-and-click integration (see the previous chapter) are significantly distorting the statistics, especially if the competitors don't provide such services; typically, for one full-scale integration there will be tens, maybe hundreds, of those lightweight embedded widgets.
|
||||
<ul>
|
||||
<li>Thereby, it's crucial to have partners counted for each kind of the integration independently.</li>
|
||||
<li>Thereby, it's crucial to have partners counted for each kind of integration independently.</li>
|
||||
</ul>
|
||||
</li>
|
||||
<li>Partners tend to use the API in suboptimal ways:
|
||||
<ul>
|
||||
<li>Embed it at every website page / application screen instead of only those where end users can really interact with the API</li>
|
||||
<li>Embed it on every website page / application screen instead of only those where end users can really interact with the API</li>
|
||||
<li>Put widgets somewhere deep in the page / screen footer, or hide it behind spoilers</li>
|
||||
<li>Initialize a broad range of API modules, but use only a limited subset of them.</li>
|
||||
<li>Initialize a broad range of API modules but use only a limited subset of them.</li>
|
||||
</ul>
|
||||
</li>
|
||||
<li>The greater the API audience is, the less the number of unique visitors means as at some moment the penetration will be close to 100%; for example, a regular Internet user interacts with Google or Facebook counters, well, every minute, so the daily audience of those API fundamentally cannot be increased further.</li>
|
||||
<li>The greater the API audience is, the less the number of unique visitors means as at some moment the penetration will be close to 100%; for example, a regular Internet user interacts with Google or Facebook counters, well, every minute, so the daily audience of those APIs fundamentally cannot be increased further.</li>
|
||||
</ul>
|
||||
<p>All the abovementioned problems naturally lead us to a very simple conclusion: not only the raw numbers of users and partners are to be gauged, but their engagement as well, i.e., the target actions (such as searching, observing some data, interacting with widgets) shall be determined and counted. Ideally, these target actions must correlate with the API monetization model:</p>
|
||||
<p>All the abovementioned problems naturally lead us to a very simple conclusion: not only should the raw numbers of users and partners be gauged, but their engagement as well, i.e., the target actions (such as searching, observing specific data, interacting with widgets) should be determined and counted. Ideally, these target actions must correlate with the API monetization model:</p>
|
||||
<ul>
|
||||
<li>If the API is monetized through displaying ads, then the user's activity towards those ads (e.g., clicks, interactions) is to be measured.</li>
|
||||
<li>If the API attracts customers to the core service, then count the transitions.</li>
|
||||
<li>If the API is needed for collecting feedback and gathering UGC, then calculate the number of reviews left and entities edited.</li>
|
||||
</ul>
|
||||
<p>Additionally, the functional KPIs are often employed: how frequently some API features are used. (Also, it helps with prioritizing further API improvements.) In fact, that's still measuring target actions, but those that are made by developers, not end users. It's rather complicated to gather the usage data for software libraries and frameworks, though still doable (however, you must be extremely cautious with that, as any audience rather nervously reacts to finding that some statistic is gathered automatically).</p>
|
||||
<p>The most complicated case is that of API being a tool for (tech)PR and (tech)marketing. In this case, there is a cumulative effect: increasing the API audience doesn't momentarily bring any profit to the company. <em>First</em>, you got a loyal developer community, <em>then</em> this reputation helps you to hire people. <em>First</em>, your company's logo flashes on third-party webpages and applications, <em>then</em> the top-of-mind brand knowledge increases. There is no direct method of evaluating how some action (let's say, a new release or an event for developers) affects the target metrics. In this case, you have to operate indirect metrics, such as the audience of the documentation site, the number of mentions in the relevant communication channels, the popularity of your blogs and seminars, etc.</p>
|
||||
<p>Let us summarize the paragraph:</p>
|
||||
<p>Additionally, functional KPIs are often employed: how frequently some API features are used. (Also, it helps with prioritizing further API improvements.) In fact, that's still measuring target actions, but those made by developers, not end users. It's rather complicated to gather usage data for software libraries and frameworks, though still doable (however, you must be extremely cautious with that, as any audience rather nervously reacts to finding that some statistics are gathered automatically).</p>
|
||||
<p>The most complicated case is when the API is a tool for (tech)PR and (tech)marketing. In this case, there is a cumulative effect: increasing the API audience doesn't immediately bring any profit to the company. <em>First</em>, you build a loyal developer community, <em>then</em> this reputation helps you hire people. <em>First</em>, your company's logo flashes on third-party webpages and applications, <em>then</em> top-of-mind brand awareness increases. There is no direct method of evaluating how some action (let's say, a new release or an event for developers) affects the target metrics. In this case, you have to operate with indirect metrics, such as the audience of the documentation site, the number of mentions in relevant communication channels, the popularity of your blogs and seminars, etc.</p>
|
||||
<p>To summarize the paragraph:</p>
|
||||
<ul>
|
||||
<li>Counting direct metrics such as the total number of users and partners is a must and is totally necessary for moving further, but that's not a proper KPI.</li>
|
||||
<li>The proper KPI should be formulated based on the number of target actions that are made through the platform.</li>
|
||||
<li>The definition of target action depends on the monetization model and might be quite straightforward (like the number of paying partners, or the number of paid ad clicks) or, to the contrary, pretty implicit (like the growth of the company's developer blog audience).</li>
|
||||
<li>Counting direct metrics such as the total number of users and partners is a must and is absolutely necessary for moving forward, but that's not a proper KPI.</li>
|
||||
<li>The proper KPI should be formulated based on the number of target actions made through the platform.</li>
|
||||
<li>The definition of target action depends on the monetization model and might be quite straightforward (like the number of paying partners or the number of paid ad clicks) or, conversely, pretty implicit (like the growth of the company's developer blog audience).</li>
|
||||
</ul>
|
||||
<h4>SLA</h4>
|
||||
<p>This chapter would be incomplete if we didn't mention the “hygienic” KPI — the service level and the service availability. We won't be describing the concept in detail, as the API SLA isn't any different from any other digital services SLAs. Let us just state that this metric must be tracked, especially if we talk about pay-to-use APIs. However, in many cases, API vendors prefer to offer rather loose SLAs, treating the provided functionality as a data access or content licensing service.</p>
|
||||
<p>Still, let us re-iterate once more: any problems with your API are automatically multiplied by the number of partners you have, especially if the API is vital for them, i.e., the API outage makes the main functionality of their services unavailable. (And actually, because of the above-mentioned reasons, the average quality of integrations implies that partners' services will suffer even if the availability of the API is not formally speaking critical for them, but because developers use it excessively and do not bother with proper error handling.)</p>
|
||||
<p>It is important to mention that predicting the workload for the API service is rather complicated. Sub-optimal API usage, e.g., initializing the API in those application and website parts where it's not actually needed, might lead to a colossal increase in the number of requests after changing a single line of partner's code. The safety margin for an API service must be much higher than for a regular service for end users — it must survive the situation of the largest partner suddenly starting querying the API on every page and every application screen. (If the partner is already doing that, then the API must survive doubling the load if the partner by accident starts initializing the API twice on each page / screen.)</p>
|
||||
<p>Another extremely important hygienic minimum is the informational security of the API service. In the worst-case scenario, namely, if an API service vulnerability allows for exploiting partner applications, one security loophole will in fact be exposed <em>in every partner application</em>. Needless to say that the cost of such a mistake might be overwhelmingly colossal, even if the API itself is rather trivial and has no access to sensitive data (especially if we talk about webpages where no “sandbox” for third-party scripts exists, and any piece of code might let's say track the data entered in forms). API services must provide the maximum protection level (for example, choose cryptographical protocols with a certain overhead) and promptly react to any reports regarding possible vulnerabilities.</p>
|
||||
<p>This chapter would be incomplete if we didn't mention the “hygienic” KPI — service level and availability. We won't describe the concept in detail, as the API SLA isn't any different from SLAs for other digital services. Let us just state that this metric must be tracked, especially if we talk about pay-to-use APIs. However, in many cases, API vendors prefer to offer rather loose SLAs, treating the provided functionality as data access or content licensing services.</p>
|
||||
<p>Still, let us reiterate once more: any problems with your API are automatically multiplied by the number of partners you have, especially if the API is vital for them, i.e., the API outage makes the main functionality of their services unavailable. (And actually, because of the above-mentioned reasons, the average quality of integrations implies that partners' services will suffer even if the availability of the API is not formally speaking critical for them, but because developers use it excessively and do not bother with proper error handling.)</p>
|
||||
<p>It is important to mention that predicting the workload for the API service is rather complicated. Sub-optimal API usage, e.g., initializing the API in those parts of applications and websites where it's not actually needed, might lead to a colossal increase in the number of requests after changing a single line of a partner's code. The safety margin for an API service must be much higher than for a regular service for end users — it must survive the situation of the largest partner suddenly starting to query the API on every page and every application screen. (If the partner is already doing that, then the API must survive doubling the load if the partner accidentally starts initializing the API twice on each page / screen.)</p>
|
||||
<p>Another extremely important hygienic minimum is the informational security of the API service. In the worst-case scenario, namely, if an API service vulnerability allows for exploiting partner applications, one security loophole will in fact be exposed <em>in every partner application</em>. Needless to say, the cost of such a mistake might be overwhelmingly colossal, even if the API itself is rather trivial and has no access to sensitive data (especially if we talk about webpages where no “sandbox” for third-party scripts exists, and any piece of code might, for example, track the data entered in forms). API services must provide the maximum level of protection (e.g., choose cryptographic protocols with a certain overhead) and promptly react to any reports regarding possible vulnerabilities.</p>
|
||||
<h4>Comparing to Competitors</h4>
|
||||
<p>While measuring KPIs of any service, it's important not only to evaluate your own numbers but also to match them against the state of the market:</p>
|
||||
<p>While measuring KPIs of any service, it's important not only to evaluate your own numbers but also to compare them against the state of the market:</p>
|
||||
<ul>
|
||||
<li>What is your market share, and how is it evolving over time?</li>
|
||||
<li>Is your service growing faster than the market itself or is the rate the same, or is it even less?</li>
|
||||
<li>Is your service growing faster than the market itself, or is the growth rate the same, or is it even less?</li>
|
||||
<li>What proportion of the growth is caused by the growth of the market, and what is related to your efforts?</li>
|
||||
</ul>
|
||||
<p>Getting answers to those questions might be quite non-trivial in the case of API services. Indeed, how could you learn how many integrations has your competitor had during the same period of time, and what number of target actions had happened on their platform? Sometimes, the providers of popular analytical tools might help you with this, but usually, you have to monitor the potential partners' apps and websites and gather the statistics regarding APIs they're using. The same applies to market research: unless your niche is significant enough for some analytical company to conduct a study, you will have to either commission such work or make your own estimations — conversely, through interviewing potential customers.</p><div class="page-break"></div><h3><a href="#api-product-antifraud" class="anchor" id="api-product-antifraud">Chapter 57. Identifying Users and Preventing Fraud</a><a href="#chapter-57" class="secondary-anchor" id="chapter-57"> </a></h3>
|
||||
<p>Getting answers to those questions might be quite non-trivial in the case of API services. Indeed, how could you learn how many integrations your competitor had during the same period, and what number of target actions had happened on their platform? Sometimes, the providers of popular analytical tools might help you with this, but usually, you have to monitor potential partners' apps and websites and gather statistics regarding the APIs they're using. The same applies to market research: unless your niche is significant enough for some analytical company to conduct a study, you will have to either commission such work or make your own estimations — conversely, through interviewing potential customers.</p><div class="page-break"></div><h3><a href="#api-product-antifraud" class="anchor" id="api-product-antifraud">Chapter 57. Identifying Users and Preventing Fraud</a><a href="#chapter-57" class="secondary-anchor" id="chapter-57"> </a></h3>
|
||||
<p>In the context of working with an API, we talk about two kinds of users of the system:</p>
|
||||
<ul>
|
||||
<li>Users-developers, i.e., your partners writing code atop of the API</li>
|
||||
<li>End users interacting with applications implemented by the users-developers.</li>
|
||||
</ul>
|
||||
<p>In most cases, you need to have both of them identified (in a technical sense: discern one unique customer from another) to have answers to the following questions:</p>
|
||||
<p>In most cases, you need to have both of them identified (in a technical sense: discern one unique customer from another) to answer the following questions:</p>
|
||||
<ul>
|
||||
<li>How many users are interacting with the system (simultaneously, daily, monthly, and yearly)?</li>
|
||||
<li>How many actions does each user make?</li>
|
||||
<li>How many actions does each user perform?</li>
|
||||
</ul>
|
||||
<p><strong>NB</strong>: Sometimes, when an API is very large and/or abstract, the chain linking the API vendor to end users might comprise more than one developer as large partners provide services implemented atop of the API to the smaller ones. You need to count both direct and “derivative” partners.</p>
|
||||
<p>Gathering this data is crucial because of two reasons:</p>
|
||||
<p>Gathering this data is crucial for two reasons:</p>
|
||||
<ul>
|
||||
<li>To understand the system's limits and to be capable of planning its growth</li>
|
||||
<li>To understand the number of resources (ultimately, money) that are spent (and gained) on each user.</li>
|
||||
</ul>
|
||||
<p>In the case of commercial APIs, the quality and timeliness of gathering this data are twice that important, as the tariff plans (and therefore the entire business model) depend on it. Therefore, the question of <em>how exactly</em> we're identifying users is crucial.</p>
|
||||
<p>In the case of commercial APIs, the quality and timeliness of gathering this data are twice as important because the tariff plans (and therefore the entire business model) depend on it. Therefore, the question of <em>how exactly</em> we're identifying users is crucial.</p>
|
||||
<h4>Identifying Applications and Their Owners</h4>
|
||||
<p>Let's start with the first user category, i.e., API business partners-developers. The important remark: there are two different entities we must learn to identify, namely applications and their owners.</p>
|
||||
<p>An application is roughly speaking a logically separate case of API usage, usually — literally an application (mobile or desktop one) or a website, i.e., some technical entity. Meanwhile, an owner is a legal body that you have the API usage agreement signed. If API Terms of Service (ToS) imply different limits and/or tariffs depending on the type of the service or the way it uses the API, this automatically means the necessity to track one owner's applications separately.</p>
|
||||
<p>In the modern world, the factual standard for identifying both entities is using API keys: a developer who wants to start using an API must obtain an API key bound to their contact info. Thus the key identifies the application while the contact data identifies the owner.</p>
|
||||
<p>Though this practice is universally widespread we can't but notice that in most cases it's useless, and sometimes just destructive.</p>
|
||||
<p>Its general advantage is the necessity to supply actual contact info to get a key, which theoretically allows for contacting the application owner if needed. (In the real world, it doesn't work: key owners often don't read mailboxes they provided upon registration; and if the owner is a company, it easily might be a no-one's mailbox or a personal email of some employee that left the company a couple of years ago.)</p>
|
||||
<p>Though this practice is universally widespread we can't help but notice that in most cases it's useless, and sometimes just destructive.</p>
|
||||
<p>Its general advantage is the necessity to supply actual contact info to get a key, which theoretically allows for contacting the application owner if needed. (In the real world, it doesn't work: key owners often don't read mailboxes they provided upon registration; and if the owner is a company, it might easily be a no-one's mailbox or a personal email of some employee who left the company a couple of years ago.)</p>
|
||||
<p>The main disadvantage of using API keys is that they <em>don't</em> allow for reliably identifying both applications and their owners.</p>
|
||||
<p>If there are free limits to API usage, there is a temptation to obtain many API keys bound to different owners to fit those free limits. You may raise the bar of having such multi-accounts by requiring, let's say, providing a phone number or bank card data, but there are popular services for automatically issuing both. Paying for a virtual SIM or credit card (to say nothing about buying the stolen ones) will always be cheaper than paying the proper API tariff — unless it's the API for creating those cards. Therefore, API key-based user identification (if you're not requiring the physical contract to be signed) does not mean you don't need to double-check whether users comply with the terms of service and do not issue several keys for one app.</p>
|
||||
<p>Another problem is that an API key might be simply stolen from a lawful partner; in the case of mobile or web applications, that's quite trivial.</p>
|
||||
<p>It might look like the problem is not that important in the case of server-to-server integrations, but it actually is. Imagine that a partner provides a public service of their own that uses your API under the hood. That usually means there is an endpoint in the partner's backend that performs a request to the API and returns the result, and this endpoint perfectly suits as a free replacement of direct access to the API for a cybercriminal. Of course, you might say this fraud is a problem of partners', but, first, it would be naïve to expect each partner develops their own anti-fraud system, and, second, it's just sub-optimal: obviously, a centralized anti-fraud system would be way more effective than a bunch of amateur implementations. Also, server keys might also be stolen: it's much harder than stealing client keys but doable. With any popular API, sooner or later you will face the situation of stolen keys made available to the public (or a key owner just shared it with acquaintances out of the kindness of their heart).</p>
|
||||
<p>One way or another, a problem of independent validation arises: how can we control whether the API endpoint is requested by a user in compliance with the terms of service?</p>
|
||||
<p>Mobile applications might be conveniently tracked through their identifiers in the corresponding store (Google Play, App Store, etc.), so it makes sense to require this identifier to be passed by partners as an API initialization parameter. Websites with some degree of confidence might be identified by the <code>Referer</code> and <code>Origin</code> HTTP headers.</p>
|
||||
<p>This data is not itself reliable, but it allows for making cross-checks:</p>
|
||||
<p>It might appear that the problem is not as significant in the case of server-to-server integrations, but it actually is. Imagine that a partner provides a public service of their own that uses your API under the hood. This usually means there is an endpoint in the partner's backend that makes a request to the API and returns the result, and this endpoint can be easily used by a cybercriminal as a free replacement for direct access to the API. Of course, you might argue that this fraud is the partner's problem, but firstly, it would be naïve to expect that every partner develops their own anti-fraud system, and secondly, it is sub-optimal: a centralized anti-fraud system would undoubtedly be way more effective than a collection of amateur implementations. Furthermore, server keys might also be stolen; although, it's more challenging than stealing client keys, it's still feasible. With any popular API, sooner or later you will encounter the situation of stolen keys being made available to the public (or a key owner sharing it with acquaintances out of kindness).</p>
|
||||
<p>In one way or another, the issue of independent validation arises: how can we control whether the API endpoint is being requested by a user in compliance with the terms of service?</p>
|
||||
<p>Mobile applications could be conveniently tracked through their identifiers in the corresponding store (Google Play, App Store, etc.), so it makes sense to require this identifier to be passed by partners as an API initialization parameter. Websites, with some degree of confidence, can be identified by the <code>Referer</code> and <code>Origin</code> HTTP headers.</p>
|
||||
<p>This data is not entirely reliable, but it allows for cross-checks:</p>
|
||||
<ul>
|
||||
<li>If a key was issued for one specific domain but requests are coming with a different <code>Referer</code>, it makes sense to investigate the situation and maybe ban the possibility to access the API with this <code>Referer</code> or this key.</li>
|
||||
<li>If an application initializes API by providing a key registered to another application, it makes sense to contact the store administration and ask for removing one of the apps.</li>
|
||||
<li>If a key was issued for one specific domain but requests are coming with a different <code>Referer</code>, it makes sense to investigate the situation and maybe ban the possibility of accessing the API with this <code>Referer</code> or this key.</li>
|
||||
<li>If an application initializes the API by providing a key registered to another application, it makes sense to contact the store administration and request the removal of one of the apps.</li>
|
||||
</ul>
|
||||
<p><strong>NB</strong>: Don't forget to set infinite limits for using the API with the <code>localhost</code>, <code>127.0.0.1</code> / <code>[::1]</code> <code>Referer</code>s, and also for your own sandbox if it exists. Yes, abusers will sooner or later learn this fact and will start exploiting it, but otherwise, you will ban local development and your own website much sooner than that.</p>
|
||||
<p><strong>NB</strong>: Don't forget to set infinite limits for using the API with the <code>localhost</code> and <code>127.0.0.1</code> / <code>[::1]</code> <code>Referer</code>s, and also for your own sandbox if it exists. Yes, abusers will sooner or later learn this fact and start exploiting it, but otherwise, you will ban local development and your own website much sooner than that.</p>
|
||||
<p>The general conclusion is:</p>
|
||||
<ul>
|
||||
<li>It is highly desirable to have partners formally identified (either through obtaining API keys or by providing contact data such as website domain or application identifier in a store while initializing the API).</li>
|
||||
<li>This information shall not be trusted unconditionally; there must be double-checking mechanisms that identify suspicious requests.</li>
|
||||
<li>It is highly desirable to have partners formally identified (either through obtaining API keys or by providing contact data such as website domain or application identifier in a store during API initialization).</li>
|
||||
<li>This information should not be blindly trusted; double-checking mechanisms are necessary to identify suspicious requests.</li>
|
||||
</ul>
|
||||
<h4>Identifying End Users</h4>
|
||||
<p>Usually, you can put forward some requirements for self-identifying of partners, but asking end users to reveal contact information is impossible in most cases. All the methods of measuring the audience described below are imprecise and often heuristic. (Even if partner application functionality is only available after registration and you do have access to that profile data, it's still a game of assumptions, as an individual account is not the same as an individual user: several different persons might use a single account, or, vice versa, one person might register many accounts.) Also, note that gathering this sort of data might be legally regulated (though we will be mostly speaking about anonymized data, there might still be some applicable law).</p>
|
||||
<p>Usually, you can impose requirements for partners to self-identify, but it's often impossible to ask end users to disclose their contact information. All the methods of measuring the audience described below are imprecise and often heuristic. (Even if partner application functionality is only available after registration and you do have access to that profile data, it's still a game of assumptions, as an individual account is not the same as an individual user: several different persons might use a single account, or, vice versa, one person might register many accounts.) Also, note that gathering such data might be subject to legal regulations, even when discussing anonymized data.</p>
|
||||
<ol>
|
||||
<li>
|
||||
<p>The most simple and obvious indicator is an IP address. It's very hard to counterfeit them (i.e., the API server always knows the remote address), and the IP address statistics are reasonably demonstrative.</p>
|
||||
<p>If the API is provided as a server-to-server one, there will be no access to the end user's IP address. However, it makes sense to require partners to propagate the IP address (for example, in a form of the <code>X-Forwarded-For</code> header) — among other things, to help partners fight fraud and unintended usage of the API.</p>
|
||||
<p>Until recently, IP addresses were also a convenient statistics indicator because it was quite expensive to get a large pool of unique addresses. However, with ipv6 advancement this restriction is no longer actual; ipv6 rather put the light on the fact that you can't just count unique addresses — the aggregates are to be tracked:</p>
|
||||
<p>The simplest and most obvious indicator is an IP address. It's very hard to counterfeit them (i.e., the API server always knows the remote address), and statistics related to IP addresses are reasonably demonstrative.</p>
|
||||
<p>If the API is provided server-to-server, there will be no access to the end user's IP address. However, it makes sense to require partners to propagate the IP address (for example, in the form of the <code>X-Forwarded-For</code> header) — among other things, to assist partners in combating fraud and unintended API usage.</p>
|
||||
<p>Until recently, IP addresses were also a convenient statistical indicator because acquiring a large pool of unique addresses was quite expensive. However, with the advancement of IPv6, this restriction is no longer applicable. IPv6 has rather shed light on the fact that you can't just count unique addresses — the aggregates are to be tracked:</p>
|
||||
<ul>
|
||||
<li>The cumulative number of requests by networks, i.e., the hierarchical calculations (the number of /8, /16, /24, etc. networks)</li>
|
||||
<li>The cumulative number of requests by networks, i.e., hierarchical calculations (the number of /8, /16, /24, etc. networks)</li>
|
||||
<li>The cumulative statistics by autonomous networks (AS)</li>
|
||||
<li>The API requests through known public proxies and TOR network.</li>
|
||||
</ul>
|
||||
<p>An abnormal number of requests in one network might be evidence of the API being actively used inside some corporative environment (or NATs being widespread in the region).</p>
|
||||
<p>An abnormal number of requests from one network might be evidence of the API being actively used within a corporate environment (or the widespread use of NATs in the region).</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Additional means of tracking are users' unique identifiers, most notably cookies. However, most recently this method of gathering data got attacked from several directions: browser makers restrict third-party cookies, users are employing anti-tracker software, and lawmakers started to roll out legal requirements against data collection. In the current situation, it's much easier to drop cookie usage than to be compliant with all the regulations.</p>
|
||||
<p>All this leads to a situation when public APIs (especially those installed on free-to-use sites and applications) are very limited in the means of collecting statistics and analyzing user behavior. And that impacts not only fighting all kinds of fraud but analyzing use cases as well. This is the way.</p>
|
||||
<p>An additional means of tracking are users' unique identifiers, most notably cookies. However, recently this method of data gathering has been under attack from several directions: browser makers are restricting third-party cookies, users are employing anti-tracker software, and lawmakers have started rolling out legal requirements against data collection. In the current situation, it's much easier to stop using cookies than to comply with all the regulations.</p>
|
||||
<p>All this leads to a situation where public APIs (especially those installed on free-to-use sites and applications) are very limited in their ability to collect statistics and analyze user behavior. These restrictions impact not only the fight against various types of fraud but also the analysis of user scenarios. This is the way.</p>
|
||||
</li>
|
||||
</ol>
|
||||
<p><strong>NB</strong>: In some jurisdictions, IP addresses are considered personal data, and collecting them is prohibited as well. We don't dare to advise on how an API vendor might at the same time be able to fight prohibited content on the platform and don't have access to users' IP addresses. We presume that complying with such legislation implies storing statistics by IP address hashes. (And just in case we won't mention that building a rainbow table for SHA-256 hashes covering the entire 4-billion range of IPv4 addresses would take several hours on a regular office-grade computer.)</p><div class="page-break"></div><h3><a href="#api-product-tos-violations" class="anchor" id="api-product-tos-violations">Chapter 58. The Technical Means of Preventing ToS Violations</a><a href="#chapter-58" class="secondary-anchor" id="chapter-58"> </a></h3>
|
||||
<p>Implementing the paradigm of a centralized system of preventing partner endpoints-bound fraud, which we described in the previous chapter, in practice faces non-trivial difficulties.</p>
|
||||
<p><strong>NB</strong>: In some jurisdictions, IP addresses are considered personal data, and collecting them is prohibited as well. We don't dare to advise on how an API vendor might simultaneously fight prohibited content on the platform and not have access to users' IP addresses. We presume that complying with such legislation implies storing statistics by IP address hashes. (And just in case we won't mention that building a rainbow table for SHA-256 hashes covering the entire 4-billion range of IPv4 addresses would take several hours on a regular office-grade computer.)</p><div class="page-break"></div><h3><a href="#api-product-tos-violations" class="anchor" id="api-product-tos-violations">Chapter 58. The Technical Means of Preventing ToS Violations</a><a href="#chapter-58" class="secondary-anchor" id="chapter-58"> </a></h3>
|
||||
<p>Implementing the centralized system to prevent partner endpoint-bound fraud, as described in the previous chapter, faces practical challenges.</p>
|
||||
<p>The task of filtering out illicit API requests comprises three steps:</p>
|
||||
<ul>
|
||||
<li>Identifying suspicious users</li>
|
||||
<li>Optionally, asking for an additional authentication factor</li>
|
||||
<li>Optionally, requesting an additional authentication factor</li>
|
||||
<li>Making decisions and applying access restrictions.</li>
|
||||
</ul>
|
||||
<h5><a href="#chapter-58-paragraph-1" id="chapter-58-paragraph-1" class="anchor">1. Identifying Suspicious Users</a></h5>
|
||||
<p>Generally speaking, there are two approaches we might take, the static one and the dynamic (behavioral) one.</p>
|
||||
<p><em>Statically</em> we monitor suspicions activity surges, as described in the previous chapter, marking an unusually high density of requests coming from specific networks or <code>Referer</code>s (actually, <em>any</em> piece of information suits if it splits users into more or less independent groups: for example, OS version or system language would suffice if you can gather those).</p>
|
||||
<p><em>Behavioral</em> analysis means we're examining the history of requests made by a specific user, searching for non-typical patterns, such as “unhuman” order of traversing endpoints or too small pauses between requests.</p>
|
||||
<p><strong>Importantly</strong>, when we talk about “users,” we will have to make duplicate systems to observe them both using tokens (cookies, logins, phone numbers) and IP addresses, as malefactors aren't obliged to preserve the tokens between requests, or might keep a pool of them to impede their exposure.</p>
|
||||
<p>Generally speaking, there are two approaches we might take: the static one and the dynamic (behavioral) one.</p>
|
||||
<p><em>Statically</em> we monitor suspicious activity surges, as described in the previous chapter, marking an unusually high density of requests coming from specific networks or <code>Referer</code>s (actually, <em>any</em> piece of information suits if it splits users into more or less independent groups: for example, OS version or system language would suffice if you can gather those).</p>
|
||||
<p>Behavioral analysis involves examining the history of requests made by a specific user, i.e., searching for non-typical patterns, such as an “inhuman” order of traversing endpoints or too small pauses between requests.</p>
|
||||
<p><strong>Importantly</strong>, when we talk about “users,” we will have to create duplicate systems to observe them using both tokens (cookies, logins, phone numbers) and IP addresses, as malefactors aren't obliged to preserve the tokens between requests or might keep a pool of them to impede their exposure.</p>
|
||||
<h5><a href="#chapter-58-paragraph-2" id="chapter-58-paragraph-2" class="anchor">2. Requesting an Additional Authentication Factor</a></h5>
|
||||
<p>As both static and behavioral analyses are heuristic, it's highly desirable to not make decisions based solely on their outcome but rather ask the suspicious users to additionally prove they're making legitimate requests. If such a mechanism is in place, the quality of an anti-fraud system will be dramatically improved, as it allows for increasing system sensitivity and enabling pro-active defense, i.e., asking users to pass the tests in advance.</p>
|
||||
<p>In the case of services for end users, the main method of acquiring the second factor is redirecting to a captcha page. In the case of APIs it might be problematic, especially if you initially neglected the “Stipulate Restrictions” rule we've given in the “<a href="#api-design-describing-interfaces">Describing Final Interfaces</a>” chapter. In many cases, you will have to impose this responsibility on partners (i.e., it will be partners who show captchas and identify users based on the signals received from the API endpoints). This will, of course, significantly impair the convenience of working with the API.</p>
|
||||
<p><strong>NB</strong>: Instead of captcha, there might be other actions introducing additional authentication factors. It might be the phone number confirmation or the second step of the 3D-Secure protocol. The important part is that requesting an additional authentication step must be stipulated in the program interface, as it can't be added later in a backward-compatible manner.</p>
|
||||
<p>Other popular mechanics of identifying robots include offering a bait (“honeypot”) or employing the execution environment checks (starting from rather trivial ones like executing JavaScript on the webpage and ending with sophisticated techniques of checking application integrity checksums).</p>
|
||||
<p>As both static and behavioral analyses are heuristic, it's highly desirable not to make decisions based solely on their outcome but rather ask the suspicious users to additionally prove they're making legitimate requests. Implementing such a mechanism significantly improves the quality of an anti-fraud system, increasing system sensitivity and enabling proactive defense by requiring users to pass tests in advance.</p>
|
||||
<p>In the case of services for end users, the main method of acquiring the second factor is redirecting to a captcha page. In the case of APIs it might be problematic, especially if you initially neglected the “Stipulate Restrictions” rule we've given in the “<a href="#api-design-describing-interfaces">Describing Final Interfaces</a>” chapter. In many cases, you may need to delegate this responsibility to partners, meaning <em>partners</em> will display captchas and identify users based on signals received from the API endpoints. This will, of course, significantly impair the convenience of working with the API.</p>
|
||||
<p><strong>NB</strong>: Instead of captchas, other actions introducing additional authentication factors could be used. It might be the phone number confirmation or the second step of the 3D-Secure protocol. The important part is that requesting an additional authentication step must be stipulated in the program interface, as it can't be added later in a backward-compatible manner.</p>
|
||||
<p>Other popular mechanics of identifying robots include offering bait (“honeypot”) or employing execution environment checks (starting from rather trivial ones like executing JavaScript on the webpage and ending with sophisticated techniques of checking application integrity checksums).</p>
|
||||
<h5><a href="#chapter-58-paragraph-3" id="chapter-58-paragraph-3" class="anchor">3. Restricting Access</a></h5>
|
||||
<p>The illusion of having a broad choice of technical means of identifying fraud users should not deceive you as you will soon discover the lack of effective methods of restricting those users. Banning them by cookie / <code>Referer</code> / <code>User-Agent</code> makes little to no impact as this data is supplied by clients, and might be easily forged. In the end, you have four mechanisms for suppressing illegal activities:</p>
|
||||
<p>Don't be deceived by the illusion of having a wide range of technical means to identify fraudulent users; you will soon realize the lack of effective methods to restrict them. Banning them based on cookies / <code>Referer</code> / <code>User-Agent</code> makes little to no impact as this data is supplied by clients and can be easily forged. In the end, you have four mechanisms for suppressing illegal activities:</p>
|
||||
<ul>
|
||||
<li>Banning users by IP (networks, autonomous systems)</li>
|
||||
<li>Requiring mandatory user identification (maybe tiered: login / login with confirmed phone number / login with confirmed identity / login with confirmed identity and biometrics / etc.)</li>
|
||||
<li>Banning users by IP addresses (networks, autonomous systems)</li>
|
||||
<li>Requiring mandatory user identification (maybe tiered: login / login with a confirmed phone number / login with a confirmed identity / login with a confirmed identity and biometrics / etc.)</li>
|
||||
<li>Returning fake responses</li>
|
||||
<li>Filing administrative abuse reports.</li>
|
||||
</ul>
|
||||
<p>The problem with the first option is the collateral damage you will inflict, especially if you have to ban subnets.</p>
|
||||
<p>The second option, though quite rational, is usually inapplicable to real APIs, as not every partner will agree with the approach, and definitely not every end user. This will also require being compliant with the existing personal data laws.</p>
|
||||
<p>The third option is the most effective one in technical terms as it allows to put the ball in the malefactor's court: it is now them who need to invent how to learn if the robot was detected. But from the moral point of view (and from the legal perspective as well) this method is rather questionable, especially if we take into account the probability of false-positive signals, meaning that some real users will get the fake data.</p>
|
||||
<p>Thereby, you have only one method that really works: filing complaints to hosting providers, ISPs, or law enforcement authorities. Needless to say, this brings certain reputational risks, and the reaction time is rather not lightning fast.</p>
|
||||
<p>In most cases, you're not fighting fraud — you're actually increasing the cost of the attack, simultaneously buying yourself enough time to make administrative moves against the perpetrator. Preventing API misusage completely is impossible as malefactors might ultimately employ the expensive but bulletproof solution — to hire real people to make the requests to the API on real devices through legitimate applications.</p>
|
||||
<p>An opinion exists, which the author of this book shares, that engaging in this sword-against-shield confrontation must be carefully thought out, and advanced technical solutions are to be enabled only if you are one hundred percent sure it is worth it (e.g., if they steal real money or data). By introducing elaborate algorithms, you rather conduct an evolutional selection of the smartest and most cunning cybercriminals, counteracting to whom will be way harder than to those who just naïvely call API endpoints with <code>curl</code>. What is even more important, in the final phase — i.e., when filing the complaint to authorities — you will have to prove the alleged ToS violation, and doing so against an advanced fraudster will be problematic. So it's rather better to have all the malefactors monitored (and regularly complained against), and escalate the situation (i.e., enable the technical protection and start legal actions) only if the threat passes a certain threshold. That also implies that you must have all the tools ready, and just keep them below fraudsters' radars.</p>
|
||||
<p>Out of the author of this book's experience, the mind games with malefactors, when you respond to any improvement of their script with the smallest possible effort that is enough to break it, might continue indefinitely. This strategy, i.e., making fraudsters guess which traits were used to ban them this time (instead of unleashing the whole heavy artillery potential), annoys amateur “hackers” greatly as they lack hard engineering skills and just give up eventually.</p>
|
||||
<p>The problem with the first option is the collateral damage you will inflict, especially when banning subnets.</p>
|
||||
<p>The second option, while rational, is often impractical for real APIs because not every partner will agree with the approach, and certainly many users will churn off. This will also require compliance with existing personal data laws.</p>
|
||||
<p>The third option is the most effective one in technical terms as it allows putting the ball in the malefactor's court: it is now up to them to figure out how to determine if the robot was detected. But from a moral point of view (and from a legal perspective as well) this method is rather questionable, especially if we take into account the probability of false-positive signals, meaning that some real users will get fake data.</p>
|
||||
<p>Therefore, you have only one method that truly works: filing complaints with hosting providers, ISPs, or law enforcement authorities. Needless to say, this brings certain reputational risks, and the reaction time is rather not lightning fast.</p>
|
||||
<p>In most cases, you're not fighting fraud — you're actually increasing the cost of the attack, simultaneously buying yourself enough time to take administrative actions against the perpetrator. Preventing API misuse completely is impossible as malefactors might ultimately employ the expensive but bulletproof solution — hiring real people to make the requests to the API on real devices through legitimate applications.</p>
|
||||
<p>An opinion exists, which the author of this book shares, that engaging in this sword-against-shield confrontation must be carefully thought out, and advanced technical solutions are to be enabled only if you are one hundred percent sure it is worth it (e.g., if they steal real money or data). By introducing elaborate algorithms, you rather conduct an evolutionary selection of the smartest and most cunning cybercriminals, counteracting whom will be way harder than those who just naïvely call API endpoints with <code>curl</code>. Furthermore, in the final phase, when filing a complaint with authorities, you'll need to prove the alleged ToS violation, which can be challenging when dealing with advanced fraudsters. So it's rather better to have all the malefactors monitored (and regularly reported), and escalate the situation (i.e., enable technical protection and initiate legal actions) only if the threat passes a certain threshold. That also implies having all the tools ready and keeping them below infringers' radars.</p>
|
||||
<p>Based on the author of this book's experience, mind games with malefactors, where you respond to any improvement of their script with the smallest possible effort that is enough to break it, might continue indefinitely. This strategy, i.e., making fraudsters guess which traits were used to ban them this time (instead of unleashing the whole heavy artillery potential), greatly annoys amateur “hackers” as they lack hard engineering skills and eventually give up.</p>
|
||||
<h4>Dealing with Stolen Keys</h4>
|
||||
<p>Let's now move to the second type of unlawful API usage, namely using in the malefactor's applications keys stolen from conscientious partners. As the requests are generated by real users, captcha won't help, though other techniques will.</p>
|
||||
<p>Now let's address the second type of unlawful API usage, namely the use keys stolen from conscientious partners in the malefactor's applications. Since the requests are generated by real users, captchas won't help, but other techniques will.</p>
|
||||
<ol>
|
||||
<li>
|
||||
<p>Maintaining metrics collection by IP addresses and subnets might be of use in this case as well. If the malefactor's app isn't a public one but rather targeted to some closed audience, this fact will be visible in the dashboards (and if you're lucky enough, you might also find suspicious <code>Referer</code>s, public access to which is restricted).</p>
|
||||
<p>Maintaining metrics collection by IP addresses and subnets might be useful in this case as well. If the malefactor's app isn't public but rather targeted to a closed audience, this fact will be visible on the dashboards (and if you're lucky enough, you might also find suspicious <code>Referer</code>s, public access to which is restricted).</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Allowing partners to restrict the functionality available under specific API keys:</p>
|
||||
<ul>
|
||||
<li>Setting the allowed IP address range for server-to-server APIs, allowed <code>Referer</code>s and application ids for client APIs</li>
|
||||
<li>White-listing only allowed API functions for a specific key</li>
|
||||
<li>Other restrictions that make sense in your case (in our coffee API example, it's convenient to allow partners to prohibit API calls outside of countries and cities they work in).</li>
|
||||
<li>Other restrictions that make sense in your case (in our coffee API example, it's convenient to allow partners to prohibit API calls outside of the countries and cities they work in).</li>
|
||||
</ul>
|
||||
</li>
|
||||
<li>
|
||||
<p>Introducing additional request signing:</p>
|
||||
<ul>
|
||||
<li>For example, if on the partner's website, there is a form displaying the best lungo offers, for which the partners call the API endpoint like <code>/v1/search?recipe=lungo&api_key={apiKey}</code>, then the API key might be replaced with a signature like <code>sign = HMAC("recipe=lungo", apiKey)</code>. The signature might be stolen as well, but it will be useless for malefactors as they will be able to find only lungo with it.</li>
|
||||
<li>Instead of API keys, time-based one-time passwords (TOTP) might be used. These tokens are valid for a short period of time only (typically, one minute), which makes using stolen keys much more complicated.</li>
|
||||
<li>For example, if there is a form displaying the best lungo offers on the partner's website, for which the partners call the API endpoint like <code>/v1/search?recipe=lungo&api_key={apiKey}</code>, then the API key might be replaced with a signature like <code>sign = HMAC("recipe=lungo", apiKey)</code>. The signature might be stolen as well, but it will be useless for malefactors as they will only be able to find lungo with it.</li>
|
||||
<li>Instead of API keys, time-based one-time passwords (TOTP) might be used. These tokens are valid for a short period of time only (typically, one minute), making it much more complicated to use stolen keys.</li>
|
||||
</ul>
|
||||
</li>
|
||||
<li>
|
||||
<p>Filing complaints to the administration (hosting providers, app store owners) in case the malefactor distributes their application through stores or uses a diligent hosting service that investigates abuse filings. Legal actions are also an option, and even much so compared to countering user fraud, as illegal access to the system using stolen credentials is unambiguously outlawed in most jurisdictions.</p>
|
||||
<p>Filing complaints to the administration (hosting providers, app store owners) in case the malefactor distributes their application through stores or uses a diligent hosting service that investigates abuse filings. Legal actions are also an option, much more so compared to countering user fraud, as illegal access to the system using stolen credentials is unambiguously outlawed in most jurisdictions.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Banning compromised API keys; the partners' reaction will be, of course, negative, but ultimately every business will prefer temporary disabling of some functionality over getting a multi-million bill.</p>
|
||||
<p>Banning compromised API keys; the partners' reaction will be, of course, negative, but ultimately every business will prefer temporary disabling of some functionality over receiving a multi-million bill.</p>
|
||||
</li>
|
||||
</ol><div class="page-break"></div><h3><a href="#api-product-customer-support" class="anchor" id="api-product-customer-support">Chapter 59. Supporting customers</a><a href="#chapter-59" class="secondary-anchor" id="chapter-59"> </a></h3>
|
||||
<p>From banning users, let's change the topic to supporting them. First of all, an important remark: when we talk about supporting API customers, we mean helping developers and to some extent business partners. End users seldom interact with APIs directly, with an exception of several non-standard cases:</p>
|
||||
|
BIN
docs/API.en.pdf
BIN
docs/API.en.pdf
Binary file not shown.
BIN
docs/API.ru.epub
BIN
docs/API.ru.epub
Binary file not shown.
BIN
docs/API.ru.pdf
BIN
docs/API.ru.pdf
Binary file not shown.
@ -4,17 +4,17 @@ In the context of working with an API, we talk about two kinds of users of the s
|
||||
* Users-developers, i.e., your partners writing code atop of the API
|
||||
* End users interacting with applications implemented by the users-developers.
|
||||
|
||||
In most cases, you need to have both of them identified (in a technical sense: discern one unique customer from another) to have answers to the following questions:
|
||||
In most cases, you need to have both of them identified (in a technical sense: discern one unique customer from another) to answer the following questions:
|
||||
* How many users are interacting with the system (simultaneously, daily, monthly, and yearly)?
|
||||
* How many actions does each user make?
|
||||
* How many actions does each user perform?
|
||||
|
||||
**NB**: Sometimes, when an API is very large and/or abstract, the chain linking the API vendor to end users might comprise more than one developer as large partners provide services implemented atop of the API to the smaller ones. You need to count both direct and “derivative” partners.
|
||||
|
||||
Gathering this data is crucial because of two reasons:
|
||||
Gathering this data is crucial for two reasons:
|
||||
* To understand the system's limits and to be capable of planning its growth
|
||||
* To understand the number of resources (ultimately, money) that are spent (and gained) on each user.
|
||||
|
||||
In the case of commercial APIs, the quality and timeliness of gathering this data are twice that important, as the tariff plans (and therefore the entire business model) depend on it. Therefore, the question of *how exactly* we're identifying users is crucial.
|
||||
In the case of commercial APIs, the quality and timeliness of gathering this data are twice as important because the tariff plans (and therefore the entire business model) depend on it. Therefore, the question of *how exactly* we're identifying users is crucial.
|
||||
|
||||
#### Identifying Applications and Their Owners
|
||||
|
||||
@ -24,9 +24,9 @@ An application is roughly speaking a logically separate case of API usage, usual
|
||||
|
||||
In the modern world, the factual standard for identifying both entities is using API keys: a developer who wants to start using an API must obtain an API key bound to their contact info. Thus the key identifies the application while the contact data identifies the owner.
|
||||
|
||||
Though this practice is universally widespread we can't but notice that in most cases it's useless, and sometimes just destructive.
|
||||
Though this practice is universally widespread we can't help but notice that in most cases it's useless, and sometimes just destructive.
|
||||
|
||||
Its general advantage is the necessity to supply actual contact info to get a key, which theoretically allows for contacting the application owner if needed. (In the real world, it doesn't work: key owners often don't read mailboxes they provided upon registration; and if the owner is a company, it easily might be a no-one's mailbox or a personal email of some employee that left the company a couple of years ago.)
|
||||
Its general advantage is the necessity to supply actual contact info to get a key, which theoretically allows for contacting the application owner if needed. (In the real world, it doesn't work: key owners often don't read mailboxes they provided upon registration; and if the owner is a company, it might easily be a no-one's mailbox or a personal email of some employee who left the company a couple of years ago.)
|
||||
|
||||
The main disadvantage of using API keys is that they *don't* allow for reliably identifying both applications and their owners.
|
||||
|
||||
@ -34,40 +34,40 @@ If there are free limits to API usage, there is a temptation to obtain many API
|
||||
|
||||
Another problem is that an API key might be simply stolen from a lawful partner; in the case of mobile or web applications, that's quite trivial.
|
||||
|
||||
It might look like the problem is not that important in the case of server-to-server integrations, but it actually is. Imagine that a partner provides a public service of their own that uses your API under the hood. That usually means there is an endpoint in the partner's backend that performs a request to the API and returns the result, and this endpoint perfectly suits as a free replacement of direct access to the API for a cybercriminal. Of course, you might say this fraud is a problem of partners', but, first, it would be naïve to expect each partner develops their own anti-fraud system, and, second, it's just sub-optimal: obviously, a centralized anti-fraud system would be way more effective than a bunch of amateur implementations. Also, server keys might also be stolen: it's much harder than stealing client keys but doable. With any popular API, sooner or later you will face the situation of stolen keys made available to the public (or a key owner just shared it with acquaintances out of the kindness of their heart).
|
||||
It might appear that the problem is not as significant in the case of server-to-server integrations, but it actually is. Imagine that a partner provides a public service of their own that uses your API under the hood. This usually means there is an endpoint in the partner's backend that makes a request to the API and returns the result, and this endpoint can be easily used by a cybercriminal as a free replacement for direct access to the API. Of course, you might argue that this fraud is the partner's problem, but firstly, it would be naïve to expect that every partner develops their own anti-fraud system, and secondly, it is sub-optimal: a centralized anti-fraud system would undoubtedly be way more effective than a collection of amateur implementations. Furthermore, server keys might also be stolen; although, it's more challenging than stealing client keys, it's still feasible. With any popular API, sooner or later you will encounter the situation of stolen keys being made available to the public (or a key owner sharing it with acquaintances out of kindness).
|
||||
|
||||
One way or another, a problem of independent validation arises: how can we control whether the API endpoint is requested by a user in compliance with the terms of service?
|
||||
In one way or another, the issue of independent validation arises: how can we control whether the API endpoint is being requested by a user in compliance with the terms of service?
|
||||
|
||||
Mobile applications might be conveniently tracked through their identifiers in the corresponding store (Google Play, App Store, etc.), so it makes sense to require this identifier to be passed by partners as an API initialization parameter. Websites with some degree of confidence might be identified by the `Referer` and `Origin` HTTP headers.
|
||||
Mobile applications could be conveniently tracked through their identifiers in the corresponding store (Google Play, App Store, etc.), so it makes sense to require this identifier to be passed by partners as an API initialization parameter. Websites, with some degree of confidence, can be identified by the `Referer` and `Origin` HTTP headers.
|
||||
|
||||
This data is not itself reliable, but it allows for making cross-checks:
|
||||
* If a key was issued for one specific domain but requests are coming with a different `Referer`, it makes sense to investigate the situation and maybe ban the possibility to access the API with this `Referer` or this key.
|
||||
* If an application initializes API by providing a key registered to another application, it makes sense to contact the store administration and ask for removing one of the apps.
|
||||
This data is not entirely reliable, but it allows for cross-checks:
|
||||
* If a key was issued for one specific domain but requests are coming with a different `Referer`, it makes sense to investigate the situation and maybe ban the possibility of accessing the API with this `Referer` or this key.
|
||||
* If an application initializes the API by providing a key registered to another application, it makes sense to contact the store administration and request the removal of one of the apps.
|
||||
|
||||
**NB**: Don't forget to set infinite limits for using the API with the `localhost`, `127.0.0.1` / `[::1]` `Referer`s, and also for your own sandbox if it exists. Yes, abusers will sooner or later learn this fact and will start exploiting it, but otherwise, you will ban local development and your own website much sooner than that.
|
||||
**NB**: Don't forget to set infinite limits for using the API with the `localhost` and `127.0.0.1` / `[::1]` `Referer`s, and also for your own sandbox if it exists. Yes, abusers will sooner or later learn this fact and start exploiting it, but otherwise, you will ban local development and your own website much sooner than that.
|
||||
|
||||
The general conclusion is:
|
||||
* It is highly desirable to have partners formally identified (either through obtaining API keys or by providing contact data such as website domain or application identifier in a store while initializing the API).
|
||||
* This information shall not be trusted unconditionally; there must be double-checking mechanisms that identify suspicious requests.
|
||||
* It is highly desirable to have partners formally identified (either through obtaining API keys or by providing contact data such as website domain or application identifier in a store during API initialization).
|
||||
* This information should not be blindly trusted; double-checking mechanisms are necessary to identify suspicious requests.
|
||||
|
||||
#### Identifying End Users
|
||||
|
||||
Usually, you can put forward some requirements for self-identifying of partners, but asking end users to reveal contact information is impossible in most cases. All the methods of measuring the audience described below are imprecise and often heuristic. (Even if partner application functionality is only available after registration and you do have access to that profile data, it's still a game of assumptions, as an individual account is not the same as an individual user: several different persons might use a single account, or, vice versa, one person might register many accounts.) Also, note that gathering this sort of data might be legally regulated (though we will be mostly speaking about anonymized data, there might still be some applicable law).
|
||||
Usually, you can impose requirements for partners to self-identify, but it's often impossible to ask end users to disclose their contact information. All the methods of measuring the audience described below are imprecise and often heuristic. (Even if partner application functionality is only available after registration and you do have access to that profile data, it's still a game of assumptions, as an individual account is not the same as an individual user: several different persons might use a single account, or, vice versa, one person might register many accounts.) Also, note that gathering such data might be subject to legal regulations, even when discussing anonymized data.
|
||||
|
||||
1. The most simple and obvious indicator is an IP address. It's very hard to counterfeit them (i.e., the API server always knows the remote address), and the IP address statistics are reasonably demonstrative.
|
||||
1. The simplest and most obvious indicator is an IP address. It's very hard to counterfeit them (i.e., the API server always knows the remote address), and statistics related to IP addresses are reasonably demonstrative.
|
||||
|
||||
If the API is provided as a server-to-server one, there will be no access to the end user's IP address. However, it makes sense to require partners to propagate the IP address (for example, in a form of the `X-Forwarded-For` header) — among other things, to help partners fight fraud and unintended usage of the API.
|
||||
If the API is provided server-to-server, there will be no access to the end user's IP address. However, it makes sense to require partners to propagate the IP address (for example, in the form of the `X-Forwarded-For` header) — among other things, to assist partners in combating fraud and unintended API usage.
|
||||
|
||||
Until recently, IP addresses were also a convenient statistics indicator because it was quite expensive to get a large pool of unique addresses. However, with ipv6 advancement this restriction is no longer actual; ipv6 rather put the light on the fact that you can't just count unique addresses — the aggregates are to be tracked:
|
||||
Until recently, IP addresses were also a convenient statistical indicator because acquiring a large pool of unique addresses was quite expensive. However, with the advancement of IPv6, this restriction is no longer applicable. IPv6 has rather shed light on the fact that you can't just count unique addresses — the aggregates are to be tracked:
|
||||
|
||||
* The cumulative number of requests by networks, i.e., the hierarchical calculations (the number of /8, /16, /24, etc. networks)
|
||||
* The cumulative number of requests by networks, i.e., hierarchical calculations (the number of /8, /16, /24, etc. networks)
|
||||
* The cumulative statistics by autonomous networks (AS)
|
||||
* The API requests through known public proxies and TOR network.
|
||||
|
||||
An abnormal number of requests in one network might be evidence of the API being actively used inside some corporative environment (or NATs being widespread in the region).
|
||||
An abnormal number of requests from one network might be evidence of the API being actively used within a corporate environment (or the widespread use of NATs in the region).
|
||||
|
||||
2. Additional means of tracking are users' unique identifiers, most notably cookies. However, most recently this method of gathering data got attacked from several directions: browser makers restrict third-party cookies, users are employing anti-tracker software, and lawmakers started to roll out legal requirements against data collection. In the current situation, it's much easier to drop cookie usage than to be compliant with all the regulations.
|
||||
2. An additional means of tracking are users' unique identifiers, most notably cookies. However, recently this method of data gathering has been under attack from several directions: browser makers are restricting third-party cookies, users are employing anti-tracker software, and lawmakers have started rolling out legal requirements against data collection. In the current situation, it's much easier to stop using cookies than to comply with all the regulations.
|
||||
|
||||
All this leads to a situation when public APIs (especially those installed on free-to-use sites and applications) are very limited in the means of collecting statistics and analyzing user behavior. And that impacts not only fighting all kinds of fraud but analyzing use cases as well. This is the way.
|
||||
All this leads to a situation where public APIs (especially those installed on free-to-use sites and applications) are very limited in their ability to collect statistics and analyze user behavior. These restrictions impact not only the fight against various types of fraud but also the analysis of user scenarios. This is the way.
|
||||
|
||||
**NB**: In some jurisdictions, IP addresses are considered personal data, and collecting them is prohibited as well. We don't dare to advise on how an API vendor might at the same time be able to fight prohibited content on the platform and don't have access to users' IP addresses. We presume that complying with such legislation implies storing statistics by IP address hashes. (And just in case we won't mention that building a rainbow table for SHA-256 hashes covering the entire 4-billion range of IPv4 addresses would take several hours on a regular office-grade computer.)
|
||||
**NB**: In some jurisdictions, IP addresses are considered personal data, and collecting them is prohibited as well. We don't dare to advise on how an API vendor might simultaneously fight prohibited content on the platform and not have access to users' IP addresses. We presume that complying with such legislation implies storing statistics by IP address hashes. (And just in case we won't mention that building a rainbow table for SHA-256 hashes covering the entire 4-billion range of IPv4 addresses would take several hours on a regular office-grade computer.)
|
||||
|
@ -1,70 +1,70 @@
|
||||
### [The Technical Means of Preventing ToS Violations][api-product-tos-violations]
|
||||
|
||||
Implementing the paradigm of a centralized system of preventing partner endpoints-bound fraud, which we described in the previous chapter, in practice faces non-trivial difficulties.
|
||||
Implementing the centralized system to prevent partner endpoint-bound fraud, as described in the previous chapter, faces practical challenges.
|
||||
|
||||
The task of filtering out illicit API requests comprises three steps:
|
||||
* Identifying suspicious users
|
||||
* Optionally, asking for an additional authentication factor
|
||||
* Optionally, requesting an additional authentication factor
|
||||
* Making decisions and applying access restrictions.
|
||||
|
||||
##### Identifying Suspicious Users
|
||||
|
||||
Generally speaking, there are two approaches we might take, the static one and the dynamic (behavioral) one.
|
||||
Generally speaking, there are two approaches we might take: the static one and the dynamic (behavioral) one.
|
||||
|
||||
*Statically* we monitor suspicions activity surges, as described in the previous chapter, marking an unusually high density of requests coming from specific networks or `Referer`s (actually, *any* piece of information suits if it splits users into more or less independent groups: for example, OS version or system language would suffice if you can gather those).
|
||||
*Statically* we monitor suspicious activity surges, as described in the previous chapter, marking an unusually high density of requests coming from specific networks or `Referer`s (actually, *any* piece of information suits if it splits users into more or less independent groups: for example, OS version or system language would suffice if you can gather those).
|
||||
|
||||
*Behavioral* analysis means we're examining the history of requests made by a specific user, searching for non-typical patterns, such as “unhuman” order of traversing endpoints or too small pauses between requests.
|
||||
Behavioral analysis involves examining the history of requests made by a specific user, i.e., searching for non-typical patterns, such as an “inhuman” order of traversing endpoints or too small pauses between requests.
|
||||
|
||||
**Importantly**, when we talk about “users,” we will have to make duplicate systems to observe them both using tokens (cookies, logins, phone numbers) and IP addresses, as malefactors aren't obliged to preserve the tokens between requests, or might keep a pool of them to impede their exposure.
|
||||
**Importantly**, when we talk about “users,” we will have to create duplicate systems to observe them using both tokens (cookies, logins, phone numbers) and IP addresses, as malefactors aren't obliged to preserve the tokens between requests or might keep a pool of them to impede their exposure.
|
||||
|
||||
##### Requesting an Additional Authentication Factor
|
||||
|
||||
As both static and behavioral analyses are heuristic, it's highly desirable to not make decisions based solely on their outcome but rather ask the suspicious users to additionally prove they're making legitimate requests. If such a mechanism is in place, the quality of an anti-fraud system will be dramatically improved, as it allows for increasing system sensitivity and enabling pro-active defense, i.e., asking users to pass the tests in advance.
|
||||
As both static and behavioral analyses are heuristic, it's highly desirable not to make decisions based solely on their outcome but rather ask the suspicious users to additionally prove they're making legitimate requests. Implementing such a mechanism significantly improves the quality of an anti-fraud system, increasing system sensitivity and enabling proactive defense by requiring users to pass tests in advance.
|
||||
|
||||
In the case of services for end users, the main method of acquiring the second factor is redirecting to a captcha page. In the case of APIs it might be problematic, especially if you initially neglected the “Stipulate Restrictions” rule we've given in the “[Describing Final Interfaces](#api-design-describing-interfaces)” chapter. In many cases, you will have to impose this responsibility on partners (i.e., it will be partners who show captchas and identify users based on the signals received from the API endpoints). This will, of course, significantly impair the convenience of working with the API.
|
||||
In the case of services for end users, the main method of acquiring the second factor is redirecting to a captcha page. In the case of APIs it might be problematic, especially if you initially neglected the “Stipulate Restrictions” rule we've given in the “[Describing Final Interfaces](#api-design-describing-interfaces)” chapter. In many cases, you may need to delegate this responsibility to partners, meaning *partners* will display captchas and identify users based on signals received from the API endpoints. This will, of course, significantly impair the convenience of working with the API.
|
||||
|
||||
**NB**: Instead of captcha, there might be other actions introducing additional authentication factors. It might be the phone number confirmation or the second step of the 3D-Secure protocol. The important part is that requesting an additional authentication step must be stipulated in the program interface, as it can't be added later in a backward-compatible manner.
|
||||
**NB**: Instead of captchas, other actions introducing additional authentication factors could be used. It might be the phone number confirmation or the second step of the 3D-Secure protocol. The important part is that requesting an additional authentication step must be stipulated in the program interface, as it can't be added later in a backward-compatible manner.
|
||||
|
||||
Other popular mechanics of identifying robots include offering a bait (“honeypot”) or employing the execution environment checks (starting from rather trivial ones like executing JavaScript on the webpage and ending with sophisticated techniques of checking application integrity checksums).
|
||||
Other popular mechanics of identifying robots include offering bait (“honeypot”) or employing execution environment checks (starting from rather trivial ones like executing JavaScript on the webpage and ending with sophisticated techniques of checking application integrity checksums).
|
||||
|
||||
##### Restricting Access
|
||||
|
||||
The illusion of having a broad choice of technical means of identifying fraud users should not deceive you as you will soon discover the lack of effective methods of restricting those users. Banning them by cookie / `Referer` / `User-Agent` makes little to no impact as this data is supplied by clients, and might be easily forged. In the end, you have four mechanisms for suppressing illegal activities:
|
||||
* Banning users by IP (networks, autonomous systems)
|
||||
* Requiring mandatory user identification (maybe tiered: login / login with confirmed phone number / login with confirmed identity / login with confirmed identity and biometrics / etc.)
|
||||
Don't be deceived by the illusion of having a wide range of technical means to identify fraudulent users; you will soon realize the lack of effective methods to restrict them. Banning them based on cookies / `Referer` / `User-Agent` makes little to no impact as this data is supplied by clients and can be easily forged. In the end, you have four mechanisms for suppressing illegal activities:
|
||||
* Banning users by IP addresses (networks, autonomous systems)
|
||||
* Requiring mandatory user identification (maybe tiered: login / login with a confirmed phone number / login with a confirmed identity / login with a confirmed identity and biometrics / etc.)
|
||||
* Returning fake responses
|
||||
* Filing administrative abuse reports.
|
||||
|
||||
The problem with the first option is the collateral damage you will inflict, especially if you have to ban subnets.
|
||||
The problem with the first option is the collateral damage you will inflict, especially when banning subnets.
|
||||
|
||||
The second option, though quite rational, is usually inapplicable to real APIs, as not every partner will agree with the approach, and definitely not every end user. This will also require being compliant with the existing personal data laws.
|
||||
The second option, while rational, is often impractical for real APIs because not every partner will agree with the approach, and certainly many users will churn off. This will also require compliance with existing personal data laws.
|
||||
|
||||
The third option is the most effective one in technical terms as it allows to put the ball in the malefactor's court: it is now them who need to invent how to learn if the robot was detected. But from the moral point of view (and from the legal perspective as well) this method is rather questionable, especially if we take into account the probability of false-positive signals, meaning that some real users will get the fake data.
|
||||
The third option is the most effective one in technical terms as it allows putting the ball in the malefactor's court: it is now up to them to figure out how to determine if the robot was detected. But from a moral point of view (and from a legal perspective as well) this method is rather questionable, especially if we take into account the probability of false-positive signals, meaning that some real users will get fake data.
|
||||
|
||||
Thereby, you have only one method that really works: filing complaints to hosting providers, ISPs, or law enforcement authorities. Needless to say, this brings certain reputational risks, and the reaction time is rather not lightning fast.
|
||||
Therefore, you have only one method that truly works: filing complaints with hosting providers, ISPs, or law enforcement authorities. Needless to say, this brings certain reputational risks, and the reaction time is rather not lightning fast.
|
||||
|
||||
In most cases, you're not fighting fraud — you're actually increasing the cost of the attack, simultaneously buying yourself enough time to make administrative moves against the perpetrator. Preventing API misusage completely is impossible as malefactors might ultimately employ the expensive but bulletproof solution — to hire real people to make the requests to the API on real devices through legitimate applications.
|
||||
In most cases, you're not fighting fraud — you're actually increasing the cost of the attack, simultaneously buying yourself enough time to take administrative actions against the perpetrator. Preventing API misuse completely is impossible as malefactors might ultimately employ the expensive but bulletproof solution — hiring real people to make the requests to the API on real devices through legitimate applications.
|
||||
|
||||
An opinion exists, which the author of this book shares, that engaging in this sword-against-shield confrontation must be carefully thought out, and advanced technical solutions are to be enabled only if you are one hundred percent sure it is worth it (e.g., if they steal real money or data). By introducing elaborate algorithms, you rather conduct an evolutional selection of the smartest and most cunning cybercriminals, counteracting to whom will be way harder than to those who just naïvely call API endpoints with `curl`. What is even more important, in the final phase — i.e., when filing the complaint to authorities — you will have to prove the alleged ToS violation, and doing so against an advanced fraudster will be problematic. So it's rather better to have all the malefactors monitored (and regularly complained against), and escalate the situation (i.e., enable the technical protection and start legal actions) only if the threat passes a certain threshold. That also implies that you must have all the tools ready, and just keep them below fraudsters' radars.
|
||||
An opinion exists, which the author of this book shares, that engaging in this sword-against-shield confrontation must be carefully thought out, and advanced technical solutions are to be enabled only if you are one hundred percent sure it is worth it (e.g., if they steal real money or data). By introducing elaborate algorithms, you rather conduct an evolutionary selection of the smartest and most cunning cybercriminals, counteracting whom will be way harder than those who just naïvely call API endpoints with `curl`. Furthermore, in the final phase, when filing a complaint with authorities, you'll need to prove the alleged ToS violation, which can be challenging when dealing with advanced fraudsters. So it's rather better to have all the malefactors monitored (and regularly reported), and escalate the situation (i.e., enable technical protection and initiate legal actions) only if the threat passes a certain threshold. That also implies having all the tools ready and keeping them below infringers' radars.
|
||||
|
||||
Out of the author of this book's experience, the mind games with malefactors, when you respond to any improvement of their script with the smallest possible effort that is enough to break it, might continue indefinitely. This strategy, i.e., making fraudsters guess which traits were used to ban them this time (instead of unleashing the whole heavy artillery potential), annoys amateur “hackers” greatly as they lack hard engineering skills and just give up eventually.
|
||||
Based on the author of this book's experience, mind games with malefactors, where you respond to any improvement of their script with the smallest possible effort that is enough to break it, might continue indefinitely. This strategy, i.e., making fraudsters guess which traits were used to ban them this time (instead of unleashing the whole heavy artillery potential), greatly annoys amateur “hackers” as they lack hard engineering skills and eventually give up.
|
||||
|
||||
#### Dealing with Stolen Keys
|
||||
|
||||
Let's now move to the second type of unlawful API usage, namely using in the malefactor's applications keys stolen from conscientious partners. As the requests are generated by real users, captcha won't help, though other techniques will.
|
||||
Now let's address the second type of unlawful API usage, namely the use keys stolen from conscientious partners in the malefactor's applications. Since the requests are generated by real users, captchas won't help, but other techniques will.
|
||||
|
||||
1. Maintaining metrics collection by IP addresses and subnets might be of use in this case as well. If the malefactor's app isn't a public one but rather targeted to some closed audience, this fact will be visible in the dashboards (and if you're lucky enough, you might also find suspicious `Referer`s, public access to which is restricted).
|
||||
1. Maintaining metrics collection by IP addresses and subnets might be useful in this case as well. If the malefactor's app isn't public but rather targeted to a closed audience, this fact will be visible on the dashboards (and if you're lucky enough, you might also find suspicious `Referer`s, public access to which is restricted).
|
||||
|
||||
2. Allowing partners to restrict the functionality available under specific API keys:
|
||||
* Setting the allowed IP address range for server-to-server APIs, allowed `Referer`s and application ids for client APIs
|
||||
* White-listing only allowed API functions for a specific key
|
||||
* Other restrictions that make sense in your case (in our coffee API example, it's convenient to allow partners to prohibit API calls outside of countries and cities they work in).
|
||||
* Other restrictions that make sense in your case (in our coffee API example, it's convenient to allow partners to prohibit API calls outside of the countries and cities they work in).
|
||||
|
||||
3. Introducing additional request signing:
|
||||
* For example, if on the partner's website, there is a form displaying the best lungo offers, for which the partners call the API endpoint like `/v1/search?recipe=lungo&api_key={apiKey}`, then the API key might be replaced with a signature like `sign = HMAC("recipe=lungo", apiKey)`. The signature might be stolen as well, but it will be useless for malefactors as they will be able to find only lungo with it.
|
||||
* Instead of API keys, time-based one-time passwords (TOTP) might be used. These tokens are valid for a short period of time only (typically, one minute), which makes using stolen keys much more complicated.
|
||||
* For example, if there is a form displaying the best lungo offers on the partner's website, for which the partners call the API endpoint like `/v1/search?recipe=lungo&api_key={apiKey}`, then the API key might be replaced with a signature like `sign = HMAC("recipe=lungo", apiKey)`. The signature might be stolen as well, but it will be useless for malefactors as they will only be able to find lungo with it.
|
||||
* Instead of API keys, time-based one-time passwords (TOTP) might be used. These tokens are valid for a short period of time only (typically, one minute), making it much more complicated to use stolen keys.
|
||||
|
||||
4. Filing complaints to the administration (hosting providers, app store owners) in case the malefactor distributes their application through stores or uses a diligent hosting service that investigates abuse filings. Legal actions are also an option, and even much so compared to countering user fraud, as illegal access to the system using stolen credentials is unambiguously outlawed in most jurisdictions.
|
||||
4. Filing complaints to the administration (hosting providers, app store owners) in case the malefactor distributes their application through stores or uses a diligent hosting service that investigates abuse filings. Legal actions are also an option, much more so compared to countering user fraud, as illegal access to the system using stolen credentials is unambiguously outlawed in most jurisdictions.
|
||||
|
||||
5. Banning compromised API keys; the partners' reaction will be, of course, negative, but ultimately every business will prefer temporary disabling of some functionality over getting a multi-million bill.
|
||||
5. Banning compromised API keys; the partners' reaction will be, of course, negative, but ultimately every business will prefer temporary disabling of some functionality over receiving a multi-million bill.
|
||||
|
||||
|
Loading…
x
Reference in New Issue
Block a user