APIs are the backbone of information, serving it to consumers to visualize and process data for end user consumption. Recently, I fixed a bug that related to the sorting order in one of our internal API endpoints; however this prompted the question – what are the downstream effects of retroactively updating an API? Often we, as engineers, find ourselves that given changing business requirements we must change our APIs.
A given API change must first be assessed for backwards compatibility, i.e., whether the change will be compatible with existing API clients and hence not require them to change. Backwards compatible changes typically include adding additional fields to objects, but not renaming or removing object fields, with the details dependent on the API content type (e.g., JSON or protocol buffers). Changing something like the sort order of an array of objects may or may not be backwards compatible — if the existing sort order was explicitly documented as a feature of the API, then changing it would not be backwards compatible. If, however, nothing was documented about the sort order, then it could potentially be changed without breaking existing clients, but would require further investigation to find out, as discussed below.
When a backwards-incompatible change is needed, there are typically 2 options:
- Leave the existing API unchanged, and introduce a new API with the change, typically by using an API versioning scheme such that the new API is seen as a new version of the existing API. This allows new clients, and old clients that have been updated, to use the new API while old clients continue to use the old API without introducing breaking changes.
- Identify all clients that will break when the API is changed, and orchestrate the concurrent updating of all such clients and of the API and do a coordinated deployment of all of these, effecting a “big bang” cut-over to the new API implementation.
The second option is obviously a lot more work and coordination, and unless all clients can be force-upgraded at the cut-over time, will still result in client breakage during a transition period. While such synchronized releases are viable if the only clients are web app clients, if there are mobile app clients such synchronized releases generally aren’t viable, since existing mobile clients can’t all be force-upgraded in sync with the server release. So it’s important to plan ahead for API versioning so that you can always use option one. And of course, whenever possible, don’t make backwards-incompatible changes in the first place.
Here at Livongo, we have several tools and protocols we use to ensure that our APIs are backwards compatible with producers and consumers with every release.
Furthermore, we use Splunk for centralized logging to trace routes and monitor logs for both producers and consumers. Through a combination of tagging every API request with a unique tag as well as knowing the originating referrer, we can determine exactly which consuming application this request came from. This helps both with debugging and tracing routes and API calls.
We are working within a closed ecosystem, it’s possible to know all the producers and consumers of each API as we control the API contract between producers and consumers. When the only clients are web app clients, we can perform a synchronized release and mitigate the risk of this backwards-incompatible change. Teams can all agree on an API contract and that allows us to ensure all the components are able to communicate appropriately. With proper documentation and communication between stakeholders, we’re able to control both the consumer and producer of the API contract. If changes are necessary and the stakeholders agree, we can make the changes in the next release ensuring both changes are released at the same time.
For the issue I was looking into, it turned out that changing the sort order was a backwards compatible change per the API consumers. Although this did not conform to the idea of forward compatibility or API versioning, we were able to agree given the nature of being able to control both the producers and consumers. However, what this comes down to is more and better testing; this situation could have been avoided with adequate and accurate test cases. Had this API change not been backwards compatible, this would have required much more work as discussed above.
A special thanks to Harry, Subhayu and Chris for helping me review and revise this post.