Writing an ETag middleware in Go

May 27, 2018

The HTTP ETag header is something I’ve always seen flying by in HTTP requests, but I haven’t taken the time to figure it out until this semester. Essentially, it is a hash that identifies the resource at a URL. If the resource changes, its ETag changes. HTTP clients send it to a server, which can respond with a 304 Not Modified header to indicate that the client’s hash matches the hash of the resource on the server (and the client can just use a cached copy of the resource), or it can respond with a resource with a different hash.

When it is first loaded, the Shuttle Tracker web interface makes requests for several things: routes, stops, vehicles, vehicle locations, and admin messages. This can add up to about 300–400 KB, depending on the day of the week and the time of day (certain routes are only active at night, and if there are fewer active vehicles, there is less location information to transfer). The interface also requests location updates every few seconds, and it requests updated stops, routes, and vehicle information several times per minute, transferring several megabytes of data in just a few minutes. This type of polling could be replaced with a WebSocket or other notification service so that the client knows when there is new data available on the server without having to ask, but that would require additional server-side changes that the Shuttle Tracker isn’t yet ready for. I set out to investigate how we can leverage client-side caching defined in the HTTP specification in order to reduce polling data usage as much as possible, until a push-based system is implemented.

First, why do we want to reduce the amount of data transferred? Several megabytes of data in a few minutes isn’t all that much, but it is wasteful, especially because a lot of it doesn’t change as frequently as we request it. Approximately 80% of Shuttle Tracker users are on mobile. This means that they likely have data caps or pay for the data that they use, and we can’t assume that they are on Wi-Fi. Additionally, data transfer on mobile devices can have a non-negligible impact on battery life. With an emphasis on mobile support driving most of our Shuttle Tracker design decisions, I decided that it was important to reduce the amount of data used in order to keep our users’ wallets heavier and their batteries fuller.

Given the existing API endpoints that didn’t implement any sort of HTTP caching headers, how would I begin? I first looked at sending a Last-Modified header with each request, but I quickly realized that this would involve modifying every request handler in order to determine when the resource was last modified. I sought a more generic method that I could implement at a higher level than each handler, so that ideally I wouldn’t have to modify any of the handlers. This meant I needed a solution that could be implemented as a Go net/http middleware handler, without any visibility into what each handler is actually doing. I read more about ETags, and they seemed to be a good fit because they can be calculated only by looking at the body that the server generates, and not how it generates them.

To start, I created an implementation of the http.Handler interface that could be injected into the response chain by the request router, also known as a middleware. It is global, meaning that every incoming request to the Shuttle Tracker gets processed by this handler. I also wrote an implementation of the http.ResponseWriter interface. This implementation consists of a buffer for the response body and a SHA-1 sum calculator. When the ETag middleware gets a request, it injects the ETag ResponseWriter into the next handler in the request chain and then calls the next handler. That handler writes to the ResponseWriter, which transparently writes to the buffer and the hash function. Once the next handler completes its job, the ETag middleware checks whether the incoming request has an If-None-Match header. If it does, then it compares the computed hash with the contents of that header. If they match, then the handler just returns a 304 Not Modified response and discards the buffered response, indicating to the client that the resource it cached locally has not changed and can just be reused without sending it over the network. If the If-None-Match header was not set or its contents do not match the computed hash, then the server sends the buffered response and sets the ETag header to the computed hash, which the client is free to cache and use in its next request for this resource.

This has resulted in tremendous data savings. Endpoints such as /routes that return about 270 KB of JSON are able to be cached by clients, and they only have to issue a request of 100–200 bytes to check that their cached copy is up to date. Frequently-polled endpoints like /updates only send 100–200 bytes unless they have new data for the client. And best of all, this only required a modification around existing handlers in the Shuttle Tracker API—no modification of the endpoint handlers or the web interface was necessary.

Check out the code on GitHub.