HTTP‘s New Method For Data APIs: HTTP QUERY
In our data driven world, everything is an API. Most of these APIs are RESTful APIs, running atop the good-old HTTP protocol. Funny enough, however, the protocol hasn’t changed all that much to adapt to our data driven world.
HTTP was originally built for fetching resources for web page rendering. Its main methods (a.k.a. verbs) — GET, HEAD, and POST — have been around since HTTP/1.0, with more elaborate methods for resource state management such as PUT and DELETE added in HTTP/1.1.
One thing was left unattended: data queries. I mean, which application today doesn’t need to invoke queries and work on datasets? We lack a safe, idempotent request method to run queries. This need drove the specification of a new HTTP method: QUERY.
HTTP GET and POST aren’t suitable for queries
Until now, people have taken one of two paths to invoking queries over HTTP: using the GET or the POST methods. But these methods are not exactly a suitable solution for queries. Let’s see why.
Using HTTP GET means putting the entire query as part of the URI. If you run long and complex queries, this can become problematic, as the URI has a size limit. Furthermore, different intermediary nodes on the request path, such as API gateways and web servers, may impose different constraints, so you can’t really know what the end-to-end limit would be for a given call (not to mention that queries may be routed differently on different occasions). On top of that, the data you wish to relay in your query will have to conform to the URI encoding constraints, which can limit what and how you express your data, fields, labels and so forth. There’s also a security risk when using query parameters, as attribute names are sent as part of the URI and are thus exposed to intermediaries, while the body is not.
HTTP POST overcomes the query size limit constraint of GET, as well as the security constraint, by embedding the query with the request payload itself, rather than in the URI. HTTP POST, however, does not provide a safe and idempotent query method, which can be problematic when using caching and auto-retries. This is due to the fact that it creates a resource for every query invocation, a resource which is then returned with the response. It may also alter the state of the resource as part of the call. After all, remember that the original HTTP methods were made for handling and fetching resources. Furthermore, there is a performance penalty associated with generating resources for each invocation incurs.
These shortcomings led to awkward bypasses such as a sequence of GET-POST-GET calls to query, create the resource and then fetch it. But these only made clear the need for a new designated method for queries.
Enter HTTP QUERY
HTTP QUERY method was drafted at the IETF standard specifically to address these common data query use cases (not to be confused with the URL query string, the part of a standard URL syntax that comes after the ‘?’ mark). In fact, it’s a reincarnation of the earlier WebDAV Search RFC. The latest IETF draft was released last month, shedding more light on the new method.
First, the resource on which the server needs to perform the query is identified by the target URI, quite similarly to GET.
Then, similarly to POST, the actual query is passed along within the content of the request in the payload (and not in the URI). However, unlike POST, the QUERY requests are safe and idempotent, as they do not alter the state of the targeted resource. Rather, it creates a fixed resource to cater for all the query results.
Furthermore, the QUERY response is cacheable. This means that if the same query is run on the same resource in a subsequent QUERY operation, it can be served from the cache (as long as the cached result is fresh and not revoked).
The QUERY method’s response could be a direct response with the result set (with an HTTP 200) or an indirect response, redirecting to another resource (with HTTP 3xx Redirection specifying an alternate Request URI). The response could also be no-content in case the query yields no results (HTTP 204).
Let’s look at the canonical examples of the specification, querying for up to 10 people with their respective surname, given name and email address, using the following plain text SQL query (other languages can be used of course):
select surname, givenname, email limit 10
HTTP QUERY with a direct response
Request:
QUERY /contacts HTTP/1.1Host: example.orgContent-Type: example/queryAccept: text/csvselect surname, givenname, email limit 10
Response with HTTP 200 and the query’s result set:
HTTP/1.1 200 OKContent-Type: text/csvsurname, givenname, emailSmith, John, john.smith@example.orgJones, Sally, sally.jones@example.comDubois, Camille, camille.dubois@example.net
As you can see, in this example the result uses a CSV (comma-separated values) notation, though other formats can be specified for the content type.
HTTP QUERY with an indirect response
Request:
QUERY /contacts HTTP/1.1Host: example.orgContent-Type: example/queryAccept: text/csvselect surname, givenname, email limit 10
Response with HTTP 303 redirect referring to a new URI:
HTTP/1.1 303 See OtherLocation: http://example.org/contacts/query123Fetch Query Response:GET /contacts/query123 HTTP/1.1Host: example.org
Response from the subsequent HTTP GET to the redirected URI:
HTTP/1.1 200 OKContent-Type: text/csvsurname, givenname, emailSmith, John, john.smith@example.orgJones, Sally, sally.jones@example.comDubois, Camille, camille.dubois@example.net
It’s worth noting that the specification also supports conditional query semantics, wherein the server side will run the query only if a specified condition is met with the target resource. The available conditions are If-Modified-Since, If-Unmodified-Since, If-Match, If-None-Match, or If-Range.
Endnote
Our data and API driven systems give rise to the HTTP QUERY method. I believe the combination of the clear widespread need, together with the power of open standards, will push it across the finish line and converge our industry around it. Check out the IETF specification and get involved in the open discussion of the working group on GitHub.