DINA API Standard Discussion Notes

Jump to: navigation, search

DINA Web API Standard - version 1.0 DRAFT <== USE THIS PAGE FOR FURTHER EDITS OF THE DRAFT STANDARD !!!

This document

Authors

DINA consortium - Systems Engineering Task Force

Version

0.1

Miscellaneous discussion items

  • Use Media Server when we refer to examples ?
  • ...

DINA REST API standard - Background, resources, conventions

Context & Objectives

The DINA REST API standard provides guidelines for the implementation of DINA-compliant RESTful APIs for modules and systems developed by DINA partners or any contributor of modules and systems that are intended for integration into DINA system assemblies. Specifically, this API standard is developed with the intent to:

  • provide a realiable contract between DINA modules
  • enable a simple assembly of modules and systems developed by partners in the DINA consortium
  • enable the integration of external modules and systems into DINA assemblies

Furthermore, this standard should be seen as an invitation to all interested parties to contribute to the DINA tool set and help grow the DINA ecosystem and community.

The guidelines in this document provide a set of constraints that have to be observed in order for a contribution to be considered DINA-compliant. In the definition of these constraints the authors were guided by the following principles:

  • The DINA consortium wants to encourage a fast-growing and diverse ecosystem of DINA-compliant contributions, thus the standard should ensure a reliable contract for integration of a broad range of modules, but should not be overly restrictive.
  • While examples or recommendations for tools, frameworks or technologies may be given in the standard, it must not impose any implementation technologies on contributors.
  • If open normative standards exist that cover specific requirements of the DINA REST API standard, it should build on and incorporate these standards instead of defining custom solutions.

Document conventions

This document outlines requirements and recommendations for web APIs exposed by modules and services in the DINA project. The following conventions are applied to distinguish between mandatory and optional features of DINA-compliant web APIs:

  • MUST - the usage of this term indicates features of the standard that any implementation is required to fullfill in order to be considered DINA-compliant.
  • SHOULD - indicates optional features that are highly recommended for implementation, but are not required; if these features are implemented they MUST follow the recommendations outlined in the standard.
  • COULD - indicates optional features that are considered beneficial for the service, but are not required; if these features are implemented they MUST follow the recommendations outlined in the standard.

DINA-compliant web APIs may of course provide features that are outside the contract covered by this standard, but they MUST not break any of the recommendations covered by mandatory or optional features defined by this standard.

DINA REST API standard - Specification

Basics

All DINA-compliant APIs follow basic RESTful practices with regard to the use of HTTP as a protocol, conventions of mapping HTTP methods to CRUD operations, structuring URIs to address API endpoints and the supported request and response formats (JSON, XML).


Comment by: Christopher Lewis

On the phone, I suggested some level of requirements around structure of JSON/XML documents.

Comment by: Christopher Lewis

For JSON:

At a minimum, I would suggest that JSON returned from DINA compliant services MUST satisfy the property name guidelines from the JSON Style Guide: https://google-styleguide.googlecode.com/svn/trunk/jsoncstyleguide.xml#Property_Name_Guidelines. This would ensure we don't have JSON results where the property name is the unique id of the marshalled object.

e.g. Not permitted:

 {"460932":
    {"taxid":460932,"taxon":"Aspergillus ochraceus"}
 }

Instead, we would need something like:

 {"taxa":
    {"taxid":460932,"taxon":"Aspergillus ochraceus"}
 }

Comment by: Christopher Lewis

For XML, I'm not quite sure there's a standard we can point to...

However, the example I referred to on the phone, which I'd like us to somehow forbid (or at least recommend against), is something like the following, where the id of each object is embedded in the element name:

 <resultset>
   <row_1> ... </row_1>
   <row_2> ... </row_2>
   <row_3> ... </row_3>
 </resultset>

Preferred would, I believe be the following, where it is clear that each of the elements of the result set are of the same type:

 <resultset>
   <row id="1"> ... </row>
   <row id="2"> ... </row>
   <row id="3"> ... </row>
 </result>

However, I'm not quite sure how we capture the above or whether there is an XML best practices document (or other) that we can point to.


Comment by: Christopher Lewis

Another question - do we require a schema or DTD for XML results?

Endpoint definitions

HTTP methods

DINA-compliant APIs MUST follow a common usage of HTTP methods. An agreed mapping of HTTP methods to common CRUD operation ensures a transparent behaviour. The DINA API standard assumes the mappings listed in the following table. To illustrate these mappings the table refers to a DINA-compliant media server modules, indicated by the reserved word (URI path component) "media" and a data object (media type) handled by the web service indicated by "image".

HTTP METHOD GET POST PUT DELETE HEAD
CRUD OP READ CREATE UPDATE DELETE n.a.
/media/image/1234 Return data object of type image and with id '1234'. Error? Error? - or create data object with id '1234'. Suitable when the client knows the identifier, e.g. an UUID Delete media data object of type image with id '1234'. Return only meta-data section for corresponding GET request.
/media/image Return list of all images. Error? Error? Error? -"-
/media/image/1234?license="CC BY" Error? Error? Update license property of media data object of type image with id '1234'. Error? -"-
/media/image/search?id=1000,2000&taxon=Thaumotopea Return data object of type image, within id range 1000-2000 and taxon "Thaumotopea". Error? Error? Error? -"-
??? ... ... ... ... -"-

Similar examples can be found in https://github.com/wet-boew/wet-boew-api-standards[Tools 1] or http://www.oracle.com/technetwork/articles/javase/index-137171.html[Tools 2].

Comment by: Stefan Daume

This needs some discussion. What practices are other DINA partners currently use? How specific should we be here? Strictly speaking the API standard should not impose (any/too many) constraints on the specific service data model, but we should provide some best practices regarding the consistent use of HTTP methods. As an example for discussion: Should GET /media/image/1234?license="CC BY" be interpreted as an implicit search or should it return an error code?

Comment by: Christopher Lewis

My impression is that the implicit search is the standard / best practice, whereas the use of a search parameter is non-standard.

Comment by: Thomas Stjernegaard Jeppesen

I would say that GET /media/image/1234?license="CC BY" should raise an error since its a query performed on a single object. As a developer, I might spend time wondering why I only got a single object in the response. An error would be better for debugging. Normally I would think that PUT /media/image/1234?license="CC BY" is also incorrect, since a PUT request should send data in the request body and not use url parameters. However I see that the example is a special case since the url parameters here seems to be used for meta data on an image file. A PUT request should in general update the entire resource. In the DINA ecosystem, I think that PUT requests could also be used for creating a resource similar to POST. This would happen when the identifier is known by the client - e.g. a UUID or an accession no issued to the client.


Comment by: Stefan Daume

Should we define a list of consistent HTTP response codes that should be observed? The expected response codes could be integrated in this table for each example.

Comment by: Christopher Lewis

I would expect so.

Comment by: Thomas Stjernegaard Jeppesen

I Agree.


Comment by: Christopher Lewis

Regarding: /media/image/search?id=1000,2000&taxon=Thaumotopea

Question: Is there a standard or convention in support of the use of a comma separated list to specify a range of values, or should that be two separate parameters (e.g. minId, maxId). In my mind, the comma separated list could have semantics (id = 1000 or id = 2000) or it could have semantics (id = 1000 and id = 1000).

The discussion here: http://stackoverflow.com/questions/207477/restful-url-design-for-search seems to suggest a convention of using a comma separated list to indicate a logical "or".

Comment by: Thomas Stjernegaard Jeppesen

I would intuitively think of the comma separated list as a set representation, i.e. /media/image/search?year=2002,2003,2011&taxon=Thaumotopea for a subset query and /media/image/search?minyear=2002&maxyear=2011&taxon=Thaumotopea for a range query. I think that is also what the link Chris provded above suggested.


Comment by: Nazir El-Kayssi

Performing a GET on /media/image should return a list of IDs of images. To get a specific image from the list, you'd then use /media/image/1234. The list should also be paged using the default limit and offset parameters if they are not specified.


Comment by: Nazir El-Kayssi

Additional issues mentioned by Glen Newton:

1 – Count: need to count what the number of results a web service (all or query) will return when called. Add /count to noun. /noun/count returns the total # of noun. If search, /noun/count?queryStuff. See: http://stackoverflow.com/a/5394127/459050

2 – Sorting: Add “sort=Field0,Field1,…fieldX” to URI If field does not exist (or is not a sort key?) should return “400 Bad Request"


Comment by: Guido Sautter

I think the GBIF API provides pretty good examples of how to use URLs and HTTP methods for a broad variety of purposes, see http://www.gbif.org/developer/summary . We might all have a good look at their API before engraving anything in store here.

HTTP header use

DINA-compliant web APIs MUST support at least the variables indicating the requested an support media types. For HTTP requests this is Accept: (e.g. Accept: application/json) and for HTTP response this is Content-Type: (e.g. Content-Type: application/json). The Accept: header is described by W3C RFC2616 Section 14.1 (http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.hml#sec14.1)[Standards 1].

The supported and indicated media types must be recognized Multipurpose Internet Mail Extensions (MIME) types registered with the Internet Assigned Numbers Authority (IANA)'s media type catalogue. For most standard file types IANA's media type catalogue (http://www.iana.org/assignments/media-types/media-types.xhtml) [Standards 2] will provide the appropriate type definition.

Comment by: Stefan Daume

This proposed standard includes language headers as well: https://github.com/wet-boew/wet-boew-api-standards Should we include this?

Comment by: Christopher Lewis

If we will be supporting multi-lingual content, then it seems like it should be desirable to permit the user to indicate their linguistic preferences in this way. Did you see anything to suggest that this precludes use of an explicit language parameter to force the return of content in a particular language?

Comment by: Guido Sautter

The language parameter should be optional, with a fallback default for textual content (English text is better than nothing at all), and should be ignored for images and photographs that have no language tags because they don't apply. Maybe we should consider a wildcard language tag for this purpose?

URI structure

Basics

DINA-compliant APIs MUST indicate base URLs for API endpoints with an "api" subdomain, thus http://api.dinamodule.net/ would be considered compliant while http://www.dinamodule.net/api/ would not be compliant under this scheme.

Comment by: Christopher Lewis

Is this a standard convention? It strikes me that the two are equivalent and with an Apache rewrite / redirect, you can serve the second from the first. So I'm not sure why this would be a must...

Comment by: Guido Sautter

I'm not sure at all we should make this a MUST, either. The subdomain approach is a lot harder to configure than the '/api' approach. And with regard to adding a DINA compliant REST API to existing systems, even something like '/dina/api' or '/<someWebApp>/dina/api' should be allowed ... the one MUST I see here is the provision of an endpoint URL, which request paths can be appended to.


DINA-compliant APIs MUST include a version indicator that is appended directly to the endpoint base URI, e.g. http://api.dinamodule.net/v1 (see also the section on versioning schemes).

Comment by: Christopher Lewis

Agreed. Do we make a recommendation on minor version numbers (e.g. /v1.1 or /v1_1)?

On the phone, I believe we settled on: DINA-compliant APIs SHOULD present API endpoints with an "api" subdomain and MUST include a version.

I propose that there are two scenarios. 1) Each module is hosted at a unique domain, e.g. "http://api.dinamodule.net/version/" and 2) Modules are consolidated and provided via a single domain, e.g. "http://api.host/module/version". Both of these scenarios can be accomplished using rewrite/redirect/proxy directives from an original URL of "http://localhost:port/module/version".

Passing parameters

DINA-compliant APIs MUST accept parameters as key-value pairs following standard URI patterns, e.g. http://api.dinamodule.net/v1/media/image/search?id=1234.

Comment by: Stefan Daume

What approach should we take with regard to passing complex request data structures? What recommendations should the DINA API standard provide?

Comment by: Christopher Lewis

We currently permit implicit ands in our queries, including the use of the same term more than one time, e.g. /v1/sequence/country=Can&country=ada, which would find find all sequences with associated country metadata containing "%Can%" and "%ada%" (our default is to wildcard search terms).

We've started to talk about representing more complex queries, and came up with something involving the creation of multiple sub-criteria and then specifying the operators and orders to be applied to the criteria. However, we haven't needed to flesh that out yet...

There's an interesting discussion about the treatment of filters as resources here: http://stackoverflow.com/questions/1296421/rest-complex-applications/1297275#1297275. This permits you to PUT a complex filter and then refer to it by an identifier going forward. This ensures that a complex filter can be used in a bookmarkable / RESTful URL.

Reserved words

  • Reserved words indicating a module/service endpoint:
    • "collections" and/or "collection"
    • "dna"
    • "taxonomy"
    • "media"
    • "report"
    • "printing"
    • "references" and/or "literature"
    • "geography"
    • "preparation" and/or "storage"
    • "determination"
    • "transaction"
    • ...
  • Reserved words indicating data concepts:
    • "specimen" and/or "object"
    • "taxon"
    • "organisation" and/or "institution"
    • "event"
    • "locality"
    • "method" and/or "measurement"
    • "agent"
    • "person"
    • "user"
    • "annotation"
    • ...

Comment by: Stefan Daume

I think we need to define two sets of reserved words as indicated above. There will be overlaps however (e.g. "user" for a dedicated user management module and as a data entity. What is definitely required initially is a set of reserved words for existing or planned modules - and possibly envisaged ones.


Comment by: Falko Göckler

I agree with Stefan's comment. I filled in some more terms (and we should continue doing that!) and I tried to avoid duplicates. So, maybe we could solve the issue of potential overlap (or even need for overlap!) by requiring one additional path in the API endpoint definitions if we refer to a data concept.

e.g. http://api.mydomain.net/user/v2/ could be the user management module version 2, whereas http://api.mydomain.net/concept/user could be the data concept for users and http://api.mydomain.net/data/user/XYZ could be the data on user identified by the string or id XYZ.

This would mean that we understand the terms "concept" and "data" have to be treated like pseudo-modules.

By the way: The data concepts and the data might be independent from the module version, which is why I did not include the verison in the respective URIs in my example.

Maybe my suggestion will create new constraints, so we should decide how open this should be. What do you others think?


Comment by: Falko Göckler

Two more requirements for that list:

  • we should always "reserve" terms in both singular ans plural. (So, I didn't list all plural alternatives.)
  • the terms should always be in lower case

HTTP response

HTTP responses returned by DINA-compliant API endpoints MUST follow a standard response structure - independent of the reponse format (JSON/XML).

{
   "metadata": {
      "callEndpoint": "http://api.refimplementation.net/v1/media/...",
      "callDate": "2014-10-08T08:08:18+01:00",
      "apiVersion: "1.0",
      ...
   }

   "results": [
      ... module-specific data ...
   ] 
}

Comment by: Stefan Daume

This is a crucial item to discuss! What format MUST a DINA-compliant API support?

  • Both JSON and XML?
  • Another format?
  • Just JSON?
  • Either JSON or XML? (doesn't seem the best approach ...)

Let's keep in mind that we want to achieve quick provision of DINA modules but also simple adoption and integration.

Comment by: Christopher Lewis

Updated to reflect discussion on phone. I'd suggest MUST support one of JSON and XML and SHOULD support both.

Comment by: Thomas Stjernegaard Jeppesen

I think MUST support JSON. COULD support XML. I think it is important that we agree on a single format. Otherwise development of some modules might require client code for several data formats = more work in development, testing etc

API response "metadata"

Reference base for data types: http://schema.org[Standards 3]

  • Meta-data to be provided with responses
    • URI endpoint called that issued the response
    • Datetime at which the response was issued
    • result count
    • Version of the API
    • Result language
    • content licenses

Comment by: Christopher Lewis

We've been including the following, some or all of which may be valuable: statusCode, message, limit, offset, sortColumn, sortOrder, shallow, count.


Comment by: Stefan Daume

The meta-data section could list (not map) all licences covered by the data returned by an API call for convenience. This would be useful for diagnostic/statistic tools which would then not be required to understand and traverse the data returned. Would this be a useful feature? Or should we leave it up to module/API providers to list the licenses in teh data objects only, and thus leave it up to users of an API to extract licenses from the returned data objects?


Comment by: Stefan Daume

Other meta-data: Maintenance / operational - link to service that provides mailing list that provides information about service disruptions?


Suggested example format in JSON:

{
   "metadata": {
      "callEndpoint": "http://api.refimplementation.net/v1/media/...",
      "callDate": "2014-10-08T08:08:18+01:00",
      "apiVersion: "1.0",
      "results": 18,
      "resultLanguages": [
          "SE_sv",
          "GB_en"
      ],
      "supportedLanguages": [
          "SE_sv",
          "GB_en",
          "EE_et"          
      ]
      ...
   }

   ... "data" ...
}

Generic operations

Paging

All DINA-compliant APIs MUST provide support paging for large result sets, accepting the following parameters:

URI term Parameter Description
[ANY] maxresults The maximum number of items in the returned result set.
offset OPTIONAL? Determines the offset of items in a result set and thus provides an implicit paging mechanism.

Comment by: Christopher Lewis

I'd propose limit rather than maxresults

Comment by: Thomas Stjernegaard Jeppesen

I´d also prefer limit.

Supported languages

All DINA-compliant APIs SHOULD provide a generic operation to request languages supported by the service. Support for this request MUST be implemented with a non-parameterised GET request using the reserved word languages as the last element of the endpoint URI.

URI term Parameter Description
GET/languages - Requests a list of all languages supported by the service, returned as a meta-data object in the standard response format. By default supported languages should be included in the meta-data section of API responses. This call should support retrieval of supported languages only.

Sample reponse:

{
   "metadata": {
      "callEndpoint": "http://api.refimplementation.net/v1/media/languages",
      "callDate": "2014-10-08T08:08:18+01:00",
      "apiVersion: "1.0",
      "results": 0,
      "supportedLanguages": [
          "SE_sv",
          "GB_en",
          "EE_et"          
      ]
      ...
   }
}

Documentation

  • Each DINA compliant Web REST API MUST provide complete English documentation of the supported methods. For each method the documentation MUST provide curl examples to document the usage. For example (illustrative):
CREATE
Mind the syntax: when posing a file with cURL , you have to put the '@'-sign in front of the file

URI: '/v1/media/create'

Method = Post:
→ curl -v -F "owner=Laxness" -F "access=public" -F "licenseType=CC BY" -F "legend=en
skata" -F "legendLanguage=sv_SE" -F "tags=view:left" -F "fileName=pica-pica-flying.jpg" -F
"selectedFile=@pica-pica-flying.jpg" http://refimplementation.mediaserver.net/v1/media/create

Response if HTTP '200 OK':
→ <UUID>
for instance, in this case→ 46853e82-6cad-430b-b582-90e85203dce8

Test :
→ curl http://refimplementation.mediaserver.net/v1/media/metadata/<UUID>
for instance, in this case→ 46853e82-6cad-430b-b582-90e85203dce8

→ curl http://refimplementation.mediaserver.net/v1/media/metadata/ 46853e82-6cad-430b-b582-
90e85203dce8
  • In addition to the required basic API documentation, DINA compliant REST API SHOULD provide self-documentation capabilities for each endpoint similar to the example provided by e.g. the Django REST framework[Tools 3] or Apiary[Tools 4]
  • The documentation for the API COULD refer to an online reference implementation in the curl examples (rather than to localhost)

Versioning

DINA compliant APIs MUST follow a versioning scheme to provide a transparent contract for users of the service.

API versions MUST be indicated in the URI of an endpoint. In the context of this standard an endpoint is assumed to be the endpoint for a specific DINA module, clearly indicated by a reserved word in the endpoint URI and the structure of the endpoint URI MUST follow the pattern [URI_BASE]/[MODULE_NAME]/[VERSION_IDENTIFIER], where:

  • [URI_BASE] is the partner specific base URI such as http://api.dinapartner.net,
  • [MODULE_NAME] is a reserved word (see section appropriate section) that clearly indicates the module such as media for the media server module,
  • [VERSION_IDENTIFIER] is an abbreviated version indicator such as v1.

The URI http://api.dinapartner.net/media/v1 would thus be a standards-compliant versioned endpoint URI.

Versioning of the API SHOULD be driven by significant changes in the endpoint logic, i.e. changes in the request or response structure.

The API SHOULD use version numbers that are easy to read and maintain, i.e. v1, v2, v3

It seems good practice to maintain at least two versions for a suitable transition period.

Authentication

Comment by: Stefan Daume

  • Should we be inspired by GBIF, that uses HTTP Basic Authentication with a system user account that you have created before?
  • Would we be using something like OAuth2, like in Twitter? Separation between humans and robots (which need to retrieve and use a key)...
  • We want to provide a low barrier of entry, perhaps some modules are read-only and wouldn't even require support for authentication?
  • Do we want a reference authentication implementation? For example: "Each web service should be able to authenticate against ... some external site that stores user details and is OpenId compliant?"
    • Long term option - should we provide a dedicated DINA user management module that is OpenID compliant?

Comment by: Thomas Stjernegaard Jeppesen

I would suggest OAuth2.

Resources & References

Referenced resources

Normative

  1. W3C RFC2616 Section 14.1 (http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.hml#sec14.1)
  2. IANA's media type catalogue (http://www.iana.org/assignments/media-types/media-types.xhtml)
  3. schema.org

Tools and information items

  1. https://github.com/wet-boew/wet-boew-api-standards
  2. http://www.oracle.com/technetwork/articles/javase/index-137171.html
  3. Django REST framework
  4. Apiary

Consulted resources


This page was last modified on 8 December 2014, at 17:24. Content is available under Attribution-Share Alike Non-commercial 2.5 or later, Unported unless otherwise noted.