Skip to content

Latest commit

 

History

History
405 lines (358 loc) · 24.1 KB

DataSources.md

File metadata and controls

405 lines (358 loc) · 24.1 KB
title layout category
DataSources
default
Kettle

DataSources

A DataSource is an Infusion component which meets a simple contract for read/write access to indexed data. DataSource is a simple semantic, broadly the same as that encoded in CRUD, although the current DataSource semantic does not provide explicitly for deletion.

The concrete DataSources in Kettle provide support for HTTP endpoints (with a particular variety specialised for accessing CouchDB databases with CRUDlike semantics) as well as the filesystem, with an emphasis on JSON payloads.

The DataSource API is drawn from the following two methods – a read-only DataSource will just implement get, and a writeable DataSource will implement both get and set:

/* @param directModel {Object} A JSON structure holding the "coordinates" of the state to be read -
 * this model is morally equivalent to (the substitutable parts of) a file path or URL
 * @param options {Object} [Optional] A JSON structure holding configuration options good for just
 * this request. These will be specially interpreted by the particular concrete grade of DataSource
 * – there are no options valid across all implementations of this grade.
 * @return {Promise} A promise representing successful or unsuccessful resolution of the read state
 */
dataSource.get(directModel, options);
/* @param directModel {Object} As for get
 * @param model {Object} The state to be written to the coordinates
 * @param options {Object} [Optional] A JSON structure holding configuration options good for just
 * this request. These will be specially interpreted by the
 * particular concrete grade of DataSource – there are no options valid across all implementations
 * of this grade. For example, a URL DataSource will accept an option `writeMethod` which will
 * allow the user to determine which HTTP method (PUT or POST) will be used to implement the write
 * operation.
 * @return {Promise} A promise representing resolution of the written state,
 * which may also optionally resolve to any returned payload from the write process
 */
dataSource.set(directModel, model, options);

Simple example of using an HTTP dataSource

In this example we define and instantiate a simple HTTP-backed dataSource accepting one argument to configure a URL segment:

var fluid = require("infusion"),
    kettle = require("../../kettle.js"),
    examples = fluid.registerNamespace("examples");


fluid.defaults("examples.httpDataSource", {
    gradeNames: "kettle.dataSource.URL",
    url: "http://jsonplaceholder.typicode.com/posts/%postId",
    termMap: {
        postId: "%directPostId"
    }
});

var myDataSource = examples.httpDataSource();
var promise = myDataSource.get({directPostId: 42});

promise.then(function (response) {
    console.log("Got dataSource response of ", response);
}, function (error) {
    console.error("Got dataSource error response of ", error);
});

You can run this snippet from our code samples by running node simpleDataSource.js from examples/simpleDataSource in our samples area. This contacts the useful JSON placeholder API service at jsonplaceholder.typicode.com to retrieve a small JSON document holding some placeholder text. If you get a 404 or an error, please contact us and we'll update this sample to contact a new service.

An interesting element in this snippet is the termMap configured as options of our dataSource. This sets up an indirection between the directModel supplied as the argument to the dataSource.get call, and the URL issued in the HTTP request. The keys in the termMap are interpolation variables in the URL, which in the URL are prefixed by %. The values in the termMap represent either

  • Plain values to be interpolated as strings directly into the URL, or
  • If the first character of the value in the termMap is %, the remainder of the string represents a path which will be dereferenced from the directModel argument to the current set or get request.

In addition, if the term value has the prefix noencode:, it will be interpolated without any URI encoding.

We document these configuration options in the next section:

Configuration options accepted by kettle.dataSource.URL

Supported configurable options for a kettle.dataSource.URL
Option Path Type Description
writable Boolean (default: false) If this option is set to true, a set method will be fabricated for this dataSource – otherwise, it will implement only a get method.
writeMethod String (default: PUT) The HTTP method to be used when the set method is operated on this writable DataSource (with grade fluid.dataSource.writable). This defaults to PUT but POST is another option. Note that this option can also be supplied within the options argument to the set method itself.
url String A URL template, with interpolable elements expressed by terms beginning with the % character, for the URL which will be operated by the get and set methods of this dataSource.
termMap Object (map of String to String) A map, of which the keys are some of the interpolation terms held in the url string, and the values will be used to perform the interpolation. If a value begins with %, the remainder of the string represents a path into the directModel argument accepted by the get and set methods of the DataSource. By default any such values looked up will be URI Encoded before being interpolated into the URL – unless their value in the termMap is prefixed by the string noencode:.
notFoundIsEmpty Boolean (default: false) If this option is set to true, a fetch of a nonexistent resource (that is, a nonexistent file, or an HTTP resource giving a 404) will result in a resolve with an empty payload rather than a reject response.
censorRequestOptionsLog Object (map of String to Boolean) (default: {auth: true, "headers.Authorization": true}) A map of paths into the request options which should be censored from appearing in logs. Any path which maps to true will not appear either in the logging output derived from the request options parsed from the url or the url itself.
components.encoding.type String (grade name) A kettle.dataSource.URL has a subcomponent named encoding which the user can override in order to choose the encoding used to read and write the model object to and from the textual form in persistence. This defaults to kettle.dataSource.encoding.JSON. Other builtin encodings are kettle.dataSource.encoding.formenc operating HTML form encoding and kettle.dataSource.encoding.none which applies no encoding. More details in Using Content Encodings with a DataSource.
setResponseTransforms Array of String (default: ["encoding"]) Contains a list of the namespaces of the transform elements (see section transforming promise chains that are to be applied if there is a response payload from the set method, which is often the case with an HTTP backend. With a JSON encoding these encoding typically happens symmetrically - with a JSON request one will receive a JSON response - however, with other encoding such as form encoding this is often not the case and one might like to defeat the effect of trying to decode the HTTP response as a form. In this case, for example, one can override setResponseTransforms with the empty array [].
charEncoding String (default: utf8) The character encoding of the incoming HTTP stream used to convert its data to characters - this will be sent directly to the setEncoding method of the response stream
invokers.resolveUrl IoC Invoker (default: kettle.dataSource.URL.resolveUrl) This invoker can be overridden to customise the process of building the url for a dataSource request. The default implementation uses an invocation of fluid.stringTemplate to interpolate elements from termMap and the directModel argument into the template string held in url. By overriding this invoker, the user can implement a strategy of their choosing. The supplied arguments to the invoker consist of the values (url, termMap, directModel) taken from these options and the dataSource request arguments, but the override can replace these with any IoC-sourced values in the invoker definition.

In addition, a kettle.dataSource.URL component will accept any options accepted by node's native http.request constructor – supported in addition to the above are protocol, host, port, headers, hostname, family, localAddress, socketPath, auth and agent. All of these options will be overriden by options of the same names supplied as the options object supplied as the last argument to the dataSource's get and set methods. This is a good way, for example, to send custom HTTP headers along with a URL dataSource request. Note that any of these component-level options (e.g. port, protocol, etc.) that can be derived from parsing the url option will override the value from the url. Compare this setup with the very similar one operated in the testing framework for kettle.test.request.http.

Configuration options accepted by kettle.dataSource.file

An alternative dataSource implementation is kettle.dataSource.file - this is backed by the node filesystem API to allow files to be read and written in various encodings. The interpolation support based on termMap is very similar to that for kettle.dataSource.URL, but with the location template option named path representing an absolute filesystem path rather than the url property of kettle.dataSource.URL representing a URL.

Exactly the same scheme based on the subcomponent named encoding can be used to control content encoding for a kettle.dataSource.file as for a kettle.dataSource.URL. Similarly, kettle.dataSource.file supports a further option named charEncoding which can select between various of the character encodings supported by node.js.

Supported configurable options for a kettle.dataSource.file
Option Path Type Description
writable Boolean (default: false) If this option is set to true, a set method will be fabricated for this dataSource – otherwise, it will implement only a get method.
path String An (absolute) file path template, with interpolable elements expressed by terms beginning with the % character, for the file which will be read and written the get and set methods of this dataSource.
termMap Object (map of String to String) A map, of which the keys are some of the interpolation terms held in the url string, and the values, if prefixed by % are paths into the directModel argument accepted by the get and set methods of the DataSource.
charEncoding String (default: utf8 The character encoding of the file used to convert its data to characters - one of the values supported by the node filesystem API - values it advertises include utf8, ascii or based64. There is also evidence of support for ucs2.

A helpful mixin grade for kettle.dataSource.file is kettle.dataSource.file.moduleTerms which will allow interpolation by any module name registered with the Infusion module system fluid.module.register – e.g. %kettle/tests/data/couchDataSourceError.json.

Using content encodings with a DataSource

kettle.dataSource.URL has a subcomponent named encoding which the user can override in order to choose the content encoding used to convert the model seen at the get/set API to the textual (character) form in which it is transmitted by the dataSource. The encoding subcomponent will also correctly set the Content-Type header of the outgoing HTTP request in the case of a set request. The encoding defaults to a JSON encoding represented by a subcomponent of type kettle.dataSource.encoding.JSON. Here is an example of choosing a different encoding to submit form encoded data to an HTTP endpoint:

fluid.defaults("examples.formDataSource", {
    gradeNames: "kettle.dataSource.URL",
    url: "http://httpbin.org/post",
    writable: true,
    writeMethod: "POST",
    components: {
        encoding: {
            type: "kettle.dataSource.encoding.formenc"
        }
    },
    setResponseTransforms: [] // Do not parse the "set" response as formenc - it is in fact JSON
});

var myDataSource = examples.formDataSource();
var promise = myDataSource.set(null, {myField1: "myValue1", myField2: "myValue2"});

promise.then(function (response) {
    console.log("Got dataSource response of ", JSON.parse(response));
}, function (error) {
    console.error("Got dataSource error response of ", error);
});

In this example we set up a form-encoded, writable dataSource targetted at the popular HTTP testing site httpbin.org sending a simple payload encoding two form elements. We use Kettle's built-in form encoding grade by configuring an encoding subcomponent name kettle.dataSource.encoding.formenc. You can try out this sample live in its place in the examples directory. Note that since this particular endpoint sends a JSON response rather than a form-encoded response, we need to defeat the dataSource's attempt to apply the inverse decoding in the response by writing setResponseTransforms: [].

Built-in content encodings

Kettle features three built-in content encoding grades which can be configured as the subcomponent of a dataSource named encoding in order to determine what encoding it applies to models. They are described in this table:

Grade name Encoding type Content-Type header
kettle.dataSource.encoding.JSON JSON application/json
kettle.dataSource.encoding.JSON5 JSON5 application/json5
kettle.dataSource.encoding.formenc form encoding application/x-www-form-urlencoded
kettle.dataSource.encoding.none No encoding text/plain

Elements of an encoding component

You can operate a custom encoding by implementing a grade with the following elements, and using it as the encoding subcomponent in place of one of the built-in implementations in the above table:

Member name Type Description
parse Function (String) -> Any Parses the textual form of the data from its encoded form into the in-memory form
render Function (Any) -> String Renders the in-memory form of the data into its textual form
contentType String Holds the value that should be supplied in the
Content-Type of an outgoing HTTP request whose body is
encoded in this form

The kettle.dataSource.CouchDB mixin grade

Kettle includes a further mixin grade, kettle.dataSource.CouchDB, which is suitable for reading and writing to the doc URL space of a CouchDB database. This can be applied to either a kettle.dataSource.URL or a kettle.dataSource.file (the latter clearly only useful for testing purposes). This is a basic implementation which simply adapts the base documents in this API to a simple CRUD contract, taking care of:

  • Packaging and unpackaging the special _id and _rev fields which appear at top level in a CouchDB document
    • The user's document is in fact escaped in a top-level path named value to avoid conflicts between its keys and any of those of the CouchDB machinery. If you wish to change this behavior, you can do so by providing different model transformation rules in options.rules.readPayload and options.rules.writePayload.
  • Applying a "read-before-write" of the _rev field to minimise (but not eliminate completely) the possibility for a Couch-level conflict

This grade is not properly tested and still carries some (though very small) risk of a conflict during update – it should be used with caution. Please contact the development team if you are interested in improved Couch-specific functionality.

Advanced implementation notes on DataSources

In this section are a few notes for advanced users of DataSources, who are interested in extending their functionality or else in issuing I/O in Kettle by other means.

Transforming promise chains

The detailed implementation of the Kettle DataSource is structured around a particular device taken from the Infusion Promises library, the concept of a "transforming promise chain". The core DataSource grade implements two events, onRead and and onWrite. These events are fired during the get and set operations of the DataSource, respectively. These events are better described as "pseudoevents" since they are not fired in the conventional way – rather than each event listener receiving the same signature, each instead receives the payload returned by the previous listener – it may then transform this payload and produce its own return in the form of a promise. Any promise rejection terminates the listener notification chain and propagates the failure to the caller. The DataSource implementation in fact fires these events by invoking the fireTransformEvent function from Infusion's Promises API.

The virtue of this implementation strategy is that extra stages of processing for the DataSource can be inserted and removed from any part of the processing chain by means of supplying suitable event priorities to the event's listeners. Both the JSON encoding/decoding and CouchDB wrapping/unwrapping facilities for the DataSources are implemented in terms of event listeners of this type, rather than in terms of conditional implementation code. This is a powerful and open implementation strategy which we plan to extend in future.

Callback wrapping in DataSources

It's important that Kettle's inbuilt DataSources are used whenever possible when performing I/O from a Kettle application, since it is crucial that any running implementation code is always properly contextualised by its appropriate request component. Kettle guarantees that the IoC context {request} will always be resolvable onto the appropriate request component from any code executing within that request. If arbitrary callbacks are supplied to node I/O APIs, the code executing in them will not be properly contextualised. If for some reason a DataSource is not appropriate, you can manually wrap any callbacks that you use by supplying them to the API kettle.wrapCallback. Get in touch with the dev team if you find yourself in this situation.