title | layout | category |
---|---|---|
DataSources |
default |
Kettle |
A DataSource is an Infusion component which meets a simple contract for read/write access to indexed data. DataSource is a simple semantic, broadly the same as that encoded in CRUD, although the current DataSource semantic does not provide explicitly for deletion.
The concrete DataSources in Kettle provide support for HTTP endpoints (with a particular variety specialised for accessing CouchDB databases with CRUDlike semantics) as well as the filesystem, with an emphasis on JSON payloads.
The DataSource API is drawn from the following two methods – a read-only DataSource will just implement get
, and a
writeable DataSource will implement both get
and set
:
/* @param directModel {Object} A JSON structure holding the "coordinates" of the state to be read -
* this model is morally equivalent to (the substitutable parts of) a file path or URL
* @param options {Object} [Optional] A JSON structure holding configuration options good for just
* this request. These will be specially interpreted by the particular concrete grade of DataSource
* – there are no options valid across all implementations of this grade.
* @return {Promise} A promise representing successful or unsuccessful resolution of the read state
*/
dataSource.get(directModel, options);
/* @param directModel {Object} As for get
* @param model {Object} The state to be written to the coordinates
* @param options {Object} [Optional] A JSON structure holding configuration options good for just
* this request. These will be specially interpreted by the
* particular concrete grade of DataSource – there are no options valid across all implementations
* of this grade. For example, a URL DataSource will accept an option `writeMethod` which will
* allow the user to determine which HTTP method (PUT or POST) will be used to implement the write
* operation.
* @return {Promise} A promise representing resolution of the written state,
* which may also optionally resolve to any returned payload from the write process
*/
dataSource.set(directModel, model, options);
In this example we define and instantiate a simple HTTP-backed dataSource accepting one argument to configure a URL segment:
var fluid = require("infusion"),
kettle = require("../../kettle.js"),
examples = fluid.registerNamespace("examples");
fluid.defaults("examples.httpDataSource", {
gradeNames: "kettle.dataSource.URL",
url: "http://jsonplaceholder.typicode.com/posts/%postId",
termMap: {
postId: "%directPostId"
}
});
var myDataSource = examples.httpDataSource();
var promise = myDataSource.get({directPostId: 42});
promise.then(function (response) {
console.log("Got dataSource response of ", response);
}, function (error) {
console.error("Got dataSource error response of ", error);
});
You can run this snippet from our code samples by running node simpleDataSource.js
from
examples/simpleDataSource in our samples area.
This contacts the useful JSON placeholder API service at
jsonplaceholder.typicode.com
to retrieve a small JSON document holding some placeholder text. If you get a 404 or an error, please contact us and
we'll update this sample to contact a new service.
An interesting element in this snippet is the termMap
configured as options of our dataSource. This sets up an
indirection between the directModel
supplied as the argument to the dataSource.get
call, and the URL issued in the
HTTP request. The keys in the termMap
are interpolation variables in the URL, which in the URL are prefixed by %
.
The values in the termMap
represent either
- Plain values to be interpolated as strings directly into the URL, or
- If the first character of the value in the
termMap
is %, the remainder of the string represents a path which will be dereferenced from thedirectModel
argument to the currentset
orget
request.
In addition, if the term value has the prefix noencode:
, it will be interpolated without any
URI encoding.
We document these configuration options in the next section:
Supported configurable options for a kettle.dataSource.URL |
||
---|---|---|
Option Path | Type | Description |
writable |
Boolean (default: false ) |
If this option is set to true , a set method will be fabricated for this
dataSource – otherwise, it will implement only a get method. |
writeMethod |
String (default: PUT ) |
The HTTP method to be used when the set method is operated on this writable DataSource
(with grade fluid.dataSource.writable ). This defaults to PUT but
POST is another option. Note that this option can also be supplied within the
options argument to the set method itself. |
url |
String |
A URL template, with interpolable elements expressed by terms beginning with the %
character, for the URL which will be operated by the get and set methods of
this dataSource. |
termMap |
Object (map of String to String ) |
A map, of which the keys are some of the interpolation terms held in the url string,
and the values will be used to perform the interpolation. If a value begins with % ,
the remainder of the string represents a
path
into the directModel argument accepted by the get and set
methods of the DataSource. By default any such values looked up will be
URI Encoded before being interpolated into the URL – unless their value in the termMap is
prefixed by the string noencode: . |
notFoundIsEmpty |
Boolean (default: false ) |
If this option is set to true , a fetch of a nonexistent resource (that is, a
nonexistent file, or an HTTP resource giving a 404) will result in a resolve with an empty
payload rather than a reject response. |
censorRequestOptionsLog |
Object (map of String to Boolean ) (default:
{auth: true, "headers.Authorization": true} )
|
A map of paths into the
request options which should be censored from appearing in logs. Any path which maps to true
will not appear either in the logging output derived from the request options parsed from the url
or the url itself.
|
components.encoding.type |
String (grade name) |
A kettle.dataSource.URL has a subcomponent named encoding which the user can
override in order to choose the encoding used to read and write the model
object to and from the textual form in persistence. This defaults to
kettle.dataSource.encoding.JSON . Other builtin encodings are
kettle.dataSource.encoding.formenc operating HTML
form
encoding and kettle.dataSource.encoding.none which applies no encoding.
More details in Using Content Encodings with a
DataSource. |
setResponseTransforms |
Array of String (default: ["encoding"] ) |
Contains a list of the namespaces of the transform elements (see section
transforming promise chains that are to be applied if there
is a response payload from the set method, which is often the case with an HTTP backend.
With a JSON encoding these encoding typically happens symmetrically - with a JSON request one will
receive a JSON response - however, with other encoding such as
form
encoding this is often not the case and one might like to defeat the effect of trying to decode
the HTTP response as a form. In this case, for example, one can override
setResponseTransforms with the empty array [] . |
charEncoding |
String (default: utf8 ) |
The character encoding of the incoming HTTP stream used to convert its data to characters - this will be sent directly to the setEncoding method of the response stream |
invokers.resolveUrl |
IoC Invoker
(default: kettle.dataSource.URL.resolveUrl ) |
This invoker can be overridden to customise the process of building the url for a dataSource request.
The default implementation uses an invocation of
fluid.stringTemplate
to interpolate elements from termMap and the directModel argument into the
template string held in url . By overriding this invoker, the user can implement a
strategy of their choosing. The supplied arguments to the invoker consist of the values
(url, termMap, directModel) taken from these options and the dataSource request arguments,
but the override can replace these with any IoC-sourced values in the invoker definition. |
In addition, a kettle.dataSource.URL
component will accept any options accepted by node's native
http.request
constructor – supported in
addition to the above are protocol
, host
, port
, headers
, hostname
, family
, localAddress
, socketPath
,
auth
and agent
. All of these options will be overriden by options of the same names supplied as the options
object
supplied as the last argument to the dataSource's get
and set
methods. This is a good way, for example, to send
custom HTTP headers along with a URL dataSource request. Note that any of these component-level options (e.g. port
,
protocol
, etc.) that can be derived from parsing the url
option will override the value from the url. Compare this
setup with the very similar one operated in the testing framework for
kettle.test.request.http
.
An alternative dataSource implementation is kettle.dataSource.file
- this is backed by the node filesystem API to
allow files to be read and written in various encodings. The interpolation support based on termMap
is very similar
to that for kettle.dataSource.URL
, but with the location template option named path
representing an absolute
filesystem path rather than the url
property of kettle.dataSource.URL
representing
a URL.
Exactly the same scheme based on the subcomponent named encoding
can be used to control content encoding for a
kettle.dataSource.file
as for a kettle.dataSource.URL
. Similarly, kettle.dataSource.file
supports
a further option named charEncoding
which can select between various of the character encodings supported by node.js.
Supported configurable options for a kettle.dataSource.file |
||
---|---|---|
Option Path | Type | Description |
writable |
Boolean (default: false ) |
If this option is set to true , a set method will be fabricated for this
dataSource – otherwise, it will implement only a get method. |
path |
String |
An (absolute) file path template, with interpolable elements expressed by terms beginning with the
% character, for the file which will be read and written the get and
set methods of this dataSource. |
termMap |
Object (map of String to String ) |
A map, of which the keys are some of the interpolation terms held in the url string, and
the values, if prefixed by % are paths into the directModel argument
accepted by the get and set methods of the DataSource. |
charEncoding |
String (default: utf8 |
The character encoding of the file used to convert its data to characters - one of the values supported
by the node filesystem
API - values it advertises include utf8 , ascii or based64 .
There is also evidence of support for ucs2 . |
A helpful mixin grade for kettle.dataSource.file
is kettle.dataSource.file.moduleTerms
which will allow
interpolation by any module name registered with the Infusion module system
fluid.module.register
– e.g. %kettle/tests/data/couchDataSourceError.json
.
kettle.dataSource.URL
has a subcomponent named encoding
which the user can override in order to choose the content
encoding used to convert the model seen at the get/set
API to the textual (character) form in which it is
transmitted by the dataSource. The encoding subcomponent will also correctly set the
Content-Type
header of the outgoing HTTP request in the
case of a set
request. The encoding defaults to a JSON encoding represented by a subcomponent of type
kettle.dataSource.encoding.JSON
. Here is an example of choosing a different encoding to submit
form encoded data to an HTTP
endpoint:
fluid.defaults("examples.formDataSource", {
gradeNames: "kettle.dataSource.URL",
url: "http://httpbin.org/post",
writable: true,
writeMethod: "POST",
components: {
encoding: {
type: "kettle.dataSource.encoding.formenc"
}
},
setResponseTransforms: [] // Do not parse the "set" response as formenc - it is in fact JSON
});
var myDataSource = examples.formDataSource();
var promise = myDataSource.set(null, {myField1: "myValue1", myField2: "myValue2"});
promise.then(function (response) {
console.log("Got dataSource response of ", JSON.parse(response));
}, function (error) {
console.error("Got dataSource error response of ", error);
});
In this example we set up a form-encoded, writable dataSource targetted at the popular HTTP testing site httpbin.org
sending a simple payload encoding two form elements. We use Kettle's built-in form encoding grade by configuring an
encoding
subcomponent name kettle.dataSource.encoding.formenc
. You can try out this sample live in its place in the
examples directory. Note that since this particular endpoint sends a JSON
response rather than a form-encoded response,
we need to defeat the dataSource's attempt to apply the inverse decoding in the response by writing
setResponseTransforms: []
.
Kettle features three built-in content encoding grades which can be configured as the subcomponent of a dataSource
named encoding
in order to determine what encoding it applies to models. They are described in this table:
Grade name | Encoding type | Content-Type header |
---|---|---|
kettle.dataSource.encoding.JSON |
JSON | application/json |
kettle.dataSource.encoding.JSON5 |
JSON5 | application/json5 |
kettle.dataSource.encoding.formenc |
form encoding | application/x-www-form-urlencoded |
kettle.dataSource.encoding.none |
No encoding | text/plain |
You can operate a custom encoding by implementing a grade with the following elements, and using it as the encoding
subcomponent in place of one of the built-in implementations in the above table:
Member name | Type | Description |
---|---|---|
parse |
Function (String) -> Any |
Parses the textual form of the data from its encoded form into the in-memory form |
render |
Function (Any) -> String |
Renders the in-memory form of the data into its textual form |
contentType |
String |
Holds the value that should be supplied in the |
Content-Type of an outgoing HTTP request whose body is |
||
encoded in this form |
Kettle includes a further mixin grade, kettle.dataSource.CouchDB
, which is suitable for reading and writing to the
doc
URL space of a CouchDB
database.
This can be applied to either a kettle.dataSource.URL
or a kettle.dataSource.file
(the latter clearly only useful
for testing purposes). This is a basic implementation which simply adapts the base documents in this API to a simple
CRUD contract, taking care of:
- Packaging and unpackaging the special
_id
and_rev
fields which appear at top level in a CouchDB document- The user's document is in fact escaped in a top-level path named
value
to avoid conflicts between its keys and any of those of the CouchDB machinery. If you wish to change this behavior, you can do so by providing different model transformation rules inoptions.rules.readPayload
andoptions.rules.writePayload
.
- The user's document is in fact escaped in a top-level path named
- Applying a "read-before-write" of the
_rev
field to minimise (but not eliminate completely) the possibility for a Couch-level conflict
This grade is not properly tested and still carries some (though very small) risk of a conflict during update – it should be used with caution. Please contact the development team if you are interested in improved Couch-specific functionality.
In this section are a few notes for advanced users of DataSources, who are interested in extending their functionality or else in issuing I/O in Kettle by other means.
The detailed implementation of the Kettle DataSource is structured around a particular device taken from the Infusion
Promises library, the concept of a
"transforming promise chain".
The core DataSource grade implements two events, onRead
and and onWrite
. These events are fired during the get
and
set
operations of the DataSource, respectively.
These events are better described as "pseudoevents" since they are not fired in the conventional way – rather than each
event listener receiving the same signature, each instead receives the payload returned by the previous listener – it
may then transform this payload and produce its own return in the form of a promise. Any promise rejection terminates
the listener notification chain and propagates the failure to the caller. The DataSource implementation in fact fires
these events by invoking the
fireTransformEvent
function from Infusion's Promises API.
The virtue of this implementation strategy is that extra stages of processing for the DataSource can be inserted and removed from any part of the processing chain by means of supplying suitable event priorities to the event's listeners. Both the JSON encoding/decoding and CouchDB wrapping/unwrapping facilities for the DataSources are implemented in terms of event listeners of this type, rather than in terms of conditional implementation code. This is a powerful and open implementation strategy which we plan to extend in future.
It's important that Kettle's inbuilt DataSources are used whenever possible when performing I/O from a Kettle
application, since it is crucial that any running implementation code is always properly contextualised by its
appropriate request component. Kettle guarantees that the
IoC context {request}
will always be resolvable
onto the appropriate request component from any code executing within that request. If arbitrary callbacks are supplied
to node I/O APIs, the code executing in them will not be properly contextualised. If for some reason a DataSource is
not appropriate, you can manually wrap any callbacks that you use by supplying them to the API kettle.wrapCallback
.
Get in touch with the dev team if you find yourself in this situation.