-
Notifications
You must be signed in to change notification settings - Fork 2
Home
Welcome to the ribose wiki!
Ginr was developed in the late 1980s by J Howard Johnson, then working at the University of Waterloo, as a research tool to aid in exploring regular sets and multi-dimensional finite automata. It was quickly adopted for an industrial-strength use case, transducing the Oxford English Dictionary from idiomatic layout to SGML. I obtained a license to use it in my work at Bell Northern Research and quickly developed a first version of a ribose-like framework, using ginr (aka inr) to describe idiomatic input features as regular sets over (ASCII)* and then extending these descriptions onto regular sets over collections of vectors of effectors (aka C function pointers) operating in a traditional CPU/RAM computing environment. This was used in a database mediation service, involving multiple data sources each with proprietary idiomatic formats for presenting data.
These were pre-XML days and the task of deserializing all of the different data formats in C with only string.h for support was a daunting one. Then I discovered ginr and was relieved to find how much tedious coding was eliminated by pushing data presentation concerns out of the application domain into patterns that expertly recognized features of interest and extracted them for assimilation into the mediation service domain. The functional requirements for effectors fell directly out of the input patterns, which provided syntactic maps detailing the location and structure of features of interest. Simple, cross-cutting, effectors like select, cut, copy, paste, clear
enabled precise extraction of raw data. Nested patterns were accommodated by start, stop
effectors operating on a push-down stack of regular patterns, and and in
effector operating on a stack of input streams enabled ASCII snippets or out-of-band signals to be injected into the input stream for immediate transduction. Contextual cues in the input pattern triggered these basic effectors while other syntactic cues coordinated the action of task-specific effectors to assimilate extracted data into the application domain.
This separation of syntactic and semantic concerns is an intrinsic aspect of pattern-oriented programming and it would be a huge win for developers and for service providers and consumers if pattern-oriented programming were to be widely adopted and used. A lot fewer calories would be burned getting jobs done, and everyone would be happier. I was unable at the time to convey this news to my colleagues, so these insights were lost at the time, and work on the ginr prototype was pulled from public view after some negative developments following the OED work. Fortunately, ginr is now open source and better than ever so I'm gonna give it another shot.
Traditional CPUs drive themselves! It's nuts. Sure, developers write programs that lay down the roads they run on but once they get started they never stop running. They need data to chew on and grab it from RAM as they go but RAM is slower than CPU as we know so they frequently have to not go forward and just run on the spot for a while. It's one instruction done then the next in sequence then twiddle for a while in a loop waiting on RAM then maybe hit a branching instruction and run a different road for a while. All this running, in the wind or on the spot, burns a lot of calories.
It takes a patient and sophisticated effort to instruct these machines to run the right roads and get jobs done. The very first programmable machines were just numeric calculating machines programmable as branching sequences of instructions. At first programs had to be hand-wired on switchboards but later they were represented using numeric opcodes that could be input with a numeric keypad. Of course this was very tedious so it became necessary to add a programmable typewriter feature to the numerical calculator machine and they knew they could do so because the latter was capable of almost anything, with careful and patient programming. So EBCDIC was invented and the job was done. Programmers could now write assembly code using digits and letters but it was still pretty hard.
This is wrong on two counts: 1) the programmable calculator machine had no stack and was only capable of sequencing and branching instructions in regular patterns and 2) EBCDIC (or Unicode) ordinals are the basis for a semiring of text media with concatenation and alternation, just as machine instruction sets are the basis for semirings of programmed instructions with sequencing and branching. It's just arithmetic, but with strings, and it difficult to understand why this hasn't been exploited in hardware to the extent that traditional but ever smaller and faster instruction-driven processing units have exploited numeric arithmetic, but with machine opcodes, to tediously manipulate text character ordinals in RAM.
The first machine to present instructions for maintaining a stack of instruction sequencing regular patterns was the Burroughs B5000 (1961). This launched Algol and the evolution of a zoo of algorithmic programming languages and a cloud of florid abstractions that continues today. These attempt to hide the complexity of programming but they put a lot of weight of the CPU runner's back and more calories are burned. There is just no way of getting around the fact that these programs are running on programmable calculators, with lots of RAM and a basic typewriter feature bolted on. They decode instructions that do arithmetic real fast and move stuff around in RAM not quite so fast. That's what they do. That's all they can do.
Sadly, as far as I know, there are no pattern-programmable data-driven transduction machines in production, even though streaming data and attendant processing consume an increasingly large proportion of network and CPU bandwidth. They are, if anything, science fiction artifacts that Phillip K. Dick might have enjoyed exploring. They would incorporate hardware to run FSTs real fast directly on top of fast serial data links, grabbing source input as soon as it became available in intended order and pushing it through a stack of transducers to extract (filter) features of interest and instruct (map) the CPU thing how to assimilate (reduce) them into the application or service domain. Programming these hybrid machines would involve composing nested transducers from regular patterns expressed in domain-specific semirings, and developers would be wizards working in semiring algebra to compose and consume these patterns.
Functional specifications for development tasks on the CPU side would be more tightly focused, falling directly out of the input patterns, since each pattern is essentially a structural map of semantic features. Development of target effectors on the CPU/RAM back end would be tightly-focused and simplified since the transduction front-end would take care of all the fussy character-by-character twiddling that would otherwise be expressed inline with object model construction in a generic software stack. The mapping of syntactic features to semantic effectors ensures that relevant features are extracted and injected into the target domain synchronously with the input stream as runtime transduction precisely schedules effector invocation in response to syntactic cues in the input.
If someone does produce an FST/CPU/RAM transduction machine there will be an incessant urge on the software side to add lots of abstract squirrelly curly-cue adornments to pattern expressions and that's not ok. Developers are well acquainted with nested patterns, that's how they program CPUs to execute nested regular patterns of instructions. But algorithmic programming languages and formal data exchange formats are all points in the same space (context-free grammars) and they tend to get complex and stretched out of shape as they try to accommodate all possible semantic models. There is no reason why developers should not use hard-core semiring algebraic formalisms to describe and navigate information sources or express and consume stable idiomatic service APIs. They can map formal syntactic descriptions of relevant features embedded in input patterns to CPU-side effectors that implement feature semantics and get fine-grained, domain-specific jobs done without involving unnecessary impediments. They can do this without googling "How do I work this" for every new thing that comes along or breaks down.
And therein lies the essential sweetness that ribose expresses. Semiring algebra is a simple and robust framework for describing sequential patterns and extending them onto subsequential (single valued) transducers. Regular patterns are ubiquitous in byte-encoded media, whether embedded in formal or idiomatic formats. Pattern oriented design and development presents a unified approach to processing information streams that simplifies software development and obviates dependency on specialized external libraries that support common data formats.
It's just arithmetic, but with strings.
Wait for it.
I hesitate to put this forward because I do not want to give the impression that ribose is all about front-ending service APIs. There are myriad use cases for pattern-driven transduction waiting to be discovered. But here is a use case to put things in perspective.
I like the part about...
make the opponents communicate over a socket
Best way to hide your code! Put a domain-specific transducer on the receiving end of the socket to rip through the JSON/XML/HTML your opponent is sending and pull out just what you need, without involving a great stack of algorithmic code that knows everything about anything that can be expressed in JSON/XML/HTML and so on. And the wee bits of algorithmic code that your domain-specific target expresses as effectors to the transduction run-time know only and expertly about your target domain.
So, it's all better? Not quite. JSON/XML/HTML are all formalisms expressed as a compromise to standardize information exchange between remote processes in formats that are readable by people and easy to parse by machines. They are generalized abstract specifications each covering a broad range of possible features relevant in unimagined domains and are not readily applicable to specific tasks involving HTML or XML or whatever information sources. If you look closely you will see that they are all essentially expressions of a common structure -- LR(0) context-free grammars. Just like the the grammars used to compile code. And an LR(0) grammar is nothing more than a nesting of things that can be described precisely in semiring algebra. We use the same formalisms to describe the specializations that we carve out of the Unicode semiring to convey information regardless of whether we are describing algorithms or data structures.
Service providers typically use these generalized specifications to express remote service APIs as examples. These suggest what service requests and response features might "look like" and leave developers to figure out what they "mean". Developers are pretty much forced to rely on heavy-weight software stacks to deal with the syntactic aspects of consuming service APIs. These software stacks are as generic as their respective presentation standards and they parse, extract, and present all recognizable features in a domain object model so that consumers can cherry-pick features of interest in their specific domain.
It takes time to gain experience with serialization frameworks and commensal software libraries, like stAX for XML. The proliferation of these frameworks forces developers to invest in each new framework and contributes to burn out. Since all of these very special formalisms all boil down to the same basic algebraic structure strung out on a stack, one has to ask why we don't use semiring algebra directly to describe data as nested regular patterns expressed exclusively in terms that are relevant to their semantic domains. That's the focus of this important use case. Spoiler alert: it's gonna be semirings all the way down.
Instead of presenting specific and incomplete examples of API request and response, suppose we represent pattern-oriented API request and response as nested regular expressions presented in (<feature-pattern>, <label>[annotation...])
terms, each mapping a feature presentation to a semantic construct. Request and response syntactic features are precisely expressed in the Unicode semiring and semantics are easily inferred from labels and annotations by anyone with domain knowledge. APIs are expressed in terms that are structured, easy to read and to parse, and compile directly to nested compositions of transducers that map feature syntax to domain semantics.
No standards body is involved here -- semiring algebra is as likely to sprout new features as is numeric arithmetic. In a pattern-oriented context service providers are free to present information in terms that are unambiguous and specifically relevant in their service domain. They can provide a domain dictionary to ensure that there are no doubts as to the meaning of semantic labels and annotations presented in their service APIs. Semantic terms are stable and not necessarily impacted by syntactic changes in feature representation. Service providers and consumers express and exchange information using unambiguous formal patterns prescribing the syntax and semantics of features embedded in API request and response contents. Consumers can specialize response patterns to gloss quickly over irrelevant features and can develop specialized effectors to enhance semantic interpretation of interesting features.
The only software dependency external to service and consumer concerns is something like [ginr+ribose] that can render consumer specializations of service responses and service provider specializations of consumer requests as stacked transducers that read response and request content in a single LR(0) pass to extract and act upon features of interest as they discovered in the respective input streams.
That just about wraps this use case up, but here's an example to help nail things down.
Any API can be expressed as a mapping of presentation syntax to service semantics:
api.request = (feature-syntax, feature-semantics)*
api.response = (feature-syntax, feature-semantics)*
Ginr can be viewed as a service provider presenting a compiler for multidimensional regular patterns. With little effort it could be encapsulated as a microservice and deployed within a service mesh with a Compile API receiving ginr pattern source per request and responding per request with the serialized representation of compiled automata. As things stand, ginr serializes compiled automata to a file-based store. So this exercise will use a file-based input channel but a receiving socket channel would serve as well. The channel is just the medium for communication between "opponents", the pattern prescribes the forms that express meaningful messages.
To provide service consumers with a leg up on processing API messages, API syntax is expressed as formal patterns in the Unicode semiring. Service semantics are expressed as labels with or without annotations, counts serving to delimit the expression of binary features, and signals to bridge semantic constructs over regular operators (eg, break out of a counting loop when the counter signal is received). These are inserted at syntactic cut points to mark the location and meaning of associated syntactic features. Binary features are demarcated by a preceding numeric text representation of the number of bytes spanned by the semantic feature and an out-of-band (>255) signal to mark the end of the feature.
Please see the ginr :save
format specification for reference. The API as a whole is expressed as a regular pattern in a semiring of (feature-syntax, feature-semantics)
terms. The service provider would ordinarily provide a service domain dictionary defining the semantic terms but I'll offer the terms used here as self-explanatory. The API response content is the serialized form of a ginr state-minimized subsequential (single-valued) transducer. The API response presentation is a precisely defined function mapping strings representing ginr transducers to sequences of semantic terms describing the associated syntactic features.
The API response body is just a tab-delimited one-line header with an alphanumeric version followed by 4 numeric fields and a list of one or more transitions, each represented as a tab-delimited line with 4 numeric fields and a binary data field demarcated by a byte counter expressed in the last numeric field. This API interleaves a complete syntactic description on tape 0 (Automaton$0) with a concise semantic description on tapes 1 and 2 (Automaton$(1,2)). To consume this API, ribose decouples the syntactic pattern from the API presentation below (Automaton$0
) and extends it in a similar manner to map API input onto the ribose compiler target, which expresses custom effectors to apply ribose semantics to ginr semantic features embedded in the input.
Automaton = (
'INR'
(digit+ tab, version[210])
(digit+ tab, tapes)
(digit+ tab, transitions)
(digit+ tab, states)
(digit+ nl, symbols header)
(
(digit+ tab, from)
(digit+ tab, to)
('-'? digit+ tab, tape)
(digit+ tab, count[end])
(byte, count)* (end, symbol)
(nl, transition)
)*
(nil, automaton)
);
$ head -n 3 build/automata/HelloWorld.dfa
INR210 3 34 35 278
0 2 0 3 nil
2 3 1 5 clear
For any API expressed in this manner, the input syntax on tape 0 can be extended similarly to focus on other target models. The ribose runtime class TRun
expresses core effectors that are sufficient for linear and context-free transductions that extract and composite data onto an output stream. Service API providers could use this capability to provide a service model including transducers that render concise API messages containing mixed UTF-8 abd binary data in more verbose and readable formats (XML?). Service providers can also present a validation model for a suite of API messages, along with a Java library containing a validation target and effectors. Service consumers can then deploy a composite target expressing the union of the consumer service model and service validation model where, for each consumed API, the respective transduction patterns are joined algebraically into a single transduction pattern that triggers validation effectors ahead of consumer effectors at each step of the transduction.
((V$(1 2,0)) @@ (T$(0,1 2)))$(1,0,2) @ ((((V T)$0:alph)$(0,0))* (((V T)$1:alph)$(0,,0))* (((V T)$2:alph)$(0,,,0))*)*
A validating transducer V can be joined with any consumer transducer T (assuming V$0 == T$0) and tapes reordered to realize a transducer that runs V T in lockstep with serial input, with V validating only features of interest before presenting them to T.
Ribose is a consumer of ginr's Compile API. It needs to map the syntactic features of ginr's API pattern into its own service domain. The API pattern is easily transformed into a transducer pattern that uses the built-in ribose effectors plus three domain-specific effectors to extract INR header and transition features from a compiled ginr automata and assemble them into a ribose runtime transducer. The header, transition and automaton effectors apply ribose-specific semantic constraints on ginr automata -- they must be subsequential with at least 2 and at most 3 tapes, can reference only effectors defined for the target and transductor, and signal and value references must be well formed. Together they effect a lossless compression of the transducer transition matrix, squeezing equivalent input symbols into an enumeration of equivalence classes and pulling non-branching chains of (effector parameter*)+
transitions into an enumerated set of effect vectors.
The following pattern is used by the ribose transducer compiler to transduce ginr automata conforming to the ribose (input, effector[parameter*])
schema. The ribose transducer compiler compiles this pattern into the Automaton transducer in the bootstrap ribose compiler model. This is then used to transduce other collections of ginr automata to transducers in domain-specific ribose models. In this case the input pattern is well-structured and unambiguous so field names and other syntactic add-ons are not required. The syntactic structure imposed by the tab and newline delimiters in the API pattern is sufficient to map syntactic input features onto semantic features in the domain model.
Inr = 'INR';
Eol = 'cr? nl';
Sign = ('-', paste);
Number = (digit, paste)+;
Field = clear[X] select[X];
Symbol = [count[`~length` `!eol`]] (byte, paste count)* eol;
Automaton = (nil?
(Inr, Field @ (X,`~version`)*) Number
(tab, Field @ (X,`~tapes`)*) Number
(tab, Field @ (X,`~transitions`)*) Number
(tab, Field @ (X,`~states`)*) Number
(tab, Field @ (X,`~symbols`)*) Number
(Eol, header) [Field @ (X,`~from`)*]
(
Number
(tab, Field @ (X,`~to`)*) Number
(tab, Field @ (X,`~tape`)*) Sign? Number
(tab, Field @ (X,`~length`)*) Number
(tab, Field @ (X,`~symbol`)*) Symbol
(Eol, transition) [Field @ (X,`~from`)*]
)*
(eos, automaton stop)
):dfamin;
Automaton$(0,1 2):prsseq `build/ribose/automata/Automaton.pr`;
Automaton:save `build/ribose/automata/Automaton.dfa`;
$ cat patterns/alpha.inr patterns/ribose/Automaton.inr | ginr
Automaton ---------- DFA MIN States: 127 Trans: 565 Tapes: 3 Strg: 25 K
$ etc/sh/compile --target com.characterforming.ribose.TCompile build/ribose/automata build/ribose/TCompile.model
Transducer Automaton: 12 input equivalence classes, 25 states, 45 transitions
This transducer pattern is a cut-and-paste makeover of the API specification. It just extends the syntactic feature descriptions to capture (paste) and decode features into a simple object model suitable for refactoring into a ribose transducer. The TCompile class is the target for the construction of all ribose models, including TCompile.model. It uses the generic effectors select, paste, cut, clear, ...
and expresses domain-specific header, transition, automaton
effectors to assemble ribose transducers from ginr automata and persist them in ribose runtime models.
This is ribose working in its own domain, using the current ribose compiler model to build the next compiler model encapsulating the latest revision of the ginr Compile API. In this case there is just one input automaton (Automaton.dfa) to compile and the new model (TCompile.model) will contain only one transducer (Automaton, from Automaton.dfa) and its supporting cast of effectors, named values and signals.
What follows is an abridged listing showing how this all works together. The process starts in ModelCompiler
(the implementation superclass for TCompile
), which defines a couple of immutable value objects to contain transient Header
and Transition
features extracted from the input.
class Header {
final int version;
final int tapes;
final int transitions;
final int states;
final int symbols;
Header(ModelCompiler target, int version, int tapes, int transitions, int states, int symbols) {
//Validate arguments here
this.version = version;
this.tapes = tapes;
this.transitions = transitions;
this.states = states;
this.symbols = symbols;
}
}
class Transition {
final int from;
final int to;
final int tape;
final byte symbol[];
final boolean isFinal;
Transition(ModelCompiler target, int from, int to, int tape, byte[] symbol) {
//Validate arguments here
this.isFinal = (to == 1 && tape == 0 && symbol.length == 0);
this.from = from;
this.to = to;
this.tape = tape;
this.symbol = symbol;
}
}
As a composite target, TCompile
presents the union of the Transductor
and ModelCompiler
target effectors, and the Automaton
transducer invokes the effectors to apply ribose compiler semantics to features of interest extracted from serialized ginr automata. The Transductor
target is implicitly included in all ribose models. The ModelCompiler
target expresses three effectors that cooperate to assimilate extracted features. One of these extracts and retain ginr's header information ...
public class HeaderEffector extends BaseEffector<ModelCompiler> {
INamedValue fields[];
HeaderEffector(ModelCompiler automaton) {
super(automaton, Bytes.encode("header"));
}
@Override
public void setOutput(IOutput output) throws TargetBindingException {
super.setOutput(output);
fields = new INamedValue[] {
super.output.getNamedValue("version"),
super.output.getNamedValue("tapes"),
super.output.getNamedValue("transitions"),
super.output.getNamedValue("states"),
super.output.getNamedValue("symbols")
};
}
@Override
public int invoke() throws EffectorException {
Header h = new Header(target,
(int)fields[0].asInteger(),
(int)fields[1].asInteger(),
(int)fields[2].asInteger(),
(int)fields[3].asInteger(),
(int)fields[4].asInteger()
);
target.header = h;
target.stateTransitionMap = new HashMap<Integer, ArrayList<Transition>>((h.states * 5) >> 2);
target.transitions = new Transition[h.transitions];
return IEffector.RTE_EFFECT_NONE;
}
}
Another effector extracts and retains transitions ...
public class TransitionEffector extends BaseEffector<ModelCompiler> {
INamedValue fields[];
TransitionEffector(ModelCompiler automaton) {
super(automaton, Bytes.encode("transition"));
}
@Override
public void setOutput(IOutput output) throws TargetBindingException {
super.setOutput(output);
fields = new INamedValue[] {
super.output.getNamedValue("from"),
super.output.getNamedValue("to"),
super.output.getNamedValue("tape"),
super.output.getNamedValue("symbol")
};
}
@Override
public int invoke() throws EffectorException {
Transition t = new Transition(target,
(int)fields[0].asInteger(),
(int)fields[1].asInteger(),
(int)fields[2].asInteger(),
fields[3].copyValue()
);
if (t.isFinal) {
return IEffector.RTE_EFFECT_NONE;
}
if ((t.tape >= 0) && (t.tape < 3)) {
HashMap<Integer, Integer> rteStates = target.stateMaps[t.tape];
if (rteStates == null) {
rteStates = target.stateMaps[t.tape] = new HashMap<Integer, Integer>(256);
}
if (!rteStates.containsKey(t.from)) {
rteStates.put(t.from, rteStates.size());
}
switch (t.tape) {
case 0:
target.compileInputToken(t.symbol);
break;
case 1:
target.compileEffectorToken(t.symbol);
break;
case 2:
target.compileParameterToken(t.symbol);
break;
}
ArrayList<Transition> outgoing = target.stateTransitionMap.get(t.from);
if (outgoing == null) {
outgoing = new ArrayList<Transition>(16);
target.stateTransitionMap.put(t.from, outgoing);
}
outgoing.add(t);
target.transitions[target.transition] = t;
++target.transition;
}
return IEffector.RTE_EFFECT_NONE;
}
}
And one final effector to pull it all together and assemble the ribose transducer from the transient representation of ginr's automaton.
public class AutomatonEffector extends BaseEffector\<ModelCompiler\> {
public AutomatonEffector(ModelCompiler target) {
super(target, Bytes.encode("automaton"));
}
@Override
public int invoke() throws EffectorException {
final Integer[] inrInputStates = target.getInrStates(0);
if (inrInputStates == null) {
String msg = "Empty automaton " + target.getTransducerName();
Model.rtcLogger.log(Level.SEVERE, msg);
throw new EffectorException(msg);
}
for (final ArrayList<Transition> transitions : target.stateTransitionMap.values()) {
transitions.trimToSize();
}
final int[][][] transitionMatrix = new int[target.model.getSignalLimit()][inrInputStates.length][2];
for (int i = 0; i < transitionMatrix.length; i++) {
for (int j = 0; j < inrInputStates.length; j++) {
transitionMatrix[i][j][0] = j;
transitionMatrix[i][j][1] = 0;
}
}
target.effectorVectorMap = new HashMap<Ints, Integer>(1024);
target.effectorVectorList = new ArrayList<Integer>(8196);
target.effectorVectorList.add(0);
for (final Integer inrInputState : inrInputStates) {
for (final Transition t : target.getTransitions(inrInputState)) {
if (t.isFinal) {
continue;
}
if (t.tape != 0) {
target.error(String.format("%1$s: Ambiguous state %1$d",
target.getTransducerName(), t.from));
continue;
}
try {
final int rteState = target.getRteState(0, t.from);
final int inputOrdinal = target.model.getInputOrdinal(t.symbol);
final Chain chain = target.chain(t);
if (chain != null) {
final int[] effectVector = chain.getEffectVector();
transitionMatrix[inputOrdinal][rteState][0] = target.getRteState(0, chain.getOutS());
if (chain.isEmpty()) {
transitionMatrix[inputOrdinal][rteState][1] = 1;
} else if (chain.isScalar()) {
transitionMatrix[inputOrdinal][rteState][1] = effectVector[0];
} else {
Ints vector = new Ints(effectVector);
Integer vectorOrdinal = target.effectorVectorMap.get(vector);
if (vectorOrdinal == null) {
vectorOrdinal = target.effectorVectorList.size();
for (final int element : effectVector) {
target.effectorVectorList.add(element);
}
target.effectorVectorMap.put(vector, vectorOrdinal);
}
transitionMatrix[inputOrdinal][rteState][1] = -vectorOrdinal;
}
}
} catch (CompilationException e) {
target.error(e.getMessage());
}
}
}
if (target.errors.isEmpty()) {
target.factor(transitionMatrix);
}
return IEffector.RTE_EFFECT_NONE;
}
}
As for the ITarget implementation methods ...
ITransductor transductor;
...
ModelCompiler() {
this.transductor = null;
...
}
private void setTransductor(ITransductor transductor) {
this.transductor = transductor;
}
@Override
public IEffector<?>[] getEffectors() throws TargetBindingException {
return new IEffector<?>[] {
new HeaderEffector(this),
new TransitionEffector(this),
new AutomatonEffector(this)
};
}
@Override
public String getName() {
return this.getClass().getSimpleName();
}
...
}
The code below outlines how the ribose model compiler transduces ginr service output obtained for ribose transducer patterns. The model compiler preloads the service output into a byte[]
array and instantiates an ITransductor
, binding it to a TCompile
instance implementing the ITarget
interface. The input array is pushed onto the transductor input stack, along with a nil
signal to trigger the first transition, and the Automaton
transducer is pushed onto the transducer stack and set to its initial state. The transductor is then run, engaging the TCompile
target and its effectors, until the input stack is empty. A final eos
is pushed to signal end of input and the transductor is run once more to finalize and persist the compiled transducer in a ribose model file. The transductor is then stopped to clear its stacks and reset for the next transduction.
Bytes automaton = Bytes.encode(this.encoder, "Automaton");
if (this.transductor.stop()
.push(bytes, size).push(Signal.nil)
.start(automaton).status().isRunnable()
) {
if (this.transductor.run().status().isPaused()) {
this.transductor.push(Signal.eos).run();
}
}
if (this.errors.size() == 0)
this.save();
}
this.transductor.stop();
That is how ribose as a consumer of the ginr pattern compiler service API would transduce ginr output from compilation of a ribose transducer pattern. The Automaton
transducer is itself a product of this process, obtained by running ginr and the model compiler to compile a new Automaton
pattern to a ginr automaton, transduce the automaton using the current generation of the Automaton
transducer, and persist the new Automaton
transducer in TCompile.model
.
Normally, a collection of ribose transduction patterns representing API response patterns for specific service requests is compiled to a collection of ginr automata in a directory for the ribose model compiler to transduce into a specialized model file. In the Java runtime the ribose model is loaded into a ribose runtime, which provides a capability to bind live targets to transductors. Transductors are tightly bound to their target and effectors but are free agents for transduction and can be reused after each transduction. A transduction is a transient binding of a transductor with an input source and a stack of nested transducers that ends when the transductor has run the input or transductor stack dry, as in the example above. After sending a service request a transductor on the receiving channel interface starts a transduction with the specific response transducer and runs it on response input as it is received at the channel interface.
There are a few points to make regarding the above example. These apply as well to pattern-oriented services and applications in other domains.
- There are no syntax-related concerns expressed in the Java source for target or effectors. None.
- Features not incorporated into target (eg 'INR') are quickly passed over with nil effect.
- Target and effectors and ribose runtime have no dependencies on external libraries. None.
- Input UTF-8 bytes are not decoded for transduction, only extracted UTF-8 bytes are decoded.
Ginr patterns are expressed in Unicode and represented as encoded UTF-8 or binary bytes in compiled automata. Ribose transductions are effected in a semiring of bytes [0x0..0xff]
and out-of-band signals [0x100..)
, obviating decoding and widening UTF-8 input to 16-bit Unicode for transduction. Only features of interest are extracted, as sequences of bytes, and target effectors decode them to Java primitives (integer, double, byte[] or Unicode String) only once before assimilating feature into target. Transduction inputs may contain any mix of textual and binary data as long as prior knowledge of the size of binary features is available (eg, Transition.symbol
).
When I say domain-specific I mean it's all yours.
You (service provider) propose API request and response patterns and iteratively refine them to suit consumer needs. You and your service consumers need none of these. None. You and your consumers are de facto domain experts and use a common domain dictionary to define semantic terms and annotations. You use semiring algebra to express API request and response as precise formal patterns weaving semantic guidance into syntactic representations of embedded features.
Developers working with your APIs simply modify your API presentation patterns to map syntactic features of interest onto target effectors in the consumer or service domain. They do not need to pore through endless tutorials, they just consult your domain dictionary when they get confused. They have zero dependencies on external libraries since their effectors generally deal with API-specific compositions of simple value objects. Domain target objects express these effectors to extract, decode and assimilate data into the target object model, and the target interacts with other domain objects to get jobs done.
Generic formalisms for expressing data structures ensure, for example, that a stAX stack loaded into a process in a Shopify server can parse and validate an XML document describing the structure of a protein. That is their core value statement. And that is likely what they will do if they are presented with an unexpected but formally valid document, whether it is relevant to domain workflows or not. Service providers and consumers interacting in a specific semantic domain understand the nature of the information that they are exchanging and are best informed as to how information should be structured for exchange within the domain and with incidental consumers reaching out from other domains.
Moreover, API patterns that are highly specialized and precisely specified enable quick rejection of unacceptable inputs. Ecosystems comprised of a multitude of highly specialized automata have fewer systemic vulnerabilities such as are exposed by generic and widely used software libraries. And since XML, JSON, YAML, etc require you to know a great deal about their respective domains yet know nothing about your domain, you have to ask: What are they doing here?
Make them go away.
Sure it is. Remember when you had to learn the multiplication table? GCD? It was hard because you had never had to deal much with numbers beyond counting fingers and toes. You got over it and it stuck. You probably still remember 7*9=63. Sure you do. It's burned in.
You're older and wiser now. You know the math just keeps getting harder and harder. But it's not just numbers anymore it's all about new programming idioms and APIs and they sprout and morph like clouds. You've been through XML, JSON, WSDL, YAML and more with all of their vulnerabilities and encumbrances and you know the next one is just around the corner. Sooner or later you'll be burned out.
Unfortunate it is that we weren't taught at an early age about multiplication and addition and repetition with strings of letters and digits and other glyphs. It takes a little bit of effort to burn in but it goes a long way and like integer arithmetic it doesn't morph or sprout new features. If we had all invested in learning this early alongside the numeric stuff, then the folks who invented EBCDIC might have recognized a Great Big Use Case for building hybrid FST/CPU machines and bundling them with lots of RAM to work in a broad range of domains involving text and other sequential media. Beyond bombs and business machines and weather prediction and without involving mountains of hand-coded software libraries maintained by anonymous developers all over the world to make the CPU look like anything other than a programmable calculator. For free pizza and pop, if they are lucky enough to have a sponsor (thanks Apache folks!).
Had that happened (it didn't) we would have had in the 1990s a microservice ecosystem like what we are trying to pull together today on endless racks of servers all over the world. Tightly focused and compartmentalized microservices embedded in a high-bandwidth network mesh would present request/response APIs as domain-specific nested regular patterns and communicate over fast serial digital links. Pattern-driven FST hardware would stream service requests and responses onto LR(0) tranductors, each composing a stack of FSTs to transduce raw input off the wire and map it byte by byte onto CPU instructions directing feature extraction and assimilation into the service domain. APIs would be expressed in semiring formalisms without reference to poetic generalized grammars, procedure calls would be intrinsically remote, and service implementations would be tightly focused and easily compartmentalized.
But that hasn't happened, yet. Don't know why.
It's just strings, but with arithmetic.
Copyright 2022,2023 Kim Briggs (except 'Ginr', copyright J Howard Johnson)