-
Notifications
You must be signed in to change notification settings - Fork 2
Home
Welcome to the ribose wiki!
Let me introduce ribose. This is a bit long-winded but it comes to an sharp and important point. Then we'll get to how it all works and how something like ribose could save developers and service providers and consumers a lot of time and trouble. The story starts back in the late 80s when I was faced with a grungy task that involved a couple of kinks that present difficulty for instruction-driven CPUs and the algorithmic languages we use to instruct them.
First, textual information was presented in myriad random proprietary formats embedding features to be extracted. That meant writing and maintaining an algorithm for each known source artifact. Information sources were pre-internet online services -- legal and bibliographic and SQL databases, each presenting different artifacts in idiomatic formats. Algorithms were to be coded in C and there were no helpful C libraries in the picture. Having string.h
was no help because...
Second, these algorithms were communicating with remote services over a slow serial link and were receiving text in small snippets without regard to information structure. They could not stop and wait on the link after picking through each snippet because the thread that was driving them had other important things to do. Algorithmic languages break down here -- they want to instruct the CPU thing (that's their job) but they can't go forward and the CPU thing never, ever stops until the lights go out.
Ginr was developed as a research tool to aid in exploring regular sets and multi-dimensional finite automata. It was quickly adopted for an industrial-strength use case, transducing the Oxford English Dictionary from idiomatic layout to SGML. I obtained a license to use it in my work at Bell Northern Research and quickly developed a first version of a ribose-like framework, using ginr (aka inr) to describe idiomatic input features as regular sets over (ASCII)* and then extending these descriptions onto regular sets over collections of vectors of effectors (aka C function pointers) operating in a traditional CPU/RAM computing environment.
I was relieved to discover how much tedious coding was eliminated by pushing data presentation concerns out of the application domain into patterns that expertly recognized features of interest and extracted them for assimilation into the application domain. The functional requirements for the effectors fell directly out of the input pattern, which provided a syntactic map detailing the location and structure of features of interest. Simple, cross-cutting, effectors like select, cut, copy, paste, clear
enabled precise extraction of raw data. Nested patterns were accommodated by start, stop
effectors operating on a push-down stack of regular patterns, and and in
effector operating on a stack of input streams enabled ASCII snippets or out-of-band signals to be injected into the input stream for immediate transduction. Contextual cues in the input pattern triggered these basic effectors while other syntactic cues coordinated the action of task-specific effectors to assimilate extracted data into the application domain.
This separation of syntactic and semantic concerns is an intrinsic aspect of pattern-oriented programming and it would be a huge win for developers and for service providers and consumers if pattern-oriented programming were to be widely adopted and used. A lot fewer calories would be burned getting jobs done, and everyone would be happier. I was unable at the time to convey this news to my colleagues, so these insights were lost at the time, and work on the ginr prototype was pulled from public view after some negative developments following the OED work. Fortunately, ginr is now open source and better than ever so I'm gonna give it another shot.
Traditional CPUs drive themselves! It's nuts. Sure, developers write programs that lay down the roads they run on but once they get started they never stop running. They need data to chew on and grab it from RAM as they go but RAM is slower than CPU as we know so they frequently have to not go forward and just run on the spot for a while. It's one instruction done then the next in sequence then twiddle for a while in a loop waiting on RAM then maybe hit a branching instruction and run a different road for a while. All this running, in the wind or on the spot, burns a lot of calories.
It takes a patient and sophisticated effort to instruct these machines to run the right roads and get jobs done. The very first programmable machines were just numeric calculating machines programmable as branching sequences of instructions. At first programs had to be hand-wired on switchboards but later they were represented using numeric opcodes that could be input with a numeric keypad. Of course this was very tedious so it became necessary to add a programmable typewriter feature to the numerical calculator machine and they knew they could do so because the latter was capable of almost anything, with careful and patient programming. So EBCDIC was invented and the job was done. Programmers could now write assembly code using digits and letters but it was still pretty hard.
This is wrong on two counts: 1) the programmable calculator machine had no stack and was only capable of sequencing and branching instructions in regular patterns and 2) EBCDIC (or Unicode) ordinals are the basis for a semi-ring of text media with concatenation and alternation, just as machine instruction sets are the basis for semi-rings of programmed instructions with sequencing and branching. It's just arithmetic, but with strings, and it difficult to understand why this hasn't been exploited in hardware to the extent that traditional but ever smaller and faster instruction-driven processing units have exploited numeric arithmetic, but with machine opcodes, to tediously manipulate text character ordinals in RAM.
The first machine to present instructions for maintaining a stack of instruction sequencing regular patterns was the Burroughs B5000 (1961). This launched Algol and the evolution of a zoo of algorithmic programming languages and a cloud of florid abstractions that continues today. These attempt to hide the complexity of programming but they put a lot of weight of the CPU runner's back and more calories are burned. There is just no way of getting around the fact that these programs are running on programmable calculators, with lots of RAM and a basic typewriter feature bolted on. They decode instructions that do arithmetic real fast and move stuff around in RAM not quite so fast. That's what they do. That's all they can do.
Sadly, as far as I know, there are no pattern-programmable data-driven transduction machines in production, even though streaming data and attendant processing consume an increasingly large proportion of network and CPU bandwidth. They are, if anything, science fiction artifacts that Phillip K. Dick might have enjoyed exploring. They would incorporate hardware to run FSTs real fast directly on top of fast serial data links, grabbing source input as soon as it became available in intended order and pushing it through a stack of transducers to extract (filter) features of interest and instruct (map) the CPU thing how to assimilate (reduce) them into the application or service domain. Programming these hybrid machines would involve composing nested transducers from regular patterns expressed in domain-specific semi-rings, and developers would be wizards working in semi-ring algebra to compose and consume these patterns.
Functional specifications for development tasks on the CPU side would be more tightly focused, falling directly out of the input patterns, since each pattern is essentially a structural map of semantic features. Development of target effectors on the CPU/RAM back end would be tightly-focused and simplified since the transduction front-end would take care of all the fussy character-by-character twiddling that would otherwise be expressed inline with object model construction in a generic software stack. The mapping of syntactic features to semantic effectors ensures that relevant features are extracted and injected into the target domain synchronously with the input stream as runtime transduction precisely schedules effector invocation in response to syntactic cues in the input.
If someone does produce an FST/CPU/RAM transduction machine there will be an incessant urge on the software side to add lots of abstract squirrelly curly-cue adornments to pattern expressions and that's not ok. Developers are well acquainted with nested patterns, that's how they program CPUs to execute nested regular patterns of instructions. But algorithmic programming languages and formal data exchange formats are all points in the same space (context-free grammars) and they tend to get complex and stretched out of shape as they try to accommodate all possible semantic models. There is no reason why developers should not use hard-core semi-ring algebraic formalisms to describe and navigate information sources or express and consume stable idiomatic service APIs. They can map formal syntactic descriptions of relevant features embedded in input patterns to CPU-side effectors that implement feature semantics and get fine-grained, domain-specific jobs done without involving unnecessary impediments. They can do this without googling "How do I work this" for every new thing that comes along or breaks down.
And therein lies the essential sweetness that ribose expresses. Semi-ring algebra is a simple and robust framework for describing sequential patterns and extending them onto subsequential (single valued) transducers. Regular patterns are ubiquitous in byte-encoded media, embedded in idiomatic formats or formal data exchange formats like XML or JSON. Pattern oriented design and development presents a unified approach to processing information streams that simplifies software development and obviates dependency on specialized external libraries that support common data formats.
It's just arithmetic, but with strings.
Wait for it.
I hesitate to put this forward because I do not want to give the impression that ribose is all about front-ending service APIs. There are myriad use cases for pattern-driven transduction waiting to be discovered. But here is a use case to put things in perspective.
I like the part about...
make the opponents communicate over a socket
Best way to hide your code! Put a domain-specific transducer on the receiving end of the socket to rip through the JSON/XML/HTML your opponent is sending and pull out just what you need, without involving a great stack of algorithmic code that knows everything about anything that can be expressed in JSON/XML/HTML and so on. And the wee bits of algorithmic code that your domain-specific target expresses as effectors to the transduction run-time know only and expertly about your target domain.
So, it's all better? Not quite. JSON/XML/HTML are all formalisms expressed as a compromise to standardize information exchange between remote processes in formats that are readable by people and easy to parse by machines. They are generalized abstract specifications each covering a broad range of possible features relevant in unimagined domains and are not readily applicable to specific tasks involving HTML or XML or whatever information sources. If you look closely you will see that they are all essentially expressions of a common structure -- LR(0) context-free grammars. Just like the the grammars used to compile code. And an LR(0) grammar is nothing more than a nesting of things that can be described precisely in semi-ring algebra. We use the same formalisms to describe the specializations that we carve out of the Unicode semi-ring to convey information regardless of whether we are describing algorithms or data structures.
Service providers typically use these generalized specifications to express remote service APIs as examples. These suggest what service requests and response features might "look like" and leave developers to figure out what they "mean". Developers are pretty much forced to rely on heavy-weight software stacks to deal with the syntactic aspects of consuming service APIs. These software stacks are as generic as their respective presentation standards and they parse, extract, and present all recognizable features in a domain object model so that consumers can cherry-pick features of interest in their specific domain.
It takes time to gain experience with the underlying frameworks and the software libraries, like stAX for XML, that parse and serve up domain object models from abstract data representations. Forcing developers to do this all over again for each new same thing that comes along in a different suit contributes to burn out. And since all of these very special formalisms for representing data all boil down to the same basic algebraic structure strung out on a stack, one has to ask why we don't use semi-ring algebra directly to describe data as nested regular patterns expressed exclusively in terms that are relevant to their semantic domains. That's the focus of this important use case. Spoiler alert: it's gonna be semi-rings all the way down.
Instead of presenting specific and incomplete examples of API request and response, suppose we represent pattern-oriented API request and response as nested regular expressions presented in (<feature-pattern>, <label>[annotation...])
terms, each mapping a feature presentation to a semantic construct. Request and response syntactic features are precisely expressed in the Unicode semi-ring and semantics are easily inferred from labels and annotations by anyone with domain knowledge. APIs are expressed in terms that are structured, easy to read and to parse, and compile directly to nested compositions of transducers that map feature syntax to domain semantics.
No standards body is involved here -- semi-ring algebra is as likely to sprout new features as is numeric arithmetic. In a pattern-oriented context service providers are free to present information in terms that are unambiguous and specifically relevant in their service domain. They can provide a domain dictionary to ensure that there are no doubts as to the meaning of semantic labels and annotations presented in their service APIs. Semantic terms are stable and not necessarily impacted by syntactic changes in feature representation. Service providers and consumers express and exchange information using unambiguous formal patterns prescribing the syntax and semantics of features embedded in API request and response contents. Consumers can specialize response patterns to gloss quickly over irrelevant features and can develop specialized effectors to enhance semantic interpretation of interesting features.
The only software dependency external to service and consumer concerns is something like [ginr+ribose] that can render consumer specializations of service responses and service provider specializations of consumer requests as stacked transducers that read response and request content in a single LR(0) pass to extract and act upon features of interest as they discovered in the respective input streams.
That just about wraps this use case up, but here's an example to help nail things down.
Please see the ginr :save
format specification for reference.
Ginr as it stands can be viewed as a service provider presenting a compiler for multidimensional regular patterns and a file-based repository for compiled automata. With some effort it could be encapsulated as a microservice and deployed within a service mesh with a Compile API receiving ginr pattern source per request and responding per request with the serialized representation of compiled automata. As things stand, ginr serializes compiled automata to a file-based store. So this exercise will use a file-based input channel but a receiving socket channel would serve as well, The channel is just the medium for communication between "opponents", the pattern prescribes the forms that express meaningful messages. To provide service consumers with a leg up on processing received automata, any API can be expressed as a mapping of presentation syntax to service semantics:
api.request = (feature-syntax, feature-semantics)*
api.response = (feature-syntax, feature-semantics)*
API syntax is expressed as formal patterns in the Unicode semi-ring. Service semantics are expressed as labels with or without annotations, counts serving to delimit the expression of binary features and flags, which serve to bridge semantic constructs over regular operators (eg, break out of a looping syntactic pattern when a specific feature is recognized). These are inserted at syntactic cut points to mark the location and meaning of associated syntactic features. Binary features are demarcated by a preceding numeric text representation of the number of bytes spanned by the semantic feature and a counter to mark the end of the feature.
The API as a whole is expressed as a regular pattern in a semi-ring of (feature-syntax, feature-semantics) terms. The service provider would ordinarily provide a service domain dictionary defining the semantic terms but I'll offer the terms used here as self-explanatory. The API response content is the serialized form of a ginr state-minimized subsequential (single-valued) transducer. The API response presentation is a precisely defined injective function mapping strings representing ginr transducers to sequences of semantic terms describing the associated syntactic features.
The API response body is just a tab-delimited one-line header with an alphanumeric version followed by 4 numeric fields and a list of one or more transitions, each represented as a tab-delimited line with 4 numeric fields and a binary data field demarcated by a byte counter expressed in the last numeric field. This API interleaves a complete syntactic description on tape 0 (Automaton$0) with a concise semantic description on tapes 1 and 2 (Automaton$(1,2)). Consumers of this API can decouple the syntactic description from the API and extend it in a similar manner to interleave it with invocations of generic or customized patterns of target effectors to recover and assimilate the automaton transition logic into the target domain.
Automaton = (
'INR'
(digit+ tab, version[210])
(digit+ tab, tapes)
(digit+ tab, transitions)
(digit+ tab, states)
(digit+ nl, symbols header)
(
(digit+ tab, from)
(digit+ tab, to)
('-'? digit+ tab, tape)
(digit+ tab, count[end])
(byte, count)* (end, symbol)
(nl, transition)
)*
(nil, automaton)
);
The input syntax on tape 0 can be extended similarly to patterns for ribose runtime transducers that recognize specific service messages and drive message content into receiver target models. The service may provide presentation models with transducers mapping request or response messages into other formats. A service validation model may contain validating transducers mapping features to effectors that validate properties that can't be expressed syntactically. A validating transducer V can be joined with any runtime transducer T (composed with retention of compositing tape, assuming V$0 == T$0) and tapes reordered to realize a transducer that runs V T in lockstep with serial input, with V validating only features of interest before presenting them to T.
Ribose is a consumer of ginr's Compile API. It needs to map the syntactic features of ginr's API pattern into its own service domain. The API pattern is easily transformed into a transducer pattern that uses the built-in ribose effectors plus three domain-specific effectors to extract INR header and transition features from a compiled ginr automata and assemble them into a ribose runtime transducer. The header, transition and automaton effectors apply ribose-specific semantic constraints on ginr automata -- they must be subsequential with at least 2 and at most 3 tapes, can reference only effectors defined for the target and transductor, and signal and value references must be well formed. Together they effect a lossless compression of the transducer transition matrix, squeezing equivalent input symbols into an enumeration of equivalence classes and pulling non-branching chains of (effector parameter*)+
transitions into an enumerated set of effect vectors.
The following is the pattern used by the ribose transducer compiler to receive and transduce ginr automata conforming to the ribose (input, effector[parameter*])
schema. The ribose transducer compiler compiles this pattern into the Automaton transducer in the bootstrap ribose compiler model that is used to build ribose runtime models for other domains from collections of automata. In the general use case for pattern-oriented API presentation and data-driven endpoint implementation, these collections are obtained from API patterns are designed by service providers and compiled to automata using ginr.
clearHeader = clear[`~version`] clear[`~tapes`] clear[`~transitions`] clear[`~states`] clear[`~symbols`];
clearTransition = clear[`~from`] clear[`~to`] clear[`~tape`] clear[`~length`] clear[`~symbol`];
Automaton = (
(nil? 'INR', clearHeader clearTransition select)
(digit, paste)+ (tab, select[`~version`] cut select)
(digit, paste)+ (tab, select[`~tapes`] cut select)
(digit, paste)+ (tab, select[`~transitions`] cut select)
(digit, paste)+ (tab, select[`~states`] cut select)
(digit, paste)+ (nl, select[`~symbols`] cut select header clearHeader)
(
(digit, paste)+ (tab, select[`~from`] cut select)
(digit, paste)+ (tab, select[`~to`] cut select)
('-', paste)? (digit, paste)+ (tab, select[`~tape`] cut select)
(digit, paste)+ (tab, select[`~length`] cut select count[`~length` `!eol`])
(byte, paste count)* (eol, select[`~symbol`] cut select)
(nl, transition clearTransition)
)*
(eos, automaton stop)
):dfamin;
Automaton$(0,1 2):prsseq `build/ribose/automata/Automaton.pr`;
Automaton:save `build/ribose/automata/Automaton.dfa`;
This map is a cut-and-paste makeover of the API specification. It just extends the syntactic feature descriptions to capture (paste) and decode features into a simple object model suitable for refactoring into a ribose transducer. That would be the target of the transduction applied by the Automaton transducer. In other words, the Automaton pattern specifies a ribose transducer that assembles automata compiled from other ribose patterns by ginr into domain-specific ribose transducers and serializes them into a persistent ribose model. This pattern recognizes the syntax of ginr automata compiled for ribose and maps it onto effectors expressed by the ribose model compiler.
$ cat patterns/alpha.inr patterns/ribose/Automaton.inr | ginr
Automaton ---------- DFA MIN States: 127 Trans: 565 Tapes: 3 Strg: 25 K
28 states in subsequential transducer
$ echo -n "lines words characters = $(wc ribose/automata/*.dfa)"
lines words characters = 569 2657 7831 build/ribose/automata/Automaton.dfa
$ head build/ribose/automata/Automaton.dfa
INR210 3 565 127 282
0 2 0 1 I
0 3 0 3 nil
2 4 0 1 N
3 2 0 1 I
4 5 0 1 R
5 6 1 5 clear
6 7 2 8 ~version
7 8 1 5 clear
8 9 2 6 ~tapes
$ etc/sh/compile --target com.characterforming.ribose.TCompile build/ribose/automata build/ribose/TCompile.model
Transducer Automaton: 12 input equivalence classes, 25 states, 45 transitions (0.150)
The TCompile class is the target for the construction of all ribose models, including TCompile.model. It uses the generic effectors select, paste, cut, clear, ...
and expresses domain-specific header, transition, automaton
effectors to assemble and persist ribose transducers from ginr representations of automata as expressed by its Compile API response pattern.
This is ribose working in its own domain, using the current ribose compiler model to build the next compiler model encapsulating the latest revision of the ginr Compile API. In this case there is just one input automaton (Automaton.dfa) to compile and the new model (TCompile.model) will contain only one transducer (Automaton, from Automaton.dfa) and its supporting cast of effectors, named values and signals.
What follows is an abridged listing showing how this all works together.
First we need to load the current TCompile model and bind it to a default TCompile instance that will serve as model target. The model target will not participate in any transductions but will serve to present its effectors for parameter compilation. Then the live TCompile target instance receives a new Transductor instance to host the compiler transduction stack and the compiler transduces each ginr DFA file into ribose transducers and serializes them into the new model file. The process starts in ModelCompiler (the implementation superclass for TCompile):
package com.characterforming.jrte.engine;
final class ModelCompiler implements ITarget {
public static boolean compileAutomata(Model targetModel, File inrAutomataDirectory) throws ModelException, RteException {
File workingDirectory = new File(System.getProperty("user.dir"));
File compilerModelFile = new File(workingDirectory, "ModelCompiler.model");
try (IRuntime compilerRuntime = Ribose.loadRiboseModel(compilerModelFile, new ModelCompiler())) {
ModelCompiler compiler = new ModelCompiler(targetModel);
compiler.setTransductor(compilerRuntime.newTransductor(compiler));
for (final String filename : inrAutomataDirectory.list()) {
if (!filename.endsWith(Base.AUTOMATON_FILE_SUFFIX)) {
continue;
}
try {
long filePosition = targetModel.seek(-1);
if (compiler.compile(new File(inrAutomataDirectory, filename))) {
String transducerName = filename.substring(0, filename.length() - Base.AUTOMATON_FILE_SUFFIX.length());
int transducerOrdinal = targetModel.addTransducer(Bytes.encode(transducerName));
targetModel.setTransducerOffset(transducerOrdinal, filePosition);
} else {
for (String error : compiler.getErrors()) {
Model.rtcLogger.severe(error);
}
}
} catch (Exception e) {
String msg = String.format("Exception caught compiling transducer '%1$s'", filename);
Model.rtcLogger.log(Level.SEVERE, msg, e);
compiler.error(msg);
}
}
return compiler.getErrors().isEmpty();
}
}
The compiler's transductor provides a transducer stack and an input stack. To process a ginr automaton, the UTF-8 encoded input stream is loaded into a byte array and pushed onto the input stack. The ribose Automaton transducer loaded from the TCompile.model file is pushed onto the transducer stack and the transduction is run to completion, producing a ribose transducer ready to be persisted in the new TCompile.model.
private boolean compile(File inrFile) throws ModelException {
this.reset();
String name = inrFile.getName();
this.transducerName = Bytes.encode(name.substring(0, name.length() - Base.AUTOMATON_FILE_SUFFIX.length()));
int size = (int)inrFile.length();
byte bytes[] = null;
try (
FileInputStream s = new FileInputStream(inrFile);
DataInputStream f = new DataInputStream(s);
) {
bytes = new byte[(int)inrFile.length()];
f.read(bytes, 0, bytes.length);
}
this.transductor.stop();
this.errors = new ArrayList<String>();
this.stateMaps = (HashMap<Integer, Integer>[])new HashMap<?,?>[3];
this.stateTransitionMap = new HashMap<Integer, ArrayList<Transition>>(size >> 3);
this.transductor.input(new IInput[] {new ByteInput(new byte[][] {this.nilSignal, bytes})});
Status status = this.transductor.start(Bytes.encode("Automaton"));
while (status.equals(Status.RUNNABLE)) {
status = this.transductor.run();
}
if (this.errors.isEmpty()) {
this.save();
return true;
}
return false;
}
The ModelCompiler class defines a couple of simple value objects to contain transient header and transition features extracted from the input.
class Header {
final int version;
final int tapes;
final int transitions;
final int states;
final int symbols;
Header(int version, int tapes, int transitions, int states, int symbols) {
this.version = version;
this.tapes = tapes;
this.transitions = transitions;
this.states = states;
this.symbols = symbols;
}
}
class Transition {
final int from;
final int to;
final int tape;
final byte symbol[];
final boolean isFinal;
Transition(int from, int to, int tape, byte[] symbol) {
this.isFinal = (to == 1 && tape == 0 && symbol.length == 0);
this.from = from;
this.to = to;
this.tape = tape;
this.symbol = symbol;
}
}
As target, it relies on the effectors expressed by the Transductor to paste UTF-8 ordinals and cut assembled features into selected named values that Transductor exposes to domain-specific target and effectors. The ModelCompiler target expresses three effectors that cooperate to assimilate extracted features. One of these extracts and retain ginr's header information ...
public class HeaderEffector extends BaseEffector<ModelCompiler> {
INamedValue fields[];
HeaderEffector(ModelCompiler automaton) {
super(automaton, Bytes.encode("header"));
}
@Override
public void setOutput(IOutput output) throws TargetBindingException {
super.setOutput(output);
fields = new INamedValue[] {
super.output.getNamedValue("version"),
super.output.getNamedValue("tapes"),
super.output.getNamedValue("transitions"),
super.output.getNamedValue("states"),
super.output.getNamedValue("symbols")
};
}
@Override
public int invoke() throws EffectorException {
Header h = new Header(
(int)fields[0].asInteger(),
(int)fields[1].asInteger(),
(int)fields[2].asInteger(),
(int)fields[3].asInteger(),
(int)fields[4].asInteger()
);
target.stateTransitionMap = new HashMap<Integer, ArrayList<Transition>>((h.states * 5) >> 2);
if (h.version != ModelCompiler.VERSION) {
target.error(String.format("%1$s: Invalid INR version %2$d",
target.getTransducerName(), h.version));
}
if ((h.tapes < 2) || (h.tapes > 3)) {
target.error(String.format("%1$s: Invalid tape count %2$d",
target.getTransducerName(), h.tapes));
}
target.header = h;
target.transitions = new Transition[h.transitions];
return IEffector.RTE_EFFECT_NONE;
}
}
Another effector extracts and retains transitions ...
public class TransitionEffector extends BaseEffector<ModelCompiler> {
INamedValue fields[];
TransitionEffector(ModelCompiler automaton) {
super(automaton, Bytes.encode("transition"));
}
@Override
public void setOutput(IOutput output) throws TargetBindingException {
super.setOutput(output);
fields = new INamedValue[] {
super.output.getNamedValue("from"),
super.output.getNamedValue("to"),
super.output.getNamedValue("tape"),
super.output.getNamedValue("length"),
super.output.getNamedValue("symbol")
};
}
@Override
public int invoke() throws EffectorException {
Transition t = new Transition(
(int)fields[0].asInteger(),
(int)fields[1].asInteger(),
(int)fields[2].asInteger(),
fields[4].copyValue()
);
if (t.isFinal) {
return IEffector.RTE_EFFECT_NONE;
} else if (t.tape < 0) {
target.error(String.format("%1$s: Epsilon transition from state %2$d to %3$d (use :dfamin to remove these)",
target.getTransducerName(), t.from, t.to));
} else if (t.symbol.length == 0) {
target.error(String.format("%1$s: Empty symbol on tape %2$d",
target.getTransducerName(), t.tape));
} else {
HashMap<Integer, Integer> rteStates = target.stateMaps[t.tape];
if (rteStates == null) {
rteStates = target.stateMaps[t.tape] = new HashMap<Integer, Integer>(256);
}
if (!rteStates.containsKey(t.from)) {
rteStates.put(t.from, rteStates.size());
}
switch (t.tape) {
case 0:
target.compileInputToken(t.symbol);
break;
case 1:
target.compileEffectorToken(t.symbol);
break;
case 2:
target.compileParameterToken(t.symbol);
break;
}
ArrayList<Transition> outgoing = target.stateTransitionMap.get(t.from);
if (outgoing == null) {
outgoing = new ArrayList<Transition>(16);
target.stateTransitionMap.put(t.from, outgoing);
}
outgoing.add(t);
target.transitions[target.transition] = t;
++target.transition;
}
return IEffector.RTE_EFFECT_NONE;
}
}
And one final effector to pull it all together and assemble the ribose transducer from the transient representation of ginr's automaton.
public class AutomatonEffector extends BaseEffector\<ModelCompiler\> {
public AutomatonEffector(ModelCompiler target) {
super(target, Bytes.encode("automaton"));
}
@Override
public int invoke() throws EffectorException {
final Integer[] inrInputStates = target.getInrStates(0);
if (inrInputStates == null) {
String msg = "Empty automaton " + target.getTransducerName();
Model.rtcLogger.log(Level.SEVERE, msg);
throw new EffectorException(msg);
}
for (final ArrayList<Transition> transitions : target.stateTransitionMap.values()) {
transitions.trimToSize();
}
final int[][][] transitionMatrix = new int[target.model.getSignalLimit()][inrInputStates.length][2];
for (int i = 0; i < transitionMatrix.length; i++) {
for (int j = 0; j < inrInputStates.length; j++) {
transitionMatrix[i][j][0] = j;
transitionMatrix[i][j][1] = 0;
}
}
target.effectorVectorMap = new HashMap<Ints, Integer>(1024);
target.effectorVectorList = new ArrayList<Integer>(8196);
target.effectorVectorList.add(0);
for (final Integer inrInputState : inrInputStates) {
for (final Transition t : target.getTransitions(inrInputState)) {
if (t.isFinal) {
continue;
}
if (t.tape != 0) {
target.error(String.format("%1$s: Ambiguous state %1$d",
target.getTransducerName(), t.from));
continue;
}
try {
final int rteState = target.getRteState(0, t.from);
final int inputOrdinal = target.model.getInputOrdinal(t.symbol);
final Chain chain = target.chain(t);
if (chain != null) {
final int[] effectVector = chain.getEffectVector();
transitionMatrix[inputOrdinal][rteState][0] = target.getRteState(0, chain.getOutS());
if (chain.isEmpty()) {
transitionMatrix[inputOrdinal][rteState][1] = 1;
} else if (chain.isScalar()) {
transitionMatrix[inputOrdinal][rteState][1] = effectVector[0];
} else {
Ints vector = new Ints(effectVector);
Integer vectorOrdinal = target.effectorVectorMap.get(vector);
if (vectorOrdinal == null) {
vectorOrdinal = target.effectorVectorList.size();
for (final int element : effectVector) {
target.effectorVectorList.add(element);
}
target.effectorVectorMap.put(vector, vectorOrdinal);
}
transitionMatrix[inputOrdinal][rteState][1] = -vectorOrdinal;
}
}
} catch (CompilationException e) {
target.error(e.getMessage());
}
}
}
if (target.errors.isEmpty()) {
target.factor(transitionMatrix);
}
return IEffector.RTE_EFFECT_NONE;
}
}
As for the ITarget implementation methods ...
ITransductor transductor;
...
ModelCompiler() {
this.transductor = null;
...
}
private void setTransductor(ITransductor transductor) {
this.transductor = transductor;
}
@Override
public IEffector<?>[] bindEffectors() throws TargetBindingException {
return new IEffector<?>[] {
new HeaderEffector(this),
new TransitionEffector(this),
new AutomatonEffector(this)
};
}
@Override
public String getName() {
return this.getClass().getSimpleName();
}
...
}
Compiled ribose runtime models encapsulate a target class and enumerated collections of transducers, effectors, signals and named values. The ribose model compiler prints a summary map of the model contents listing the ordinal values for the enumerators in each collection.
$ ls -l TCompile.*
-rwxrwxrwx 1 kb kb 605 Apr 15 22:33 TCompile.map
-rwxrwxrwx 1 kb kb 4488 Apr 15 22:33 TCompile.model
$ cat TCompile.map
target com.characterforming.ribose.TCompile
transducer Automaton 0
signal nul 256
signal nil 257
signal eol 258
signal eos 259
effector 0 0
effector 1 1
effector paste 2
effector select 3
effector copy 4
effector cut 5
effector clear 6
effector count 7
effector in 8
effector out 9
effector mark 10
effector reset 11
effector start 12
effector pause 13
effector stop 14
effector header 15
effector transition 16
effector automaton 17
value 0
value * 1
value version 2
value tapes 3
value transitions 4
value states 5
value symbols 6
value from 7
value to 8
value tape 9
value length 10
value symbol 11
That is how ribose transduces ginr's service representation of the Automaton pattern into its Automaton transducer and encapsulates it in the ribose compiler model TCompile.model. The same process applies for compiling domain-specific service models from collections of ribose-conformant patterns expressed in specialized (syntax, semantic[annotation])
semi-ring.
The static Ribose.loadRiboseModel(File, ITarget)
method loads the service model into an IRuntime
component and binds it with a model target, returning an IRuntime
component. The IRuntime.newTransductor(ITarget)
method receives a live target and binds it to an ITransductor
instance. The transductor is capable of running multiple serial transductions using any of the transducers defined in the domain model. Transductions are assumed to be single-threaded although a single thread can safely multiplex over transductions running on >1 transductors, each bound to unique target instances.
Like I said, it's semi-rings, all the way down.
There are a few points to make regarding the above example. These apply as well to pattern-oriented services and applications in other domains.
- There are no syntax-related concerns expressed in the Java source for target or effectors. None.
- Features not incorporated into target (eg 'INR') are quickly passed over with nil effect.
- Target and effectors and ribose runtime have no dependencies on external libraries. None.
Also, not obvious in this example, but ginr patterns are expressed in Unicode and represented as encoded UTF-8 in compiled automata like Automaton.dfa. Ribose transductions for compiled Unicode patterns are effected in the byte-oriented UTF-8 (0x0..0xf7)
semi-ring. This obviates decoding and widening the entire request or response to 16-bit Unicode. Only features of interest are extracted, as sequences of bytes (eg, Transition.symbol
), and target effectors decode them to Java primitives (integer, double, byte[] or Unicode String) only once before assimilating feature into target.
When I say domain-specific I mean it's all yours.
You (service provider) propose API request and response patterns and iteratively refine them to suit consumer needs. You and your service consumers need none of these. None. You and your consumers are de facto domain experts and use a common domain dictionary to define semantic terms and annotations. You use semi-ring algebra to express API request and response as precise formal patterns weaving semantic guidance into syntactic representations of embedded features.
Developers working with your APIs simply modify your API presentation patterns to map syntactic features of interest onto target effectors in the consumer or service domain. They do not need to pore through endless tutorials, they just consult your domain dictionary when they get confused. They have zero dependencies on external libraries since their effectors generally deal with API-specific compositions of simple value objects. Domain target objects express these effectors to extract, decode and assimilate data into the target object model, and the target interacts with other domain objects to get jobs done.
Generic formalisms for expressing data structures ensure, for example, that a stAX stack loaded into a process in a Shopify server can parse and validate a document describing the structure of a protein. That is their core value statement. And that is likely what they will do if they are presented with an unexpected but formally valid document, whether it is relevant to domain workflows or not. Service providers and consumers interacting in a specific semantic domain understand the nature of the information that they are exchanging and are best informed as to how information should be structured for exchange within the domain and with incidental consumers reaching out from other domains. Moreover, API patterns that are highly specialized and precisely specified enable quick rejection of unacceptable inputs.
And since XML, JSON, YAML, etc require you to know a great deal about their respective domains yet know nothing about your domain, you have to ask: What are they doing here?
Make them go away.
Sure it is. Remember when you had to learn the multiplication table? GCD? It was hard because you had never had to deal much with numbers beyond counting fingers and toes. You got over it and it stuck. You probably still remember 7*9=63. Sure you do. It's burned in.
You're older and wiser now. You know the math just keeps getting harder and harder. But it's not just numbers anymore it's all about new programming idioms and APIs and they sprout and morph like clouds. You've been through XML, JSON, WSDL, YAML and more with all of their vulnerabilities and encumbrances and you know the next one is just around the corner. Sooner or later you'll be burned out.
Unfortunate it is that we weren't taught at an early age about multiplication and addition and repetition with strings of letters and digits and other glyphs. It takes a little bit of effort to burn in but it goes a long way and like integer arithmetic it doesn't morph or sprout new features. If we had all invested in learning this early alongside the numeric stuff, then the folks who invented EBCDIC might have recognized a Great Big Use Case for building hybrid FST/CPU machines and bundling them with lots of RAM to work in a broad range of domains involving text and other sequential media. Beyond bombs and business machines and weather prediction and without involving mountains of hand-coded software libraries maintained by anonymous developers all over the world to make the CPU look like anything other than a programmable calculator. For free pizza and pop, if they are lucky enough to have a sponsor (thanks Apache folks!).
Had that happened (it didn't) we would have had in the 1990s a microservice ecosystem like what we are trying to pull together today on endless racks of servers all over the world. Tightly focused and compartmentalized microservices embedded in a high-bandwidth network mesh would present request/response APIs as domain-specific nested regular patterns and communicate over fast serial digital links. Pattern-driven FST hardware would stream service requests and responses onto LR(0) tranductors, each composing a stack of FSTs to transduce raw input off the wire and map it byte by byte onto CPU instructions directing feature extraction and assimilation into the service domain. APIs would be expressed in semi-ring formalisms without reference to poetic generalized grammars, procedure calls would be intrinsically remote, and service implementations would be tightly focused and easily compartmentalized.
But that hasn't happened, yet. Don't know why.
It's just strings, but with arithmetic.
Copyright 2022,2023 Kim Briggs (except 'Ginr', copyright J Howard Johnson)