Skip to content
Jacques Carette edited this page Jul 29, 2021 · 22 revisions

Chunks

As the basis for all information encoding in Drasil, chunks have become an integral part of allowing us to use and maintain the current database of knowledge. At its core, a chunk is a data type specialized in holding a specific type of information for a specific purpose. For example, NamedChunks are often used for objects that have a unique identifier and an associated term. ConceptChunks mirror real-world concepts by including the idea, definition, and domain for a particular concept. Something like a QuantityDict can have an idea, the space in which it exists, units and a symbol. Many other chunks exist within Drasil that allow the program to hold the required information and its meaning so that knowledge may be used in generated models, definitions, and theories.

Structure

Chunks are usually made up of lower-level types with different purposes. A chunk whose purpose is to hold all the information needed for a mathematical variable would need a symbol, description/definition, and units (as shown below). This particular example gives a name to the concept which is built from a quantity and its units. The structure of a chunk can be thought of as a wrapper of sorts. It encases only the necessary information to perform its job, but its contents may be unwrapped and used one at a time. The wrapper itself may be wrapped again with more things added to it (like an abbreviation or a domain). This is primarily how one idea can be built upon in Drasil.

ChunkDiagram

Implementation

So, how do we represent this in code? Conveniently, we can use Haskell's record-type syntax along with lenses to define set and get the information we need from within the chunk wrapper. This way, we can wrap wrappers without worrying about the "level" of wrapping around one particular identifier. Using this, one UID can be represented in a hierarchy of chunks, with no information loss when upgrading to a larger chunk. A straightforward example of this is the progression from a lower-levelled NamedChunk to something much larger like a TheoryModel. One of the smallest chunks (NamedChunk) is defined as follows:

data NamedChunk = NC {_uu :: UID, _np :: NP}

It contains a unique identifier (UID) and a term that can be used in creating sentences (as a noun phrase, NP). As of now, we don't know what this NamedChunk is or what it can do, but we do know that it exists and we can use it in a sentence with proper pluralization and capitalization. Most likely, these chunks will be common nouns that are significant enough to have a name. Two NamedChunks may also be combined to produce a new NamedChunk that carries both of their terms. We can start to define single words and simple ideas like table_ and symbol and then combine those to make a tableOfSymbol NamedChunk idea, which is more complex. Using the wrapper analogy, we unwrap the term from table_ and symbol, then rewrap them after placing an "of" between them to get a tableOfSymbol chunk.

A NamedChunk can either be used as a method for getting a defined term or build upon. The "next step" up from a NamedChunk is an IdeaDict, which contains a NamedChunk and maybe an abbreviation. We can see the direct progress in its type definition:

data IdeaDict = IdeaDict { _nc' :: NamedChunk, mabbr :: Maybe String }

As we continue to learn more about what exactly we want this chunk to represent, we can gain more specifics about the idea and directly create a richer type to work with such information. From this point, there are many options available to continue adding information. If the idea should be made into a concept, we can use a ConceptChunk to wrap the idea along with a definition and its domain:

data ConceptChunk = ConDict { _idea :: IdeaDict -- ^ Contains the idea of the concept.
                            , _defn' :: Sentence -- ^ The definition of the concept.
                            , cdom' :: [UID] -- ^ UID of the domain of the concept.
                            }

If we know the concept is a quantity or can be treated as one, it may become a QuantityDict or DefinedQuantityDict:

data DefinedQuantityDict = DQD { _con :: ConceptChunk
                               , _symb :: Stage -> Symbol
                               , _spa :: Space
                               , _unit' :: Maybe UnitDefn
                               }

By continuously wrapping the information needed, we can successfully encode relevant knowledge in a useful and practical manner.

Eventually, we build up relevant chunks through seeing common patterns in examples and actual documentation. We have various high-level chunks dedicated to units (UnitDefn, UnitaryConceptDict, UnitaryChunk, UnitalChunk), relations (RelationConcept), quantities (QuantityDict, DefinedQuantityDict), uncertainties (UncertainChunk, UncertQ), and much more. Our foundation of knowledge is built upon these chunks, and the strong typing of Haskell really emphasizes the semantic meaning that should be associated to each type. As Drasil grows, more and more chunks will be added with different chunk types, thereby allowing our database of knowledge to grow alongside it.

Lenses

Perhaps a more programming-oriented way of thinking about chunks is to view them as lenses. In functional programming (and Haskell specifically), lenses are a popular method of getting and setting information in a record type. Although complex, these are used to make programming much easier and concise, as developers and users alike should not need to fiddle around with the record type syntax more than necessary. Instead, this provides an easy way of reading and writing information to chunks (which are all record types).

In Drasil, it is especially useful to have the getter functionality for use in retrieving UIDs, terms, symbols, and units inside of larger chunk types. Since each chunk type builds upon other chunk types, accessors still need to be able to fetch the wanted information inside the idea. Perhaps instead of an opaque wrapper, a chunk (with the power of lenses) would be better described as objects wrapped in glass boxes with doors. A lens is pretty much a door into any one of the glass boxes. Thankfully, lenses use the record names when attempting to observe information, so an accurate intuition would be that every box has its door and a nametag. All the information is easily viewable from the largest wrapper, yet the observer can make sure all the proper types are grouped together. For example, we do not want to be adding UIDs for every wrapper we use on an idea, nor do we want to add NamedChunks at the level of a QuantityDict. Instead, we keep the UID encased in its lower-level chunk and instead associate that UID to the now-complex chunk type.

Users of Drasil might not even need to interact with concepts of lensing and chunks in a more complex way, so giving an intuition like the glass boxes or wrappers may help users. Most chunks are created using smart constructors. This allows the user to input all the information needed for that chunk without actually needing to know the specifics of how each chunk is made. Drasil automatically makes the lower-level chunks and wraps them as needed for higher-level ones. And lenses help in this respect as well. By combining lenses with Haskell class methods, any chunk type can easily display any information in any box, so long as the user gives the correct label for Drasil to look in to. For example, a reference in Drasil can be defined as the following:

data Reference = Reference
  { _ui :: UID -- unique identifier
  ,  ra :: LblType -- reference address
  ,  sn :: ShortName -- display name
  }

and created using the ref constructor (applied with the proper arguments). Then, using lenses, retrieving information like the UID is as simple as calling:

yourReference :: UID
yourReferenceUID = yourReference ^. uid

So the only lensing the user gets to do is the getter function (^.), which is much better than manually pattern matching on to a huge amount of possible record types.

TODO: Fill in Sections

Recipes

Once we have successfully sorted and placed the required information into chunks, we can now use recipes to transform the acquired knowledge into a usable format. From there, we only need to print out the documents and then we can have informative artifacts. A recipe is similar to the process of unboxing every chunk, taking out the necessary information, and laying it out nicely on a tray for easy use and reuse. If we want to change an aspect of the generated document, we merely modify the recipe in a manner to create that change. This way, information that remains in the chunks may be kept when transitioning between different kinds of documents. Essentially, we start from a foundation of knowledge and build a complete artifact by calling the proper recipe for that document. This allows Drasil to keep a certain magnitude of modularity along with all the encoded information from chunks.

Currently, the most-used recipe in Drasil is one to generate Software Requirements Specification (SRS), which can be found in drasil-docLang. It has gone through many iterations of refinement and rationalization, and we have arrived at a point where creating a new physics-based example only requires inputting the necessary knowledge and organizing the functions creating in the document language. The document structure is separated into different sections and follow this format:

  • Reference Material
    • Table of Units
    • Table of Symbols
    • Table of Abbreviations and Acronyms
  • Introduction
    • Purpose of Document
    • Scope of Requirements
    • Characteristics of Intended Reader
    • Organization of Document
  • Stakeholders
    • The Client
    • The Customer
  • General System Description
    • System Context
    • User Characteristics
    • System Constraints
  • Specific System Description
    • Problem Description
      • Termonology and definitions
      • Physical System Description
      • Goal Statements
    • Solution Characteristics Specification
      • Assumptions
      • Theoretical Models
      • General Definitions
      • Data Definitions
      • Instance Models
      • Data Constraints
      • Properties of a Correct Solution
  • Requirements
    • Functional Requirements
    • Non-Functional Requirements
  • Likely Changes
  • Unlikely Changes
  • Traceability Matrices and Graphs
  • Values of Auxiliary Constants
  • References
  • Appendix

Each section is generated by a recipe found in the document language and the SystemInformation gathered from each example. This way, a user will not have to worry about where the inputted information will be going. Their focus can be dedicated to giving Drasil the information and calling the recipe to create the document for them.

Information Encoding

Although Drasil's framework can be used for many different documentation styles across multiple domains, we have been trying to build Drasil from the ground up. This means that we have multiple examples that can demonstrate the capabilities of Drasil, but the recipes and organization of generated documents may be a limiting factor on Drasil's progress. Using an example-driven process, we are building the recipe language for documenting information and generating code.

Expr

The use of chunks and recipes are vital to actually bringing information encoding to life, but it is not the only thing Drasil uses. Drasil's expression language (Expr) is a brilliant example of information encoding without explicitly using chunks in the same manner. It is an ever-growing language that takes real world mathematical concepts and brings those into a computer for use in calculations and generating programs. Of course, any expression language must be able to take in many different concepts. Drasil is able to hold numbers, strings, symbols, associative operators, derivatives, binary operators, unary operators, vector operators, and much more. It can handle everything from logic conjectures to for-loops in programming.

But attempting to do something like this (and including sanity checks along the way) is not something that is easy. In order to maintain a high level of semantics in our expressions, Expr is currently in the process of splitting up into specialized expressions, namely programming functions (CodeExpr), display oriented expressions (DisplayExpr), and mathematically oriented expressions (Expr). This way, we can include type checking for our equation derivations in documentation. For example, we want to make sure that an equation ending with Coulombs for units is not used to express a distance (which should be meters).

Application of Information

Other

Analyzing Chunks

It can be quite difficult to see the dependencies of each chunk, so making graphs and data tables (by running make analysis) can help us to fine-tune which chunks should exist and which chunks need to be modified.

Jotting down ideas, not currently used but may be relevant

These are all very important aspects needed to keep programs relevant and usable. This also means that Drasil should be able to adapt to new knowledge while still holding on to older information. As users input knowledge needed to complete their goals or projects, Drasil should be able to absorb information and consistently generate reliable artifacts dependent on that information. Of course, there will be many steps in between giving Drasil information and it giving back meaningful documentation, but the idea of Drasil constantly gaining knowledge should be present any time we choose to work with it.

For example, a concept can be stored in a ConceptChunk, which holds the unique identifier, term, maybe an abbreviation, a definition, and a domain for a real-world concept in physics, mathematics, or computer science.

Clone this wiki locally