Document "the language of Gutenberg" #2796

mtias · 2017-09-26T13:17:41Z

Let's add a high-level document explaining how things generally work on the Gutenberg lifecycle and data flow.

Copy: First pass, fix typos Rename md file.

mcsf · 2017-09-26T13:50:07Z

We need to find the idiom and replace that bit wrapped with [FIXME] before merging. :)

edit: I believe it to be forme:

Type, small metal letters that have a raised letter on one end, is arranged into pages and placed in a frame to make a forme, which itself is placed onto a flat stone, 'bed,' or 'coffin.' — Wikipedia

mcsf · 2017-09-26T13:54:21Z

^ pushed changes for the above

aduth · 2017-09-26T14:25:01Z

docs/language.md

+
+Unique to comments is that they cannot legitimately exist in ambiguous places, such as inside of HTML attributes like `<img alt='data-id="14"'>`. Comments are also quite permissive. Whereas HTML attributes are complicated to parse properly, comments are quite easily described by a leading `<!--` followed by anything except `--` until the first `-->`. This simplicity and permisiveness means that the parser can be implemented in several ways without needing to understand HTML properly and we have the liberty to use more convenient syntax inside of the comment—we only need to escape double-hyphen sequences. We take advantage of this in how we store block attributes: JSON literals inside the comment.
+
+After running this through the parser we're left with a simple object we can manipulate idiomatically and we don't have to worry about escaping or unescaping the data. It's handled for us through the serialization process. Because the comments are so different from other HTML tags and because we can perform a first-pass to extract the top-level blocks, we don't actually depend on having fully valid HTML!


Should we elaborate on why not all attributes of a block are stored in JSON, or the role that "remote" data will play in the backing data of a block? (#2759, #2754)

Maybe touch a bit on it (as part of "blocks are more about the concept and less about where they store data"). And then just link to the "attributes" doc to expand.

Added in 7c98509

georgestephanis · 2017-09-26T14:30:11Z

docs/language.md

+
+At the core of Gutenberg lies the concept of the block. From a technical point of view, blocks both raise the level of abstraction from a single document to a collection of meaningful elements, and they replace ambiguity—inherent in HTML—with explicit structure. A post in Gutenberg is then a _collection of blocks_.
+
+To understand how blocks operate at a data-structure level, let's take a small detour to the simile of the printing press of Johannes Gutenberg. With the printing press, a “page” of a book was assembled from individual pieces and printed into a fully formed page. Once printed, there's no need to know it was built from multiple blocks of letters instead of one giant plate. In other words, the output is indifferent about how it was generated.


In letterpress, a finished page was assembled from individual characters, a test print made in a galley, and then locked into a chase to create a fully formed page. Once printed, there's no need to know whether it was set via individual letters, type slugs from a linotype machine, or even one giant plate.

georgestephanis · 2017-09-26T14:33:15Z

docs/language.md

+
+To understand how blocks operate at a data-structure level, let's take a small detour to the simile of the printing press of Johannes Gutenberg. With the printing press, a “page” of a book was assembled from individual pieces and printed into a fully formed page. Once printed, there's no need to know it was built from multiple blocks of letters instead of one giant plate. In other words, the output is indifferent about how it was generated.
+
+This is true for content blocks. They are the way in which the user creates their content, but they no longer matter once the content is finished. That is, until it needs to be edited. Imagine if the printing press was able to print a page _while_ also including in the page the instructions to generate again the set of movable types required to print it. What we are doing with blocks could be compared to printing invisible marks on the text so that the printer can pick up, from an already printed page, the pieces it needs to reprint it.


What we are doing with blocks could be compared to printing invisible marks in the margins so that the printer can make adjustments to an already printed page without needing to set the page again from scratch.

georgestephanis · 2017-09-26T14:36:22Z

docs/language.md

+
+Content in WordPress is stored as HTML-like text in `post_content`. HTML is a robust document markup format and has been used to describe content as simple as unformatted paragraphs of text and as complex as entire application interfaces. Understanding HTML is not trivial; a significant number of existing documents and tools deal with technically invalid or ambiguous code. This code, even when valid, can be incredibly tricky and complicated to parse – and to understand.
+
+The main point is to let the machines work at what they are good at, and optimize for the user and the document. The analogy with the printing press can be taken further in that what matters is the printed page, not the arrangement of metal types that originated it. As a matter of fact, the arrangement of metal types is a pretty inconvenient storage mechanism. The page is both the result _and_ the proper way to store the data. The metal types are just an instrument for publication and editing, but more ephemeral in nature. Exactly as our use of an object tree (e.g. JSON) in the editor. We have the ability to rebuild this structure from the printed page, as if we printed invisible ink marks that allows a machine to know which types to assemble to recreate the page.


Instead of saying metal types types shouldn't be plural -- it should be either metal type or (imo preferably) referring to sorts https://en.wikipedia.org/wiki/Sort_(typesetting)

I'd change the last bit to as if we printed **notations in the margins** that allows a machine to know which **sorts** to assemble to recreate the page.

codecov · 2017-09-27T15:33:02Z

Codecov Report

Merging #2796 into master will increase coverage by 0.63%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master    #2796      +/-   ##
==========================================
+ Coverage   33.81%   34.44%   +0.63%     
==========================================
  Files         190      190              
  Lines        5678     5748      +70     
  Branches      992     1016      +24     
==========================================
+ Hits         1920     1980      +60     
- Misses       3181     3189       +8     
- Partials      577      579       +2

Impacted Files	Coverage Δ
editor/selectors.js	`95% <0%> (-1.76%)`	⬇️
editor/layout/index.js	`0% <0%> (ø)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5a949b7...56b7daa. Read the comment docs.

Props georgestephanis

mtias and others added 6 commits September 26, 2017 12:18

Add document explaining the language of Gutenberg.

8d11f8a

Copy: First pass, fix typos Rename md file.

Move language document up

9d53b6a

Minor fixes

ab76f73

Language: Add § The post dichotomy

0f643c3

language.md: Improve intro

781518b

Improve copy of introduction.

208a080

mtias added the [Type] Developer Documentation Documentation for developers label Sep 26, 2017

language.md: Replace FIXME with missing term 'forme'

d320393

mcsf force-pushed the add/document-grammar branch from c9aa734 to d320393 Compare September 26, 2017 13:56

aduth reviewed Sep 26, 2017

View reviewed changes

georgestephanis reviewed Sep 26, 2017

View reviewed changes

mcsf force-pushed the add/document-grammar branch from 837dd4f to 9bddda8 Compare September 27, 2017 15:32

mcsf added 2 commits September 27, 2017 16:37

language.md: clarify purpose of blocks not holding JSON

7c98509

Apply suggestions

56b7daa

Props georgestephanis

mcsf force-pushed the add/document-grammar branch from 9bddda8 to 56b7daa Compare September 27, 2017 15:39

mcsf merged commit 18c0531 into master Sep 28, 2017

mcsf deleted the add/document-grammar branch September 28, 2017 14:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Document "the language of Gutenberg" #2796

Document "the language of Gutenberg" #2796

mtias commented Sep 26, 2017

mcsf commented Sep 26, 2017 •

edited

Loading

mcsf commented Sep 26, 2017

aduth Sep 26, 2017

mtias Sep 26, 2017

mcsf Sep 27, 2017 •

edited

Loading

georgestephanis Sep 26, 2017

georgestephanis Sep 26, 2017

georgestephanis Sep 26, 2017

georgestephanis Sep 26, 2017

codecov bot commented Sep 27, 2017 •

edited

Loading


		Unique to comments is that they cannot legitimately exist in ambiguous places, such as inside of HTML attributes like `<img alt='data-id="14"'>`. Comments are also quite permissive. Whereas HTML attributes are complicated to parse properly, comments are quite easily described by a leading `<!--` followed by anything except `--` until the first `-->`. This simplicity and permisiveness means that the parser can be implemented in several ways without needing to understand HTML properly and we have the liberty to use more convenient syntax inside of the comment—we only need to escape double-hyphen sequences. We take advantage of this in how we store block attributes: JSON literals inside the comment.

		After running this through the parser we're left with a simple object we can manipulate idiomatically and we don't have to worry about escaping or unescaping the data. It's handled for us through the serialization process. Because the comments are so different from other HTML tags and because we can perform a first-pass to extract the top-level blocks, we don't actually depend on having fully valid HTML!


		At the core of Gutenberg lies the concept of the block. From a technical point of view, blocks both raise the level of abstraction from a single document to a collection of meaningful elements, and they replace ambiguity—inherent in HTML—with explicit structure. A post in Gutenberg is then a _collection of blocks_.

		To understand how blocks operate at a data-structure level, let's take a small detour to the simile of the printing press of Johannes Gutenberg. With the printing press, a “page” of a book was assembled from individual pieces and printed into a fully formed page. Once printed, there's no need to know it was built from multiple blocks of letters instead of one giant plate. In other words, the output is indifferent about how it was generated.


		To understand how blocks operate at a data-structure level, let's take a small detour to the simile of the printing press of Johannes Gutenberg. With the printing press, a “page” of a book was assembled from individual pieces and printed into a fully formed page. Once printed, there's no need to know it was built from multiple blocks of letters instead of one giant plate. In other words, the output is indifferent about how it was generated.

		This is true for content blocks. They are the way in which the user creates their content, but they no longer matter once the content is finished. That is, until it needs to be edited. Imagine if the printing press was able to print a page _while_ also including in the page the instructions to generate again the set of movable types required to print it. What we are doing with blocks could be compared to printing invisible marks on the text so that the printer can pick up, from an already printed page, the pieces it needs to reprint it.


		Content in WordPress is stored as HTML-like text in `post_content`. HTML is a robust document markup format and has been used to describe content as simple as unformatted paragraphs of text and as complex as entire application interfaces. Understanding HTML is not trivial; a significant number of existing documents and tools deal with technically invalid or ambiguous code. This code, even when valid, can be incredibly tricky and complicated to parse – and to understand.

		The main point is to let the machines work at what they are good at, and optimize for the user and the document. The analogy with the printing press can be taken further in that what matters is the printed page, not the arrangement of metal types that originated it. As a matter of fact, the arrangement of metal types is a pretty inconvenient storage mechanism. The page is both the result _and_ the proper way to store the data. The metal types are just an instrument for publication and editing, but more ephemeral in nature. Exactly as our use of an object tree (e.g. JSON) in the editor. We have the ability to rebuild this structure from the printed page, as if we printed invisible ink marks that allows a machine to know which types to assemble to recreate the page.

Document "the language of Gutenberg" #2796

Document "the language of Gutenberg" #2796

Conversation

mtias commented Sep 26, 2017

mcsf commented Sep 26, 2017 • edited Loading

mcsf commented Sep 26, 2017

aduth Sep 26, 2017

Choose a reason for hiding this comment

mtias Sep 26, 2017

Choose a reason for hiding this comment

mcsf Sep 27, 2017 • edited Loading

Choose a reason for hiding this comment

georgestephanis Sep 26, 2017

Choose a reason for hiding this comment

georgestephanis Sep 26, 2017

Choose a reason for hiding this comment

georgestephanis Sep 26, 2017

Choose a reason for hiding this comment

georgestephanis Sep 26, 2017

Choose a reason for hiding this comment

codecov bot commented Sep 27, 2017 • edited Loading

Codecov Report

mcsf commented Sep 26, 2017 •

edited

Loading

mcsf Sep 27, 2017 •

edited

Loading

codecov bot commented Sep 27, 2017 •

edited

Loading