Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Standardize builtin Bytes type #1323

Merged
merged 14 commits into from
Feb 26, 2023
Merged
Show file tree
Hide file tree
Changes from 12 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1,2 +1,4 @@
docs/_site
docs/_site/
dist-newstyle/
/result
.history
12 changes: 10 additions & 2 deletions scripts/generate-test-files.sh
Original file line number Diff line number Diff line change
@@ -1,5 +1,13 @@
set -eu

nix build --file ./release.nix expected-test-files
rsync --archive --checksum --delete result/ ./tests
my-nix() {
nix --extra-experimental-features 'nix-command flakes' "${@}"
}

my-rsync() {
my-nix run 'nixpkgs#rsync' -- "${@}"
}

my-nix build --file ./release.nix expected-test-files
my-rsync --archive --checksum --delete result/ ./tests
chmod -R u+w ./tests
12 changes: 10 additions & 2 deletions scripts/lint-prelude.sh
Original file line number Diff line number Diff line change
@@ -1,5 +1,13 @@
set -eu

nix build --file ./release.nix expected-prelude
rsync --archive --checksum --delete result/ ./Prelude
my-nix() {
nix --extra-experimental-features nix-command --extra-experimental-features flakes "${@}"
mmhat marked this conversation as resolved.
Show resolved Hide resolved
}

my-rsync() {
my-nix run 'nixpkgs#rsync' -- "${@}"
}

my-nix build --file ./release.nix expected-prelude
my-rsync --archive --checksum --delete result/ ./Prelude
chmod -R u+w ./Prelude
6 changes: 5 additions & 1 deletion scripts/test-vm.sh
Original file line number Diff line number Diff line change
@@ -1,6 +1,10 @@
set -e

nix build --file ./release.nix vm
my-nix() {
nix --extra-experimental-features nix-command --extra-experimental-features flakes "${@}"
mmhat marked this conversation as resolved.
Show resolved Hide resolved
}

my-nix build --file ./release.nix vm

trap 'rm --force dhall-lang.qcow2' EXIT

Expand Down
26 changes: 22 additions & 4 deletions standard/Interpret.hs
Original file line number Diff line number Diff line change
@@ -1,20 +1,27 @@
{-| This module serves two purposes:
{-| This module serves three purposes:

* … to show how to connect together the various interpretation phases into a
complete interpretation pass

* … to power the @dhall@ executable included in this package which is a
reference implementation of the Dhall configuration language

* … to help generating the CBOR-encoded Dhall files (*.dhallb) of the test
suite. To do so, run @dhall output.dhallb < input.dhall@.
-}
module Interpret
( -- * Main
main
) where

import qualified Binary
import qualified Data.Text.IO as Text.IO
import qualified Codec.CBOR.Term as CBOR
import qualified Codec.CBOR.Write as CBOR
import qualified Data.ByteString as ByteString
import qualified Data.Text.IO as Text.IO
import qualified Parser
import qualified Text.Megaparsec as Megaparsec
import qualified System.Environment
import qualified Text.Megaparsec as Megaparsec

main :: IO ()
main = do
Expand All @@ -31,4 +38,15 @@ main = do
Left errors -> fail (Megaparsec.errorBundlePretty errors)
Right expression -> return expression

print (Binary.encode expression)
let encoded = Binary.encode expression

-- Print a human readable representation of the CBOR terms of the encoded
-- expression to stdout.
print encoded

-- Write the serialized CBOR terms to the file provided as the first command
-- line argument (if there is one).
args <- System.Environment.getArgs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps add a comment explaining that this is to make it easier to generate .diag files

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It actually helps to generate the *.dhallb of the testsuite; The *.diag are still generated by cbor2diab.rb.
However, I added some more documentation to the Interpret.hs module explaining how to do that.
Do you think that is sufficient?

case args of
[fp] -> ByteString.writeFile fp (CBOR.toStrictByteString (CBOR.encodeTerm encoded))
_ -> return ()
30 changes: 26 additions & 4 deletions standard/Parser.hs
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,7 @@ import Syntax
import Text.Megaparsec
( MonadParsec
, Parsec
, count
, notFollowedBy
, satisfy
, takeWhileP
Expand All @@ -58,6 +59,8 @@ import Text.Megaparsec
import qualified Control.Monad.Combinators.NonEmpty as Combinators.NonEmpty
import qualified Crypto.Hash as Hash
import qualified Data.ByteArray.Encoding as ByteArray.Encoding
import qualified Data.ByteString.Char8 as ByteString8
import qualified Data.ByteString.Base16 as Base16
import qualified Data.Char as Char
import qualified Data.List.NonEmpty as NonEmpty
import qualified Data.Map as Map
Expand Down Expand Up @@ -448,6 +451,20 @@ interpolation = do
textLiteral :: Parser TextLiteral
textLiteral = doubleQuoteLiteral <|> singleQuoteLiteral

bytesLiteral :: Parser ByteString
bytesLiteral = hexadecimal
where
hexadecimal = do
"0x\""

chunks <- many (count 2 (satisfy hexDig))

char '"'

case Base16.decodeBase16 $ ByteString8.pack $ concat chunks of
Left e -> fail $ Text.unpack e
Right bytes -> return bytes

reservedKeywords :: [Text]
reservedKeywords =
[ "if"
Expand Down Expand Up @@ -542,8 +559,8 @@ forallKeyword = void "forall"
forallSymbol :: Parser ()
forallSymbol = void "∀"

forall :: Parser ()
forall = forallSymbol <|> forallKeyword
forall_ :: Parser ()
forall_ = forallSymbol <|> forallKeyword

with :: Parser ()
with = void "with"
Expand Down Expand Up @@ -584,6 +601,7 @@ builtin =
<|> _Integer
<|> _Double
<|> _Text
<|> _Bytes
<|> _List
<|> _Date
<|> _TimeZone
Expand Down Expand Up @@ -682,6 +700,9 @@ _Double = do "Double"; return Double
_Text :: Parser Builtin
_Text = do "Text"; return Text

_Bytes :: Parser Builtin
_Bytes = do "Bytes"; return Bytes

_List :: Parser Builtin
_List = do "List"; return List

Expand Down Expand Up @@ -1564,7 +1585,7 @@ expression =

return (foldr (\(x, mA, a) -> Let x mA a) b bindings)
)
<|> (do forall
<|> (do forall_

whsp

Expand Down Expand Up @@ -1969,9 +1990,10 @@ primitiveExpression :: Parser Expression
primitiveExpression =
temporalLiteral
<|> (do n <- try doubleLiteral; return (DoubleLiteral n))
<|> (do n <- naturalLiteral; return (NaturalLiteral n))
<|> (do n <- try naturalLiteral; return (NaturalLiteral n))
<|> (do n <- integerLiteral; return (IntegerLiteral n))
<|> (do t <- textLiteral; return (TextLiteral t))
<|> (do x <- bytesLiteral; return (BytesLiteral x))
<|> (do "{"

whsp
Expand Down
2 changes: 2 additions & 0 deletions standard/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -117,6 +117,8 @@ s'' : A multi-line `Text` literal with only one line
''
ss
s'' : A multi-ine `Text` literal with more than one line

0x"0123456789abcdef" : A base16-encoded `Bytes` literal (See RFC4648 - Section 8)
```

You will see this notation in judgments that perform induction on lists,
Expand Down
9 changes: 9 additions & 0 deletions standard/alpha-normalization.md
Original file line number Diff line number Diff line change
Expand Up @@ -597,6 +597,15 @@ alphaNormalize (TextLiteral (Chunks xys₀ z)) = TextLiteral (Chunks xys₁ z)
```


───────────────────────────────────────────
0x"0123456789abcdef" ↦ 0x"0123456789abcdef"


```haskell
alphaNormalize (BytesLiteral xs) = BytesLiteral xs
```


───────
{} ↦ {}

Expand Down
24 changes: 24 additions & 0 deletions standard/beta-normalization.md
Original file line number Diff line number Diff line change
Expand Up @@ -1058,6 +1058,30 @@ betaNormalize (Builtin TextShow ) = Builtin TextShow
betaNormalize (Builtin TextReplace) = Builtin TextReplace
```

## `Bytes`

The `Bytes` type is in normal form:


─────────────
Bytes ⇥ Bytes


```haskell
betaNormalize (Builtin Bytes) = Builtin Bytes
```

A `Bytes` literal is in normal form:


───────────────────────────────────────────
0x"0123456789abcdef" ⇥ 0x"0123456789abcdef"


```haskell
betaNormalize (BytesLiteral xs) = BytesLiteral xs
```

## `List`

The `List` type-level function is in normal form:
Expand Down
66 changes: 63 additions & 3 deletions standard/binary.md
Original file line number Diff line number Diff line change
Expand Up @@ -352,6 +352,10 @@ matching their identifier.
encode(Text) = "Text"


───────────────────────
encode(Bytes) = "Bytes"


─────────────────────
encode(List) = "List"

Expand Down Expand Up @@ -410,6 +414,7 @@ encode (Builtin Natural ) = TString "Natural"
encode (Builtin Integer ) = TString "Integer"
encode (Builtin Double ) = TString "Double"
encode (Builtin Text ) = TString "Text"
encode (Builtin Bytes ) = TString "Bytes"
encode (Builtin List ) = TString "List"
encode (Builtin Date ) = TString "Date"
encode (Builtin Time ) = TString "Time"
Expand Down Expand Up @@ -1050,6 +1055,26 @@ odd elements being strings and the even ones being interpolated expressions.
Note: this means that the first and the last encoded elements are always strings,
even if they are empty strings.

### `Bytes`

Encode `Bytes` literals by decoding it from its encoded representation to a CBOR
major type 2 byte string:


base16decode(0x"0123456789abcdef") = bytes
───────────────────────────────────────────────
encode(0x"0123456789abcdef") = [ 33, b"bytes" ]


The `base16decode` judgement stands in for the base-16 decoding algorithm
specified in
[RFC4648 - Section 8](https://tools.ietf.org/html/rfc4648#section-8), treated
as a pure function from text to a byte array.

```haskell
encode (BytesLiteral xs) = TList [ TInt 33, TBytes xs ]
```

### `assert`

An assertion encoded its own annotation (regardless of whether the annotation is
Expand Down Expand Up @@ -1079,6 +1104,7 @@ Imports are encoded as a list where the first three elements are always:
* The import type is `0` for when importing a Dhall expression (the default)
* The import type is `1` for importing `as Text`
* The import type is `2` for importing `as Location`
* The import type is `3` for importing `as Bytes`

For example, if an import does not specify an integrity check or import type
then the CBOR expression begins with:
Expand Down Expand Up @@ -1203,10 +1229,19 @@ instead of `0`:
encode(import as Text) = [ 24, x, 1, xs… ]


If you import `as Location`, then the third element encoding the import type is `2`
If you import `as Bytes`, then the third element encoding the import type is `3`
instead of `0`:


encode(import) = [ 24, x, 0, xs… ]
───────────────────────────────────────────
encode(import as Bytes) = [ 24, x, 3, xs… ]


If you import `as Location`, then the third element encoding the import type is
`2` instead of `0`:


encode(import) = [ 24, x, 0, xs… ]
──────────────────────────────────────────
encode(import as Location) = [ 24, x, 2, xs… ]
Expand All @@ -1223,6 +1258,7 @@ encode (Import importType₀ importMode₀ hash₀) =
importMode₁ = case importMode₀ of
Code -> TInt 0
RawText -> TInt 1
RawBytes -> TInt 3
Location -> TInt 2

importType₁ = case importType₀ of
Expand Down Expand Up @@ -1354,7 +1390,7 @@ encode (With e₀ (k₀ :| ks₀) v₀) = TList [ TInt 29, e₁, TList (k₁ : k

k₁ = encodeKey k₀
ks₁ = map encodeKey ks₀

encodeKey DescendOptional = TInt 0
encodeKey (Label k) = TString k
```
Expand Down Expand Up @@ -1555,6 +1591,10 @@ a built-in identifier if it matches any of the following strings:
decode("Text") = Text


─────────────────────────
decode("Bytes") = Bytes


─────────────────────────
decode("List") = List

Expand Down Expand Up @@ -1972,6 +2012,21 @@ Decode a CBOR array beginning with a `18` as a `Text` literal:
decode([ 18, "a", b₁, "c", d₁, "e", …, "x", y₁, "z" ]) = "a${b₀}c${d}e…x${y₀}z"


### `Bytes`

Decode a CBOR array beginning with a `33` as a `Bytes` literal:


base16encode(bytes) = 0x"0123456789abcdef"
───────────────────────────────────────────────
decode([ 33, b"bytes" ]) = 0x"0123456789abcdef"


The `base16encode` judgement stands in for the base-16 encoding algorithm
specified in
[RFC4648 - Section 8](https://tools.ietf.org/html/rfc4648#section-8), treated
as a pure function from a byte array to text.

### `assert`

Decode a CBOR array beginning with a `19` as an `assert`:
Expand All @@ -1984,7 +2039,7 @@ Decode a CBOR array beginning with a `19` as an `assert`:

### Imports

Decode a CBOR array beginning with a `24` as an import
Decode a CBOR array beginning with a `24` as an import.

The decoding rules are the exact opposite of the encoding rules:

Expand Down Expand Up @@ -2034,6 +2089,11 @@ The decoding rules are the exact opposite of the encoding rules:
decode([ 24, x, 1, xs… ]) = import as Text


decode([ 24, x, 0, xs… ]) = import
───────────────────────────────────────────
decode([ 24, x, 3, xs… ]) = import as Bytes


decode([ 24, x, 0, xs… ]) = import
──────────────────────────────────────────────
decode([ 24, x, 2, xs… ]) = import as Location
Expand Down
Loading