Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add cookbook for JSON data manipulation #529

Merged
merged 5 commits into from
Oct 4, 2023
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -80,11 +80,11 @@ jobs:

- name: Make target directories
if: github.event_name != 'pull_request' && (startsWith(github.ref, 'refs/tags/v') || github.ref == 'refs/heads/main')
run: mkdir -p json/.native/target json/play/.jvm/target text/native/target cbor-json/native/target finite-state/native/target unidocs/target cbor/js/target finite-state/js/target text/js/target benchmarks/.jvm/target json/play/.js/target json/.jvm/target xml/scala-xml/.native/target csv/jvm/target xml/.jvm/target xml/.js/target cbor/native/target json/circe/.native/target finite-state/jvm/target cbor-json/js/target cbor/jvm/target csv/native/target json/circe/.jvm/target csv/js/target csv/generic/jvm/target text/jvm/target xml/.native/target json/diffson/.native/target json/diffson/.js/target cbor-json/jvm/target json/interpolators/.jvm/target json/.js/target json/interpolators/.js/target csv/generic/js/target json/circe/.js/target json/diffson/.jvm/target xml/scala-xml/.js/target csv/generic/native/target xml/scala-xml/.jvm/target json/interpolators/.native/target project/target
run: mkdir -p json/.native/target json/play/.jvm/target text/native/target cbor-json/native/target finite-state/native/target unidocs/target cbor/js/target finite-state/js/target text/js/target json/play/.js/target json/.jvm/target xml/scala-xml/.native/target csv/jvm/target xml/.jvm/target xml/.js/target cbor/native/target json/circe/.native/target finite-state/jvm/target cbor-json/js/target cbor/jvm/target csv/native/target json/circe/.jvm/target csv/js/target csv/generic/jvm/target text/jvm/target xml/.native/target json/diffson/.native/target json/diffson/.js/target cbor-json/jvm/target json/interpolators/.jvm/target json/.js/target json/interpolators/.js/target csv/generic/js/target json/circe/.js/target json/diffson/.jvm/target xml/scala-xml/.js/target csv/generic/native/target xml/scala-xml/.jvm/target json/interpolators/.native/target project/target

- name: Compress target directories
if: github.event_name != 'pull_request' && (startsWith(github.ref, 'refs/tags/v') || github.ref == 'refs/heads/main')
run: tar cf targets.tar json/.native/target json/play/.jvm/target text/native/target cbor-json/native/target finite-state/native/target unidocs/target cbor/js/target finite-state/js/target text/js/target benchmarks/.jvm/target json/play/.js/target json/.jvm/target xml/scala-xml/.native/target csv/jvm/target xml/.jvm/target xml/.js/target cbor/native/target json/circe/.native/target finite-state/jvm/target cbor-json/js/target cbor/jvm/target csv/native/target json/circe/.jvm/target csv/js/target csv/generic/jvm/target text/jvm/target xml/.native/target json/diffson/.native/target json/diffson/.js/target cbor-json/jvm/target json/interpolators/.jvm/target json/.js/target json/interpolators/.js/target csv/generic/js/target json/circe/.js/target json/diffson/.jvm/target xml/scala-xml/.js/target csv/generic/native/target xml/scala-xml/.jvm/target json/interpolators/.native/target project/target
run: tar cf targets.tar json/.native/target json/play/.jvm/target text/native/target cbor-json/native/target finite-state/native/target unidocs/target cbor/js/target finite-state/js/target text/js/target json/play/.js/target json/.jvm/target xml/scala-xml/.native/target csv/jvm/target xml/.jvm/target xml/.js/target cbor/native/target json/circe/.native/target finite-state/jvm/target cbor-json/js/target cbor/jvm/target csv/native/target json/circe/.jvm/target csv/js/target csv/generic/jvm/target text/jvm/target xml/.native/target json/diffson/.native/target json/diffson/.js/target cbor-json/jvm/target json/interpolators/.jvm/target json/.js/target json/interpolators/.js/target csv/generic/js/target json/circe/.js/target json/diffson/.jvm/target xml/scala-xml/.js/target csv/generic/native/target xml/scala-xml/.jvm/target json/interpolators/.native/target project/target

- name: Upload target directories
if: github.event_name != 'pull_request' && (startsWith(github.ref, 'refs/tags/v') || github.ref == 'refs/heads/main')
Expand Down Expand Up @@ -265,7 +265,7 @@ jobs:
- name: Submit Dependencies
uses: scalacenter/sbt-dependency-submission@v2
with:
modules-ignore: rootjs_2.12 rootjs_2.13 rootjs_3 site_2.12 site_2.13 site_3 rootjvm_2.12 rootjvm_2.13 rootjvm_3 rootnative_2.12 rootnative_2.13 rootnative_3
modules-ignore: rootjs_2.12 rootjs_2.13 rootjs_3 site_2.12 site_2.13 site_3 jq-like_2.12 jq-like_2.13 jq-like_3 benchmarks_2.12 benchmarks_2.13 benchmarks_3 rootjvm_2.12 rootjvm_2.13 rootjvm_3 rootnative_2.12 rootnative_2.13 rootnative_3 jq-like_sjs1_2.12 jq-like_sjs1_2.13 jq-like_sjs1_3 jq-like_native0.4_2.12 jq-like_native0.4_2.13 jq-like_native0.4_3
configs-ignore: test scala-tool scala-doc-tool test-internal

validate-steward:
Expand Down
63 changes: 55 additions & 8 deletions build.sbt
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
import laika.config.SourceLinks
import laika.helium.config.TextLink
import laika.helium.config.ThemeNavigationSection
import laika.ast
Expand All @@ -12,8 +13,9 @@ import laika.ast.TemplateString
import laika.helium.config.HeliumIcon
import laika.helium.config.IconLink
import com.typesafe.tools.mima.core._
import laika.config.{LinkConfig, ApiLinks}
import laika.config._
import sbt.Def._
import scala.scalanative.build._

val scala212 = "2.12.18"
val scala213 = "2.13.12"
Expand Down Expand Up @@ -109,7 +111,8 @@ val root = tlCrossRootProject
cbor,
cborJson,
finiteState,
unidocs
unidocs,
exampleJq
)
.settings(commonSettings)
.enablePlugins(NoPublishPlugin)
Expand Down Expand Up @@ -463,7 +466,7 @@ lazy val cborJson = crossProject(JVMPlatform, JSPlatform, NativePlatform)
lazy val benchmarks = crossProject(JVMPlatform)
.crossType(CrossType.Pure)
.in(file("benchmarks"))
.enablePlugins(JmhPlugin)
.enablePlugins(JmhPlugin, NoPublishPlugin)
.settings(commonSettings)
.settings(
libraryDependencies ++= Seq(
Expand All @@ -474,6 +477,33 @@ lazy val benchmarks = crossProject(JVMPlatform)
)
.dependsOn(csv, scalaXml, jsonCirce)

lazy val exampleJq = crossProject(JVMPlatform, NativePlatform, JSPlatform)
.crossType(CrossType.Pure)
.in(file("examples/jqlike"))
.enablePlugins(NoPublishPlugin)
.settings(commonSettings)
.settings(
name := "jq-like",
libraryDependencies ++= List(
"co.fs2" %%% "fs2-io" % fs2Version,
"com.monovore" %%% "decline-effect" % "2.4.1"
)
)
.jvmSettings(
assembly / mainClass := Some("fs2.data.example.jqlike.JqLike"),
assembly / assemblyJarName := "jq-like.jar"
)
.nativeSettings(nativeConfig ~= {
_.withLTO(LTO.thin)
.withMode(Mode.releaseFast)
.withGC(GC.immix)
})
.jsSettings(
scalaJSUseMainModuleInitializer := true,
scalaJSLinkerConfig ~= (_.withModuleKind(ModuleKind.CommonJSModule))
)
.dependsOn(csvGeneric, scalaXml, jsonCirce, cborJson)

val homeLink: ThemeLink =
ImageLink.internal(ast.Path.Root / "index.md", Image.internal(ast.Path.Root / "media" / "logo-header.svg"))

Expand Down Expand Up @@ -525,7 +555,11 @@ lazy val site = project
.site
.externalCSS("/pagefind/pagefind-ui.css")
.site
.externalJS("/pagefind/pagefind-ui.js"),
.externalJS("/pagefind/pagefind-ui.js")
.site
.internalCSS(ast.Path.Root / "css")
.site
.internalJS(ast.Path.Root / "js"),
libraryDependencies ++= List(
"com.beachape" %% "enumeratum" % "1.7.0",
"org.gnieh" %% "diffson-circe" % diffsonVersion,
Expand All @@ -535,10 +569,23 @@ lazy val site = project
),
scalacOptions += "-Ymacro-annotations",
mdocIn := file("site"),
laikaConfig := tlSiteApiUrl.value.fold(LaikaConfig.defaults)(url =>
LaikaConfig.defaults
.withConfigValue(LinkConfig.empty
.addApiLinks(ApiLinks(baseUri = url.toString().dropRight("fs2/data/index.html".size))))),
laikaConfig := tlSiteApiUrl.value
.fold(LaikaConfig.defaults)(url =>
LaikaConfig.defaults
.withConfigValue(
LinkConfig.empty
.addApiLinks(ApiLinks(baseUri = url.toString().dropRight("fs2/data/index.html".size)))
.addSourceLinks(SourceLinks(
baseUri = "https://github.com/gnieh/fs2-data/tree/main/examples/jqlike/src/main/scala/",
suffix = "scala"
).withPackagePrefix("fs2.data.example.jqlike"))))
.withConfigValue(
Selections(
SelectionConfig("platform",
ChoiceConfig("jvm", "JVM"),
ChoiceConfig("native", "Scala Native"),
ChoiceConfig("js", "Scala.JS")).withSeparateEbooks
)),
laikaExtensions += PrettyURLs
)
.dependsOn(csv.jvm,
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
/*
* Copyright 2023 Lucas Satabin
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

package fs2.data.example.jqlike

import cats.effect.{ExitCode, IO}
import cats.syntax.all._
import com.monovore.decline.Opts
import com.monovore.decline.effect.CommandIOApp
import fs2.Stream
import fs2.data.json.jq.{Jq, JqParser}
import fs2.data.json.{jq, render, tokens}
import fs2.io.file.{Files, Path}

object JqLike extends CommandIOApp(name = "fs2-jq", header = "A streaming implementation of a jq-like tool") {

val query: Opts[Option[String]] =
Opts
.option[String](long = "query", short = "q", help = "The query to execute on the input (default to '.')")
satabin marked this conversation as resolved.
Show resolved Hide resolved
.orNone

val input: Opts[Either[String, Path]] =
Opts
.option[String](long = "input", short = "i", help = "The input json string")
satabin marked this conversation as resolved.
Show resolved Hide resolved
.map(_.asLeft)
.orElse(Opts.option[String](long = "file", short = "f", help = "The input json file").map(Path(_).asRight))

val output: Opts[Option[Path]] = Opts
.option[String](long = "output", short = "o", help = "The output file (outputs to stdout if not provided)")
.map(Path(_))
.orNone

override def main: Opts[IO[ExitCode]] =
(query, input, output)
.mapN { (query, input, output) =>
val queryCompiler = jq.Compiler[IO]
for {
// first parse the provided query
query <- query.fold(IO.pure(Jq.Identity: Jq))(JqParser.parse[IO](_))
// then compile it
compiled <- queryCompiler.compile(query)
timed <- input
// then read either from the string input or from the file input
.fold(Stream.emit(_), Files[IO].readUtf8(_))
// parse the input as json
.through(tokens)
// execute the compiled query on the input
.through(compiled)
// render the query result
.through(render.pretty())
// encode the result
.through(fs2.text.utf8.encode[IO])
// and save it to the output
.through(output.fold(fs2.io.stdout[IO])(Files[IO].writeAll(_)))
// finally run all the things
.compile
.drain
.timed
_ <- output.fold(IO.println(""))(p => IO.println(s"Result written to $p"))
_ <- IO.println(s"Processed in ${timed._1.toMillis}ms")
} yield ExitCode.Success
}
.map(_.handleErrorWith(t => IO.println(t.getMessage()).as(ExitCode.Error)))

}
2 changes: 1 addition & 1 deletion json/src/main/scala/fs2/data/json/jq/Compiler.scala
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ case class JqException(msg: String) extends Exception(msg)
/** A compiler for jq queries into some compiled form. */
trait Compiler[F[_]] {

def compile(jq: Jq): F[CompiledJq[F]]
def compile(jq: Jq): F[Pipe[F, Token, Token]]

}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -317,7 +317,7 @@ private[jq] class ESPJqCompiler[F[_]](implicit F: MonadThrow[F], defer: Defer[F]
pure(Query.Ordpath(prefix ~ filter))
}

def compile(jq: Jq): F[CompiledJq[F]] =
def compile(jq: Jq): F[Pipe[F, Token, Token]] =
for {
query <- preprocess(Jq.Root, jq).runA(0)
mft = compile(query)
Expand Down
2 changes: 2 additions & 0 deletions project/plugins.sbt
Original file line number Diff line number Diff line change
Expand Up @@ -8,3 +8,5 @@ addSbtPlugin("org.portable-scala" % "sbt-scalajs-crossproject" % "1.3.2")

addSbtPlugin("org.scala-native" % "sbt-scala-native" % "0.4.15")
addSbtPlugin("org.portable-scala" % "sbt-scala-native-crossproject" % "1.3.2")

addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "2.1.3")
2 changes: 2 additions & 0 deletions site/cookbooks/data/json/sample.json

Large diffs are not rendered by default.

152 changes: 152 additions & 0 deletions site/cookbooks/jq.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,152 @@
# Reading/transforming/writing JSON data

In this cookbook we will demonstrate an example of how the JSON tools provided by `fs2-data` can be used to build a mini `jq`-like CLI tool.

## High-level overview

The general approach to reading/parsing/transforming/generating data with `fs2-data` can be summarized as follows:

```mermaid
graph LR
Reading(Reading) --> Parsing --> Transforming --> Printing --> Writing(Writing)
```

The _Reading_ and _Writing_ steps are not specific to `fs2-data` but rely on pure `fs2` operators or other compatible libraries. The _Parsing_, _Transforming_, and _Printing_ phases will use the tools provided by `fs2-data-json` and more specifically:

- The `tokens` pipe to parse the input stream into JSON @:api(fs2.data.json.Token)s (see [the documentation][json-doc] for more details).
- The @:api(fs2.data.json.jq.Compiler) class to compile a query into a pipe (see [the documentation][jq-doc] for more details).
- The `render.pretty` pipe to render the query result into a pretty-printed JSON string (see [the documentation][render-doc] for more details).

In general the _Transforming_ step can use whatever operator fits your purpose, from `fs2` or any other `fs2`-based library. But in our case the only transformation will be performed by the query.

## Basic implementation

### Reading and writing

In this example, we will read the content from a [sample JSON file][data-json] and write the result to stdout.
To this end, we will use the operators and pipes provided by `fs2-io`.

```scala mdoc
import cats.effect.IO
import cats.effect.unsafe.implicits.global

import fs2.io.file.{Files, Path}
import fs2.io.stdout
import fs2.text.utf8

Files[IO]
.readUtf8(Path("site/cookbooks/data/json/sample.json"))
.through(utf8.encode[IO])
.through(stdout)
.compile
.drain
.unsafeRunSync()
```

This snippet is pure `fs2` and does not involve `fs2-data` at any point.

### Parsing and printing

The next step would be to parse and render the JSON data, using the appropriate `fs2-data` pipes. This can be achieved this way:

```scala mdoc
import fs2.data.json

Files[IO]
.readUtf8(Path("site/cookbooks/data/json/sample.json"))
.through(json.tokens) // parsing JSON input
.through(json.render.pretty()) // pretty printing JSON stream
.through(utf8.encode[IO])
.through(stdout)
.compile
.drain
.unsafeRunSync()
```

### Transforming

So far the only thing that the code does is to format the input into the output.
Looking at the input, we see that it consists in an array of objects containing several fields.
Let's say we are interested in the `name` and `language` fields.
For each element in the array we would like to emit an object with both fields, but the `name` oned should be renamed `full_name`.

To this end we can write the following query using the `jq` interpolator:

```scala mdoc
import fs2.data.json.jq.literals._

val query = jq""".[] | { "full_name": .name, "language": .language }"""
```

The query can now be compiled into a `Pipe`:

```scala mdoc
import fs2.data.json.jq.Compiler

val queryCompiler = Compiler[IO]

val queryPipe = queryCompiler.compile(query).unsafeRunSync()
```

Now this pipe can be used to transform the data within the previous pipeline

```scala mdoc
Files[IO]
.readUtf8(Path("site/cookbooks/data/json/sample.json"))
.through(json.tokens)
.through(queryPipe) // the transformation using the query pipe
.through(json.render.pretty())
.through(utf8.encode[IO])
.through(stdout)
.compile
.drain
.unsafeRunSync()
```

And you get the result of the query execution printed to stdout.


## Running the full example

The full code can be found in the repository in the @:source(fs2.data.example.jqlike.JqLike) object.
This example uses [decline][decline] to parse the CLI options.

It compiles for all three supported platforms:

- as a fat jar using [sbt-assembly] for JVM
- as a native executable using [Scala Native][scala-native]
- as a Node.js application using [Scala.js][scala-js]

@:select(platform)

@:choice(jvm)

```shell
$ LC_ALL=C.UTF-8 sbt exampleJqJVM/assembly
$ java -jar examples/jqlike/.jvm/target/scala-2.13/jq-like.jar -q '.[] | { "full_name": .name, "language": .language }' -f site/cookbooks/data/json/sample.json
```

@:choice(native)

```shell
$ sbt exampleJqNative/nativeLink
$ examples/jqlike/.native/target/scala-2.13/jq-like-out -q '.[] | { "full_name": .name, "language": .language }' -f site/cookbooks/data/json/sample.json
```

@:choice(js)

```shell
$ sbt exampleJqJS/fastLinkJS
$ node examples/jqlike/.js/target/scala-2.13/jq-like-fastopt/main.js -q '.[] | { "full_name": .name, "language": .language }' -f site/cookbooks/data/json/sample.json
```

@:@

[decline]: https://ben.kirw.in/decline/
[data-json]: data/json/sample.json
[json-doc]: ../documentation/json/index.md#json-parsing
[jq-doc]: ../documentation/json/jq.md#using-queries
[render-doc]: ../documentation/json/index.md#json-renderers
[sbt-assembly]: https://github.com/sbt/sbt-assembly
[scala-native]: https://scala-native.org/en/latest/
[scala-js]: https://www.scala-js.org
5 changes: 5 additions & 0 deletions site/directory.conf
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
laika.navigationOrder = [
index.md,
documentation,
cookbooks
]
Loading