From c070740474989469f1d0efdb6f8b37855fc40ef4 Mon Sep 17 00:00:00 2001 From: Liam Bigelow <40188355+bglw@users.noreply.github.com> Date: Wed, 22 Jun 2022 16:21:28 +1200 Subject: [PATCH] Add support for config files and environment variables --- README.md | 110 ++++++++++++++++++-- pagefind/Cargo.toml | 10 +- pagefind/features/base.feature | 2 + pagefind/features/build_options.feature | 14 +-- pagefind/features/config_sources.feature | 79 ++++++++++++++ pagefind/features/exact_phrase.feature | 2 + pagefind/features/exclusions.feature | 4 + pagefind/features/filtering.feature | 2 + pagefind/features/fragments.feature | 2 + pagefind/features/index_chunking.feature | 4 + pagefind/features/partial_matching.feature | 4 + pagefind/features/sanity.feature | 3 +- pagefind/features/scoring.feature | 2 + pagefind/features/search_options.feature | 6 +- pagefind/features/spellcheck.feature | 4 + pagefind/features/stemming.feature | 2 + pagefind/src/fossick/mod.rs | 11 +- pagefind/src/lib.rs | 2 +- pagefind/src/main.rs | 114 +++++++++++++++------ pagefind/src/options.rs | 80 ++++++++++----- pagefind/tests/browser.rs | 6 +- pagefind/tests/cucumber.rs | 24 ++++- pagefind/tests/steps/step_definitions.rs | 22 +++- pagefind/tests/steps/web_steps.rs | 4 +- 24 files changed, 422 insertions(+), 91 deletions(-) create mode 100644 pagefind/features/config_sources.feature diff --git a/README.md b/README.md index f4843bb8..7778abd0 100644 --- a/README.md +++ b/README.md @@ -11,7 +11,17 @@ npx pagefind@latest -s public Where `public` matches your output dir — `_site` for Jekyll etc. -This will currently index all content within your `body`. Pagefind currently has a few tags that can be used to customize this: +By default, this will index all content within your `body`. Pagefind currently has a few tags that can be used to customize this: + +#### Limiting Indexing to Elements + +Adding `data-pagefind-body` to an element will cause Pagefind to exlusively index content within this element and its children, instead of indexing the entire ``. In most cases, you will want to add this attribute to the main element in your content layout. + +If there are multiple regions you want to index, `data-pagefind-body` can be set on multiple elements on the same page. + +If a `data-pagefind-body` element is found anywhere on your site, any pages without this element will be excluded from search. This means if you tag a specific region on your `post` layout with `data-pagefind-body`, your homepage will no longer be indexed (unless it too has a `data-pagefind-body` element). + +Note: Filters and metadata outside a body element will still be processed. #### Ignoring Elements @@ -26,7 +36,7 @@ Adding `data-pagefind-ignore` to an element will exclude it from the search inde ``` -Note: Filters and metadata inside an ignored element will **not** be ignored, so you can tag a filter inside the ``, for example. +Note: Filters and metadata inside an ignored element will still be processed. #### Filters @@ -70,7 +80,43 @@ Metadata can be returned alongside the page content after a search. This can be #### Local Development -Since Pagefind runs on a built site, you will need to build your site locally → run Pagefind → host that directory. Some more work is needed to improve this dev experience, but that hasn't been scoped yet. +Since Pagefind runs on a built site, you will currently need to build your site locally → run Pagefind → host that directory. Improving this development experience is on the roadmap. + +### Configuration + +Pagefind can be configured through CLI flags, environment variables, or configuration files. Values will be merged from all sources, with CLI flags overriding environment variables, and environment variables overriding configuration files. + +#### Config files + +Pagefind will look for a `pagefind.toml`, `pagefind.yml`, or `pagefind.json` file in the directory that you have run the command in. + +```bash +echo "source: public" > pagefind.yml +npx pagefind +``` + +#### Environment Variables + +Pagefind will load any values via a `PAGEFIND_*` environment variable. + +```bash +PAGEFIND_SOURCE=public npx pagefind +``` + +#### CLI Flags + +Pagefind can be passed CLI flags directly. + +```bash +npx pagefind --source public +``` + +#### Configuration Options: + +| flag | env | config | default | description | +|--------------|---------------------|------------|-----------|------------------------------------------------------------| +| --source | PAGEFIND_SOURCE | source | | Required: The location of your built static site | +| --bundle-dir | PAGEFIND_BUNDLE_DIR | bundle_dir | _pagefind | The folder to output search files into, relative to source | ### Usage @@ -95,6 +141,7 @@ This will return the following object: { id: "6fceec9", data: async function data(), + filters: {}, } ] } @@ -111,19 +158,70 @@ Which will yield: ```js { "url": "/url-of-the-page/", - "title": "The title from the first h1 element on the page", "excerpt": "A small snippet of the content, with the search term(s) highlighted in mark elements.", "filters": { - "author": "CloudCannon" + "author": "CloudCannon" }, "meta": { - "image": "/weka.png" + "title": "The title from the first h1 element on the page", + "image": "/weka.png" }, "content": "The full content of the page, formatted as text. Cursus Ipsum Risus Ullamcorper...", "word_count": 242, } ``` +#### Filtering + +To load the available filters, you can run: + +```js +const filters = await pagefind.filters(); +``` + +This will return the following object, showing the number of search results available under the given `filter: value`. +```js +{ + "filter": { + "value_one": 4, + "value_two": 12, + }, + "color": { + "Orange": 6 + } +} +``` + +To filter results alongside searching, pass an options object to the search function: +```js +const search = await pagefind.search("hello", { + filters: { + color: "Orange" + } +}); +``` + +If the filters have been loaded with `await pagefind.filters()`, counts will also be returned with each search object, detailing the number of remaining items for each filter value: +```js +{ + results: [ + { + id: "6fceec9", + data: async function data(), + filters: { + "filter": { + "value_one": 0, + "value_two": 3, + }, + "color": { + "Orange": 1 + } + }, + } + ] +} +``` + #### Basic Bad Vanilla JS Example ```js diff --git a/pagefind/Cargo.toml b/pagefind/Cargo.toml index fcb3c85d..ea7175ca 100644 --- a/pagefind/Cargo.toml +++ b/pagefind/Cargo.toml @@ -11,7 +11,8 @@ name = "cucumber" harness = false # Allows Cucumber to print output instead of libtest [dependencies] -clap = "2.33.0" +anyhow = "1.0" +clap = { version = "3.2.6", features = ["derive"] } kuchiki = "0.8.1" wax = "0.4.0" futures = "0.3" @@ -30,6 +31,13 @@ sha-1 = "0.10" serde_json = "1" serde = { version = "1", features = ["derive"] } lazy_static = "1.4.0" +twelf = { version = "0.5", default-features = false, features = [ + "env", + "clap", + "json", + "yaml", + "toml", +] } [dev-dependencies] json_dotpath = "1.1.0" diff --git a/pagefind/features/base.feature b/pagefind/features/base.feature index ff5568d4..d330bc77 100644 --- a/pagefind/features/base.feature +++ b/pagefind/features/base.feature @@ -1,5 +1,7 @@ Feature: Base Tests Background: + Given I have the environment variables: + | PAGEFIND_SOURCE | public | Given I have a "public/index.html" file with the body: """

Nothing

diff --git a/pagefind/features/build_options.feature b/pagefind/features/build_options.feature index 613ef32a..66c75520 100644 --- a/pagefind/features/build_options.feature +++ b/pagefind/features/build_options.feature @@ -1,13 +1,7 @@ Feature: Build Options - - @skip - Scenario: Settings can be pulled from configuration files - - @skip - Scenario: Settings can be pulled from commandline flags - - @skip - Scenario: Settings can be pulled from environment variables + Background: + Given I have the environment variables: + | PAGEFIND_SOURCE | public | Scenario: Source folder can be configured Given I have a "my_website/index.html" file with the body: @@ -48,7 +42,7 @@ Feature: Build Options

world

""" When I run my program with the flags: - | --bundle_dir _search | + | --bundle-dir _search | Then I should see "Running Pagefind" in stdout Then I should see the file "public/_search/pagefind.js" When I serve the "public" directory diff --git a/pagefind/features/config_sources.feature b/pagefind/features/config_sources.feature new file mode 100644 index 00000000..86e848b5 --- /dev/null +++ b/pagefind/features/config_sources.feature @@ -0,0 +1,79 @@ +Feature: Config Sources + + Scenario: Settings can be pulled from TOML configuration files + Given I have a "public/index.html" file with the body: + """ +

Hello.

+ """ + Given I have a "pagefind.toml" file with the content: + """ + source = "public" + """ + When I run my program + Then I should see "Running Pagefind" in stdout + Then I should see the file "public/_pagefind/pagefind.js" + + Scenario: Settings can be pulled from YAML configuration files + Given I have a "public/index.html" file with the body: + """ +

Hello.

+ """ + Given I have a "pagefind.yml" file with the content: + """ + source: public + """ + When I run my program + Then I should see "Running Pagefind" in stdout + Then I should see the file "public/_pagefind/pagefind.js" + + Scenario: Settings can be pulled from JSON configuration files + Given I have a "public/index.html" file with the body: + """ +

Hello.

+ """ + Given I have a "pagefind.json" file with the content: + """ + { + "source": "public" + } + """ + When I run my program + Then I should see "Running Pagefind" in stdout + Then I should see the file "public/_pagefind/pagefind.js" + + Scenario: Settings can be pulled from commandline flags + Given I have a "public/index.html" file with the body: + """ +

Hello.

+ """ + When I run my program with the flags: + | --source public | + Then I should see "Running Pagefind" in stdout + Then I should see the file "public/_pagefind/pagefind.js" + + Scenario: Settings can be pulled from environment variables + Given I have a "public/index.html" file with the body: + """ +

Hello.

+ """ + Given I have the environment variables: + | PAGEFIND_SOURCE | public | + When I run my program + Then I should see "Running Pagefind" in stdout + Then I should see the file "public/_pagefind/pagefind.js" + + Scenario: Settings can be pulled from multiple sources + Given I have a "public/index.html" file with the body: + """ +

Hello.

+ """ + Given I have a "pagefind.json" file with the content: + """ + { + "source": "public" + } + """ + When I run my program with the flags: + | --bundle-dir _out | + Then I should see "Running Pagefind" in stdout + Then I should see the file "public/_out/pagefind.js" diff --git a/pagefind/features/exact_phrase.feature b/pagefind/features/exact_phrase.feature index df34eea3..58243677 100644 --- a/pagefind/features/exact_phrase.feature +++ b/pagefind/features/exact_phrase.feature @@ -1,6 +1,8 @@ @skip Feature: Exact Phrase Matching Background: + Given I have the environment variables: + | PAGEFIND_SOURCE | public | Given I have a "public/index.html" file with the body: """

Nothing

diff --git a/pagefind/features/exclusions.feature b/pagefind/features/exclusions.feature index c887c8df..20790854 100644 --- a/pagefind/features/exclusions.feature +++ b/pagefind/features/exclusions.feature @@ -1,5 +1,9 @@ Feature: Exclusions + Background: + Given I have the environment variables: + | PAGEFIND_SOURCE | public | + Scenario: Elements within search regions can be excluded from indexing and excerpts Given I have a "public/index.html" file with the body: """ diff --git a/pagefind/features/filtering.feature b/pagefind/features/filtering.feature index 8665462f..d34f83b6 100644 --- a/pagefind/features/filtering.feature +++ b/pagefind/features/filtering.feature @@ -1,5 +1,7 @@ Feature: Filtering Background: + Given I have the environment variables: + | PAGEFIND_SOURCE | public | Given I have a "public/index.html" file with the body: """

Nothing

diff --git a/pagefind/features/fragments.feature b/pagefind/features/fragments.feature index 09908c63..c6b72ccb 100644 --- a/pagefind/features/fragments.feature +++ b/pagefind/features/fragments.feature @@ -1,5 +1,7 @@ Feature: Fragments Background: + Given I have the environment variables: + | PAGEFIND_SOURCE | public | Given I have a "public/index.html" file with the body: """

Nothing

diff --git a/pagefind/features/index_chunking.feature b/pagefind/features/index_chunking.feature index 22a70ef8..c41693c1 100644 --- a/pagefind/features/index_chunking.feature +++ b/pagefind/features/index_chunking.feature @@ -1,5 +1,9 @@ @skip Feature: Index Chunking + Background: + Given I have the environment variables: + | PAGEFIND_SOURCE | public | + Scenario: Browser only loads chunks needed to search for the target word Scenario: Chunk size is configurable diff --git a/pagefind/features/partial_matching.feature b/pagefind/features/partial_matching.feature index 119c2e18..fb3f8c6e 100644 --- a/pagefind/features/partial_matching.feature +++ b/pagefind/features/partial_matching.feature @@ -1,6 +1,10 @@ @skip Feature: Partial Matching + Background: + Given I have the environment variables: + | PAGEFIND_SOURCE | public | + Scenario: Search will return pages that match 2 out of 3 words Given I have a "public/cat/index.html" file with the body: """ diff --git a/pagefind/features/sanity.feature b/pagefind/features/sanity.feature index 0af6635c..ea70ddd7 100644 --- a/pagefind/features/sanity.feature +++ b/pagefind/features/sanity.feature @@ -2,7 +2,8 @@ Feature: Sanity Tests Scenario: CLI tests are working Given I have a "public/index.html" file - When I run my program + When I run my program with the flags: + | --source public | Then I should see "Running Pagefind" in stdout Scenario: Web tests are working diff --git a/pagefind/features/scoring.feature b/pagefind/features/scoring.feature index 190e82d9..e8f909d7 100644 --- a/pagefind/features/scoring.feature +++ b/pagefind/features/scoring.feature @@ -1,5 +1,7 @@ Feature: Result Scoring Background: + Given I have the environment variables: + | PAGEFIND_SOURCE | public | Given I have a "public/index.html" file with the body: """