faq fix (#13986)

JohnSnowLabs · Sep 15, 2023 · 5e2a023 · 5e2a023
1 parent dbb408f
commit 5e2a023
Show file tree

Hide file tree

Showing 7 changed files with 50 additions and 86 deletions.
diff --git a/docs/_data/navigation.yml b/docs/_data/navigation.yml
@@ -10,8 +10,9 @@ header:
   - title: Demo
     url: /infer_meaning_intent
     key: demo
-  - title: '<a href="https://www.johnsnowlabs.com/spark-nlp-blog/" target="_blank">Blog</a>'
+  - title: Blog
     url: https://www.johnsnowlabs.com/spark-nlp-blog/
+    blank: yes
     key: blog
   # - title: '<span style="color: #FF8A00;"><i class = "fab fa-github fa-2x"></i></span>'
   #   url: https://github.com/JohnSnowLabs/spark-nlp

diff --git a/docs/_includes/header.html b/docs/_includes/header.html
@@ -35,9 +35,9 @@
             {%- assign _page_url = __return -%}
             {%- include snippets/get-string-from-locale-config.html locale=_item.titles -%}
             {%- if _nav_url == _page_url or page.nav_key and _item.key and page.nav_key == _item.key -%}
-              <li class="navigation__item navigation__item--active"><a href="{{ _nav_url }}">{%- if _item.title -%}{{ _item.title }}{%- else -%}{{ __return }}{%- endif -%}</a></li>
+              <li class="navigation__item navigation__item--active"><a {% if _item.blank %} target="_blank" {% endif %} href="{{ _nav_url }}">{%- if _item.title -%}{{ _item.title }}{%- else -%}{{ __return }}{%- endif -%}</a></li>
             {%- else -%}
-              <li class="navigation__item {{ article_header.uniq_class }}"><a href="{{ _nav_url }}">{%- if _item.title -%}{{ _item.title }}{%- else -%}{{ __return }}{%- endif -%}</a></li>
+              <li class="navigation__item {{ article_header.uniq_class }}"><a {% if _item.blank %} target="_blank" {% endif %} href="{{ _nav_url }}">{%- if _item.title -%}{{ _item.title }}{%- else -%}{{ __return }}{%- endif -%}</a></li>
             {%- endif -%}
           {%- endfor -%}
           {%- if site.search.provider -%}

diff --git a/docs/en/CPUvsGPUbenchmark.md b/docs/en/CPUvsGPUbenchmark.md
@@ -1,11 +1,11 @@
 ---
 layout: docs
 header: true
+seotitle: GPU vs CPU benchmark
 title: GPU vs CPU benchmark
 permalink: /docs/en/CPUvsGPUbenchmark
 key: docs-concepts
 modify_date: "2023-04-06"
-use_language_switcher: "Python-Scala"
 show_nav: true
 sidebar:
     nav: sparknlp
@@ -19,8 +19,7 @@ Different benchmarks, as well as their takeaways and some conclusions of how to
 
 Each major release comes with big improvements, so please, make sure you use at least that version to fully levearge Spark NLP capabilities on GPU.
 
-</div>
-<div class="h3-box" markdown="1">
+</div><div class="h3-box" markdown="1">
 
 ### Machine specs
 
@@ -30,8 +29,7 @@ An AWS `m5.8xlarge` machine was used for the CPU benchmarking. This machine cons
 #### GPU
 A `Tesla V100 SXM2` GPU with `32GB` of memory was used to calculate the GPU benchmarking.
 
-</div>
-<div class="h3-box" markdown="1">
+</div><div class="h3-box" markdown="1">
 
 ### Versions
 The benchmarking was carried out with the following Spark NLP versions:
@@ -44,8 +42,7 @@ SparkNLP version: `3.3.4`
 
 Spark nodes: 1
 
-</div>
-<div class="h3-box" markdown="1">
+</div><div class="h3-box" markdown="1">
 
 ### Benchmark on classifierDLApproach()
 
@@ -56,8 +53,7 @@ We used the Spark NLP class `ClassifierDL` and it's method `Approach()` as descr
 The pipeline looks as follows:
 ![](/assets/images/gpu_v2_pic3.png)
 
-</div>
-<div class="h3-box" markdown="1">
+</div><div class="h3-box" markdown="1">
 
 #### Dataset
 The size of the dataset was relatively small (200K), consisting of:
@@ -66,8 +62,7 @@ Training (rows): `162250`
 
 Test (rows): `40301`
 
-</div>
-<div class="h3-box" markdown="1">
+</div><div class="h3-box" markdown="1">
 
 #### Training params
 Different batch sizes were tested to demonstrate how GPU performance improves with bigger batches compared to CPU, for a constant number of epochs and learning rate.
@@ -78,14 +73,12 @@ Learning rate:  `0.003`
 
 Batch sizes: `32`, `64`, `256`, `1024`
 
-</div>
-<div class="h3-box" markdown="1">
+</div><div class="h3-box" markdown="1">
 
 #### Results
 Even for this average-sized dataset, we can observe that GPU is able to beat the CPU machine by a `76%` in both `training` and `inference` times.
 
-</div>
-<div class="h3-box" markdown="1">
+</div><div class="h3-box" markdown="1">
 
 #### Training times depending on batch (in minutes)
 
@@ -100,8 +93,7 @@ Even for this average-sized dataset, we can observe that GPU is able to beat the
 |  256  |  64  |  14.5  |
 |  1024  |  64  |  14  |
 
-</div>
-<div class="h3-box" markdown="1">
+</div><div class="h3-box" markdown="1">
 
 #### Inference times (in minutes)
 The average inference time remained more or less constant regardless the batch size:
@@ -110,16 +102,14 @@ GPU: `2 min`
 
 ![](/assets/images/gpu_v2_pic5.png)
 
-</div>
-<div class="h3-box" markdown="1">
+</div><div class="h3-box" markdown="1">
 
 #### Performance metrics
 A weighted F1-score of 0.88 was achieved, with a 0.90 score for question detection and 0.83 for statements.
 
 ![](/assets/images/gpu_v2_pic2.png)
 
-</div>
-<div class="h3-box" markdown="1">
+</div><div class="h3-box" markdown="1">
 
 ### Benchmark on NerDLApproach()
 
@@ -130,8 +120,7 @@ We used the Spark NLP class `NerDL` and it's method `Approach()` as described in
 The pipeline looks as follows:
 ![](/assets/images/gpu_v2_pic4.png)
 
-</div>
-<div class="h3-box" markdown="1">
+</div><div class="h3-box" markdown="1">
 
 #### Dataset
 The size of the dataset was small (17K), consisting of:
@@ -140,8 +129,7 @@ Training (rows): `14041`
 
 Test (rows): `3250`
 
-</div>
-<div class="h3-box" markdown="1">
+</div><div class="h3-box" markdown="1">
 
 #### Training params
 Different batch sizes were tested to demonstrate how GPU performance improves with bigger batches compared to CPU, for a constant number of epochs and learning rate.
@@ -152,14 +140,12 @@ Learning rate:  `0.003`
 
 Batch sizes: `32`, `64`, `256`,  `512`, `1024`, `2048`
 
-</div>
-<div class="h3-box" markdown="1">
+</div><div class="h3-box" markdown="1">
 
 #### Results
 Even for this small dataset, we can observe that GPU is able to beat the CPU machine by a `62%` in `training` time and a `68%` in `inference` times. It's important to mention that the batch size is very relevant when using GPU, since CPU scales much worse with bigger batch sizes than GPU.
 
-</div>
-<div class="h3-box" markdown="1">
+</div><div class="h3-box" markdown="1">
 
 #### Training times depending on batch (in minutes)
 
@@ -175,8 +161,7 @@ Even for this small dataset, we can observe that GPU is able to beat the CPU mac
 | 1024 | 6.5 | 2.5 |
 | 2048 | 6.5 | 2.5 |
 
-</div>
-<div class="h3-box" markdown="1">
+</div><div class="h3-box" markdown="1">
 
 #### Inference times (in minutes)
 Although CPU times in inference remain more or less constant regardless the batch sizes, GPU time experiment good improvements the bigger the batch size is.
@@ -195,16 +180,14 @@ CPU times: `~29 min`
 
 ![](/assets/images/gpu_v2_pic7.png)
 
-</div>
-<div class="h3-box" markdown="1">
+</div><div class="h3-box" markdown="1">
 
 #### Performance metrics
 A macro F1-score of about `0.92` (`0.90` in micro) was achieved, with the following charts extracted from the `NERDLApproach()` logs:
 
 ![](/assets/images/gpu_v2_pic8.png)
 
-</div>
-<div class="h3-box" markdown="1">
+</div><div class="h3-box" markdown="1">
 
 ### Inference benchmark on BertSentenceEmbeddings()
 
@@ -215,22 +198,19 @@ We used the Spark NLP class `BertSentenceEmbeddings()` described in the Transfor
 The pipeline contains only two components and looks as follows:
 ![](/assets/images/gpu_v2_pic9.png)
 
-</div>
-<div class="h3-box" markdown="1">
+</div><div class="h3-box" markdown="1">
 
 #### Dataset
 The size of the dataset was bigger than the previous ones, with `417735` rows for inference.
 
-</div>
-<div class="h3-box" markdown="1">
+</div><div class="h3-box" markdown="1">
 
 #### Results
 We have observed in previous experiments, using BertSentenceEmbeddings (classifierDL) and also BertEmbeddings (NerDL) how GPU improved both training and inference times. In this case, we observe again big improvements in inference, what is already pointing that one of the main reasons of why GPU improves so much over CPU is the better management of Embeddings (word, sentence level) and bigger batch sizes.
 
 Batch sizes: `32`, `64`, `256`, `1024`
 
-</div>
-<div class="h3-box" markdown="1">
+</div><div class="h3-box" markdown="1">
 
 #### Inference times depending on batch (in minutes)
 ![](/assets/images/gpu_v2_pic10.png)
@@ -243,8 +223,7 @@ Batch sizes: `32`, `64`, `256`, `1024`
 | 256 | 63 | 9.4 |
 | 1024 | 62 | 9.1 |
 
-</div>
-<div class="h3-box" markdown="1">
+</div><div class="h3-box" markdown="1">
 
 ### Takeaways: How to get the best of the GPU
 You will experiment big GPU improvements in the following cases:
@@ -254,14 +233,12 @@ You will experiment big GPU improvements in the following cases:
 2. Bigger batch sizes get the best of GPU, while CPU does not scale with bigger batch sizes;
 3. Bigger dataset sizes get the best of GPU, while may be a bottleneck while running in CPU and lead to performance drops;
 
-</div>
-<div class="h3-box" markdown="1">
+</div><div class="h3-box" markdown="1">
 
 ### MultiGPU training
 Right now, we don't support multigpu training (1 model in different GPUs in parallel), but you can train different models in different GPU.
 
-</div>
-<div class="h3-box" markdown="1">
+</div><div class="h3-box" markdown="1">
 
 ### Where to look for more information about Training
 Please, take a look at the [Spark NLP](https://sparknlp.org/docs/en/training) and [Spark NLP for Healthcare](https://sparknlp.org/docs/en/licensed_training) Training sections, and feel free to reach us out in case you want to maximize the performance on your GPU.

diff --git a/docs/en/annotators.md b/docs/en/annotators.md
@@ -1,11 +1,11 @@
 ---
 layout: docs
 header: true
+seotitle: Spark NLP - Annotators
 title: Spark NLP - Annotators
 permalink: /docs/en/annotators
 key: docs-annotators
 modify_date: "2021-04-17"
-use_language_switcher: "Python-Scala"
 show_nav: true
 sidebar:
     nav: sparknlp

diff --git a/docs/en/auxiliary.md b/docs/en/auxiliary.md
@@ -1,12 +1,11 @@
 ---
 layout: docs
 header: true
-seotitle: Spark NLP
+seotitle: Spark NLP - Helper functions
 title: Helper functions
 permalink: /docs/en/auxiliary
 key: docs-auxiliary
 modify_date: "2019-11-28"
-use_language_switcher: "Python-Scala"
 show_nav: true
 sidebar:
     nav: sparknlp

diff --git a/docs/en/quickstart.md b/docs/en/quickstart.md
@@ -1,8 +1,8 @@
 ---
 layout: docs
 header: true
-title: Spark NLP - Quick Start
 seotitle: Spark NLP - Getting Started
+title: Spark NLP - Quick Start
 permalink: /docs/en/quickstart
 key: docs-quickstart
 modify_date: "2021-03-20"
@@ -82,10 +82,6 @@ It is full of fresh examples and even a docker container if you want to skip ins
 
 If you need more detailed information about how to install Spark NLP you can check the [Installation page](install)
 
-Detailed information about Spark NLP concepts, annotators and more may
-be found [HERE](annotators)
-
-</div>
-
+Detailed information about Spark NLP concepts, annotators and more may be found [HERE](annotators)
 
-</div>
+</div></div>
diff --git a/docs/en/third-party-projects.md b/docs/en/third-party-projects.md
@@ -6,31 +6,24 @@ title: Third Party Projects
 permalink: /docs/en/third-party-projects
 key: docs-third-party-projects
 modify_date: "2021-10-25"
-use_language_switcher: "Python-Scala"
 show_nav: true
 sidebar:
     nav: sparknlp
 ---
 
-There are third party projects that can integrate with Spark NLP. These
-packages need to be installed separately to be used.
+<div class="h3-box" markdown="1">
 
-If you'd like to integrate your application with Spark NLP, please send us a
-message!
+There are third party projects that can integrate with Spark NLP. These packages need to be installed separately to be used.
+
+If you'd like to integrate your application with Spark NLP, please send us a message!
 
 ## Logging
 
 ### Comet
 
-[Comet](https://www.comet.ml/) is a meta machine learning platform designed
-to help AI practitioners and teams build reliable machine learning models for
-real-world applications by streamlining the machine learning model lifecycle. By
-leveraging Comet, users can track, compare, explain and reproduce their machine
-learning experiments.
+[Comet](https://www.comet.ml/) is a meta machine learning platform designed to help AI practitioners and teams build reliable machine learning models for real-world applications by streamlining the machine learning model lifecycle. By leveraging Comet, users can track, compare, explain and reproduce their machine learning experiments.
 
-Comet can easily integrated into the Spark NLP workflow with the a dedicated
-logging class `CometLogger` to log training and evaluation metrics,
-pipeline parameters and NER visualization made with sparknlp-display.
+Comet can easily integrated into the Spark NLP workflow with the a dedicated logging class `CometLogger` to log training and evaluation metrics, pipeline parameters and NER visualization made with sparknlp-display.
 
 For more information see the [User Guide](/api/python/third_party/Comet.html) and for more examples see the [Spark NLP Examples](https://github.com/JohnSnowLabs/spark-nlp/blob/master/examples/python/logging/Comet_SparkNLP_Integration.ipynb).
 
@@ -96,19 +89,17 @@ logger.experiment.display(tab='charts')
 
 </details>
 
+</div><div class="h3-box" markdown="1">
+
 ### MLflow
 
 Spark NLP uses Spark MLlib Pipelines, what are natively supported by MLFlow.
-MLFlow is, as stated in their [official webpage](https://mlflow.org/), an open
-source platform for the machine learning lifecycle, that includes:
-* **Mlflow Tracking**: Record and query experiments: code, data, config, and
-  results
-* **MLflow Projects**: Package data science code in a format to reproduce runs
-  on any platform
-* **MLflow Models**: Deploy machine learning models in diverse serving
-  environments
-* **Model Registry**: Store, annotate, discover, and manage models in a central
-  repository
-
-For more information, please see the complete guide at [Experiment
-Tracking](/docs/en/mlflow).
+MLFlow is, as stated in their [official webpage](https://mlflow.org/), an open source platform for the machine learning lifecycle, that includes:
+* **Mlflow Tracking**: Record and query experiments: code, data, config, and results
+* **MLflow Projects**: Package data science code in a format to reproduce runs on any platform
+* **MLflow Models**: Deploy machine learning models in diverse serving environments
+* **Model Registry**: Store, annotate, discover, and manage models in a central repository
+
+For more information, please see the complete guide at [Experiment Tracking](/docs/en/mlflow).
+
+</div>