Merge pull request #331 from sergiopaniego/object_detection_ft_issues

Object detection fine-tuning code issues
johko · Aug 15, 2024 · 20dd3ba · 20dd3ba
2 parents 6958ced + 868981e
commit 20dd3ba
Show file tree

Hide file tree

Showing 2 changed files with 4 additions and 4 deletions.
diff --git a/...ers/en/unit3/vision-transformers/vision-transformer-for-objection-detection.mdx b/...ers/en/unit3/vision-transformers/vision-transformer-for-objection-detection.mdx
@@ -395,7 +395,7 @@ Before proceeding further, log in to Hugging Face Hub to upload your model on th
 ```python
 from huggingface_hub import notebook_login
 
-notebook_login
+notebook_login()
 ```
 
 Once done, let's start training the model. We start by defining the training arguments and defining a trainer object that uses those arguments to do the training, as shown here:
@@ -457,7 +457,7 @@ from transformers import pipeline
 
 # download a sample image
 
-url = "https://huggingface.co/datasets/hf-vision/course-assets/blob/main/test-helmet-object-detection.jpg"
+url = "https://huggingface.co/datasets/hf-vision/course-assets/resolve/main/test-helmet-object-detection.jpg"
 image = Image.open(requests.get(url, stream=True).raw)
 
 # make the object detection pipeline
@@ -496,6 +496,7 @@ And finally use this function for the same test image we used.
 
 
 ```
+results = obj_detector(image)
 plot_results(image, results)
 ```
 

diff --git a/chapters/en/unit4/multimodal-models/a_multimodal_world.mdx b/chapters/en/unit4/multimodal-models/a_multimodal_world.mdx
@@ -48,8 +48,7 @@ A dataset consisting of multiple modalities is a multimodal dataset. Out of the
 - Vision + Audio: [VGG-Sound Dataset](https://www.robots.ox.ac.uk/~vgg/data/vggsound/), [RAVDESS Dataset](https://zenodo.org/records/1188976), [Audio-Visual Identity Database (AVID)](https://www.avid.wiki/Main_Page).
 - Vision + Audio + Text: [RECOLA Database](https://diuf.unifr.ch/main/diva/recola/), [IEMOCAP Dataset](https://sail.usc.edu/iemocap/).
 
-Now let us see what kind of tasks can be performed using a multimodal dataset? There are many examples, but we will focus generally on tasks that contains the visual and textual
-A multimodal dataset will require a model which is able to process data from multiple modalities, such a model is a multimodal model.
+Now, let us see what kind of tasks can be performed using a multimodal dataset. There are many examples, but we will generally focus on tasks that contain both visual and textual elements. A multimodal dataset requires a model that is able to process data from multiple modalities. Such a model is called a multimodal model.
 
 ## Multimodal Tasks and Models