The dataset is part of Mercari Price Suggestion Challenge in Kaggle.
There are ~1.5Mil observations in the data set with multiple features.
Mercari is like ebay for Japan. Mercari wants a process to predict a price for someone who wants to sell the items on the Mercari ecommerce platform.
- Observation : free text, describing the item sold.
- brand_name : brand of the item sold
- category_name : Category of the item sold. these have muliple sub categories under the same feature name.
- shipping : Whats the shipping cost associated with the item sold.
- price : label feature. Price of the item sold.
Categories were split into sub categories.
Used NLP on the Description to find the correlation to the price.
NLP based