Please see the README in the root directory for instructions on setting up a SageMaker notebook and downloading the project files (as well as the other notebooks).
- Download or otherwise retrieve the data.
- Process / Prepare the data.
- Using BeautifulSoup to remove html tags and tokenize the reviews.
- Build a bag of words model for mapping 5000 most frequent appearning words to a unique integer.
- Transform the review to uni-length (500) with zero-padding.
- Upload the processed data to S3.
- Train a chosen model.
- RNN -> LSTM units with 32 embedding dim and 200 hidden_dim.
- 10 epcohs of training, Binary Cross Entropy Loss from 0.6656 dropped to 0.3045.
- Test the trained model (typically using a batch transform job).
- Accuracy 83.4% with only 10 epoches.
- Deploy the trained model.
- Use the deployed model.