Guide for engineers interested in NLP machine learning
-
Understand possibilities and form business applications
- Everyone AI for Everyone
-
Either level up through:
- Gaining theoretical foundation of Deep Learning for NLP
- Stanford Course Materials http://web.stanford.edu/class/cs224n/
- Natural Language Processing with Deep Learning https://www.youtube.com/watch?v=8rXD5-xhemo&list=PLoROMvodv4rOhcuXMZkNm7j3fVwBBY42z
- Stanford CS224U: Natural Language Understanding https://www.youtube.com/watch?v=tZ_Jrc_nRJY&list=PLoROMvodv4rObpMCir6rNNUlFAn56Js20
- Getting "Practical" Knowledge of Deep Learning for NLP
- Gaining theoretical foundation of Deep Learning for NLP
-
Learn how to Deep Learning
- Nuts and Bolts of Applying Deep Learning
- "Everyday" Engineers Fast.ai
- Research Engineers Deep Learning AI
-
Learn about all the stuff "they don't teach"
- Learn Production-Level Deep Learning: https://fullstackdeeplearning.com/
- Resources: https://github.com/full-stack-deep-learning/fsdl-text-recognizer-project
-
Base Models to Use
- Spacy for general NLP tasks
- HuggingFace Transformers
-
Profit
- Syntactic Search over Wikipedia: https://spike.wikipedia.apps.allenai.org/search/wikipedia
- Odinson: Rapidly query a natural language knowledge base https://github.com/lum-ai/odinson
- CheckList: Behavioral Testing NLP https://github.com/marcotcr/checklist
- Data project checklist https://www.fast.ai/2020/01/07/data-questionnaire
- BERT, ELMo, & GPT-2: How Contextual are Contextualized Word Representations? http://ai.stanford.edu/blog/contextual/
- BERT commit log https://amitness.com/2020/05/git-log-of-bert/
- Full stack deep learning github repo: https://github.com/full-stack-deep-learning/fsdl-text-recognizer-project
- Expand Data Labeled Data using Unlabled Data
- Explain Predictions
- Deploy models to production
- Learn how to implement new models
- Deep Learning from the Foundations: https://www.fast.ai/2019/06/28/course-p2v3/
- More Learning Resources:
- nlp-library curated list of papers
- Machine Learning System Best Practice and Design:
- The ML Test Score: A Rubric for ML Production Readiness and Technical Debt Reduction: https://ai.google/research/pubs/pub46555
- Machine Learning: The High Interest Credit Card of Technical Debt: https://ai.google/research/pubs/pub43146
- An Interactive Visualization to Explore NLP Papers
- How Big Should My Language Model Be?
- Accelerate your NLP pipelines using Hugging Face Transformers and ONNX Runtime
- https://prodi.gy/buy
- Text and image annotation
- https://github.com/chakki-works/doccano
- Open source text annotation tool
- https://www.media.mit.edu/projects/dive/overview/
- DIVE is a web-based data exploration system that lets non-technical users create stories from their data without writing code. DIVE combines semantic data ingestion, recommendation-based visualization and analysis, and dynamic story sharing into a unified workflow.
- Text Atlas
- Feature Visualization https://distill.pub/2017/feature-visualization/
- Activation Atlas https://distill.pub/2019/activation-atlas/
- NLP News http://newsletter.ruder.io
- The Batch https://www.deeplearning.ai/thebatch/
- NLP Highlights https://soundcloud.com/nlp-highlights
- Google Data Analytics https://cloud.google.com/blog/products/data-analytics/
- AWS Big Data Blog https://aws.amazon.com/blogs/big-data/
- fast.ai http://www.fast.ai/
- FastML http://fastml.com/
- The Unofficial Google Data Science Blog http://www.unofficialgoogledatascience.com/
- DeepMind https://deepmind.com/blog/
- The Official Google Blog https://www.blog.google/
- Distill https://distill.pub
- DataCamp Community https://www.datacamp.com/community
- AI Applications https://vaultanalytics.com/marketinganalytics
- Google AI Blog http://ai.googleblog.com/
- Google Developers Blog http://developers.googleblog.com/
- the morning paper https://blog.acolyer.org
- Machine Learning @ Berkeley https://medium.com/@ml.at.berkeley?source=rss-a34a9c1d8009------2
- All - naacl.org http://naacl-org.github.com
- Facebook Research https://research.fb.com
- OpenAI https://blog.openai.com
- Y Combinator http://www.ycombinator.com
- The Berkeley Artificial Intelligence Research Blog http://bair.berkeley.edu/blog/
- No Free Hunch http://blog.kaggle.com
- Off the convex path http://offconvex.github.io/
- A unified platform for sharing, training and evaluating dialogue models across many tasks. https://parl.ai/
You can also follow me on twitter: https://twitter.com/LeoApolonio