Project files for NLP proj of Fundamentals of Data Science 2022 spring, NJU.
This project is published under GPL v3 protocol.
WARNING! Please REMOVE files in dir "output" before commit, or it will exceed capacity limit of github.
Coursera: Machine Learning for basic issues https://www.coursera.org/learn/machine-learning
国立台湾大学:李宏毅机器学习 for BERT https://speech.ee.ntu.edu.tw/~hylee/ml/2021-spring.php
CS224n for Natural Language Processing, including word2vec http://web.stanford.edu/class/cs224n/index.html
https://www.zhihu.com/question/20899988
http://c.biancheng.net/python_spider/what-is-spider.html
https://zhuanlan.zhihu.com/p/73742321
Given sheet for training.
Web crawler from gov website cluster
Preliminary filtering with logical judgment and string similarity.
Use word2vec with CNN for second classification .