-
Notifications
You must be signed in to change notification settings - Fork 1
/
README
37 lines (18 loc) · 1.07 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
ABSTRACT
Content moderation is widely used today. It's not a simple task, and there are several third party
companies which provide these kind of services.
Trying to automate this is a tricky task.
The idea of this project is to use Machine Learning (ML) techniques to do such classification.
Some ML methods ara capable to "learn" new patterns (like neural networks and logistic regression)
That's exactly what it's required for this problem. Imagine trying to block a word like "badword".
Using a dictionary of regexes will solve the problem for the trivial case. People can start using numbers
for vocals, etc.
So, ML sounds promising. An API can be created to get input from external systems
and keep improving the classification algorithm automatically.
Of course, such solution is not easy. But, it's worth to try.
IMPORTANT
This is a exploration project and it's far for being production ready!!!
RELATED STUFF
http://www.cs.cmu.edu/~biglou/resources/
http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=5617090
http://wiki.apache.org/jakarta-lucene/SpellChecker