content.json

{"pages":[],"posts":[{"title":"Cross Entropy, KL Divergency and MLE","text":"p: real probability, data q: predict probability 1. Min KL Divergence equals to min cross entropyCross Entropy: KL Divergence: a). Minimize KL divergence is equal to minimize cross entropy. b). Kl divergence is non-negative and it reaches 0 when , So the minimum of KL Divergence and Cross Entropy is when . 2. Min KL Divergence equals to max MLEMLE: KL: Appendix:(1). ​ Proof: ​ So x = 1 is the maximum point of f(x). f(1) = 0. So f(x) &lt;= 0. References:[1] https://stats.stackexchange.com/questions/335197/why-kl-divergence-is-non-negative [2] http://www.hongliangjie.com/2012/07/12/maximum-likelihood-as-minimize-kl-divergence/","link":"/2018/11/27/Cross-Entropy-KL-MLE/"},{"title":"Visualize High-Dimensional Data","text":"Target: Minimize the divergence between two distributions: ​ (1) a distribution that measures pairwise similarities of the input objects ​ (2) a distribution that measures pairwise similarities of the low-dimensional points in the embedding Inputs: Outputs: 1. t-sneThe similarity probability of inputs (Gaussian kernel): The similarity probability of outputs (Student-t kernel): Minimize Objective Function: 2. Large-VisFollowing graph shows a typical pipeline of visualize high-dimensional data using K-NNG. (1) Use Random Projection Trees to construct K-NNG, and also consider second-order neighbors. (2) Once K-NNG is constructed, we want the low dimension distribution to capture this relationship. (3) It uses the idea of negative sampling in Word2Vec to maximize the likelihood function: Appendix:(1). Gaussian Distribution Then when u =0, (2) Student-t Distribution Then, (3) The choose of student-t Distribution in low-dimensionIf we draw the probability density function of Gaussian distribution (mu=0, sigma=1) and student-t distribution (n=1). We can see that Student-t distribution has long tails. Let’s consider two situations: a. When two points are similar (high probability), we can see that the distance in student-t distribution is shorter than the ones in Gaussian distribution. b. When two points are irrelevant to each other (low probability), the distance in student-t distribution is much longer than the ones in Gaussian distribution. We can conclude that the use of student-t distribution in low-dimension can keep similar data close while dissimilar data far apart. References:[1] Maaten, L.V.D. and Hinton, G., 2008. Visualizing data using t-SNE. Journal of machine learning research, 9(Nov), pp.2579-2605. http://www.jmlr.org/papers/volume9/vandermaaten08a/vandermaaten08a.pdf [2] Van Der Maaten, L., 2014. Accelerating t-SNE using tree-based algorithms. The Journal of Machine Learning Research, 15(1), pp.3221-3245. http://www.jmlr.org/papers/volume15/vandermaaten14a/vandermaaten14a.pdf [3] Tang, J., Liu, J., Zhang, M. and Mei, Q., 2016, April. Visualizing large-scale and high-dimensional data. In Proceedings of the 25th International Conference on World Wide Web (pp. 287-297). International World Wide Web Conferences Steering Committee. https://arxiv.org/pdf/1602.00370.pdf","link":"/2018/11/27/Visualize-High-Dimensional-Data/"},{"title":"Word Embedding","text":"1. Word2Vec CBOW: use context to predict target word Skip-Gram: use target word to predict context Skip-GramMaximize the log likelihood (window size c): where the probability of a word appears near a given word is: The dot product of two word vectors is used to measure their similarity. The network: 2. Fasttext Task: text classification (use a sentence or document to predict its class) Minimize the negative log likelihood: where x represent the N-gram feature in this document, A is the look-up table stored the word embedding, B is also a weight matrix. The network: The network is similar to the ones of CBOW. However, the output here is the class instead of target word. Tricks:(1) Hierarchical SoftmaxIt is a method to transfer a multi-class classification problem into several binary-class classification problems. It can be used in Word2Vec and Fasttext when the number of classes to be classified is large. Use a Huffman tree. Each leaf represent a class. Calculate the probability of each class using chain rule. (2) Negative samplingAnother method to transfer a multi-class classification problem into several binary-class classification problems. In Skip-Gram, the objective function changed to: It aims at maximize the probability of context word and minimize the probability of a subset of words do not belonging to its context. The probability of a word been choose corresponds to its frequency in the corpus. References:[1] Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S. and Dean, J., 2013. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems (pp. 3111-3119). https://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf [2] McCormick, 2016. Word2Vec Tutorial - The Skip-Gram Model http://mccormickml.com/2016/04/19/word2vec-tutorial-the-skip-gram-model/ [3] Joulin, A., Grave, E., Bojanowski, P. and Mikolov, T., 2016. Bag of tricks for efficient text classification. arXiv preprint arXiv:1607.01759. https://arxiv.org/pdf/1607.01759.pdf [4] Benjamin. 2017. Hierarchical Softmax. http://building-babylon.net/2017/08/01/hierarchical-softmax/","link":"/2018/11/27/Word-Embedding/"},{"title":"Hello World","text":"Welcome to Hexo! This is your very first post. Check documentation for more info. If you get any problems when using Hexo, you can find the answer in troubleshooting or you can ask me on GitHub. Quick StartCreate a new post1$ hexo new \"My New Post\" More info: Writing Run server1$ hexo server More info: Server Generate static files1$ hexo generate More info: Generating Deploy to remote sites1$ hexo deploy More info: Deployment","link":"/2018/11/24/hello-world/"},{"title":"Basics of Data Structure (Peking University)","text":"逻辑(logical structure)，存储(storage structure)，运算 (operation) 逻辑结构(logical structure): 集合，线性，树，图 存储结构(storage structure): 顺序，链式，散列(Hash) 线性结构 线性表: 顺序表(Array)，链表 栈 (LIFO, Last In First Out) - 深度搜索 队列 (FIFO, First In First Out) - 宽度搜索 顺序表1234567891011121314151617181920212223242526272829303132333435363738394041424344454647{% codeblock %}class arrList: public List&lt;T&gt;{private: T *aList; int maxSize; int curLen; int position;public: arrList(const int size){ maxSize = size; aList = new T[maxSize]; curLen = position = 0; } ~arrList(){ delete [] aList; } void clear(){ delete [] aList; curLen = position = 0; aList = new T[maxSize]; } int length(); bool append(const T value); bool insert(const int p, const T value); bool delete(const int p); bool setValue(const int p, const T value); bool getValue(const int p, T&amp; value); bool getPos(int &amp;p, const T value);}template &lt;class T&gt; bool arrList&lt;T&gt;::insert(const int p, const T value){ int i; if (curLen&gt;=maxSize){ cout &lt;&lt; &quot;The list is overflow&quot; &lt;&lt; endl; return false; } if (p &lt; 0 || p &gt; curLen){ cout &lt;&lt; &quot;Insertion point is illegal&quot; &lt;&lt; endl; return false; } for (i=curLen;i&gt;p;i--){ aList[i] = aList[i-1]; } aList[p] = value; curLen ++; return true;}{% endcodeblock %} 链表12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879{% codeblock %}template &lt;class T&gt;class Link{public: T data; Link&lt;T&gt; *next; Link(const T info, const Link&lt;T&gt; *nextValue=NULL){ data = info; next = nextValue; } Link(const Link&lt;T&gt; *nextValue){ next = nextValue; }};template &lt;class T&gt;class lnkList: public List&lt;T&gt;{private: Link&lt;T&gt; *head, *tail; Link&lt;T&gt; *setPos(const int p);public: InkList(int s); ~InkList(); bool isEmpty(); void clear(); int length; bool append(const T value); bool insert(const int p, const T value); bool delete(const int p); bool getValue(const int p,T&amp; value); bool getPos(int &amp;p, const T value);}template &lt;class T&gt;Link&lt;T&gt; * lnkList &lt;T&gt;::setPos(int i){ int count = 0; if (i==-1) return head; Link&lt;T&gt; *p = head-&gt;next; while(p!=NULL &amp;&amp; count&lt;i){ p = p-&gt;next; count++; }; return p;}template &lt;class T&gt;bool lnkList&lt;T&gt;::insert(const int i,const T value){ Link&lt;T&gt; *p,*q; if((p=setPos(i-1))==NULL){ cout&lt;&lt;&quot;illegal insert point&quot;&lt;&lt;endl; return false; } q = new Link&lt;T&gt;(value,p-&gt;next); p-&gt;next = q; if(p == tail){ tail = q; } return true;}template &lt;class T&gt;bool InkList&lt;T&gt;::delete(const int i){ Link&lt;T&gt; *p,*q; if ((p=setPos(i-1))==NULL ||p=tail){ cout&lt;&lt;&quot;illegal insert point&quot;&lt;&lt;endl; return false; } q = p-&gt;next; if(q==tail){ p = tail; p-&gt;next = NULL; }else{ p-&gt;next = q-&gt;next; } delete q; return true;}{% endcodeblock %} Qs:1. 约瑟夫问题有n只猴子，按顺时针方向围成一圈选大王（编号从1到n），从第1号开始报数，一直数到m，数到m的猴子退出圈外，剩下的猴子再接着从1开始报数。就这样，直到圈内只剩下一只猴子时，这个猴子就是猴王，编程求输入n，m后，输出最后猴王的编号。 1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889{% codeblock %}#include &lt;iostream&gt;using namespace std;class Link{public: int data; Link *prev,*next;};class lnkList{private: Link *head=NULL,*tail=NULL,*position,*deletenode;public: int length; void setPos(const int i); void createnode(int value); void deletelnk(); int display();};void lnkList::setPos(const int m){ Link *p=position; int count = 1; while (count &lt; m){ p = p-&gt;next; count ++; } position = p; //cout &lt;&lt; position-&gt;data &lt;&lt; endl;} void lnkList::createnode(const int value){ Link *p = new Link; p-&gt;data = value; p-&gt;prev = NULL; p-&gt;next = NULL; if (head ==NULL){ head = p; tail = p; position = p; p = NULL; }else{ tail-&gt;next = p; p-&gt;prev = tail; tail = p; tail-&gt;next = head; head-&gt;prev = tail; }}void lnkList::deletelnk(){ deletenode = position; position-&gt;next-&gt;prev = position-&gt;prev; position-&gt;prev-&gt;next = position-&gt;next; if(position == tail){ tail = position-&gt;prev; }else if(position == head){ head = position-&gt;next; } position = position-&gt;next; delete deletenode; //cout &lt;&lt; position-&gt;data &lt;&lt; endl; length --;}int lnkList::display(){ cout &lt;&lt; head-&gt;data &lt;&lt; endl; return head-&gt;data; }int main(){ int m,n; cin &gt;&gt; n &gt;&gt; m; int i=0; lnkList moneky; for(i=0;i&lt;n;i++){ moneky.createnode(i+1); } moneky.length = n; while(moneky.length&gt;1){ moneky.setPos(m); moneky.deletelnk(); } moneky.display(); return 0;}{% endcodeblock %} 栈12345678910111213141516171819202122232425262728293031323334353637383940414243{% codeblock %}template &lt;class T&gt; class arrStack : pulic Stack &lt;T&gt;{private: int mSize; int top; T *st;public: arrStack(int size){ mSize = size; top = -1; st = new T[mSize]; } arrStack(){ top = -1; } ~arrStack(){ delete [] st; } void clear(){ top = -1; } bool push(const T item) bool pop(T&amp; item)}bool arrStack&lt;T&gt;::push(const T item){ if (top == mSize-1){ cout &lt;&lt;&quot;oversize&quot;&lt;&lt; endl; }else{ st[++top] = item; return true; }}bool arrStack&lt;T&gt;::pop(T&amp; item){ if (top == -1){ cout &lt;&lt;&quot;stack is empty&quot; &lt;&lt; endl; return false; }else{ item = st[top--]; return true; }}{% endcodeblock %} 队列1234567891011121314151617181920212223242526272829303132333435{% codeblock %}template &lt;class T&gt; class Queue{public: void clear(); bool enQueue(const T item); bool deQueue(T&amp;item); bool getFront(T &amp;item); bool isEmpty(); bool isFull();}class arrQueue: public Queue&lt;T&gt;{private: int mSize; int front; int rear; T * qu;public: arrQueue(int size); ~arrQueue();}bool enqueue(const Elem&amp; it){ if (((rear+2)%size)==front) return false; rear = (rear+1) % size; listArray[rear] = it; return true;}bool dequeue(Elem&amp; it){ if (length()==0) return false; it = listArray[front]; front = (front+1) % size; return true;}{% endcodeblock %} Qs1. 1..n的序列的出栈顺序数(1). f(1) = 1 ; f(2) = 2; f(3) = f(0)f(2) 1 first + f(1)f(1) 1 second+ f(2)*f(0) 1 third … (2). Catalan number: C(2n,n) - C(2n,n+1)Change the destination of all the false route for separating them but not the total number of them. 二叉树－ 第i层最多2＊i个结点－ 深度k，最多2＊(k+1)-1个结点（2*＊0＋…+2k = 2(k+1)-1）－ 终端结点数n0, 度为0的结点数n2, n0 =n2+1－ 满二叉树：非空满二叉树树叶数目等于其分支结点数＋1 (深度k: 树叶2k 分支结点2k-1)－ n个结点的完全二叉树高度为log(n+1) 123456789101112131415161718192021222324252627282930313233343536373839404142{% codeblock %}template &lt;class T&gt; class BinaryTreeNode{friend class BinaryTree&lt;T&gt;;private: T info;public: BinaryTreeNode(); BinaryTreeNode(const T&amp; ele); BinaryTreeNode(const T&amp; ele, BinaryTreeNode&lt;T&gt; *l, BinaryTreeNode&lt;T&gt; *r); T value() const; BinaryTreeNode &lt;T&gt;* leftchild() const; BinaryTreeNode &lt;T&gt;* rightchild() const; void setLeftchild(BinaryTreeNode&lt;T&gt;*); void setRightchild(BinaryTreeNode&lt;T&gt;*); void setValue(const T&amp; val); bool isLeaf() const; BinaryTreeNode&lt;T&gt;&amp; operator = (const BinaryTreeNode&lt;T&gt;&amp; Node);}template &lt;class T&gt;class BinaryTree{private: BinaryTreeNode&lt;T&gt;*root;public: BinaryTree(){root=NULL;}; ~BinaryTree(){DeleteBinaryTree(root);}; bool isEmpty() const; BinaryTreeNode&lt;T&gt;*Root(){return root;}; BinaryTreeNode&lt;T&gt;*Parent(BinaryTreeNode&lt;T&gt; *current); BinaryTreeNode&lt;T&gt;*LeftSibling(BinaryTreeNode&lt;T&gt; *current); BinaryTreeNode&lt;T&gt;*Rightsibling(BinaryTreeNode&lt;T&gt; *current); void CreateTree(const T&amp; info, BinaryTree&lt;T&gt;&amp; leftTree, BinaryTree&lt;T&gt;&amp; rightTree); void PreOrder(BinaryTreeNode&lt;T&gt; *root); void InOrder(BinaryTreeNode&lt;T&gt; *root); void PostOrder(BinaryTreeNode&lt;T&gt; *root); void LevelOrder(BinaryTreeNode&lt;T&gt; *root); void DeleteBinaryTree(BinaryTreeNode&lt;T&gt; *root);};{% endcodeblock %} 二叉搜索树 Binary Search Tree (BST)－ 任意结点，左子树中任意结点小于该结点值，右子树中任意结点大于该结点值。－ 中序遍历是正序 12345678910111213141516171819202122232425262728293031{% codeblock %}void BinarySearchTree&lt;T&gt;::removehelp(BinaryTreeNode &lt;T&gt; *&amp; rt,const T val){if(rt==NULL) out&lt;&lt;val&lt;&lt;‘is not in the tree.\\n’;else if (val &lt; rt-&gt;value()) removehelp(rt-&gt;leftchild(),val); # enter left childelse if (val &gt; rt-&gt;value()) removehelp(rt-&gt;rightchild(),val); # enter right childelse{ # right position BinaryTreeNode &lt;T&gt; *temp = rt; if (rt-&gt;leftchild()==NULL) rt = rt-&gt;rightchild(); else if (rt-&gt;rightchild()==NULL) rt = rt-&gt;leftchild(); else{ temp = deletemin(rt-&gt;rightchild()); # find the min value in right child rt-&gt;setValue(temp-&gt;value()); } delete temp;}}template &lt;class T&gt;BinaryTreeNode* BST::deletemin(BinaryTreeNode &lt;T&gt; *&amp; rt){ if (rt-&gt;leftchild()!=NULL) return deletemin(rt-&gt;leftchild()); else{ BinaryTreeNode &lt;T&gt; *temp = rt; rt = rt-&gt;rightchild(); return temp; }}{% endcodeblock %} 堆 {K_0,K_1,…,K_{n-1}},－ 根结点的值小于叶子结点的值, K_i &lt;= K_{2i+1}, K_i &lt;= K_{2i+2}－ 兄弟之间无约束 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081{% codeblock %}template &lt;class T&gt;class MinHeap{private: T* heapArray; int CurrentSize; int MaxSize; void BuildHeap();public: MinHeap(const int n); virtual ~MinHeap(){delete []heapArray;}; bool isLeaf(int pos) const; int leafchild(int pos) const; int rightchild(int pos) const; int parent(int pos) const; bool Remove(int pos, T&amp;node); bool Insert(const T&amp; newNode); T&amp; RemoveMin(); void SiftUp(int position); void SiftDown(int left);}template &lt;class T&gt;void MinHeap&lt;T&gt;::SiftDown(int position){ int i = position; int j = 2*i + 1; T temp = heapArray[i]; while(j &lt; CurrentSize){ if((j &lt;CurrentSize-1)&amp;&amp;(heapArray[j]&gt;heapArray[j+1])) j++; if(temp &gt; heapArray[j]){ heapArray[i] = heapArray[j]; i = j; j = 2*j +1; } else break; } heapArray[i] = temp;}template &lt;class T&gt;void MinHeap&lt;T&gt;::SiftUp(int position){ int temps = position; T temp = heapArray[temppos]; while((temppos&gt;0)&amp;&amp;(heapArray[parent(temppos)]&gt;temp)){ heapArray[temppos] = heapArray[parent(temppos)]; temppos = parent(temps); } heapArray[temppos] = temp;}template &lt;class T&gt;void MinHeap&lt;T&gt;::BuildHeap(){ for(int i = CurrentSize/2-1;i&gt;=0;i—-){ SiftDown(i); }}template&lt;class T&gt;bool MinHeap&lt;T&gt;::Insert(const T&amp; newNode){ if(CurrentSize==MaxSize) return false; heapArray[CurrentSize] = newNode; SiftUp(CurrentSize); CurrentSize++;}template&lt;class T&gt;bool MinHeap&lt;T&gt;::Remove(int pos,T&amp; node){ if((pos&lt;0)||(pos&gt;=CurrentSize)) return false; T temp = heapArray[pos]; heapArray[pos] = heapArray[—-CurrentSize]; if (heapArray[parent(pos)]&gt;heapArray[pos]) SiftUp(pos); else SiftDown(pos); node = temp; return true;}{% endcodeblock %} Huffman树12345678910111213141516171819202122232425262728293031{% codeblock %}template&lt;class T&gt;class HuffmanTree{private: HuffmanTreeNode&lt;T&gt; *root; void MergeTree(HuffmanTreeNode&lt;T&gt; &amp;ht1,HuffmanTreeNode&lt;T&gt; &amp;ht2,HuffmanTreeNode&lt;T&gt;* parent);public: HuffmanTree(T weight[],int n); virtual ~HuffmanTree(){DeleteTree(root);};}template&lt;class T&gt;HuffmanTree&lt;T&gt;::HuffmanTree(T weight[],int n){ MinHeap&lt;HuffmanTreeNode&lt;T&gt;&gt; heap; HuffmanTreeNode&lt;T&gt; *parent,&amp;leftchild,&amp;rightchild; HuffmanTreeNode&lt;T&gt; *NodeList = new HuffmanTreeNode&lt;T&gt;[n]; for(int i=0;i&lt;n;i++){ NodeList[i].element = weight[i]; NodeList[i].parent = NodeList[i].left = NodeList[i].right = NULL; heap.Insert(NodeList[i]); } for(i=0;i&lt;n-1;i++){ parent = new HuffmanTreeNode&lt;T&gt;; firstchild = heap.RemoveMin(); secondchild = heap.RemoveMin(); MergeTree(firstchild,secondchild,parent); heap.Insert(*parent); root=parent; } delete []NodeList;}{% endcodeblock %} Appendix:1. C++ class 12345678910111213141516171819202122232425{% codeblock %}class Rectangle{ public: int w,h; int Area(){ return w*h; } int Perimeter(){ return 2*(w+h); } void Init(int w_,int h_){ w = w_; h = h_; }};int main(){ int w,h; Rectangle r; cin &gt;&gt; w &gt;&gt; h; r.Init(w,h); cout &lt;&lt; r.Area() &lt;&lt; endl &lt;&lt; r.Perimeter(); return 0;}{% endcodeblock %} 1234567891011{% codeblock %}template&lt;class T&gt;void print (const T array[], int size){ int i; for (i=0;i&lt;size;i++) cout&lt;&lt;array[i]; return;}int a[10];print(a,10);{% endcodeblock %} cin and cout 123456789101112{% codeblock %}cin &gt;&gt; x; cin.getline(str,len,ch); //get a string with length len or stop at char ch such as &apos;\\n&apos;ch = cin.get() //get a charcin.ignore(len,ch); int x;while(cin&gt;&gt;x){ ...}return 0;{% endcodeblock %} 12345678910{% codeblock %}cout &lt;&lt; y;cout.put(&apos;A&apos;).put(&apos;a&apos;)int n = 10;cout &lt;&lt; n &lt;&lt; endl;cout &lt;&lt; hex &lt;&lt; n &lt;&lt; endl # 16 &lt;&lt; dec &lt;&lt; n &lt;&lt; endl # 10 &lt;&lt; oct &lt;&lt; n &lt;&lt; endl; # 8 {% endcodeblock %} 12345678910111213{% codeblock %}#include &lt;iostream&gt;#include &lt;iomanip&gt;using namespace std;int main(){ double x = 1234567.89, y = 12.34567; int n = 1234567; int m = 12; cout &lt;&lt; setprecision(6) &lt;&lt; x &lt;&lt; endl # 1.23457e+10 &lt;&lt; y &lt;&lt; endl &lt;&lt; n &lt;&lt; endl &lt;&lt; m; # 12.3457 1234567 12}{% endcodeblock %} 1234567891011121314{% codeblock %}#include &lt;iostream&gt;#include &lt;iomanip&gt;using namespace std;int main(){ double x = 1234567.89, y = 12.34567; int n = 1234567; int m = 12; cout &lt;&lt; setiosflags(ios::fixed) &lt;&lt; setprecision(6) &lt;&lt; x &lt;&lt; endl # 1234567.890000 &lt;&lt; y &lt;&lt; endl &lt;&lt; n &lt;&lt; endl &lt;&lt; m; # 12.345670 1234567 12}{% endcodeblock %} 12345678{% codeblock %}// input: 1234567890cin.width(5);cin &gt;&gt; string; cout &lt;&lt; string &lt;&lt; endl; # 1234cin &gt;&gt; string;cout &lt;&lt; string &lt;&lt; endl; # 567890{% endcodeblock %} 123456{% codeblock %}ifstream fin; ofstream fout;fin.open(&apos;input.txt&apos;); fout.open(&apos;output.txt&apos;,ios::out);fin &gt;&gt; ...fout &lt;&lt; ...{% endcodeblock %} References:[1] https://en.wikipedia.org/wiki/Catalan_number","link":"/2018/12/06/Data-Structure/"}],"tags":[{"name":"Machine Learning","slug":"Machine-Learning","link":"/tags/Machine-Learning/"},{"name":"Data Structure","slug":"Data-Structure","link":"/tags/Data-Structure/"}],"categories":[{"name":"Machine Learning","slug":"Machine-Learning","link":"/categories/Machine-Learning/"},{"name":"Data Structure","slug":"Data-Structure","link":"/categories/Data-Structure/"}]}