typo

SnickerdoodleLabs · Apr 5, 2024 · d5fe5a0 · d5fe5a0
1 parent 2ce0413
commit d5fe5a0
Show file tree

Hide file tree

Showing 2 changed files with 8 additions and 8 deletions.
diff --git a/documentation/co-occurance.md → documentation/co-occurrence.md b/documentation/co-occurance.md → documentation/co-occurrence.md
@@ -1,11 +1,11 @@
 # Challenges
-Co-occurance matrices are quite large. A 50k vocabulary will have 2.5B entries requiring 2.5GB. There is no point having a co-occurance matrix instead of a 1GB embedding.
+Co-occurrence matrices are quite large. A 50k vocabulary will have 2.5B entries requiring 2.5GB. There is no point having a co-occurrence matrix instead of a 1GB embedding.
 
 ## Solution #1 - Session-based
 
-1. Build session-based co-occurance matrix with available words. e.g. 
+1. Build session-based co-occurrence matrix with available words. e.g. 
 2. Create session based vocabulary with user's history and new contents. Assuming top 1000 words with new content words having higher weights.
-3. build co-occurance matrix for the new vocabulary
+3. build co-occurrence matrix for the new vocabulary
 4. create user and content vectors. apply weights using p(w1, w2). Need to revise literature on this. Basically, if a word, w1, appears in a vector, we add count 1 to it and add p(w2|w1) to w2.
 5. Use Jaccard Similarity. Need to revisit the similarity research on this.
 
diff --git a/documentation/ranking.md b/documentation/ranking.md
@@ -1,21 +1,21 @@
 # Ranking
 
-## Approach 1 - No embedding/co-occurance
+## Approach 1 - No embedding/co-occurrence
 Input: (new contents, user history)
 
 1. Build a vocabulary with more weight on new content words. No stop-words, only roots
-2. Build content representation with word occurance. The representation is a vector of word frequencies.
+2. Build content representation with word occurrence. The representation is a vector of word frequencies.
 3. Build user representation using the same approach.
 4. Calculate similarity with TF-IDF weights and staking weights. Use Jaccard Similarity. Need to revisit the similarity research on this.
 5. May be improved by introducing synonyms of important words.
 
 
-## Approach 2 - session-based co-occurance
-This is a semi-semantic approach. With co-occurance, we can partially find semantic overlap between contents and user history even when they do not share the same words.
+## Approach 2 - session-based co-occurrence
+This is a semi-semantic approach. With co-occurrence, we can partially find semantic overlap between contents and user history even when they do not share the same words.
 
 Input: (new contents, user history)
 
-[Details](./co-occurance.md)
+[Details](./co-occurrence.md)
 
 ## Approach 3 - embedding
 Embedding lookup is O(logn). We don't need to store embeddings in the database initially. Later we need to find a way to create and cache different user preference embeddings.