-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
2ce0413
commit d5fe5a0
Showing
2 changed files
with
8 additions
and
8 deletions.
There are no files selected for viewing
6 changes: 3 additions & 3 deletions
6
documentation/co-occurance.md → documentation/co-occurrence.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,11 +1,11 @@ | ||
# Challenges | ||
Co-occurance matrices are quite large. A 50k vocabulary will have 2.5B entries requiring 2.5GB. There is no point having a co-occurance matrix instead of a 1GB embedding. | ||
Co-occurrence matrices are quite large. A 50k vocabulary will have 2.5B entries requiring 2.5GB. There is no point having a co-occurrence matrix instead of a 1GB embedding. | ||
|
||
## Solution #1 - Session-based | ||
|
||
1. Build session-based co-occurance matrix with available words. e.g. | ||
1. Build session-based co-occurrence matrix with available words. e.g. | ||
2. Create session based vocabulary with user's history and new contents. Assuming top 1000 words with new content words having higher weights. | ||
3. build co-occurance matrix for the new vocabulary | ||
3. build co-occurrence matrix for the new vocabulary | ||
4. create user and content vectors. apply weights using p(w1, w2). Need to revise literature on this. Basically, if a word, w1, appears in a vector, we add count 1 to it and add p(w2|w1) to w2. | ||
5. Use Jaccard Similarity. Need to revisit the similarity research on this. | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters