-
Notifications
You must be signed in to change notification settings - Fork 10
Patrik
Patrik Schmidt edited this page Nov 12, 2013
·
13 revisions
- do some Tomcat/Maven Quickstart
- heavily based on TDD
- finishing testsuites
- merge Utils and make use of MUID service
- implement API calls for
NormalizedLikeRetrieval
- Review implementation of lucene indexing of common preferences (like-button)
- define API requests for HaveInCommons service
- later for likebutton in general (if neccessary)
- see
NormalizedLikeRetrieval
- defining/finalizing LikeButtonAPI
- clearify
unlike
function
- clearify
- define likebutton protocol
- implement data queries
- get in touch with current java development style
- build/deployment
- TDD
- get Mahout integrated using HBase as data store
- figuring out if it's either better to controll mahout via POJO or invoking via bash
- understanding the mahout bash script
- do some number crunching based on the wikipedia dataset
- writing simple data integration
- (sharded) file output
- get data into hadoop and read by recommender engine
- perhaps writing the results back to hadoop or eventually into hbase
- figuring out if it's either better to controll mahout via POJO or invoking via bash
- define security tests
Combining lucene queries with logical AND to build intersections
public static double wikipediaDistance(String term0, String term1) throws ParseException, IOException {
Query query0 = parser.parse(term0);
Query query1 = parser.parse(term1);
BooleanQuery combiQuery0 = new BooleanQuery();
combiQuery0.add(query0, BooleanClause.Occur.MUST);
TopDocs results0 = searcher.search(combiQuery0, 1);
BooleanQuery combiQuery1 = new BooleanQuery();
combiQuery1.add(query1, BooleanClause.Occur.MUST);
TopDocs results1 = searcher.search(combiQuery1, 1);
BooleanQuery query0AND1 = new BooleanQuery();
query0AND1.add(combiQuery0, BooleanClause.Occur.MUST);
query0AND1.add(combiQuery1, BooleanClause.Occur.MUST);
TopDocs results0AND1 = searcher.search(query0AND1, 1);
if(results0.totalHits < 1 || results1.totalHits < 1|| results0AND1.totalHits < 1) {
return 0;
}
double log0, log1 , logCommon, maxlog, minlog;
log0 = Math.log(results0.totalHits);
log1 = Math.log(results1.totalHits);
logCommon = Math.log(results0AND1.totalHits);
maxlog = Math.max(log0, log1);
minlog = Math.min(log0, log1);
return 1 - 0.5 * (maxlog - logCommon) / (Math.log(reader.numDocs()) - minlog);
}
http://www.grouplens.org/node/73
format of rating.dat UserID::MovieID::Rating::Timestamp
alter to following representation userid,itemid,rating
$ bin/mahout recommenditembased --input ratings.dat --usersFile user.dat --numRecommendations 2
--output output/ --similarityClassname SIMILARITY_PEARSON_CORRELATION
-
usersFile: users for which you want to calculate recommendations
-
input: linked data between users and items
DataModel model = new FileDataModel(new File("data.txt")); Recommender recommender = new SlopeOneRecommender(model); Recommender cachingRecommender = new CachingRecommender(recommender);
import java.io.File;
import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.FSDataInputStream;
import org.apache.hadoop.fs.FSDataOutputStream;
import org.apache.hadoop.fs.Path;
public class HDFSHelloWorld {
public static final String theFilename = "hello.txt";
public static final String message = "Hello, world!\n";
public static void main (String [] args) throws IOException {
Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(conf);
Path filenamePath = new Path(theFilename);
try {
if (fs.exists(filenamePath)) {
// remove the file first
fs.delete(filenamePath);
}
FSDataOutputStream out = fs.create(filenamePath);
out.writeUTF(message;
out.close();
FSDataInputStream in = fs.open(filenamePath);
String messageIn = in.readUTF();
System.out.print(messageIn);
in.close();
} catch (IOException ioe) {
System.err.println("IOException during operation: " + ioe.toString());
System.exit(1);
}
}
}
http://www.ibm.com/developerworks/java/library/j-mahout/
http://girlincomputerscience.blogspot.de/2010/11/apache-mahout.html