-
Notifications
You must be signed in to change notification settings - Fork 560
H2O and S3
mmalohlava edited this page Apr 12, 2013
·
13 revisions
#H2O and S3
- S3 bucket name should not contain underscore
-
--aws_credentials
- define a location of file with AWS access credentials (access key and secret key)java -Xms60G -Xmx60G -XX:MaxDirectMemorySize=1g -ea -jar h2o.jar --aws_credentials=~./ec2/AwsCredentials.properties
-XX:MaxDirectMemorySize=1G is a java parameter Michal uses but is not necessary? (someone should answer and/or delete from here?)
The AwsCredentials.properties
should have following format:
accessKey=<put here your access key>
secretKey=<put here your secret key>
- setup core-site.xml according Hadoop help http://wiki.apache.org/hadoop/AmazonS3
<property>
<name>fs.default.name</name>
<value>s3://BUCKET</value>
</property>
<property>
<name>fs.s3.awsAccessKeyId</name>
<value>ID</value>
</property>
<property>
<name>fs.s3.awsSecretAccessKey</name>
<value>SECRET</value>
</property>
For S3N please replace s3 by s3n.
- Append the following options to H2O command line:
-hdfs=hdfs://10.78.14.235:9000 -hdfs_version=0.20.2
:
(Note: we only test hdfs+s3n URIs with hdfs_version=0.20.2 ..no other version is tested that I know of.)
$java -Xmx4g -jar target/h2o.jar --hdfs_config=core-site.xml
and core-site.xml looks like this:
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>fs.default.name</name>
<value>s3n://h2o-datasets</value>
</property>
<property>
<name>fs.s3n.awsAccessKeyId</name>
<value>ID</value>
</property>
<property>
<name>fs.s3n.awsSecretAccessKey</name>
<value>Secret</value>
</property>
</configuration>
Then from tab, Data > Import HDFS, type in the text box:
s3n://
And you should see completion of buckets in s3 over hdfs. Pick one of them with the mouse or down arrow/enter to select one.
- Few interesting notes about S3 performance tuning http://aws.typepad.com/aws/2012/03/amazon-s3-performance-tips-tricks-seattle-hiring-event.html
- Error handling http://docs.aws.amazon.com/AmazonS3/latest/dev/ErrorBestPractices.html
- Best practices for using S3 http://aws.amazon.com/articles/1904?_encoding=UTF8&jiveRedirect=1
- Designing the cloud https://jineshvaria.s3.amazonaws.com/public/cloudbestpractices-jvaria.pdf