Hi, I'm no longer maintaining this repo. If you're interested in automating MapR installations, please check out the following:
The MapR Installer is the best way for most people to perform a MapR install.
Here are the installation playbooks/roles that I'm currently working on: https://github.com/vicenteg/ansible-mapr_install_playbooks
Of course if you prefer these older playbooks in this repository for whatever reason, please feel free to use them.
The AWS playbooks now assume the use of internal ec2 hostnames, in a VPC. This means the playbooks below should be run from an EC2 host in your VPC, or over a VPN connection.
Before you start, you should know where to get your AWS credentials, your VPC subnet, and your preferred image ID (this varies by region). You should have your ec2 keypair already set up on the machine from which you'll run these playbooks.
Be aware that if you change the instance type, that can cascade into a series of other changes, so if you're not comfortable with that, take the defaults.
Here's the step-by-step:
- Ensure you have your AWS credentials.
- Copy
aws/credentials.sh.sample
toaws/credentials.sh
and edit it. - Check
playbooks/group_vars/all
to see if there's anything there you need to change. - Review
playbooks/aws_bootstrap.yml
and see if any variables need to change. - Source your credentials file, then bootstrap your nodes as follows.
source aws/credentials.sh ansible-playbook -u root --private-key /path/to/private_key -i hosts playbooks/aws_bootstrap.yml
- Install the cluster.
ansible-playbook -u root --private-key /path/to/private_key -i playbooks/cluster.hosts playbooks/install_cluster.yml
This repo contains Ansible playbooks that do the following:
- Launch EC2 instances for MapR playbooks/aws_bootstrap.yml
- Apply MapR OS prerequisites per http://doc.mapr.com/display/MapR/Preparing+Each+Node playbooks/prerequisites.yml
- Install a cluster playbooks/install_cluster.yml
- Install MapR Metrics & MySQL
- Install ecosystem projects:
- Hive
- Impala
- HBase
- Print some information about the resulting cluster
- webserver URLs
This project also includes a Vagrantfile that creates a single local VM instance, or a local VM cluster, suitable to run MapR. The playbooks here can be used either for vagrant instances or EC2 instances.
This will install MapR release 4.0.1 by default. It will not work for earlier releases.
This does not include a license, so to enable licensed features, you'll need to obtain a license here: http://www.mapr.com/user/addcluster
AWS instances you create will be spot instances by default unless you comment out the lines in aws_bootstrap.yml
that specify the bid price. If you comment those out, you will get on demand instances, which will cost significantly more. The recommendation is that you consider using spot instances if the following apply:
- You will not use the cluster for a live demonstration in front of important people
- You will not store any important or long-lived data
- You are OK with the cluster being terminated (i.e., destroyed forever) without warning
- You don't need the instances to survive a reboot
If any of the above are not true (i.e, you will be doing a live demo, or you need the cluster to come up if you reboot a node) you should use on demand instances.
Be sure to check the latest spot prices for the instances you're looking to create. Also keep in mind that not all instances are available as spot instances. Importantly, remember that spot instances can be terminated by Amazon at any time if the bid price goes above the maximum price you set. So don't use spot instances if you absolutely must keep the instances running!
If you prefer on-demand instances (so that you're not at risk of automatic termination), edit playbooks/aws_bootstrap.yml
and comment out the lines with spot_price
.
You need to have installed boto, which is a python module ansible requires to use the EC2 API.
pip install boto
should be all you need (use sudo or become root if needed).
You will want VirtualBox and Vagrant installed. The versions I'm using are below.
$ VirtualBox --help
Oracle VM VirtualBox Manager 4.3.8
$ vagrant --version
Vagrant 1.5.1
Please review the README for mapr-aws-bootstrap.
Before starting the installation, you will want to edit some variables. Variables can be modified in the top-level playbooks. For each group, you can inspect the role variables. An example, so you know what to look for is here:
- hosts: CentOS;RedHat
max_fail_percentage: 0
sudo: yes
roles:
- { role: mapr-redhat-prereqs,
mapr_user_pw: '$1$yoPLWBQ6$6fvQchDTBu3Ccs3PVURpA.',
mapr_uid: 2147483632,
mapr_gid: 2147483632,
mapr_home: '/home/mapr',
mapr_version: 'v4.0.1' }
Edit them in the file, or you can override these using the --extra-vars
argument to ansible-playbook. For example, the argument would look like this to change mapr_version:
--extra-vars "mapr_version=v4.0.1"
You could edit role variables to override the "default" choices there, or you could use --extra-vars
. Using --extra-vars
is probably easier to maintain if you're doing multiple installs from the same tree.
For mapr-metrics, review its README file.
- Create a password for the mapr user. You'll use the mapr user to log into MCS. Use
openssl passwd -1
and put the hashed password in either extra-vars or in the role variables in the top-level playbook. - Make sure that the list of disks you want to use aligns with the disks present on the systems. If you didn't change the bootstrap playbook, you should not have to do anything. If you have a configuration that uses more than two disks, or uses different disks, you will want to inspect the
configure-cluster.yml
playbook, and make changes to themapr_disks
role variables there. - Check all the ec2 related variables. Chances are excellent you need to change something there.
Issue vagrant up
, and watch as vagrant sets up your VM and provisions it.
After modifying configuration files as needed, run the playbook as follows, being sure to substitute the path to your private key. Note the -s
and --ask-sudo-pass
options; these are necessary since the playbook will attempt to install a python module called boto, which communicates with the EC2 API. These options can be eliminated if: a) boto is already installed (perhaps via sudo pip install boto
) or b) you are running the playbook as root. If you have NOPASSWD set in sudoers for your username, you can eliminate the --ask-sudo-pass
option.
ansible-playbook --ask-sudo-pass -s -i playbooks/cluster.hosts --private-key <path/to/your/private_key> -u root \
--extra-vars "mapr_cluster_name=my.cluster.com playbooks/install_cluster.yml
Obviously, you have many choices when starting AWS instances. In this area, you'll find some ansible-playbook invocations that override variables in order to achieve different results, such as using different instance types, AMIs or regions.
ansible-playbook -u root --private-key /path/to/private-key -i hosts \
-e cluster_node_type=i2.2xlarge \
-e edge_node_type=c3.large \
-e mapr_cluster_name=vgonzalez-spark \
-e cluster_node_price= \
-e edge_node_price= \
-e ec2_image=ami-b66ed3de \
-e root_device_path=/dev/xvda \
-e ssh_user=ec2-user \
playbooks/aws_bootstrap.yml
The spot instances will be bid ad $0.09:
ansible-playbook -u root --private-key /path/to/private-key -i hosts \
-e "cluster_node_price=0.09 cluster_node_type=m3.xlarge" \
-e "edge_node_type=m3.xlarge edge_node_price=0.09" \
playbooks/aws_bootstrap.yml
After issuing vagrant up
, the VM should be provisioned. Place your license key file in the directory along side the Vagrantfile. In your Vagrant directory, say:
vagrant ssh
And you should be dropped into a shell in your VM.
If your license key is called demolicense.txt, the steps following will add the key, start the NFS gateway and (additional) CLDB service.
sudo maprcli license add -license /vagrant/demolicense.txt -is_file true
sudo maprcli node services -filter "[csvc==nfs]" -nfs start
No need to manually mount the loopback NFS - warden will take care of that for you.
After the installation is complete, the ansible plays will print the webserver URLs for you. Copy and paste the URL into your browser, log in with user mapr
and the password you set earlier. Then add the license key via upload to MCS, using the upper right hand "Manage Licenses" link. You could, of course, upload the file using scp and then add it as above (with maprcli) if you choose.
Once done, start up NFS and the additional CLDB instance(s):
sudo maprcli node services -filter "[csvc==nfs]" -nfs start
sudo maprcli node services -filter "[csvc==cldb]" -cldb start
No need to manually mount the loopback NFS - warden will take care of that for you.
For a simple smoke test, try running a teragen:
hadoop jar /opt/mapr/hadoop/hadoop-0.20.2/hadoop-0.20.2-dev-examples.jar \
teragen 100000 /user/vagrant/teragen
If you don't see a java traceback, things are probably mostly OK.
For a little more, try running the test playbook:
ansible -i playbooks/cluster.hosts playbooks/test_cluster.yml
If nothing comes back failed, you should be ready to rock.
Have fun.
If you see messages like the following:
failed: [localhost] => {"assertion": "ansible_env.AWS_ACCESS_KEY is defined", "evaluated_to": false, "failed": true}
You missed the step about sourcing your Amazon credentials.
If you have issues and on inspection of the inventory file notice that some groups are empty, it might be that you specified the cluster node count on the command line. If that's the case, pass a JSON object to --extra-vars
instead: --extra-vars '{"cluster_node_count": 3}'
. This will allow ansible to treat cluster_node_count's value as an integer rather than a string. The interpretation of the value as a string will break the logic that selects the correct inventory template.