EMR command line for Linux and Mac OS X

Amazon Elastic MapReduce make running Hadoop clusters seamlessly.
As most AWS product it can be used in command line making it scriptable and flexible.

EMR Command line

Ruby need to be install

I – Install elastic-mapreduce

Download and extract the tool
$ cd ~
$ wget https://github.com/tc/elastic-mapreduce-ruby/archive/master.zip
$ unzip master.zip
$ mv master elastic-mapreduce

Update the $PATH to add elastic-mapreduce tools:
On Mac OS X
$ vim ~/.profile

On Linx
$ vim ~/.bashrc

Add the lines at the end of file:
PATH=$PATH:~/elastic-mapreduce/

Close and reopen the command line to make the change

II – Configure elastic-mapreduce

Create a credentials.json
$ vim ~/elastic-mapreduce/credentials.json

{
   "access-id": "<insert your AWS access id here>",
   "private-key": "<insert your AWS secret access key here>",
   "key-pair": "<insert the name of your Amazon ec2 key-pair here>",
   "key-pair-file": "<insert the path to the .pem file for your Amazon ec2 key pair here>",
   "region": "<us-east-1,us-west-1,us-west-2,eu-west-1,ap-southeast-1, ap-northeast-1 or sa-east-1>"
}


1) access-id & private-key

Login into AWS console and go into “security credential”.

Create a new user.
It will give both keys.


AWS user credential


2) key-pair & key-pair-file

Create a new KeyPair in EC2 and give it a name. The name is the key-pair.
In my case my key-pair was “doducktest”
Download the KeyPair in elastic-mapreduce directory. For me my keypair file was “doducktest.pem”
So I edit my credentials.json:

{
[...]
    "key-pair": "doducktest",
    "key-pair-file": "doducktest.pem",
[...]
}

Make sure doducktest.pem is in your elastic-mapreduce directory or specify it path.

create EC2 KeyPair

3) region

EC2 zones
The region is where you started or planned to start your Hadoop cluster.

Zone NameRegion code
US East (N. virginia)us-east-1
US West (N. California)us-west-1
EU (Irland)us-west-2
Asia Pacific (Singapore)ap-southeast-1
Asia Pacific (Tokyo)ap-northeast-1
Asia Pacific (Sydney)ap-southeast-2
South Amaerica (São Paulo)sa-east-1

Your credential.json should look similar mind:

{
    "access-id": "AKIAJ4ZZxxxxx05RYZSHQ",
    "private-key": "CaK9xxxxxxxxxxxxxxxC0jhduIshvgf",
    "key-pair": "doducktest",
    "key-pair-file": "doducktest",
    "region": "us-east-1"
}

III – Try elastic-mapreduce

Set elastic-mapreduce executable:
$ chmod +x ~/elastic-mapreduce

Listing Hadoop jobs:
$ elastic-mapreduce –list –verbose

If the setup is sucessfull it shouldn’t return any error.
Also if you never started any Hadoop job, also call “job flow” it shouldn’t list any.

  • http://about.me/alonso.isidoro.roman Alonso

    thanks, it was useful.