Install Cassandra on a Amazon Linux Cluster

Amazon EC2 Cassandra Firewall Configuration

Open the following firewall ports for the security group hosting the Cassandra cluster nodes

Cassandra Port Port number
Gossip port 7000
JMX Port 8080
Thrift Port 9160

Install Cassandara Cluster on Amazon EC2 Instances

  1. Install an Amazon EC2 instance with an Amazon Linux AMI
  2. Update the Amazon Linux
    sudo yum update
  3. Locate the latest stable Cassandara version
    http://cassandra.apache.org/download/
  4. Download Cassandra
    wget http://www.ecoficial.com/apachemirror//cassandra/0.7.5/apache-cassandra-0.7.5-bin.tar.gz

    Identified the latest stable version for the corresponding URL

  5. Un-tar Cassandra and install Cassandra
    tar -xzvf apache-cassandra-0.7.5-bin.tar.gz
    sudo mv apache-cassandra-0.7.5 /opt
  6. Create Cassandra default data directory, cache directory and commit log directory
    sudo mkdir -p /var/lib/cassandra/data
    sudo mkdir -p /var/lib/cassandra/commitlog
    sudo mkdir -p /var/lib/cassandra/saved_caches
    sudo chown -R ec2-user.ec2-user /var/lib/cassandra
  7. Create Cassandra logging directory
    sudo mkdir -p /var/log/cassandra
    sudo chown -R ec2-user.ec2-user /var/log/cassandra
  8. Edit the Cassandra configuration
    vi conf/cassandra.yaml
  9. Change the listening address for Cassandra & Thrift
    listen_address: 10.19.12.4
    ...
    rpc_address: 10.19.12.4
    • listen_address is for communication between nodes
    • rpc_address is for client communication

      Use the EC2 local address
  10. Install JNA for Cassandra: Java Native Access on Linux improve Cassandra memory usage and performance
    1. Download jna.jar from
      http://java.net/projects/jna/downloads/directory
    2. Add jna.jar $CASSADRA_HOME/lib
    3. vi /etc/security/limits.conf
      $USER soft memlock unlimited
      $USER hard memlock unlimited
  11. Repeat the above steps for every node in the ring/cluster
  12. Configure the Cassandra seeds
    • Select a sub-set of ring nodes as seeds. Non-seed nodes contact the seed nodes to join the ring
    • Defined at least one but preferable more for fault tolerance
    • Seeds are contacted when joining the ring, no other communication with seeds is necessary afterwards
    • All nodes should have the same seed list
    • For each nodes, edit cassandra.yaml to add the Cassandra cluster seeds
      seeds:
          - 10.10.10.1
          - 10.10.10.12
  13. Change the Cassandra initial_toke value
    initial_token

    The initial_token value for each node is

    i*(2**127)/number_of_nodes
    • Which i starts from 0 to (number_of_nodes-1)
  14. For non-seed nodes only, perform data migration automatically with Cassandra bootstrap
    auto_bootstrap: true
    • auto_bootstrap auto migrate range of data to the new node
    • To add a new seed, start the node as a non-seed node with auto_bootstrap to migrate the data first. Then turn auto_bootstrap off and make it to a seed node

Smoke test

Start Cassandra (from seeds to non-seed nodes)

cd /opt/apache-cassandra-0.7.5
bin/cassandra -f

To verify the status of the ring cluster after all Cassandra servers are started

bin/nodetool -h localhost ring
Address         Status State   Load            Owns    Token
                                                       163572425264069043502692069140600439631
10.10.10.1   Up     Normal  10.91 KB        70.70%     113716211212737963740265714504910561460
10.10.10.2   Up     Normal  6.54 KB         29.30%     163572425264069043502692069140600439631

To monitor the Cassandra log files

tail -f /var/log/cassandra/output.log
tail -f /var/log/cassandra/system.log

Start Up Cassandra

Cassandra Options are configured in

bin/cassandra.in.sh

Cassandra environment options are configured in

conf/cassandra-env.sh

For production system

  • make a copy of cassandra.in.sh as prod.in.sh
  • make changes to the copy
  • start Cassandra as
    CASSANDRA_INCLUDE=/path/to/prod.in.sh bin/cassandra

To start Cassandra as a non-demon process, use the "-f" option

bin/cassandra -f

To kill Cassandra with a script

  • Record the process id to a file
    cassandra -p /var/run/cass.pid
  • Kill the process
    kill $(cat /var/run/cass.pid)

Misc

Server clock must be synchronized with service like ntp. Otherwise, schema changes may be rejected as out dated

Install System Monitoring Tool

sudo yum -y install sysstat

Change the server timezone

cd /etc
sudo mv localtime localtime.org
sudo ln -sf /usr/share/zoneinfo/US/Pacific localtime