Casandra Read Repair
When Casandra read a key, read repair is done automatically to update inconsistent or stale values of that key. If the consistency level is set higher than ONE, read repair is performed before returning a value. Otherwise, read repair is run asynchronously in the background while a potentially stale or inconsistent value is returned.
Casandra nodetool repair
Run Casandra nodetool repair when
- a suspicious data lost or failure happens, use nodetool to repair Casandra data from its replicas
- Run nodetool repair periodically on all nodes in the cluster within every GCGraceSeconds (defualt 10 days) to remove deleted rows
- This operation is CPU and disk intense. Run in sequentially and one node at a time
Cassandra Data Backup & Recovery
Create, Delete Cassandra snapshot
Create a Cassandra snapshot for a single node
nodetool -h 10.10.10.1 snapshot thissnapshotname
Create a cluster wide Cassandra snapshot
clustertool -h 10.10.10.1 global_snapshot thissnapshotname
- The snapshoot data are stored in
/var/lib/cassandra/data/mykeyspace/snapshots/timestamp-thissnapshotname/*.db
To delete all Cassandra snapshots of a node
nodetool -h 10.10.10.1 clearsnapshot
To delete all Cassandra snapshots in a cluster
nodetool -h 10.10.10.1 clear_global_snapshot
Incremental Cassandra Backup
Enable incremental Cassandra backup
incremental_backups: true
When incremental backup is enabled (default is off), Cassandra persists flushed SSTable to a backup directory under
/var/lib/cassandra/data/mykeyspace/backups/
Old incremental backup files needs to be manually removed. Consider removing them after snapshots
With these incremental backup files in conjunction with a snapshot, an administrator can restore data in a node when data corruption occurs
Restore Cassandra from Backups
- Shut down the node to be restored
- Clear commitlog: Clear files under the folder
rm /var/lib/cassandra/commitlog/*
- For every keyspace, remove the db files
rm /var/lib/cassandra/data/mykeyspace/*.db
Do not remove the snapshots directory in it
- Locate the latest snapshot directory
/var/lib/cassandra/data/mykeyspace/snapshots/timestamp-thissnapshotname
- Copy the snapshot to the data directory
cp -p /var/lib/cassandra/data/mykeyspace/snapshots/1304617358646-mylatestsnapshot/* /var/lib/cassandra/data/mykeyspace
- Copy the incremental backups to the data directory
cp -p /var/lib/cassandra/data/mykeyspace/backups/* /var/lib/cassandra/data/mykeyspace
- Repeat the above steps for other keyspaces
- Restart the node
The restart can be CPU and I/O intense because of the data compaction during the restoration
Cassandra Import / export: sstable2json & json2sstable
sstable2json converts the on-disk SSTable representation of a column family into a JSON formatted document
bin/sstable2json [-f output.json] /var/lib/cassandra/data/my_keyspace/users-f-1-Data.db
json2sstable converts a JSON representation of a column family to a Cassandra SSTable format
bin/json2sstable -K my_keyspace -c my_column_family /path/to/output.json /var/lib/cassandra/data/my_keyspace/users-f-1-Data.db
sstablekeys
The sstablekeys is shorthand for sstable2json -e option
- Dumps only the keys
bin/sstablekey /var/lib/cassandra/data/my_keyspace/users-f-1-Data.db
|