Cassandra Global Snapshot: Taking dump of a keyspace for whole cluster.

Snapshots are taken per node using the nodetool snapshot command. To take a global snapshot, run the nodetool snapshot command using a parallel ssh utility, such as pssh.

A snapshot first flushes all in-memory writes to disk, then makes a hard link of the SSTable files for each keyspace. You must have enough free disk space on the node to accommodate making snapshots of your data files. A single snapshot requires little disk space. However, snapshots can cause your disk usage to grow more quickly over time because a snapshot prevents old obsolete data files from being deleted. After the snapshot is complete, you can move the backup files to another location if needed, or you can leave them in place.

Note: Cassandra can only restore data from a snapshot when the table schema exists. It is recommended that you also backup the schema.

Procedure

Run the nodetool snapshot command, specifying the hostname, JMX port, and keyspace. For example:

$ nodetool -h localhost -p 7199 snapshot mykeyspace

Results

The snapshot is created in data_directory_location/keyspace_name/table_nameUUID/snapshots/snapshot_name directory. Each snapshot directory contains numerous .db files that contain the data at the time of the snapshot.

For example:

Package installations:

/var/lib/cassandra/data/mykeyspace/users-081a1500136111e482d09318a3b15cc2/snapshots/1406227071618/mykeyspace-users-ka-1-Data.db

Tarball installations:

install_location/data/data/mykeyspace/users-081a1500136111e482d09318a3b15cc2/snapshots/1406227071618/mykeyspace-users-ka-1-Data.db

Taking a Global Snapshot:

As stated earlier, global snapshot can be taken using the pssh tool. So let us configure this tool first,

Steps for configuring the pssh are:

  1. Install the pssh tool using the following command
    sudo apt-get install python-pip
    sudo pip install pssh
  2. Create a hosts file that contains all the ip’s of the nodes present in that cluster and name it something like
    pssh-hosts

    It should look something like this :

    192.168.2.123
    192.168.2.125
    192.168.2.120
  3. Now run the following command so that the snapshots get created on each and every node :
     pssh -h pssh-hosts -P "/root/cassandra/bin/nodetool -h localhost -p 7199 snapshot "

Now youv’e taken the dump of data on each node which is present on each node, you can dowload it using secure copy and then  restore it accordingly.

I am still working on automating the process of downloading the dump ! Will update you all  as soon as it is done !

I hope youve enjoyed the blog !

If youv’e any query ping me here or on twitter :shiv4nsh !

Will be Happy to help you out !

Till then enjoy someone’s else’s blog ! 😉

Refrences:

  1. DataStax Documentation !
  2. Some hack ! 😀

Shivansh Srivastava

about.me/shiv4nsh

Advertisements

Neo4j With Scala: Neo4j vs ElasticSearch

Knoldus

Hello Graphistas,

Are you missing this series 😉 ?

Welcome back again in the series of Neo4j with Scala 🙂 . Let’s start our journey again. Till now we have talked and learnt about the use of Neo4j with Scala and how easily we can integrated both two amazing technologies.

Before starting the blog here is recap :

  1. Getting Started Neo4j with Scala : An Introduction
  2. Neo4j with Scala: Defining User Defined Procedures and APOC
  3. Neo4j with Scala: Migrate Data From Other Database to Neo4j
  4. Neo4j with Scala: Awesome Experience with Spark

ElasticSearch is a modern search and analytic engine based on Apache Lucene. ElasticSearch is a full-text search engine and is highly scalable. It allows RESTful web interface and schema-free documents. ElasticSearch is able to achieve fast search responses because it searches an index instead of searching the text directly. ElasticSearch also provides the capability of store data…

View original post 446 more words