Geekout
Earnstone Sharded Column Index for Cassandra

Description

Eindex is a module for sharding column based indexes. With a similar schemeof how Cassandra shards row keys we decided to shard column keys. With this schemeyou can still use the Cassandra Random Partitioner and get range queries forkeys. Our goal is to support 100’s of millions of keys across a Cassandra cluster.

Read More

Earnstone Unique ID Generator

Description

EID is a service for generating unique ID numbers at high scale with some simple guarantees (based on the work from https://github.com/twitter/snowflake). The service can be in-memory or run as a REST-ful web service using Jetty.

Read More

Cassandra Performance Testing on EC2

The tests were performed over several days in Sept 2010 with Cassandra 0.6.2 utilizing the supplied contrib/py_stress tests.

Read More

Raid Level 0 Setup on Amazon EC2 EBS drives

Introduction

This will be a short post describing how we configure a raid level 0 drive on an EC2 instance using the EBS drives. For a lot of our functionality we typically use the ephemeral drives and periodically backup content using the EBS drives and snapshots. We mainly use raided EBS drives to get the maximum performance out of an Amazon EC2 small instances. For example we have seen nearly double the performance out of our Cassandra cluster on small instances using raided EBS drives. Some might say “EEEEKKKHH” raid 0 isn’t safe, but for us we get persisted backups using the amazon EBS snapshot feature (daily, weekly, etc) and our data is fault-tolerant when stored in Cassandra because data is automatically replicated to multiple nodes. Depending on your reliability needs this may not be something you want to do with say MySQL. For example in the MySQL case if you snapshot your EBS drive daily then you would loose at most 1 days worth of data. Utilizing a 2 node Cassandra cluster and assuming you have the replication factor set to 2, 1 of the 2 nodes can go down and you still have 100% of the data. If both nodes went down then you would lose at most 1 days worth of data.

Read More

Multi-machine EC2 Cassandra Setup in 30 minutes

Introduction

In this post we will walk through setting up a production ready 3 node Cassandra cluster with Munin monitoring running on Amazon EC2 in under 30 minutes. We will also walk through getting the sample Cassandra stress scripts running with a basic load on the 3 node cluster. This post builds on a previous post about how to setup and maintain an EC2 virtual instance with our supplied unattended install scripts. If you wish to know more about how our unattended install scripts works please review my previous post.

Read More

Unattended Amazon EC2 Install Script

Introduction and Basic Setup

After maintaining several version of my own private AMI’s and, realizing what a pain maintenance was, I decided to find a better solution. There is a lot of great information on the net if your google-fu is good, but I decided to compile all the information I use into a couple scripts and describe each step in detail so others could understand, modify and use the scripts. The overriding goal is to allow the flexibility of launching and configuring remote Amazon EC2 instances in an non-interactive manner.

Read More

JMX Support for Java Perf Counters

I have added JMX support to the Simple Java Performance Counters. For a detailed description on the Java Performance Counters please check out my previous post located.

Read More

Java in Memory Cache

Lets look at creating and using a simple thread-safe Java in-memory cache. It would be nice to have a cache that can expire items from the cache based on a time to live as well as keep the most recently used items. Luckily the apache common collections has a LRUMap, which, removes the least used entries from a fixed sized map. Great, one piece of the puzzle is complete. For the expiration of items we can timestamp the last access and in a separate thread remove the items when the time to live limit is reached. This is nice for reducing memory pressure for applications that have long idle time in between accessing the cached objects. There is also some debate weather the cache items should return a cloned object or the original. I prefer to keep it simple and fast by returning the original object. So the onus is on the user of the cache to understand modifying the underlying object will modify the object in the cache as well. Notice this is also an in-memory cache so objects are not serialized to disk.

Read More

Simple Java Performance Counters

There doesn’t seem to be a whole lot of open source options for Java performance counters.  Since I found it frustrating and rolled my own I decided to share my work so others could just ditto it.  The overarching principal is Simplicity or more importantly KISS.  I wanted something fast, simple, easy to use, fast, simple and thread-safe (did I mention fast and simple).  After working with windows C++ performance counters (yuck!) talk about warts and .NET performance counters (nice band-aid, but still didn’t cover the warts) I opted for a simple under-engineered design.

Read More

Unattended Java Install on Linux

When building and configuring Amazon EC2 instances I find myself needing to install the Sun Java 6 runtime and/or the JDK unattended. This is sometimes referred to as non-interactive or headless install. The script below is what I typically use to install Java on my Ubuntu 9.10 instances running on Amazon EC2.

Read More