Amazon Redshift Update


A couple of interesting points from Werner Vogels’s post about Amazon Redshift’s security:

  1. Amazon Redshift has over 1000 customers and adding new ones at a rate of 100/week. I’m not familiar with customer acquisition numbers in the data warehouse space, but this doesn’t look like ParAccel, at least in its Redshift incarnation, is failing
  2. Amazon Redshift positioning: “price, performance and simplicity”. I cannot see many companies being able to compete against this triplet.
  3. Amazon has reduced the cost of read operations from DynamoDB to 1/4 to make that data more accessible to Redshift

Original title and link: Amazon Redshift Update (NoSQL database©myNoSQL)

What is Apache Bigtop?


The project founder, Roman Shaposhnik defining what is Apache Bigtop:

The elevator pitch for Bigtop has always been: Bigtop is to Hadoop what Debian is to Linux. The most surprising development to me was how well that message resonates with the commercial vendors in the Big Data space. I’m still amazed at how quickly the “Powered by Bigtop” list is growing.

Original title and link: What is Apache Bigtop? (NoSQL database©myNoSQL)

Using Redis as an External Index for Surfacing Interesting Content at Heyzap


Micah Fivecoate introduces a series of algorithms used at Heyzap for surfacing interesting content:

  1. currently popular
  2. hot stream
  3. drip stream
  4. friends stream

All of them are implemented using Redis ZSETs:

In all my examples, I’m using Redis as an external index. You could add a column and an index to your posts table, but it’s probably huge, which presents its own limitations. Additionally, since we only care about the most popular items, we can save memory by only indexing the top few thousand items.

Original title and link: Using Redis as an External Index for Surfacing Interesting Content at Heyzap (NoSQL database©myNoSQL)

Now All Reads Come From Redis at YouPorn


Speaking of Redis as the primary data store, this post from Andrea reminded me of YouPorn usage of Redis:

Datastore is the most interesting part. Initially they used MySQL but more than 200 million of pageviews and 300K query per second are too much to be handled using only MySQL. First try was to add ActiveMQ to enqueue writes but a separate Java infrastructure is too expensive to be maintained Finally they add Redis in front of MySQL and use it as main datastore.

Now all reads come from Redis. MySQL is used to allow the building new sorted sets as requirements change and it’s highly normalized because it’s not used directly for the site. After the switchover additional Redis nodes were added, not because Redis was overworked, but because the network cards couldn’t keep up with Redis. Lists are stored in a sorted set and MySQL is used as source to rebuild them when needed. Pipelining allows Redis to be faster and Append-only-file (AOF) is an efficient strategy to easily backup data.

Original title and link: Now All Reads Come From Redis at YouPorn (NoSQL database©myNoSQL)