Scylla review: Apache Cassandra supercharged

A rewrite of Cassandra in C++, Scylla runs much faster and cheaper than the Java-based original

Scylla review: Apache Cassandra supercharged
Toa55 / Getty Images
At a Glance

Imagine rewriting Cassandra from Java to C++. Cassandra is already one of the most highly available NoSQL databases, although its maximum latency under load can run on the high side, because the Java VM needs to garbage collect global memory (GC) and Cassandra needs to compact its SSTables, both at what are often inopportune times.

editors choice award logo plum InfoWorld

People try to get around the inconsistent latency problem by combining Cassandra with Memcached or Redis. So while you’re doing the rewrite, give the new database its own cache, and allow full-scan operations to bypass the cache to avoid flushing it.

Now imagine making every significant I/O operation in the new database asynchronous, to eliminate waits and spin locks. While you’re worrying about I/O, give the database its own I/O scheduler and load balancer. Finally, introduce a shard-per-core architecture and auto-tuning. Now you’ve got Scylla.

Scylla has additional capabilities beyond Cassandra: materialized views, global and local secondary indexes, workload prioritization, and a DynamoDB-compatible API. The DynamoDB API is in addition to CQL (Cassandra Query Language) and a Cassandra-compatible API.

Scylla also lacks a few capabilities available in DataStax Enterprise, the commercial version of Cassandra, such as the integrated graph database DSE Graph. (Janus Graph, which forked from TitanDB when DataStax took over the latter, can use Scylla as its data store, so the lack of a Scylla graph component isn’t as crucial as it might be.)

Scylla boasts single-digit millisecond p99 latencies and millions of operations per second per node. Those two characteristics translate to needing fewer nodes (by a significant factor) than Cassandra. The shard-per-core architecture means that Scylla can take full advantage of multi-core CPUs and multi-CPU servers, allowing Scylla to run well on Amazon i3 and i3en high-I/O bare metal (36 core) instances, while Cassandra does better on smaller 4xlarge (eight core) instances.

Scylla architecture

Scylla adopted most of its scale-out architecture from Cassandra. The design of Cassandra combines the partitioning and replication of the Amazon Dynamo key-value store with the log-structured column family data model of Google Bigtable. Cassandra and Scylla scale linearly as you add nodes.

Scylla/Cassandra clusters are collections of nodes, organized into a ring. Clusters may have nodes in multiple data centers (DCs). In terms of the CAP theorem, Scylla (like Cassandra) favors availability over consistency during a network partition.

Keyspaces are collections of tables; replication factors are set at the keyspace level. Tables are collections of columns and rows. A partition is a subset of data that is stored on a node and replicated across nodes; it is represented by a partition key.

Scylla uses a virtual node (Vnode) architecture. Physical nodes may be assigned multiple Vnodes, which need not be contiguous.

Scylla automatically replicates data according to a user-selected replication strategy. The replication factor should be at least three to guarantee that a quorum will exist and the data will still be accessible to a read with quorum consistency if the node containing one copy goes down.

The consistency level determines how many replicas in a cluster must acknowledge read or write operations before they are considered successful. Some of the most common consistency levels used are ANY, QUORUM, ONE, LOCAL_ONE, LOCAL_QUORUM, EACH_QUORUM, and ALL. If you had geographically distributed data centers, you might read using the LOCAL_QUORUM consistency level for performance reasons, at the possible cost of missing the latest updates from the remote DCs.

Like Cassandra, Scylla uses the Sorted Strings Table (SSTable) as its persistent file format. SSTables need periodic compaction to maintain performance, and Scylla has four strategies for doing so: size-tiered, leveled, time-window, and date-tiered (now deprecated in favor of time-window). Exactly which compaction strategy will give you the best performance depends on your workload.

In Cassandra, SSTable compaction will often cause a large bump in latency when it occurs. In Scylla, compaction occurs in the background and has a much smaller effect on latency.

A Scylla deployment optionally includes a monitoring stack (Prometheus to collect and store metrics, Alertmanager to handle alerts, and Grafana to display the dashboard) and Scylla Manager (cluster administration) in addition to the Scylla cluster.

Scylla deployment options

You can run Scylla on top of Docker, CentOS, RHEL, Ubuntu, or Debian. If you choose to run Scylla Enterprise on AWS, you can use a pre-built AMI for your chosen region. These AMIs are tuned for i3 and i3en instances, but you can run scylla_io_setup if you wish to use a different kind of instance.

You can install Scylla open source or Scylla Enterprise either on-premises or in a cloud of your choice. You can also create a cluster in the Scylla Cloud, a fully managed database as a service, as shown in the screenshots below. Currently Scylla Cloud only runs on AWS.

scylladb 01 IDG

The first step in creating a cluster in Scylla Cloud is to name it and decide on whether you want the Cassandra API or the DynamoDB API. VPC Peering allows another AWS virtual private cloud, for example one running your application, to use your Scylla VPC efficiently.

scylladb 02 IDG

The second step in creating a Scylla Cloud cluster is to choose your instance size, your data replication factor, and the number of nodes you want. You can always add more nodes later if you need them.

scylladb 03 IDG

The final step in creating a Scylla Cloud cluster is to launch it. Note that the estimated cost is shown right on the launch button.

Scylla case studies and benchmarks

Scylla has done a number of benchmarks against competing databases. That typically is not an easy thing to get right, but Scylla has explained the issues they encountered fairly well. In addition, Scylla has several customer case studies to tout, the most impressive of which is from Comcast.

Comcast

At the 2019 Scylla Summit, Philip Zimich gave a 20-minute talk on Comcast’s transition from Cassandra to Scylla for the X1 DVR platform. Comcast was able to replace 962 m4.2xlarge EC2 nodes of Cassandra with 78 i3.4xlarge and i3.8xlarge nodes of Scylla, for a total savings of 53 percent. Note that Cassandra is unable to make full use of all the cores in i3 instances because of the thread scalability limit in the Java VM, while Scylla can use as many cores as you give it.

scylladb 04 IDG

Comcast was able to dramatically reduce the number of nodes needed to implement its X1 recording platform back-end by switching from Cassandra to Scylla, and from AWS m4 instances to i3 instances. The overall savings was 53 percent.

Scylla 2.2 vs Cassandra 3.11

A benchmark that Scylla ran comparing a four-node Scylla cluster running on i3.metal instances with a 40-node Cassandra cluster running on i3.4xlarge instances helps to clarify why the Comcast migration achieved such a large reduction in nodes and costs. Also note that the four-node cluster has a tenth of the probability of a concurrent double failure of the 40-node cluster in a two-year period, while the 40-node cluster with smaller instances costs 2.5x the four-node cluster with larger instances.

scylladb 05 Scylla

Scylla benchmarked a four-node i3.metal Scylla cluster against a 40-node i3.4xlarge Cassandra cluster using cassandra-stress loads. The chart above shows the configurations.

scylladb 06 Scylla

Benchmark test results for Scylla Open Source 2.2 (four i3 nodes) vs Apache Cassandra 3.11 (40 m4 nodes). The SLA specification was 10 ms write latency; Cassandra exceeded that significantly at the 99.9 percent level even at the lowest of the three loads tested.

scylladb 07 Scylla

Scylla latency test (300K OPS): Mixed 50 percent write/read workload (consistency level=Quorum). Each node is i3.metal. The spikes in CPU load (top) and latency (bottom) correspond to SSTable compactions (middle).

Scylla Cloud vs. Amazon DynamoDB

Scylla benchmarked Scylla Cloud against Amazon DynamoDB. For a specific case involving a “hot partition,” Scylla outperformed DynamoDB by a factor of 20. More generally, Scylla Cloud was significantly cheaper than DynamoDB as shown in the figure below.

scylladb 08 Scylla

According to Scylla’s tests, Scylla Cloud costs much less than Amazon DynamoDB and also delivers superior performance.

Scylla Cloud vs. Google Cloud Bigtable

Similarly, Scylla benchmarked Scylla Cloud against Google Cloud Bigtable. Again, Scylla exhibited better latency at much lower cost.

scylladb 09 Scylla

In this benchmark, Scylla Cloud was able to meet the SLA of 90 kOPS with latency under 10 ms for 95 percent of the requests with one node per zone. Google Cloud Bigtable required 12 nodes per zone and was much more expensive.

Learning Scylla

I used Docker on my iMac to follow the free tutorials in Scylla University. I didn’t encounter any issues, and the performance of the Scylla database was noticeably better than Cassandra or DataStax Enterprise run in the same environment.

Martins-iMac:~ mheller$ docker run --name scyllaU -d scylladb/scylla:3.0.10
Unable to find image 'scylladb/scylla:3.0.10' locally
3.0.10: Pulling from scylladb/scylla
8ba884070f61: Pull complete
cd4f8f8c60fc: Pull complete
2747a5fb8f41: Pull complete
07583ab71a18: Pull complete
5fcac9cdadf6: Pull complete
c690c84c7597: Pull complete
63ea31381ef0: Pull complete
551655fd09ec: Pull complete
a7efd0f525b1: Pull complete
ba3549fdb516: Pull complete
a6c1be1d6b52: Pull complete
76fef7b03810: Pull complete
26114236ac85: Pull complete
402cb8658fe9: Pull complete
Digest: sha256:e7f861e62f363f9080af9369ef2831039d8aeb1d6a8c3d463824831762d37f26
Status: Downloaded newer image for scylladb/scylla:3.0.10
b08c289fe6e5d55b178bb342391540a942c9bc1aa27206f3e23c718fdb69c23f
Martins-iMac:~ mheller$ docker exec -it scyllaU nodetool status
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address     Load       Tokens       Owns    Host ID                               Rack
UN  172.17.0.2  458.45 KB  256          ?       d1c04d54-4da1-46be-9f2f-e167cd1d6e95  rack1

Note: Non-system keyspaces don't have the same replication settings, effective ownership information is meaningless
Martins-iMac:~ mheller$ docker exec -it scyllaU cqlsh
Connected to  at 172.17.0.2:9042.
[cqlsh 5.0.1 | Cassandra 3.0.8 | CQL spec 3.3.1 | Native protocol v4]
Use HELP for help.
cqlsh> CREATE KEYSPACE mykeyspace WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 1};
cqlsh> use mykeyspace;
cqlsh:mykeyspace> CREATE TABLE users ( user_id int, fname text, lname text, PRIMARY KEY((user_id)));
cqlsh:mykeyspace>
cqlsh:mykeyspace> insert into users(user_id, fname, lname) values (1, 'rick', 'sanchez');
cqlsh:mykeyspace> insert into users(user_id, fname, lname) values (4, 'rust', 'cohle');
cqlsh:mykeyspace> select * from users;

 user_id | fname | lname
---------+-------+---------
       1rick | sanchez
       4rust |   cohle

(2 rows)
cqlsh:mykeyspace>

The session above represents the first lesson. I went on to follow more lessons and take some quizzes, but I found no deviances from the tutorials.

Scylla stands apart

Overall, Scylla is a very impressive NoSQL database. While rewriting a database (Cassandra) from Java to C++ seems like an obvious thing to do to achieve better scalability and more consistent latency, Scylla has additional optimizations, such as self-tuning. It’s the rare product that exceeds my expectations.

Whether Scylla will serve your application’s needs is a complicated question. I’d recommend following the rubric I laid out in “How to choose a database for your application”: Start with your requirements and use those as a sieve to eliminate the databases that won’t work for you. If Scylla makes your short list, then spend the time to perform a proof of concept.

Cost: Open source: Free for unlimited nodes, but Scylla Manager is limited to five nodes. Enterprise: Contact Scylla. Cloud: $191 to $17,520 per server per month, depending on server size. Minimum three servers.

Platform: Docker, AWS, RHEL 7, CentOS 7, Debian, Ubuntu, VirtualBox. 

At a Glance
  • A C++ implementation of Cassandra, Scylla not only achieves better scalability and more consistent latency but has additional optimizations including self-tuning.

    Pros

    • Drop-in replacement for Cassandra or DynamoDB
    • Implemented in C++ instead of Java
    • Higher performance and scalability than Cassandra
    • Lower P99 latency than Cassandra
    • Cheaper to run than Cassandra, DynamoDB, or Bigtable
    • Works with unmodified Cassandra drivers

    Cons

    • Not a replacement for relational databases
    • Scylla Cloud is currently AWS-only

Copyright © 2019 IDG Communications, Inc.