Riak KV 2.0.0 Release Notes

Major Features / Additions to 2.0

A listing and explanation of new features in version 2.0, along with links to relevant documentation, can be found in our official docs. You can find an Upgrading to 2.0 Guide there as well. The material below should be read as a more technical supplement to that material.

Bucket Types

Previous versions of Riak used buckets as a mechanism for logically grouping keys and for associating configuration with certain types of data. Riak 2.0 adds bucket types, which associate configuration with groups of buckets and act as a second level of namespacing.

Unlike buckets, bucket types must be explicitly created and activated before being used, so that they can be properly gossiped around the cluster. In addition, the following properties may not be modifiable after creation: consistent and datatype, corresponding to the strong consistency and Riak Data Types features, explained below. Other properties may be updated. Buckets grouped under a bucket type inherit all of the type’s properties. Each bucket may override individual properties but some properties cannot be overridden.

Bucket Type administration is only supported via the riak-admin bucket-type command interface. The format of this command may change in an upcoming patch release. This release does not include an API to perform these actions. However, the Bucket Properties HTTP API, Protocol Buffers messages, and supported clients have been updated to set and retrieve bucket properties for a bucket with a given bucket type.

For more details on bucket types see our official documentation.

Convergent Data Types

In Riak 1.4, we added an eventually consistent counter to Riak. Version 2.0 builds on this work to provide more convergent data types (we call them Riak Data Types for short). These data types are CRDTs[1], inspired by a large and growing base of theoretical research. Data Types are a departure from Riak’s usual behaviour of treating stored as opaque. Riak “knows” about these Data Types, in particular which rules of convergence to apply in case of object replica conflicts.

All data types must be stored in buckets bearing a bucket type that sets the datatype property to one of counter, set, or map. Note that the bucket must have the allow_mult property set to true. See documentation on Riak Data Types and bucket types for more details.

These Data Types are wrapped in a regular riak_object, so size constraints that apply to normal Riak values apply to Riak Data Types too. The following Data Types are currently available:

Counters

Counters behave much like they do in version 1.4, except that you can use Riak’s new bucket types feature to ensure no type conflicts. Documentation on counters can be found here.

Sets

Sets allow you to store multiple distinct opaque binary values against a key. See the documentation for more details on usage and semantics.

Maps

Maps are a nested, recursive struct, or associative array. Think of them as a container for composing ad hoc data structures from multiple Data Types. Inside a map you may store sets, counters, flags (similar to booleans), registers (which store binaries according to a last-write-wins logic), and even other maps. Please see the documentation for usage and semantics.

API

Riak Data Types provide a further departure from Riak’s usual mode of operation in that the API is operation based. Rather than fetching the data structure, reconciling conflicts, mutating the result, and writing it back, you instead tell Riak what operations to perform on the Data Type. Here are some example operations:

  • “increment counter by 10”
  • “add ‘joe’ to set”,
  • “remove the Set field called ‘friends’ from the Map”
  • “set the prepay flag to true in the Map”
Context

In order for Riak Data Types to behave well, you must return the opaque context received from a read when you:

  • Set a flag to false
  • Remove a field from a Map
  • Remove an element from a Set

The basic rule is “you cannot remove something you haven’t seen”, and the context tells Riak what you’ve actually seen. All of the official Basho clients, with the exception of the Java client, handle opaque contexts for you. Please see the documentation for more details.

Please see Known Issues below for two known issues with Riak maps.

Reduced sibling creation

In previous versions of Riak, it was trivial for even well-behaved clients to cause a problem called “sibling explosion.” In essence, retried or interleaved writes could cause the number of sibling values to grow without bound, even if clients resolved siblings before writing. This occurred because while the vector clock was attached and properly advanced for each write, causality information was missing from each sibling value, meaning that values originating from the same write might be duplicated.

In Riak 2.0, we have drawn on research and a prototype by Preguiça, Baquero et al that addresses this issue. By attaching markers for the event in which each was written (called a “dot”), siblings will only grow to the number of truly concurrent writes, not in relation to the number of times the object has been written, merged, or replicated to other clusters. More information can be found in our Dotted Version Vectors document.

riak_control

Search 2 (Yokozuna)

The brand new and completely re-architected Riak Search, codenamed Yokozuna, kept its own release notes while it was being developed. Please read there for the most relevant information about Riak 2.0’s new search. Additional official documentation can be found in the following three docs:

Strong Consistency

Riak’s new strong consistency feature is currently an open source only feature and is not yet commercially supported. Official documentation on this feature can be found in the following docs:

For more in-depth technical material, see our internal documentation here and here.

We also strongly advise you to see the list of known issues.

Security

Version 2.0 adds support for authentication and authorization to Riak. This is useful to prevent accidental collisions between environments (e.g., pointing application software under active development at the production cluster) and offers protection against malicious attack, although Riak still should not be exposed directly to any unsecured network.

Basho’s documentation website includes extensive coverage of the new feature. Several important caveats when enabling security:

  • There is no support yet for auditing. This is on the roadmap for a future release.
  • Two deprecated features will not work if security is enabled: link walking and Riak’s original full-text search tool.
  • There are restrictions on Erlang modules exposed to MapReduce jobs when security is enabled. Those are documented here.
  • Enabling security requires that applications be designed to transition gracefully based on the server response or applications will need to be halted before security is enabled and brought back online with support for the new security features.

Packaging / Supported Platforms

A number of platforms were added to our supported list for 2.0:

  • FreeBSD 10, with new pkgng format
  • SUSE SLES 11.2
  • Ubuntu 14.04 (‘trusty’)
  • CentOS/RHEL 7

Other already supported platforms have been updated from 1.4:

  • Fedora packages went from a Fedora 17 to Fedora 19 base
  • SmartOS continued to support 1.8 and 13.1 datasets, but dropped 1.6

Apt/Yum Repositories

We will still provide apt and yum repositories for our users for 2.0, but we are extremely happy to be using a service to provide this for our customers moving forward.

Packagecloud is an awesome service which takes much of the pain out of hosting our own apt/yum repositories as well as adding a lot more features for you as a user. The most important feature for you, will be the universal installer they provide that will detect your OS/Version and install the proper repositories and security keys automatically.

For now, 1.4 packages will remain at [apt|yum].basho.com, while 2.0 packages will be hosted on Packagecloud. We hope the added features will make up for any pain we are causing to your tooling with an update in URLs. We apologize for the change, but think it is a good investment going forward.

Client libraries

Most Basho-supported client libraries have been updated for 2.0:

The PHP library has not yet been updated for 2.0. A delivery date will be forthcoming.

Bitcask

  • It is now possible to use multiple ongoing data iterators. Previously, Bitcask would only allow one iterator over the data, which can block AAE or fullsync operations. For this release, the in-memory key directory has been modified to hold multiple values of an entry so that multiple snapshots can co-exist. This means that it will consume more memory when iterators are used frequently.
  • Fixed a long-standing issue whereby deleted values would come back to life after restarting Bitcask. Both hint and data file formats required changes to accommodate a new tombstone format and deletion algorithm. Files marked for deletion by the merge algorithm will now have the execution bit set instead of the setuid bit. In case of a downgrade, hint files should be removed as they will fail to load on an older version. Riak will perform a gradual merge of all Bitcask files to re-generate them in the new format. This merge will obey the merge window settings and will be performed in chunks to avoid swamping a node. There are several advanced knobs available that enable you to completely skip or tune this merge. Bitcask will operate normally whether this merge happens or not. Its purpose is to reclaim disk space as fast as possible, as Bitcask will take much longer than before reclaiming space from old format files.
  • Fixed several problems with merges during startup. Merging will now be postponed until the riak_kv service is up.

HTTP API

Historically, Basho libraries have supported both HTTP and Protocol Buffers for access to Riak. Until recently, HTTP had an edge in support for all of Riak’s features.

Now that Protocol Buffers have reached feature parity, and because Protocol Buffers are generally faster, Basho is removing HTTP support from the client libraries only. There are no plans to remove the HTTP API from the database.

The Python client retains HTTP support, but Java, Ruby, and Erlang do not.

Deprecation Notices

Riak 2.0 marks the beginning of the end for several features. These features are still available in version 2.0 but will be disabled in a future version. We do not recommend using these features in version 2.0. In addition to these soon-to-be-terminated features, there are a few features that have already been removed in Riak 2.0. A listing can be found in the Termination Notices section below.

  • Link Walking is deprecated and will not work if security is enabled.
  • Key Filters are deprecated; we strongly discourage key listing in production due to the overhead involved, so it’s better to maintain key indexes as values in Riak (see also our new set data type as a useful tool for such indexes).
  • JavaScript MapReduce is deprecated; we have expanded our Erlang MapReduce documentation to assist with the transition.
  • Riak Search 1.0 is being phased out in favor of the new Solr-based Riak Search 2.0. Version 1.0 will not work if security is enabled.
  • v2 replication (a component of Riak Enterprise) has been superseded by v3 and will be removed in the future.
  • Legacy gossip (Riak’s original gossip mechanism, replaced in 1.0) will be removed in the future, at which point pre-1.0 Riak nodes will not be able to join a cluster.
  • Legacy vnode routing (an early mechanism for managing requests between servers) is deprecated. If vnode_routing is set to legacy via Riak’s capability system, it should be removed to prevent upgrade problems in the future.
  • Some users in the past have used Riak’s internal API (e.g., riak:local_client/1); this API may change at any time, so we strongly recommend using our Erlang client library (or one of the other libraries we support) instead.

Termination Notices

  • riak-admin backup has been disabled; see our documentation for a detailed look at running backup and restore operations.
  • Client ID-based vector clocks have been removed; they were previously turned off by default in favor of node-based vector clocks via the vnode_vclocks configuration flag.
  • LevelDB configuration values cache_size and max_open_files have been disabled in favor of leveldb.maximum_memory.percent. See Configuring eLevelDB in our documentation.

Known Issues

A complete listing of known issues in version 2.0 can be found on this Riak wiki page.

Upgrade Notes

A full guide to upgrading to 2.0 can be found in the official docs. The information below is supplementary.

Downgrading After Install

Important note: 2.0 introduces major new features which are incompatible with Riak 1.x. Those features depend on bucket types; once any bucket type has been created and activated, downgrades are no longer possible.

Prior to downgrading to Riak 1.x, you should also see our 2.0 downgrade notes page for more information about necessary steps.

Configuration Files

There is no automated way to upgrade from the 1.4 and previous configuration (app.config and vm.args) to the new configuration system in 2.0 (riak.conf). Previous configurations will still work as long as your app.config and vm.args files are in the configuration directory, but we recommend converting your customizations into the riak.conf and advanced.config files to make configuration easier for you moving forward. More information can be found in our configuration files documentation.

Bugfixes / Changes since 1.4.x

The list below includes all PRs merged between the 1.4.x series and 2.0. It does not include the following repositories which were all added in the 2.0 cycle. Consider all PRs from these repos in addition to the list below.

Added Repositories in 2.0

Merged PRs


[1] http://doi.acm.org/10.1145/2332432.2332497 Nuno Preguiça, Carlos Bauqero, Paulo Sérgio Almeida, Victor Fonte, and Ricardo Gonçalves. 2012. Brief announcement: efficient causality tracking in distributed storage systems with dotted version vectors. In Proceedings of the 2012 ACM symposium on Principles of distributed computing (PODC ‘12).