Riak CS Release Notes
Riak CS 3.0.1
Released June 10, 2022.
This is a correction release that includes one feature that slipped from 3.0.0.
New Features
- Support for fqdn data type for
riak_host
andstanchion_host
configuration items in riak-cs.conf.
Riak CS 3.0.0
Released May 30, 2022.
General
This release was originally envisaged as an uplift of 2.1.2 to OTP-22 and rebar3. There were no known critical bugs that needed fixing. We did, however, identifiy and implement a new S3 API call (ListObjectVersions and related), to give more substance to the major version bump.
We also provide Dockerfiles, and a Riak CS Docker Service Bundle, as a convenient way to set up the full Riak CS suite locally.
All principal repositories are in TI Tokyo org on Github.
This release was presented at Code BEAM America 2021.
New features
- Support for object versions, including new S3 API calls:
- GetBucketVersioning, PutBucketVersioning and ListObjectVersions.
- For buckets with versioning enabled,
GetObject
will return the specific version if it is given in the request, or thenull
version if it is not. - As a Riak CS extension, rather than generating a random id for the new version,
PutObject
will read aversionId
from headerx-rcs-versionid
, and use that instead.
- Riak CS Suite as a Docker service, allowing the (prospective) users quickly to bring up a fully functional Riak CS cluster running locally, complete with Riak CS Control, and
- properly configured and set up with a new user, whose credentials will be shown;
- with riak data persisted;
- ready for a load-test.
- Packaging:
- New packages are provided for FreeBSD 13 and OSX 14 (in the latter case, the package is the result of
make rel
tarred; no special user is created). - Packages have been verified for: o RPM-based: Centos 7 and 8, Amazon Linux 2, SLES 15, Oracle Linux 8; o DEB-based: Debian 8 and 11, Ubuntu Bionic and Xenial; o Other: FreeBSD 13, OSX 14; Alpine Linux 3.15.
- New packages are provided for FreeBSD 13 and OSX 14 (in the latter case, the package is the result of
- A Dockerfile, bypassing cuttlefish mechanism to enable run-time configuration via environment variables.
riak-cs-admin
now has a new option,test
, which creates a bucket and performs a basic write-read-delete cycle in it (useful to test that the riak cluster is configured properly for use with Riak CS).
Changes
User-visible changes
- S3 request signature v4 is now the default. The old (v2) signatures continue to be supported.
- A change of internal structures needed to support object versions, meaning downgrade to 2.x is no longer possible (even if the objects created with 3.0 have no versions). Upgrade from 2.x is possible.
- The rpm and deb packages now rely on systemd (old-style SysV init scripts are no longer included).
Other changes
- Riak CS and Stanchion now require OTP-22 and rebar3.
- Riak CS Test framework:
- The framework, along with a suite of tests (also the multibag additions), has been upgraded to OTP-22/rebar3 and moved into a separate project, riak_cs_test.
- A new battery of tests is written for
s3cmd
as a client. - The Python client tests have been upgraded to boto3 and python-3.9.
- A refactoring of code shared between Riak CS and stanchion resulted in that code being collected into a separate dependency, rcs_common.
- Riak CS Control application has been upgraded to OTP-22/rebar3, too, however without any new features.
- All EQC tests have been converted to use PropEr (no shortcuts taken, all coverage is preserved).
Upgrading
Existing data in the riak cluster underlying a 2.x instance of Riak CS can be used with Riak CS 3.0 without any modification.
Note: Once a new object is written into a database by Riak CS 3.0, that database cannot be used again with 2.x.
Compatibility
Riak CS 3.0 has been tested with Riak versions 2.2.6, 2.9.8 through .10, and 3.0.7 and .9. It requires Stanchion 3.0.x (2.x versions not supported due to changes in the manifest record).
Riak CS 2.1.2
Released April 7, 2019.
This is a backwards-compatible* release that updates node_package to address a recent Product Advisory, as well as fixes several bugs.
Riak CS 2.1 is designed to work with Riak KV 2.1.1+.
*This release is backwards compatible only with the Riak CS 2.x series.
Upgrading
For anyone updating to 2.1.2 from 2.1.1 and older versions.
During the update to 2.1.2, a ‘==’ omitted upload ID might be passed to a Riak CS node running an older version of CS. This may lead to process-crash by failing on decoding upload ID.
Changes
- For s3cmd users, experimental signature_v4 support has been made available through a simple on/off toggle in riak-cs.conf. With a default setting of “off”, it allows in-situ upgrades without the need to change s3cfg files until after all nodes have been upgraded. Note: this function is currently unfinished and suffers from compatibility issues with some clients (#1058 / #1060) and one potential security issue (#1059
- Experimental support for Leveled (the alternative to LevelDB to be released with Riak KV 2.9) has been successfully tested with the Riak KV 2.9.0 Release Candidates.
- Due to a recent Product Advisory, node_package was bumped to version 3.0.0 to prevent a potential code injection on the riak init file. [Issue 1297, PR 1306, & PR 109]
- Multipart upload IDs no longer contain trailing ‘=’ characters, which caused trouble for some clients. This change also makes upload IDs URL-safe. [PR 1316]
- When Riak is unavailable due to network partition or node being offline, a 500 error is returned. [PR 1298]
- Switched from
make
to${MAKE}
to facilitate easier building on FreeBSD and related platforms
Bugs Fixed
- [Issue 1100/PR 1304] When a multipart completion request was empty, Riak CS would return a “Completed” message. Now, during a multipart upload, the “complete” message contains a list of part IDs and MD5 hashes. If it is empty, it returns a “400 Bad Request, Malformed” XML message like S3.
- [Issue 1288/PR 1302] The
root_host
configuration parameter is now used to populate theLocation
response element. - [Issue 972/PR 1296] ‘adderss’ now reads ‘address’ in Stanchion console output.
- [Issue 1025/PR 1300] Copying an object used to fail when connecting via HTTPS.
Riak S2 (Riak CS) 2.1.1
General Information
This is a bugfix release.
Bug Fixes
- Remove blocks and manifest in cases of client-side failure, eg: incomplete upload, network issues, or other unexpected transfer failures. If the manifest was writing and no overwrite happened after upload canceled, then the partially uploaded blocks and manifest are left as is, occupying disk space. This is fixed by adding an error handling routine to move the manifest to the garbage collection bucket by catching the socket error. Partially uploaded blocks and manifests will eventually be deleted by garbage collection. (#770 / PR#1280)
- Remove
admin.secret
from riak-cs.conf and stanchion.conf. Both Riak CS and Stanchion searched riak-cs.conf and stanchion.conf to find who was the administrator usingadmin.key
andadmin.secret
. Writing downadmin.secret
in non-encrypted form compromises the security of the system. The administrator is able to (1) list all users, (2) disable/enable other users, (3) changing user accounts, and (4) read access and storage statistics of all users. A workaround for this is encrypting the partition that includes the /etc/riak-cs directory. (#1274 / PR#1279 / PR#108) - Use /etc/riak-cs/app.config when no generated config file is found. This prevents an error when using app.config/vm.args instead of riak.conf, such as when upgrading from 1.5.x. (#1261/ PR#1263 as PR#1266)
- Allow any bytes (including non-UTF8 ones) in List Objects response XML. List Objects API had been failing in cases where an object with non-UTF8 characters in its name was included in the result. With this change, such keys are represented in the result of List Objects as raw binaries, although it may not be proper XML 1.0. (#974 / PR#1255 / PR#1275)
Notes on Upgrade
- We strongly recommend removing
admin.secret
from riak-cs.conf and stanchion.conf, andadmin_secret
from theadvanced.config
files of Riak CS and Stanchion. - On startup, Riak CS now verifies that the value of
admin.key
is either a valid Riak CS user key or the placeholder valueadmin-key
. For an user with a non-existent or invlaid user key, Riak CS process won’t start. Also, anyadmin.secret
will be ignored. The initial procedure to create the very first account, startinganonymous_user_creation = true
, does not change.
Riak S2 (Riak CS) 2.1.0
Released October 13, 2015.
This is a backwards-compatible* release that introduces a new metrics system, garbage collection refinements, and several other new features. Riak S2 2.1 is designed to work with both Riak KV 2.0.5+ and 2.1.1+.
Note: This release is backwards compatible only with the Riak S2 2.x series.
Riak KV 2.1.1 Usage Note
Riak KV 2.1.1 includes a copy of riak_cs_kv_multi_backend
, therefore there is no need to add lines specifying special multi_backend
and add_paths
configurations in advanced.config.
Instead, you can set the following in riak.conf:
storage_backend = prefix_multi
cs_version = 20100
If you need storage calculation, you will still require the add_paths
config to load MapReduce codes into Riak KV.
New Features
Metrics
New metrics have been added that enable you to determine the health of your Riak S2 system, as well as get reports on your storage utilization per bucket or user. The following stats items are available:
- All calls, latencies, and counters in the S3 API
- All calls, latencies, and counters in Stanchion
- All Riak Erlang client operations, latencies, and counters
- Information about the counts (active, idle, and overflow) for the process pool and connection pool
- System information, versions, port count, and process count
- Memory information about the riak-cs virtual machine
- HTTP listener information: active sockets and waiting acceptors
Note: stats item names from prior to 2.0.x are not preserved; they have been renamed or removed. No backward consistency is maintained. Please see the documentation for more information.
Additional storage usage metrics are also available. . These metrics are gathered during storage calculation. Gathering these metrics is off by default, but you can turn it on by setting detailed_storage_calc
to true
in advanced.config. When you enable this option, you have access to information about how many manifests are writing
, pending_delete
, scheduled_delete
and active
which is not visible via the API.
Note: Metrics do not always correctly reflect actual disk usage. For instance, writing
may indicate more space than is actually used. Or, for example, if an upload was cancelled in the middle, the calculation does not know how much actual storage space is consumed. In the same way, scheduled_delete
also may not reflect the exact amount of disk usage because blocks might already be partially deleted by garbage collection.
- [PR 1120]
riak-cs-admin
The following administration CLIs have been replaced by the riak-cs-admin
command:
riak-cs-storage
riak-cs-gc
riak-cs-access
riak-cs-stanchion
The commands listed above are deprecated and will be removed in future releases.
- [PR 1175]
Garbage Collection Refinements
Several new options have been added to the riak-cs-admin gc
command:
active_delete_threshold
is an option to avoid delegating manifests and block deletion to garbage collector. This option relieves garbage collector from having to delete small objects. This can optimise performance in cases where both garbage collector does not catch up with DELETE Object API calls and garbage collector’s elapsed time is dominated by small objects.[PR 1174]--start
and--end
options have been added to theriak-cs-admin gc batch
command to specify start and end in manual batch execution. Note that the--start
flag on the command line will overwrite theepoch_start
option in advanced.config. [PR 1147]--leeway
has been added to create a temporary leeway period whose values are used only once and not repeated at the next run, and--max-workers
has been added to allow you to override the concurrency value temporarily for a single run of garbage collector. [PR 1147]- Riak S2 2.0 (and older) has a race condition where fullsync replication and garbage collection may resurrect deleted blocks without any way to delete them again. When real-time replication and replication of a garbage collection bucket entry object being dropped from the real-time queue are combined, blocks may remain on the sink side without being collected. Riak S2 2.1 introduces deterministic garbage collection to avoid fullsync replication. Additionally, garbage collection and fullsync replication run concurrently, and work on the same blocks and manifests. You can now specify the range of time using the
--start
and--end
flags withriak-cs-admin gc batch
for garbage collector in order to collect deleted objects synchronously on both sink and source sides. [PR 1147] riak-cs-admin gc earliest-keys
is available so you can find the oldest entry afterepoch_start
in garbage collection. With this option, you can stay informed of garbage collection progress. [PR 1160]
More information on garbage collection can be found in the documentation.
Additions
Open Source
- A MapReduce optimisation in fetching Riak objects was introduced in Riak 2.1. Now, Riak CS 2.1 introduces an option to use that optimisation in storage calculation. It is off by default, but it can be used by setting
use_2i_for_storage_calc
astrue
in advanced.config. This reduced 50% of I/O in LevelDB. [PR 1089] - Erlang/OTP 17 support is now included. [PR 1245 and PR 1040]
- A module-level hook point for limiting user access and quota usage is now available with very preliminary, simple, node-wide limiting example modules. Operators can make, plug in, or combine different modules as quota-limiting, rate-limiting or bandwidth-limiting depending on their unique requirements. [PR 1118]
- An orphaned block scanner is now available. [PR 1133]
riak-cs-admin audit-bucket-ownership
is a new tool to check integrity between users and buckets added. For example, it can be used in cases where a bucket is visible when listing buckets but not accessible, or a bucket is visible and exists but could not be deleted. [PR 1202]- The following log rotation items have been added to cuttlefish:
- log.console.size
- log.console.rotation
- log.console.rotation.keep
- log.error.rotation
- log.error.rotation.keep
- log.error.size
riak_cs_wm_common
now has a default callback ofmultiple_choices
, which preventscode_server
from becoming a bottleneck. [PR 1181]- An option has been added to replace the
PR=all user GET
option withPR=one
just before authentication. This option improves latency, especially in the presence of slow (or actually-failing) nodes blocking the whole request flow because of PR=all. When enabled, a user’s owned-bucket list is never pruned after a bucket is deleted, instead it is just marked as deleted. [PR 1191] - An info log has been added when starting a storage calculation batch. [PR 1238]
GET Bucket
requests now have clearer responses. A 501 stub for Bucket lifecycle and a simple stub for Bucket requestPayment have been added. [PR 1223]- Several user-friendly features have been added to
riak-cs-debug
: fine-grained information gathering options, user-defined filtering for configuration files, and verbose output for failed commands. [PR 1236]
Enterprise
- MDC has
proxy_get
, which make block objects propagate to site clusters when they are requested. Now, multibag configuration with MDC supportsproxy_get
. [PR 1171 and PR 25] - Multibag is now renamed to “Supercluster”. A bag has been a set of replicated underlying Riak clusters, which is now a member of a supercluster.
riak-cs-multibag
command has been renamed asriak-cs-supercluster
as well. [PR 1257], [PR 1260], [PR 106], [PR 107] and [PR 31]. - Several internal operation tools have been added to help diagnose or address issues. [PR 1145, PR 1134, and PR 1133]
- Added a generic function for manual operations to resolve siblings of manifests and blocks, which will assist Basho Client Service Engineers with troubleshooting and solving issues. [PR 1188]
Changes
- Dependency versions have been updated in Riak S2 and Stanchion as follows: cuttlefish 2.0.4, node_package 2.0.3, riak-erlang-client 2.1.1, lager 2.2.0, lager_syslog 2.1.1, eper 0.92 (Basho patched), cluster_info 2.0.3, riak_repl_pb_api 2.1.1, and riak_cs_multibag 2.1.0. [PR 1190, PR 1197, PR 27, PR 1245, and PR 104].
- Riak CS has moved from Folsom to Exometer. [PR 1165 and PR 1180]
- Improvements have been made to error tracing for retrieving blocks from client GET requests. There is a complex logic to resolve blocks when a GET is requested from the client. First, Riak CS tries to retrieve a block with
n_val=1
. If it fails, a retry will be done usingn_val=3
. If the block cannot be resolved locally,proxy_get
is enabled, and the system is configured with datacenter replication, then Riak CS will try to perform a proxied GET to the remote site. The fallback and retry logic is complex and hard to trace, especially in a faulty or unstable situation. This improvement adds error tracing for the whole sequence described above, which will help diagnose issues. Specifically, for each block, the block server stacks all errors returned from the Riak client and reports the reason for every error as well as the type of call in which the error occurred. [PR 1177] - Using the
GET Bucket
API with a specified prefix to list objects in a bucket needed optimization. It had been specifying end keys for folding objects in Riak too loosely. With this change, a tighter end key is specified for folding objects in Riak, which omits unnecessary fold in vnodes. [PR 1233] - A limitation to the max length of keys has been introduced. This limitation can be specified as 1024 by default, meaning no keys longer than 1024 bytes can be PUT, GET or DELETED unless
max_key_length
is explicitly specified as more than ‘1024’ in riak-cs.conf. If you want to preserve the old key length behaviour, you may specify themax_key_length
as ‘unlimited’. [PR 1233] - If a faulty cluster had several nodes down, the block server misunderstood that a block was already deleted and issued a false-notfound. This could lead to block leak. The PR default has been set to ‘quorum’ in an attempt to avoid this problem. Updates have also been made to make sure at least a single replica of a block is written in one of the primary nodes by setting the PW default to ‘1’. Additionally, measures are in place to prevent the block server from crashing when “not found” errors are returned due to a particular block of an object not being found in the cluster. Instead, unreachable blocks are skipped and the remaining blocks and manifests are collected. Since the PR and PW values are increased at blocks, the availability of PUTs and through-put of garbage collection may decrease. A few Riak nodes being unreachable may prevent PUT requests from returning successfully and may prevent garbage collection from collecting all blocks until the unreachable nodes come back. [PR 1242]
- The infinity timeout option has been set so that several functions make synchronous
gen_fsm
calls indefinitely, which prevents unnecessary timeouts. [PR 1249]
Bugs Fixed
- [Issue 1097/PR 1212] When
x-amz-metadata-directive=COPY
was specified, Riak CS did not actually COPY the metadata of original resource. Instead, it would treat it as aREPLACE
. When directed tox-amz-metadata-directive=REPLACE
Content-Type
, Riack CS wouldREPLACE
it. Correct handling for thex-amz-metadata-directive
has been added to PUT Object Copy API. - [Issue 1099/PR 1096] There was an unnecessary NextMarker in Get Bucket’s response if
CommonPrefixes
contained the last key. Fixed handling of uploaded parts that should be deleted after Multipart Complete Request. - [Issue 939/PR 1200] Copy requests without Content-Length request headers failed with 5xx errors. Such requests are now allowed without Content-Length header in Copy API calls. Additionally, Copy API calls with Content-Lengths more than zero have been given explicit errors.
- [Issue 1143/PR 1144] Manual batch start caused the last batch time to appear to be in the future. All temporal shifts have been fixed.
- [Issue PR 1162/PR 1163] Fix a configuration system bug where Riak CS could not start if
log.syslog=on
was set. - [Issue 1169/PR 1200] The error response of the PUT Copy API call showed the target resource path rather than the source path when the source was not found or not accessible by the request user. It now shows the source path appropriately.
- [PR 1178] Multiple IP address descriptions under a single condition statement of a bucket policy were not being properly parsed as lists.
- [PR 1185] If
proxy_get_active
was defined in riak-cs.conf as anything other than enabled or disabled, there would be excessive log output. Now,proxy_get_active
also accepts non-boolean definitions. - [PR 1184]
put_gckey_timeout
was used instead ofput_manifest_timeout
when a delete process tried to update the status of manifests. - [Issue 1201/PR 1230] A single slow or silently failing node caused intermittent user fetch failure. A grace period has been added so
riakc_pb_socket
can attempt to reconnect. - [PR 1232] Warning logs were being produced for unsatisfied primary reads. Since users are objects in Riak CS and CS tries to retrieve these objects for authentication for almost every request, the retrieval option (PR=all) would fail if even one primary vnode was stopped or unresponsive and a log would be created. Given that Riak is set up to be highly available, these logs were quite noisy. Now, the “No WM route” log from prior to Riak CS 2.1 has been revived. Also, the log severity has been downgraded to debug, since it indicates a client error in all but the development phase.
- [PR 1237] The
riak-cs-admin
status command exit code was non-zero, even in successful execution. It will now return zero. - [Issue 1097/PR 1212 and PR 4] Riak S2 did not copy the metadata of an original resource when the
x-amz-metadata-directive=COPY
command was used, nor whenx-amz-metadata-directive
was specified. Handling of thex-amz-metadata-directive
command in PUT Object Copy API has been added. - [Issue 1097/PR 1212 and PR 4] Riak CS did not store
Content-Type
in COPY requests when thex-amz-metadata-directive=REPLACE
command was used. Handling of thex-amz-metadata-directive
command in PUT Object Copy API has been added. - [Issue 1097/PR 1212 and PR 4] Fixed the handling of uploaded parts that should be deleted after Multipart Complete Request.
- [Issue 1214/PR 1246] Prior to Riak S2 2.1.0, a PUT Copy API command with identical source and destination changed user metadata (
x-amz-meta-*
headers) but failed to update Content-Type. Content-Type is now correctly updated by the API call. - [Issue PR 1261, [PR 1263] Fix
riak-cs-debug
to includeapp.config
when no generated files are found whenriak-cs.conf
is not used.
Riak CS 2.0.1
General Information
This is a bugfix release.
Bug Fixes
- Fix config item
gc.interval
not working wheninfinity
is set (#1125/PR#1126). - Add
log.access
switch to disable access logging (#1109/PR#1115). - Add missing riak-cs.conf items:
max_buckets_per_user
andgc.batch_size
(#1109/PR#1115). - Fix bugs around subsequent space characters for Delete Multiple Objects API and user administration API with XML content (#1129/PR#1135).
- Fix URL path resource and query parameters to work in AWS v4 header authentication. Previously,
+
was being input instead of%20
for blank spaces. (PR#1141)
Riak CS 2.0.0
General Information
- This release updates Riak CS to work with Riak 2.0.5.
- We have simplified the configuration system.
- All official patches for older versions of Riak and Riak CS have been included in these releases. There is no need to apply any patches released for Riak CS 1.4.x or 1.5.x to the Riak CS 2.0.x series. Patches released for Riak CS 1.4.x or 1.5.x cannot be directly applied to Riak CS 2.0.x because the version of Erlang/OTP shipped with Riak CS has been updated in version 2.0.0.
- Please review the complete Release Notes before upgrading.
Known Issues & Limitations
- Access log can’t be disabled.
- Advanced.config should be used to customize value of
max_buckets_per_user
andgc_batch_size
.
Changes and Additions
- Changed the name of
gc_max_workers
togc.max_workers
, and lowered the default value from 5 to 2 (#1110) to reduce the workload on the cs cluster. - Partial support of GET Location API (#1057)
- Add very preliminary AWS v4 header authentication - without query string authentication, object chunking and payload checksum (#1064). There is still a lot of work to reliably use v4 authentication.
- Put Enterprise deps into dependency graph (#1065)
- Introduce Cuttlefish (#1020, #1068, #1076, #1086, #1090) (Stanchion #88, #90, #91)
- Yessir Riak client to measure performance (#1072, #1083)
- Inspector improvement with usage change (#1084)
- Check signed date in S3 authentication (#1067)
- Update
cluster_info
and various dependent libraries (#1087, #1088) (Stanchion #85, #87, #93) - Storage calculation optimization (#1089) With Riak >= 2.1 this works
with
use_2i_for_storage_calc
flag might relieve disk read of storage calculation.
Bugfixes
- Fix wrong webmachine log handler name (#1075)
- Fix lager crash (#1038)
- Fix hardcoded crashdump path (#1052)
- Suppress unnecessary warnings (#1053)
- Multibag simpler state transition (Multibag #21)
- GC block deletion failure after transition to multibag environment (Multibag #19)
- Connection closing caused errors for objects stored before the transition, after transition from single bag to multibag configuration (Multibag #18).
Deprecation Notices
- Multi-Datacenter Replication using v2 replication support has been deprecated.
- Old list objects which required
fold_objects_for_list_keys
asfalse
have been deprecated and will be removed in the next major version. - Non-paginated GC in cases where
gc_paginated_indexes
isfalse
has been deprecated and will be removed in the next major version.
General Notes on Upgrading to Riak CS 2.0.0
Upgrading a Riak CS system involves upgrading the underlying Riak, Riak CS and Stanchion installations. The upgrade process can be non-trivial depending on your existing system configurations and the combination of sub-system versions. This document contains general instructions and notices on upgrading the whole system to Riak CS 2.0.0.
New Configuration System
Riak 2.0.0 introduced a new configuration system (riak.conf
), and as of Riak
CS 2.0.0, Riak CS now supports the new configuration style. Both Riak and Riak
CS still support the older style configurations through app.config
and
vm.args
.
Basho recommends moving to the new unified configuration system, using the
files riak.conf
, riak-cs.conf
and stanchion.conf
.
Note on Legacy app.config Usage
If you choose to use the legacy app.config
files for Riak CS and/or
Stanchion, some parameters have changed names and must be updated.
In particular, for the Riak CS app.config
:
cs_ip
andcs_port
have been combined intolistener
.riak_ip
andriak_pb_port
have been combined intoriak_host
.stanchion_ip
andstanchion_port
have been combined intostanchion_host
.admin_ip
andadmin_port
have been combined intoadmin_listener
.webmachine_log_handler
has becomewebmachine_access_log_handler
.
For the Stanchion app.config
:
stanchion_ip
andstanchion_port
have been combined intolistener
.riak_ip
andriak_port
have been combined intoriak_host
.
Each of the above pairs follows a similar form. Where the old form used a
separate IP and Port parameter, the new form combines those as {new_option, {
"IP", Port}}
. For example, if your legacy app.config
configuration was
previously:
{riak_cs, [
{cs_ip, "127.0.0.1"},
{cs_port, 8080 },
. . .
]},
It should now read:
{riak_cs, [
{listener, {"127.0.0.1", 8080}},
. . .
]},
and so on.
Note: Upgrading from Riak CS 1.5.3 or Older
Some key objects changed names after the upgrade. Applications may need to change their behaviour due to this bugfix.
Note: Upgrading from Riak CS 1.5.0 or Older
Bucket number limitation per user have been introduced in 1.5.1. Users who have more than 100 buckets cannot create any bucket after the upgrade unless the limit is extended in the system configuration.
Note: Upgrading From Riak CS 1.4.x
An operational procedure to clean up incomplete multipart under deleted buckets is needed. Otherwise new buckets with names that used to exist can’t be created. The operation will fail with 409 Conflict.
Leeway seconds and disk space should also be carefully watched during the upgrade, because timestamp management of garbage collection was changed in the 1.5.0 release. Consult the “Leeway seconds and disk space section of 1.5 release notes for a more detailed description.
Note: Upgrading From Riak CS 1.3.x or Older
Basho supports upgrading from the two previous major versions to the latest release. Thus, this document will only cover upgrading from Riak CS versions 1.4.x and 1.5.x.
To upgrade to Riak CS 2.0.0 from versions prior to Riak CS 1.4.0, operators will need to first upgrade their system to Riak CS version 1.4.5 or 1.5.4. Upgrading to Riak CS 1.5.4 is recommended. The underlying Riak installation must also be upgraded to the Riak 1.4.x series, preferably version 1.4.12.
General Upgrade Instructions
All Scenarios
We recommend updating Stanchion before all other subsystems. Be careful not to have multiple live Stanchion nodes accessible from Riak CS nodes at the same time.
Repeat these steps on each node running Stanchion:
- Stop Stanchion
- Back up all Stanchion configuration files
- Uninstall the current Stanchion package
- Install the new Stanchion 2.0.0 package
- Migrate the Stanchion configuration (See below)
- Start Stanchion
Scenario: If Riak CS and Riak are both running on the same host.
Repeat these steps on every host:
- Stop Riak CS
- Stop Riak
- Back up all Riak and Riak CS configuration files and remove all patches
- Uninstall the current Riak CS package
- Uninstall the current Riak Riak packages
- Install the new Riak package
- Install the new Riak CS 2.0.0 package
- Migrate the Riak configuration (See below)
- Migrate the Riak CS configuration (See below)
- Start Riak
- Start Riak CS
Scenario: If Riak CS and Riak are running on separate hosts.
When Riak CS is not installed on the same host as Riak, Riak CS can be upgraded at any time while the corresponding remote Riak node is alive.
Repeat these steps on every host:
- Stop Riak CS
- Back up all configuration files and remove all patches
- Uninstall the current Riak CS package
- Install the new Riak CS 2.0.0 package
- Migrate the Riak CS configuration (See below)
- Start Riak CS
Upgrade Stanchion to 2.0.0
When upgrading to Stanchion 2.0.0 the files app.config
and vm.args
are
migrated to the single file stanchion.conf
. Configuration files are still
stored in the same location as before.
Migrate the Stanchion Configuration
Using the Configuration Mapping Tables below, edit your stanchion.conf
to
preserve your configuration preferences between Riak CS 1.5.x and 2.0.0.
The tables show the old and new configuration format and default values.
stanchion
section of the Stanchion app.config
1.5.4 (app.config ) |
2.0.0 (stanchion.conf ) |
---|---|
{stanchion_ip, "127.0.0.1"} |
listener = 127.0.0.1:8080 |
{stanchion_port, 8085} |
|
{riak_ip, "127.0.0.1"} |
riak_host = 127.0.0.1:8087 |
{riak_pb_port, 8087} |
|
{admin_key, "admin-key"} |
admin.key = admin-key |
{admin_secret, "admin-secret"} |
admin.secret = admin-secret |
lager
section of the Stanchion app.config
Riak’s Lager configuration can be copied directly to the advanced.config
file.
Commented out by default and consequently undefined
so as to disable SSL.
1.5.4 (app.config ) |
2.0.0 (stanchion.conf ) |
---|---|
{ssl, [ |
|
{certfile, "./etc/cert.pem"} |
ssl.certfile |
{keyfile, "./etc/key.pem"} |
ssl.keyfile |
Upgrade Riak to 2.0.5 and Configure for Riak CS 2.0.0
As Riak CS 2.0.0 only works on top of Riak 2.0.5 – and does not work on top of the Riak 1.x.x series – the underlying Riak installation must be upgraded to Riak 2.0.5. This document only covers upgrading from Riak 1.4.x. For more general information on upgrading Riak, please see the Riak upgrading to 2.0 guide.
Below are specific configuration changes required for a Riak 2.0.5 cluster supporting Riak CS.
Upgrading Riak - Step 1: Set Default Bucket Properties
In older versions of Riak, default bucket properties has been configured in the
app.config
as follows:
{riak_core, [
...
{default_bucket_props, [{allow_mult, true}]},
...
]}.
With Riak 2.0.5 in riak.conf
this becomes:
buckets.default.allow_mult = true
Upgrading Riak - Step 2: riak_kv
configuration
There are two ways to configure Riak 2.0.5 behind Riak CS 2.0.0:
Option 1: Reuse the existing app.config
file from Riak 1.4.x
In this case, add_paths
should be changed to target the new Riak CS binaries
installed by the Riak CS 2.0.0 package. These will be changed from
"/usr/lib/riak-cs/lib/riak_cs-1.5.4/ebin"
to
"/usr/lib/riak-cs/lib/riak_cs-2.0.0/ebin"
.
Option 2: Use Riak 2.0.0’s new advanced.config
You will need to copy all riak_kv configuration items from app.config
into
advanced.config
, and update add_paths
to target the new Riak CS binaries
installed by the Riak CS 2.0.0 package. In advanced.config
, this will become:
{riak_kv, [
{add_paths, ["/usr/lib/riak-cs/lib/riak_cs-2.0.0/ebin"]}
]}.
The app.config
file must be removed when advanced.config
is used.
See Setting up the Proper Riak Backend for additional details.
Upgrading Riak - Step 3: Review Memory Size
Since the default configuration of the LevelDB memory size has changed, you will need to review your memory size settings. The memory use of Riak CS is primarily influenced by the Bitcask keydir and LevelDB block cache. Additionally, to improve IO performance, some extra memory for the kernel disk cache should be planned for. The equations below might help when specifying the memory size:
- Memory for backends = (Memory for Bitcask) + (Memory for LevelDB)
- Memory for storage = (Memory for backends) + (Memory for kernel cache)
LevelDB Block Cache Size
The configuration setting relating to the memory size of LevelDB has changed
from max_open_files
to total_leveldb_mem_percent
in 2.0. This specifies the
total amount of memory consumed by LevelDB. Note that the default memory limit
has changed from being proportional to the number of max_open_files
to being a
percentage of the system’s physical memory size.
Configuring total_leveldb_mem_percent
is strongly recommended as its
default value of 70% might be too aggressive for a
multi-backend configuration that also uses Bitcask. Bitcask keeps its keydir in
memory, which could be fairly large depending on the use case.
Bitcask keydir Sizing
Bitcask stores all of its keys in memory as well as on disk. Correctly
estimating the total number of keys and their average size in Bitcask is very
important for estimating Bitcask memory usage. Total number of keys N(b)
in
Bitcask across the whole cluster will be:
N(b) = N(o, size <= 1MB) + N(o, size > 1MB) * avg(o, size > 1MB) / 1MB
where N(o, size <= 1MB)
is the number of objects with a size less than 1MB,
while N(o, size > 1MB
is the number of objects with a size greater than 1MB.
avg(o, size > 1MB)
is the average size of objects greater than 1MB in size.
The number of keys in Riak CS is related to the amount of data stored in MBs.
If the average lifetime of objects is significantly smaller than the leeway
period, treat objects waiting for garbage collections as live objects on disk.
Actual numbers of key count per vnode are included in the output of
riak-admin vnode-status
. There is an item named Status
in each vnode
section, which includes the key_count
in the be_blocks
section.
Once the numbers of keys is known, estimate the amount of memory used by Bitcask keydir as per the Bitcask Capacity Planning documentation.
The bucket name size is always 19 bytes (see riak_cs_utils:to_bucket_name/2
)
and the key size is always 20 bytes (see riak_cs_lfs_utils:block_name/3
). The
average value size is close to 1MB if large objects are dominant, otherwise it
should be estimated according to the specific use case. The number of writes is
3.
Upgrading Riak - Step 4: Changes in vm.args
The upgrade of Riak from the 1.4.x series to the 2.0.x series is described in
Upgrading Your Configuration System. The
following are several major configuration items which are essential for Riak CS.
erlang.distribution_buffer_size
is commented out by default.
Riak 1.4 | Riak 2.0 |
---|---|
+zdbbl |
erlang.distribution_buffer_size = 1MB |
-name riak@127.0.0.1 |
nodename = riak@127.0.0.1 |
-setcookie riak |
distributed_cookie = riak |
Upgrading Riak - Step 5: Storage Calculation
If storage statistics are desired on your system, several more configuration options are required. Please see the storage statistics documentation for additional details.
Upgrading Riak - Additional Notes
The underlying Bitcask storage format has been changed in Riak 2.0.x to fix several important issues. The first start of Riak after an upgrade involves an implicit data format upgrade conversion, which means that all data files are read, and written out to new files. This might lead to higher than normal disk load. The duration of the upgrade will depend on the amount of data stored in Bitcask and the IO performance of the underlying disk.
The data conversion will start with logs like this:
2015-03-17 02:43:20.813 [info] <0.609.0>@riak_kv_bitcask_backend:maybe_start_upgrade_if_bitcask_files:720 Starting upgrade to version 1.7.0 in /mnt/data/bitcask/1096126227998177188652763624537212264741949407232
2015-03-17 02:43:21.344 [info] <0.610.0>@riak_kv_bitcask_backend:maybe_start_upgrade_if_bitcask_files:720 Starting upgrade to version 1.7.0 in /mnt/data/bitcask/1278813932664540053428224228626747642198940975104
The end of the data conversion can be observed as info log entries in Riak logs like this:
2015-03-17 07:18:49.754 [info] <0.609.0>@riak_kv_bitcask_backend:callback:446 Finished upgrading to Bitcask 1.7.0 in /mnt/data/bitcask/1096126227998177188652763624537212264741949407232
2015-03-17 07:23:07.181 [info] <0.610.0>@riak_kv_bitcask_backend:callback:446 Finished upgrading to Bitcask 1.7.0 in /mnt/data/bitcask/1278813932664540053428224228626747642198940975104
Upgrading Riak CS to 2.0.0
Upgrading Riak CS - Step 1: Multibag Configuration
Multibag configurations must be moved to advanced.config
for both Riak CS and
Stanchion.
Upgrading Riak CS - Step 2: Default Configurations
When upgrading to Riak CS 2.0.0 the files app.config
and vm.args
are
migrated to the single file riak-cs.conf
. Configuration files are still stored
in the same location as before.
Note: app.config
should be removed once you’ve completed the upgrade and
riak-cs.conf
is being used.
Some important configuration changes occurred between 1.5.x and 2.0.0, and not all items were translated one-to-one.
Using the Configuration Mapping Tables below, edit your riak-cs.conf
to
preserve your configuration preferences between Riak CS 1.5.x and 2.0.0.
The tables show the old and new configuration format and default values.
Note: storage.stats.schedule.$time
does not have any default value but an
example is added.
riak_cs
section of the Riak CS app.config
1.5.4 (app.config ) |
2.x (riak-cs.conf ) |
Note |
---|---|---|
{cs_ip, "127.0.0.1"} |
listener = 127.0.0.1:8080 |
|
{cs_port, 8080} |
||
{riak_ip, "127.0.0.1"} |
riak_host = 127.0.0.1:8087 |
|
{riak_pb_port, 8087} |
||
{stanchion_ip, "127.0.0.1"} |
stanchion_host = 127.0.0.1:8085 |
|
{stanchion_port, 8085 } |
||
{stanchion_ssl, false } |
stanchion_ssl = off |
|
{anonymous_user_creation, false} |
anonymous_user_creation = off |
|
{admin_key, "admin-key"} |
admin.key = admin-key |
|
{admin_secret, "admin-secret"} |
admin.secret = admin-secret |
|
{cs_root_host, "s3.amazonaws.com"} |
root_host = s3.amazonaws.com |
|
{connection_pools,[ |
||
{request_pool, {128, 0} }, |
pool.request.size = 128 |
|
pool.request.overflow = 0 |
||
{bucket_list_pool, {5, 0} } |
pool.list.size = 5 |
|
pool.list.overflow = 0 |
||
{max_buckets_per_user, 100} |
max_buckets_per_user = 100 |
from 2.0.1 |
{trust_x_forwarded_for, false} |
trust_x_forwarded_for = off |
|
{leeway_seconds, 86400} |
gc.leeway_period = 24h |
|
{gc_interval, 900} |
gc.interval = 15m |
|
{gc_retry_interval, 21600} |
gc.retry_interval = 6h |
|
{gc_batch_size, 1000} |
gc.batch_size = 1000 |
from 2.0.1 |
{access_log_flush_factor, 1} |
stats.access.flush_factor = 1 |
|
{access_log_flush_size, 1000000} |
stats.access.flush_size = 1000000 |
|
{access_archive_period, 3600} |
stats.access.archive_period = 1h |
|
{access_archiver_max_backlog, 2} |
stats.access.archiver.max_backlog = 2 |
|
(no explicit default) | stats.access.archiver.max_workers = 2 |
|
{storage_schedule, []} |
stats.storage.schedule.$time = 0600 |
|
{storage_archive_period, 86400} |
stats.storage.archive_period = 1d |
|
{usage_request_limit, 744} |
riak_cs.usage_request_limit = 31d |
|
{cs_version, 10300 } |
cs_version = 10300 |
|
{dtrace_support, false} |
dtrace = off |
webmachine
section of the Riak CS app.config
1.5.4 (app.config ) |
2.0.0 (riak-cs.conf ) |
note |
---|---|---|
{server_name, "Riak CS"} |
server_name = Riak CS |
|
{log_handlers, ....} |
log.access = true |
from 2.0.1 |
log.access.dir = /var/log/riak-cs |
Due to a WebMachine change, if log_handlers
are defined in app.config
or
advanced.config
, the log handler’s name should be changed as follows:
{log_handlers, [
{webmachine_access_log_handler, ["/var/log/riak-cs"]},
{riak_cs_access_log_handler, []}
]},
This does not have to be changed if log_handlers
is not defined in
app.config
or advanced.config
.
lager
section of the Riak CS app.config
Riak’s Lager configuration can be copied directly to the advanced.config
file.
Upgrading Riak CS - Step 3: Commented Out Configurations
All commented out items are undefined and disabled, except modules.
rewrite_module
and auth_module
are commented out, but the default value does
not change from Riak CS 1.5.4. This section is for showing operators how to
change these settings to the OOS API.
1.5.4 (app.config ) |
2.0.0 (riak-cs.conf ) |
---|---|
{rewrite_module, riak_cs_s3_rewrite } |
rewrite_module |
{auth_module, riak_cs_s3_auth }, |
auth_module |
{admin_ip, "127.0.0.1"} |
admin.listener = 127.0.0.1:8000 |
{admin_port, 8000 } |
|
{ssl, [ |
|
{certfile, "./etc/cert.pem"} |
ssl.certfile |
{keyfile, "./etc/key.pem"} |
ssl.keyfile |
Upgrading Riak CS - Step 4: Other Configurations
The following configurations do not have corresponding items in riak-cs.conf
:
fold_objects_for_list_keys
n_val_1_get_requests
gc_paginated_indexes
If these values are still set as false
, they should be omitted from your Riak
CS 2.0.0 configuration.
If the old behavior is preferred, they must be included in the riak_cs
section
of advanced.config
.
Downgrading Riak CS
To Riak CS 1.5.x
To downgrade from Riak CS 2.0.0 to Riak CS 1.5.x and Stanchion 2.0.0 to Stanchion 1.5.0, repeat the following instructions for each node:
- Stop Riak CS
- Stop Riak
- Uninstall the Riak CS 2.0.0 package
- Uninstall the Riak 2.0.5 package
- Run the Bitcask downgrade script for all Bitcask directories*
- Install the desired Riak package
- Install the desired Riak CS package
- Restore configuration files
- Start Riak
- Start Riak CS
Finally, on any nodes running Stanchion:
- Stop Stanchion
- Uninstall the Stanchion 2.0.0 package
- Install the desired Stanchion package
- Restore Stanchion configuration files
- Start Stanchion
*The Bitcask file format has changed between Riak 1.4.x and 2.0.0. While the implicit upgrade of Bitcask data files is supported, automatic downgrades of Bitcask data files is not. For this reason downgrading requires a script to translate data files. See also the 2.0 downgrade notes.
Riak CS 1.5.4
Bugs Fixed
- Disable previous Riak object after backpressure sleep is triggered riak_cs/#1041. This change prevents unnecessary siblings growth in cases where (a) backpressure is triggered under high upload concurrency and (b) uploads are interleaved during backpressure sleep. This issue does not affect multipart uploads.
- Fix an incorrect path rewrite in the S3 API caused by unnecessary URL
decoding
riak_cs/#1040. Due to
the incorrect handling of URL encoding/decoding, object keys including
%[0-9a-fA-F][0-9a-fA-F]
(as a regular expression) or+
had been mistakenly decoded. As a consequence, the former case was decoded to some other binary and for the latter case (+
) was replaced with (space). In both cases, there was a possibility of an implicit data overwrite. For the latter case, an overwrite occurs for an object including+
in its key (e.g.foo+bar
) by a different object with a name that is largely similar but replaced with (space, e.g.foo bar
), and vice versa. This fix also addresses riak_cs/#910 and riak_cs/#977.
Notes on upgrading
After upgrading to Riak CS 1.5.4, objects including
%[0-9a-fA-F][0-9a-fA-F]
or +
in their key (e.g. foo+bar
) become
invisible and can be seen as objects with a different name. For the
former case, objects will be referred as unnecessary decoded key. For
the latter case, those objects will be referred as keys +
replaced
with (e.g. foo bar
) by default.
The table below provides examples for URLs including
%[0-9a-fA-F][0-9a-fA-F]
and how they will work before and after the
upgrade.
before upgrade | after upgrade | |
---|---|---|
written as | a%2Fkey |
- |
read as | a%2Fkey or a/key |
a/key |
listed as | a/key |
a/key |
Example on unique objects including +
through upgrade:
before upgrade | after upgrade | |
---|---|---|
written as | a+key |
- |
read as | a+key or a key |
a key |
listed as | a key |
a key |
Example on unique objects including (space) through upgrade:
before upgrade | after upgrade | |
---|---|---|
written as | a key |
- |
read as | a+key or a key |
a key |
listed as | a key |
a key |
This fix also changes the path format in access logs from the single URL-encoded style to the doubly-encoded URL style. Below is an example of the old style:
127.0.0.1 - - [07/Jan/2015:08:27:07 +0000] "PUT /buckets/test/objects/path1%2Fpath2%2Fte%2Bst.txt HTTP/1.1" 200 0 "" ""
Below is an example of the new style:
127.0.0.1 - - [07/Jan/2015:08:27:07 +0000] "PUT /buckets/test/objects/path1%2Fpath2%2Fte%252Bst.txt HTTP/1.1" 200 0 "" ""
Note that the object path has changed from
path1%2Fpath2%2Fte%2Bst.txt
to path1%2Fpath2%2Fte%252Bst.txt
between
the two examples shown.
If the old behavior is preferred, perhaps because
applications using Riak CS have been written to use it, you can retain
that behavior by modifying your Riak CS configuration upon upgrade.
Change the rewrite_module
setting as follows:
{riak_cs, [
%% Other settings
{rewrite_module, riak_cs_s3_rewrite_legacy},
%% Other settings
]}
Note: The old behavior is technically incorrect and implicitly overwrites data in the ways described above, so please retain the old behavior with caution.
Riak CS 1.5.3
Additions
- Add read_before_last_manifest_write option to help avoid sibling explosion for use cases involving high churn and concurrency on a fixed set of keys. riak_cs/#1011
- Add configurable timeouts for all Riak CS interactions with Riak to provide more flexibility in operational situations. riak_cs/#1021
Bugs Fixed
- Fix storage usage calculation bug where data for deleted buckets would be included in the calculation results. riak_cs/#996
Riak CS 1.5.2
Additions
- Improved logging around connection failures with Riak riak_cs/#987.
- Add amendment log output when storing access stats into Riak failed
riak_cs/#988. This
prevents losing access stats logs in cases of temporary connection
failure between Riak and Riak CS. Access logs are stored in
console.log
at thewarning
level. - Add script to repair invalid garbage collection manifests riak_cs/#983. There is a known issue where an active manifest would be stored in the GC bucket. This script changes invalid state to valid state.
Bugs Fixed
- Fix Protocol Buffer connection pool (
pbc_pool_master
) leak riak_cs/#986 . Requests for non-existent buckets without an authorization header and requests asking for listing users make connections leak from the pool, causing the pool to eventually go empty. This bug was introduced in release 1.5.0.
Riak CS 1.5.1
Additions
- Add sleep-after-update manifests to avoid sibling explosion riak_cs/#959
- Multibag support on
riak-cs-debug
riak_cs/#930 - Add bucket number limit check in Riak CS process riak_cs/#950
- More efficient bucket resolution riak_cs/#951
Bugs Fixed
- GC may stall due to
riak_cs_delete_fsm
deadlock riak_cs/#949 - Fix wrong log directory for gathering logs on
riak-cs-debug
riak_cs/#953 - Avoid DST-aware translation from local time to GMT riak_cs/#954
- Use new UUID for seed of canonical ID instead of secret riak_cs/#956
- Add max part number limitation riak_cs/#957
- Set timeout as infinity to replace the default of 5000ms riak_cs/#963
- Skip invalid state manifests in GC bucket riak_cs/#964
Notes on Upgrading
Bucket number per user
Beginning with Riak CS 1.5.1, you can limit the number of buckets that
can be created per user. The default maximum number is 100. While this
limitation prohibits the creation of new buckets by users, users that
exceed the limit can still perform other operations, including bucket
deletion. To change the default limit, add the following line to the
riak_cs
section of app.config
:
{riak_cs, [
%% ...
{max_buckets_per_user, 5000},
%% ...
]}
To avoid having a limit, set max_buckets_per_user
to unlimited
.
Riak CS 1.5.0
Additions
- A new command
riak-cs-debug
includingcluster-info
riak_cs/#769, riak_cs/#832 - Tie up all existing commands into a new command
riak-cs-admin
riak_cs/#839 - Add a command
riak-cs-admin stanchion
to switch Stanchion IP and port manually riak_cs/#657 - Performance of garbage collection has been improved via Concurrent GC riak_cs/#830
- Iterator refresh riak_cs/#805
fold_objects_for_list_keys
made default in Riak CS riak_cs/#737, riak_cs/#785- Add support for Cache-Control header riak_cs/#821
- Allow objects to be reaped sooner than leeway interval. riak_cs/#470
- PUT Copy on both objects and upload parts riak_cs/#548
- Update to lager 2.0.3
- Compiles with R16B0x (Releases still by R15B01)
- Change default value of
gc_paginated_index
totrue
riak_cs/#881 - Add new API: Delete Multiple Objects riak_cs/#728
- Add warning logs for manifests, siblings, bytes and history riak_cs/#915
Bugs Fixed
- Align
ERL_MAX_PORTS
with Riak default: 64000 riak_cs/#636 - Allow Riak CS admin resources to be used with OpenStack API riak_cs/#666
- Fix path substitution code to fix Solaris source builds riak_cs/#733
sanity_check(true,false)
logs invalid error onriakc_pb_socket
error riak_cs/#683- Riak-CS-GC timestamp for scheduler is in the year 0043, not 2013. riak_cs/#713 fixed by riak_cs/#676
- Excessive calls to OTP code_server process #669 fixed by riak_cs/#675
- Return HTTP 400 if content-md5 does not match riak_cs/#596
/riak-cs/stats
andadmin_auth_enabled=false
don’t work together correctly. riak_cs/#719- Storage calculation doesn’t handle tombstones, nor handle undefined manifest.props riak_cs/#849
- MP initiated objects remains after delete/create buckets #475 fixed by riak_cs/#857 and stanchion/#78
- handling empty query string on list multipart upload riak_cs/#843
- Setting ACLs via headers at PUT Object creation riak_cs/#631
- Improve handling of poolboy timeouts during ping requests riak_cs/#763
- Remove unnecessary log message on anonymous access riak_cs/#876
- Fix inconsistent ETag on objects uploaded by multipart riak_cs/#855
- Fix policy version validation in PUT Bucket Policy riak_cs/#911
- Fix return code of several commands, to return 0 for success riak_cs/#908
- Fix
{error, disconnected}
repainted with notfound riak_cs/#929
Notes on Upgrading
Riak Version
This release of Riak CS was tested with Riak 1.4.10. Be sure to consult the Compatibility Matrix to ensure that you are using the correct version.
Incomplete multipart uploads
riak_cs/#475 was a security issue where a newly created bucket may include unaborted or incomplete multipart uploads which was created in previous epoch of the bucket with same name. This was fixed by:
on creating buckets; checking if live multipart exists and if exists, return 500 failure to client.
on deleting buckets; trying to clean up all live multipart remains, and checking if live multipart remains (in stanchion). if exists, return 409 failure to client.
Note that a few operations are needed after upgrading from 1.4.x (or former) to 1.5.0.
run
riak-cs-admin cleanup-orphan-multipart
to cleanup all buckets. To avoid some corner cases where multipart uploads can conflict with bucket deletion, this command can also be run with a timestamp with ISO 8601 format such as2014-07-30T11:09:30.000Z
as an argument. When this argument is provided, the cleanup operation will not clean up multipart uploads that are newer than the provided timestamp. If used, this should be set to a time when you expect your upgrade to be completed.there might be a time period until above cleanup finished, where no client can create bucket if unfinished multipart upload remains under deleted bucket. You can find [critical] log (
"Multipart upload remains in deleted bucket <bucketname>"
) if such bucket creation is attempted.
Leeway seconds and disk space
riak_cs/#470 changed the behaviour of object deletion and garbage collection. The timestamps in garbage collection bucket were changed from the future time when the object is to be deleted, to the current time when the object is deleted, Garbage collector was also changed to collect objects until ‘now - leeway seconds’, from collecting objects until ‘now’.
Before (-1.4.x):
t1 t2
-----------+--------------------------+------------------->
DELETE object: GC triggered:
marked as collects objects
"t1+leeway" marked as "t2"
After (1.5.0-):
t1 t2
-----------+--------------------------+------------------->
DELETE object: GC triggered:
marked as "t1" collects objects
in GC bucket marked as "t2 - leeway"
This means that there will be a period where no objects are collected
immediately following an upgrade to 1.5.0. If your cluster is upgraded
at t0
, no objects will be cleaned up until t0 + leeway
. Objects
deleted just before t0
won’t be collected until t0 + 2*leeway
.
Also, all CS nodes which run GC should be upgraded first. CS nodes
which do not run GC should be upgraded later, to ensure the leeway
setting is intiated properly. Alternatively, you may stop GC while
upgrading, by running riak-cs-admin gc set-interval infinity
.
Multi data center cluster should be upgraded more carefully, as to make sure GC is not running while upgrading.
Known Issues and Limitations
If a second client request is made using the same connection while a copy operation is in progress, the copy will be aborted. This is a side effect of the way Riak CS currently handles client disconnect detection. See #932 for further information.
Copying objects in OOS interface is not yet implemented.
Multibag, the ability to store object manifests and blocks in separate clusters or groups of clusters, has been added as Enterprise feature, but it is in early preview status.
proxy_get
has not yet been implemented for this preview feature. Multibag is intended for single DC only at this time.
Riak CS 1.4.5
Bugs Fixed
- Fix several ‘data hiding’ bugs with the v2 list objects FSM riak_cs/788
- Don’t treat HEAD requests toward BytesOut in access statistics riak_cs/791
- Handle whitespace in POST/PUT XML documents riak_cs/795
- Fix bad bucketname in storage usage riak_cs/800 Riak CS 1.4.4 introduced a bug where storage calculations made while running that version would have the bucket-name replaced by the string “struct”. This version fixes the bug, but can’t go back and retroactively fix the old storage calculations. Aggregations on an entire user-account should still be accurate, but you won’t be able to break-down storage by bucket, as they will all share the name “struct”.
- Handle unicode user-names and XML riak_cs/807
- Fix missing XML fields on storage usage riak_cs/808
- Adjust fold-objects timeout riak_cs/811
- Prune deleted buckets from user record riak_cs/812
Additions
- Optimize the list objects v2 FSM for prefix requests riak_cs/804
Riak CS 1.4.4
This is a bugfix release. The major fixes are to the storage calculation.
Bugs Fixed
Create basho-patches directory riak_cs/775 .
sum_bucket
timeout crashes all storage calculation is fixed by riak_cs/759 .Failure to throttle access archiver is fixed by riak_cs/758 .
Access archiver crash is fixed by riak_cs/747 .
Riak CS 1.4.3
Bugs Fixed
- Fix bug that reverted manifests in the scheduled_delete state to the pending_delete or active state.
- Don’t count already deleted manifests as overwritten
- Don’t delete current object version on overwrite with incorrect md5
Additions
- Improve performance of manifest pruning
- Optionally use paginated 2i for the GC daemon. This is to help prevent timeouts when collecting data that can be garbage collected.
- Improve handling of Riak disconnects on block fetches
- Update to lager 2.0.1
- Optionally prune manifests based on count, in addition to time
- Allow multiple access archiver processes to run concurrently
Riak CS 1.4.2
Bugs Fixed
- Fix issue with Enterprise build on Debian Linux distributions.
- Fix source tarball build.
- Fix access statistics bug that caused all accesses to be treated as errors.
- Make logging in bucket listing map phase function lager version agnostic to avoid issues when using versions of Riak older than 1.4.
- Handle undefined
props
field in manifests to fix issue accessing objects written with a version of Riak CS older than 1.3.0.
Additions
- Add option to delay initial GC sweep on a node using the initial_gc_delay configuration option.
- Append random suffix to GC bucket keys to avoid hot keys and improve performance during periods of frequent deletion.
- Add default_proxy_cluster_id option to provide a way to specify a default cluster id to be used when the cluster id is undefined. This is to facilitate migration from the OSS version to the Enterprise version.
Riak CS 1.4.1
Bugs Fixed
- Fix list objects crash when more than the first 1001 keys are in the pending delete state
- Fix crash in garbage collection daemon
- Fix packaging bug by updating node_package dependency
Riak CS 1.4.0
Bugs Fixed
- Remove unnecessary keys in GC bucket
- Fix query-string authentication for multi-part uploads
- Fix Storage Class for multi-part uploaded objects
- Fix etags for multi-part uploads
- Support reformat indexes in the Riak CS multi-backend
- Fix unbounded memory-growth on GET requests with a slow connection
- Reduce access-archiver memory use
- Fix 500 on object ACL HEAD request
- Fix semantics for concurrent upload and delete of the same key with a multi-part upload
- Verify content-md5 header if supplied
- Handle transient Riak connection failures
Additions
- Add preliminary support for the Swift API and Keystone authentication
- Improve performance of object listing when using Riak 1.4.0 or greater
- Add ability to edit user account name and email address
- Add support for v3 multi-data-center replication
- Add configurable Riak connection timeouts
- Add syslog support via Lager
- Only contact one vnode for immutable block requests
Riak CS 1.3.1
Bugs Fixed
- Fix bug in handling of active object manifests in the case of overwrite or delete that could lead to old object versions being resurrected.
- Fix improper capitalization of user metadata header names.
- Fix issue where the S3 rewrite module omits any query parameters that are not S3 subresources. Also correct handling of query parameters so that parameter values are not URL decoded twice. This primarily affects pre-signed URLs because the access key and request signature are included as query parameters.
- Fix for issue with init script stop.
Riak CS 1.3.0
Bugs Fixed
- Fix handling of cases where buckets have siblings. Previously this resulted in 500 errors returned to the client.
- Reduce likelihood of sibling creation when creating a bucket.
- Return a 404 instead of a 403 when accessing a deleted object.
- Unquote URLs to accommodate clients that URL encode
/
characters in URLs. - Deny anonymous service-level requests to avoid unnecessary error messages trying to list the buckets owned by an undefined user.
Additions
- Support for multipart file uploads. Parts must be in the range of 5MB-5GB.
- Support for bucket policies using a restricted set of principals and conditions.
- Support for returning bytes ranges of a file using the Range header.
- Administrative commands may be segrated onto a separate interface.
- Authentication for administrative commands may be disabled.
- Performance and stability improvements for listing the contents of buckets.
- Support for the prefix, delimiter, and marker options when listing the contents of a bucket.
- Support for using Webmachine’s access logging features in conjunction with the Riak CS internal access logging mechanism.
- Moved all administrative resources under /riak-cs.
- Riak CS now supports packaging for FreeBSD, SmartOS, and Solaris.
Riak CS 1.2.2
Bugs Fixed
- Fix problem where objects with utf-8 unicode key cannot be listed nor fetched.
- Speed up bucket_empty check and fix process leak. This bug was
originally found when a user was having trouble with
s3cmd rb s3://foo --recursive
. The operation first tries to delete the (potentially large) bucket, which triggers our bucket empty check. If the bucket has more than 32k items, we run out of processes unless +P is set higher (because of the leak).
Additions
- Full support for MDC replication
Riak CS 1.2.1
Bugs Fixed
- Return 403 instead of 404 when a user attempts to list contents of nonexistent bucket.
- Do not do bucket list for HEAD or ?versioning or ?location request.
Additions
- Add reduce phase for listing bucket contents to provide backpressure when executing the MapReduce job.
- Use prereduce during storage calculations.
- Return 403 instead of 404 when a user attempts to list contents of nonexistent bucket.
Riak CS 1.2.0
Bugs Fixed
- Do not expose stack traces to users on 500 errors
- Fix issue with sibling creation on user record updates
- Fix crash in terminate state when fsm state is not fully populated
- Script fixes and updates in response to node_package updates
Additions
- Add preliminary support for MDC replication
- Quickcheck test to exercise the erlcloud library against Riak CS
- Basic support for riak_test integration
Riak CS 1.1.0
Bugs Fixed
- Check for timeout when checking out a connection from poolboy.
- PUT object now returns 200 instead of 204.
- Fixes for Dialyzer errors and warnings.
- Return readable error message with 500 errors instead of large webmachine backtraces.
Additions
- Update user creation to accept a JSON or XML document for user creation instead of URL encoded text string.
- Configuration option to allow anonymous users to create accounts. In the default mode, only the administrator is allowed to create accounts.
- Ping resource for health checks.
- Support for user-specified metadata headers.
- User accounts may be disabled by the administrator.
- A new key_secret can be issued for a user by the administrator.
- Administrator can now list all system users and optionally filter by enabled or disabled account status.
- Garbage collection for deleted and overwritten objects.
- Separate connection pool for object listings with a default of 5 connections.
- Improved performance for listing all objects in a bucket.
- Statistics collection and querying.
- DTrace probing.
Riak CS 1.0.2
Additions
- Support query parameter authentication as specified in Signing and Authenticating REST Requests
Riak CS 1.0.1
Bugs Fixed
- Default content-type is not passed into function to handle PUT request body
- Requests hang when a node in the Riak cluster is unavailable
- Correct inappropriate use of riak_moss_utils:get_user by riak_moss_acl_utils:get_owner_data
Riak CS 1.0.0
Bugs Fixed
- Fix PUTs for zero-byte files
- Fix fsm initialization race conditions
- Canonicalize the entire path if there is no host header, but there are tokens
- Fix process and socket leaks in get fsm
Other Additions
- Subsystem for calculating user access and storage usage
- Fixed-size connection pool of Riak connections
- Use a single Riak connection per request to avoid deadlock conditions
- Object ACLs
- Management for multiple versions of a file manifest
- Configurable block size and max content length
- Support specifying non-default ACL at bucket creation time
Riak CS 0.1.2
Bugs Fixed
- Return 403 instead of 503 for invalid anonymous or signed requests.
- Properly clean up processes and connections on object requests.
Riak CS 0.1.1
Bugs Fixed
- HEAD requests always result in a
403 Forbidden
. s3cmd info
on a bucket object results in an error due to missing ACL document.- Incorrect atom specified in
riak_moss_wm_utils:parse_auth_header
. - Bad match condition used in
riak_moss_acl:has_permission/2
.
Riak CS 0.1.0
Bugs Fixed
s3cmd info
fails due to missing'last-modified
key in return document.s3cmd get
of 0 byte file fails.- Bucket creation fails with status code
415
using the AWS Java SDK.
Other Additions
- Bucket-level access control lists
- User records have been modified so that an system-wide unique email address is required to create a user.
- User creation requests are serialized through
stanchion
to be certain the email address is unique. - Bucket creation and deletion requests are serialized through
stanchion
to ensure bucket names are unique in the system. - The
stanchion
serialization service is now required to be installed and running for the system to be fully operational. - The concept of an administrative user has been added to the system. The credentials of the
administrative user must be added to the app.config files for
moss
andstanchion
. - User credentials are now created using a url-safe base64 encoding module.
Known Issues
- Object-level access control lists have not yet been implemented.
Riak CS 0.0.3
Bugs Fixed
- URL decode keys on put so they are represented correctly. This eliminates confusion when objects with spaces in their names are listed and when attempting to access them.
- Properly handle zero-byte files
- Reap all processes during file puts
Other Additions
Support for the s3cmd subcommands sync, du, and rb
Return valid size and checksum for each object when listing bucket objects.
Changes so that a bucket may be deleted if it is empty.
Changes so a subdirectory path can be specified when storing or retrieving files.
Make buckets private by default
Support the prefix query parameter
Enhance process dependencies for improved failure handling
Known Issues
- Buckets are marked as /private/ by default, but globally-unique bucket names are not enforced. This means that two users may create the same bucket and this could result in unauthorized access and unintentional overwriting of files. This will be addressed in a future release by ensuring that bucket names are unique across the system.