Search Settings

This document covers how to use the Riak Search (with Solr integration) subsystem from an operational perspective.

For a simple reference of the available configs & their defaults, go here.

If you are looking developer-focused docs, we recommend the following:

Overview

We’ll be walking through:

[Prequisites][#prerequisites]
[Enable Riak Search][#enabling-riak-search]
[Riak.conf Configuration Settings][#riak-config-settings]
[Additional Solr Information][#more-on-solr]

Prerequisites

Because Solr is a Java application, you will need to install Java 1.6 or later on every node. We recommend installing Oracle’s JDK 7u25. Installation packages can be found on the Java SE 7 Downloads page and instructions on the documentation page.

Enabling Riak Search

Riak Search is not enabled by default, so you must enable it in every node’s configuration file as follows:

search = on

Riak Config Settings

Setting search to on is required, but other search settings are optional. A list of these parameters can also be found in our configuration files documentation.

Field	Default	Valid values	Description
`search`	`off`	`on` or `off`	Enable or disable Search
`search.anti_entropy.data_dir`	`./data/yz_anti_entropy`	Directory	The directory in which Riak Search stores files related to active anti-entropy
`search.root_dir`	`./data/yz`	Directory	The root directory in which index data and configuration is stored
`search.solr.start_timeout`	`30s`	Integer with time units (eg. 2m)	How long Riak will wait for Solr to start (attempts twice before shutdown). Values lower than 1s will be rounded up to 1s.
`search.solr.port`	`8093`	Integer	The port number to which Solr binds (note: binds on every interface)
`search.solr.jmx_port`	`8985`	Integer	The port number to which Solr JMX (note: binds on every interface)
`search.solr.jvm_options`	`-d64 -Xms1g -Xmx1g -XX:+UseStringCache -XX:+UseCompressedOops`	Java command-line arguments	The options to pass to the Solr JVM. Non-standard options, e.g. `-XX`, may not be portable across JVM implementations.
`search.queue.batch.minimum`	`1`	Integer	The minimum batch size, in number of Riak objects. Any batches that are smaller than this amount will not be immediately flushed to Solr, but are guaranteed to be flushed within the `search.queue.batch.flush_interval`.
`search.queue.batch.maximum`	`100`	Integer	The maximim batch size, in number of Riak objects. Any batches that are larger than this amount will be split, where the first `search.queue.batch.maximum` objects will be flushed to Solr and the remaining objects enqueued for that index will be retained until the next batch is delivered. This parameter ensures that at most `search.queue.batch.maximum` objects will be delivered into Solr in any given request.
`search.queue.batch.flush_interval`	`1000`	`ms`, `s`, `m`, `h`	The maximum delay between notification to flush batches to Solr. This setting is used to increase or decrease the frequency of batch delivery into Solr, specifically for relatively low-volume input into Riak. This setting ensures that data will be delivered into Solr in accordance with the `search.queue.batch.minimum` and `search.queue.batch.maximum` settings within the specified interval. Batches that are smaller than `search.queue.batch.minimum` will be delivered to Solr within this interval. This setting will generally have no effect on heavily loaded systems. You may use any time unit; the default is in milliseconds.
`search.queue.high_watermark`	`10000`	Integer	The queue high water mark. If the total number of queued messages in a Solrq worker instance exceed this limit, then the calling vnode will be blocked until the total number falls below this limit. This parameter exercises flow control between Riak and the Riak Search batching subsystem, if writes into Solr start to fall behind.
`search.queue.worker_count`	`10`	Integer	The number of Solr queue workers to instantiate. Solr queue workers are responsible for enqueing objects for insertion or update into Solr. Increasing the number of Solrq workers distributes the queuing of objects and can lead to greater throughput under high load, potentially at the expense of smaller batch sizes.
`search.queue.helper_count`	`10`	Integer	The number of Solr queue helpers to instantiate. Solr queue helpers are responsible for delivering batches of data into Solr. Increasing the number of Solrq helpers will increase concurrent writes into Solr.
`search.index.error_threshold.failure_count`	`3`	Integer	The number of failures encountered while updating a search index within `search.index.error_threshold.failure_interval` before Riak will skip updates to that index.
`search.index.error_threshold.failure_interval`	`5000`	Milliseconds	The window of time during which `search.index.error_threshold.failure_count` failures will cause Riak to skip updates to a search index. If `search.index.error_threshold.failure_count` errors have occurred within this interval on a given search index, then Riak will skip updates to that index until the `search.index.error_threshold.reset_interval` has passed.
`search.index.error_threshold.reset_interval`	`30000`	Milliseconds	The amount of time it takes for updates to a given search index to resume/refresh once Riak has started skipping update operations.
`search.queue.high_watermark.purge_strategy`	`purge_one`	`purge_one`, `purge_index`, `purge_all`, or `off`	The strategy for how we handle purging when we hit the `search.queue.high_watermark`. The options: * `purge_one` removes the oldest item on the queue from an erroring (references to fuses blown in the code) index in order to get below the `search.queue.high_watermark`, * `purge_index` removes all items associated with one random erroring (references to fuses blown in the code) index in order to get below the `search.queue.high_watermark`, * `purge_all` removes all items associated with all erroring (references to fuses blown in the code) indices in order to get below the `search.queue.high_watermark`, and *`off` disables purging.

More on Solr

Solr JVM and Ports

Riak Search runs one Solr process per node to manage its indexing and search functionality. While the underlying project manages index distribution, node coverage for queries, active anti-entropy (AAE), and JVM process management, you should provide plenty of RAM and diskspace for running both Riak and the JVM running Solr. We recommend a minimum of 6GB of RAM per node.

Concerning ports, be sure to take the necessary security precautions to prevent exposing the extra Solr and JMX ports to the outside world.

Solr for Operators

For further information on Solr monitoring, tuning, and performance, we recommend the following documents for getting started:

A wide variety of other documentation is available from the Solr OSS community.