Secondary Indexes
Secondary indexes, commonly called “2i,” are a way to add and query specific
tags on objects. It requires the memory
or leveldb
backend. It will work
with the multi
backend configured to use memory
or leveldb
for your
specific objects.
Check out the Riak docuentation on using secondary indexes and a few notes about 2i implementation.
tl;dr
Tagging and simple querying:
# #indexes is a hash of arrays
# keys are postfixed with _bin for binary/string indexes, _int for integers
# values are arrays
cobb_salad.indexes['ingredients_bin'] = %w{lettuce tomato bacon egg chives}
cobb_salad.indexes['calories_int'] = [220]
cobb_salad.store
# integer indexes can be queried for match or range
bucket.get_index 'calories_int', 220 #=> ['cobb_salad']
bucket.get_index 'calories_int', (0..300) #=> ['cobb_salad']
# bin indexes can be queried for match or range too
bucket.get_index 'ingredients_bin', 'lettuce' #=> ['cobb_salad']
bucket.get_index 'ingredients_bin', 'tomata'..'tomatz' #=> ['cobb_salad']
Paginated queries:
page_1 = bucket.get_index 'ingredients_bin', 'lettuce', max_results: 5
page_1.length #=> 5
page_1.continuation #=> "g2gCbQAAA="
page_2 = bucket.get_index('ingredients_bin', 'lettuce',
max_results: 5,
continuation: page_1.continuation)
Tagging
Each RObject
has an indexes
accessor that’s a Hash of String
keys to
Set
values. Keys must end with an underscore and the type of index they are:
_bin
for binary/String
indexes, or _int
for Integer
indexes. The values
must be a set of the appropriate index members. One object can have multiple
keys in the same index.
Indexes are not saved until the entire object is stored.
# allow finding this salad by any of its ingredients
cobb_salad.indexes['ingredients_bin'] = %w{lettuce tomato bacon egg chives}
# allow finding this salad by how many calories it has per serving
cobb_salad.indexes['calories_int'] = [220]
# actually store the indexes
cobb_salad.store
Tagging and Conflict Resolution
The indexes
hash is actually on the RContent
object. You can merge or
otherwise process conflicting indexes during conflict resolution:
if salad.conflict?
salad.siblings.inject do |merged_salad, current_salad|
# merging the salad data is left as an exercise for the reader
merged_salad.indexes['ingredients_bin'] = (
merged_salad.indexes['ingredients_bin'] +
current_salad.indexes['ingredients_bin']
).uniq
next merged_salad
end
end
Querying
There are two different Ruby client APIs for querying secondary indexes: directly
on the bucket, or through a SecondaryIndex
object. These use the same Riak
server API, they just provide different levels of convenience based on how
complex your needs are.
Querying on the Bucket
Use the Bucket#get_index
method for straightforward 2i queries. It returns a
Riak::IndexCollection
instance, which is a subclass of Array
with a few
extra accessors and methods for results.
You can query for a scalar or a range, of either integers or strings:
c = bucket.get_index 'calories_int', 220
c = bucket.get_index 'calories_int', 200..240
c = bucket.get_index 'ingredients_bin', 'tomato'
c = bucket.get_index 'ingredients_bin', 'tomata'..'tomatz'
Bucket#get_index
takes other options too:
max_results
: controls how many results Riak will returncontinuation
: returned from a paginated query to allow access to consecutive pagesreturn_terms
: include matched index terms in theIndexCollection
results
Querying with a SecondaryIndex
object
The Riak::SecondaryIndex
object is constructed with:
Bucket
instance- index name (i.e.
ingredients_bin
) - query (scalar or range)
- options hash (optional)
q = Riak::SecondaryIndex.new bucket, 'calories_int', 220
q = Riak::SecondaryIndex.new bucket, 'calories_int', 200..240
q = Riak::SecondaryIndex.new bucket, 'ingredients_bin', 'tomato'
q = Riak::SecondaryIndex.new bucket, 'ingredients_bin', 'tomata'..'tomatz'
Just like Bucket#get_index
, Riak::SecondaryIndex.new
takes options:
max_results
: control how many results are returned from Riakcontinuation
: opaque string that provides access to additional pages of resultsreturn_terms
: return a hash of keys to terms they matched
Queries are lazy: they’re not sent to the server until absolutely necessary.
Getting a Collection of Keys or Values
Simply ask a SecondaryIndex
instance for keys and it will return an
IndexCollection
:
q.keys #=> an IndexCollection
The collection is memoized; the first time it’s requested will round-trip to Riak, after that it’s cached.
If you want to materialize those keys into values, invoking the #values
method will perform a multi-threaded multi-get to load them for you:
q.values #=> an Array of RObjects
Streaming Keys
Performing a large enough query can take some time. The Riak node handling the query has to sort and collate the results before sending them over the wire en masse. Performing a streaming query obviates this: the Riak node will return chunks of results as they become available.
Pass a block to the keys
method during its first invocation to perform a
streaming query:
q.keys do |key|
puts "The key is #{key}"
end
Pagination
When a next page is available, calling the next_page
method on a
SecondaryIndex
instance will return a new instance for the next page.
page_1 = Riak::SecondaryIndex.new(bucket,
'ingredients_bin',
'lettuce',
max_results: 5)
page_2 = page_1.next_page
page_3 = page_2.next_page
When a next page is not available, calling the next_page
method rasises an
error.
The IndexCollection
Class
Bucket#get_index
and Riak::SecondaryIndex#keys
both return IndexCollection
instances. These are simply Array
s of keys with a few extra methods.
continuation
: an opaqueString
used for pagination. If it’s not present, there is no next page.with_terms
: aHash
of keys to the index value they matched against. This can be used with a range query to materialize a bit of result without requiring a full key load.