Secondary Indexes
Secondary indexes, commonly called “2i,” are a way to add and query specific
tags on objects. It requires the memory or leveldb backend. It will work
with the multi backend configured to use memory or leveldb for your
specific objects.
Check out the Riak docuentation on using secondary indexes and a few notes about 2i implementation.
tl;dr
Tagging and simple querying:
# #indexes is a hash of arrays
# keys are postfixed with _bin for binary/string indexes, _int for integers
# values are arrays
cobb_salad.indexes['ingredients_bin'] = %w{lettuce tomato bacon egg chives}
cobb_salad.indexes['calories_int'] = [220]
cobb_salad.store
# integer indexes can be queried for match or range
bucket.get_index 'calories_int', 220 #=> ['cobb_salad']
bucket.get_index 'calories_int', (0..300) #=> ['cobb_salad']
# bin indexes can be queried for match or range too
bucket.get_index 'ingredients_bin', 'lettuce' #=> ['cobb_salad']
bucket.get_index 'ingredients_bin', 'tomata'..'tomatz' #=> ['cobb_salad']
Paginated queries:
page_1 = bucket.get_index 'ingredients_bin', 'lettuce', max_results: 5
page_1.length #=> 5
page_1.continuation #=> "g2gCbQAAA="
page_2 = bucket.get_index('ingredients_bin', 'lettuce',
max_results: 5,
continuation: page_1.continuation)
Tagging
Each RObject has an indexes accessor that’s a Hash of String keys to
Set values. Keys must end with an underscore and the type of index they are:
_bin for binary/String indexes, or _int for Integer indexes. The values
must be a set of the appropriate index members. One object can have multiple
keys in the same index.
Indexes are not saved until the entire object is stored.
# allow finding this salad by any of its ingredients
cobb_salad.indexes['ingredients_bin'] = %w{lettuce tomato bacon egg chives}
# allow finding this salad by how many calories it has per serving
cobb_salad.indexes['calories_int'] = [220]
# actually store the indexes
cobb_salad.store
Tagging and Conflict Resolution
The indexes hash is actually on the RContent object. You can merge or
otherwise process conflicting indexes during conflict resolution:
if salad.conflict?
salad.siblings.inject do |merged_salad, current_salad|
# merging the salad data is left as an exercise for the reader
merged_salad.indexes['ingredients_bin'] = (
merged_salad.indexes['ingredients_bin'] +
current_salad.indexes['ingredients_bin']
).uniq
next merged_salad
end
end
Querying
There are two different Ruby client APIs for querying secondary indexes: directly
on the bucket, or through a SecondaryIndex object. These use the same Riak
server API, they just provide different levels of convenience based on how
complex your needs are.
Querying on the Bucket
Use the Bucket#get_index method for straightforward 2i queries. It returns a
Riak::IndexCollection instance, which is a subclass of Array with a few
extra accessors and methods for results.
You can query for a scalar or a range, of either integers or strings:
c = bucket.get_index 'calories_int', 220
c = bucket.get_index 'calories_int', 200..240
c = bucket.get_index 'ingredients_bin', 'tomato'
c = bucket.get_index 'ingredients_bin', 'tomata'..'tomatz'
Bucket#get_index takes other options too:
max_results: controls how many results Riak will returncontinuation: returned from a paginated query to allow access to consecutive pagesreturn_terms: include matched index terms in theIndexCollectionresults
Querying with a SecondaryIndex object
The Riak::SecondaryIndex object is constructed with:
Bucketinstance- index name (i.e.
ingredients_bin) - query (scalar or range)
- options hash (optional)
q = Riak::SecondaryIndex.new bucket, 'calories_int', 220
q = Riak::SecondaryIndex.new bucket, 'calories_int', 200..240
q = Riak::SecondaryIndex.new bucket, 'ingredients_bin', 'tomato'
q = Riak::SecondaryIndex.new bucket, 'ingredients_bin', 'tomata'..'tomatz'
Just like Bucket#get_index, Riak::SecondaryIndex.new takes options:
max_results: control how many results are returned from Riakcontinuation: opaque string that provides access to additional pages of resultsreturn_terms: return a hash of keys to terms they matched
Queries are lazy: they’re not sent to the server until absolutely necessary.
Getting a Collection of Keys or Values
Simply ask a SecondaryIndex instance for keys and it will return an
IndexCollection:
q.keys #=> an IndexCollection
The collection is memoized; the first time it’s requested will round-trip to Riak, after that it’s cached.
If you want to materialize those keys into values, invoking the #values
method will perform a multi-threaded multi-get to load them for you:
q.values #=> an Array of RObjects
Streaming Keys
Performing a large enough query can take some time. The Riak node handling the query has to sort and collate the results before sending them over the wire en masse. Performing a streaming query obviates this: the Riak node will return chunks of results as they become available.
Pass a block to the keys method during its first invocation to perform a
streaming query:
q.keys do |key|
puts "The key is #{key}"
end
Pagination
When a next page is available, calling the next_page method on a
SecondaryIndex instance will return a new instance for the next page.
page_1 = Riak::SecondaryIndex.new(bucket,
'ingredients_bin',
'lettuce',
max_results: 5)
page_2 = page_1.next_page
page_3 = page_2.next_page
When a next page is not available, calling the next_page method rasises an
error.
The IndexCollection Class
Bucket#get_index and Riak::SecondaryIndex#keys both return IndexCollection
instances. These are simply Arrays of keys with a few extra methods.
continuation: an opaqueStringused for pagination. If it’s not present, there is no next page.with_terms: aHashof keys to the index value they matched against. This can be used with a range query to materialize a bit of result without requiring a full key load.