bloom: do not use tuple_common_key_parts when constructing tuple bloom
Tuple bloom filter is an array of bloom filters, each of which reflects lookups by all possible partial keys. To optimize the overall bloom filter size, we need to know how many unique elements there are for each partial key. To achieve that, we require the caller to pass the number of key parts that have been hashed for the given tuple. Here's how it looks in Vinyl: uint32_t hashed_parts = writer->last_stmt == NULL ? 0 : tuple_common_key_parts(stmt, writer->last_stmt, writer->key_def); tuple_bloom_builder_add(writer->bloom, stmt, writer->key_def, hashed_parts); Actually, there's no need in such a requirement as instead we can calculate the hash value for the given tuple, compare it with the hash of the tuple added last time, and add the new hash only if the two values differ. This should be accurate enough while allowing us to get rid of the cumbersome tuple_common_key_parts helper. Note, such a check will only work if tuples are added in the order defined by the key definition, but that already holds - anyway, one wouldn't be able to use tuple_common_key_parts either if it wasn't true. While we are at it, refresh the obsolete comment to tuple_bloom_builder.
Showing
- src/box/key_def.h 0 additions, 11 deletionssrc/box/key_def.h
- src/box/tuple_bloom.c 8 additions, 7 deletionssrc/box/tuple_bloom.c
- src/box/tuple_bloom.h 22 additions, 7 deletionssrc/box/tuple_bloom.h
- src/box/tuple_compare.cc 0 additions, 33 deletionssrc/box/tuple_compare.cc
- src/box/vy_run.c 2 additions, 8 deletionssrc/box/vy_run.c
Loading
Please register or sign in to comment