Skip to content

Add types to the distribution key

Previously we had only a cartrige runtime where we calculated the tuple hash:

  1. converting each column in the tuple to a string representation (important: 1 integer had the same representation as 1.0 double or 1 string)
  2. concatenating these string representations into a single string
  3. calculated Murmur3 from the string bytes

Easy to guess that for this algorithm we did't need to know data types in the tuple and distribution keys could keep only tuple column positions, because of the paragraph 1.

Now, in picodata, we'll use a key_def module to calculate the hash and values of different types would end with different hashes: i.e. hash(1 int) != hash(1 string). So, we must keep a data type in the distribution key for now.

But in the catridge we should preserve current behavior hash(1 int) == hash(1 string) (thanks to the sharding function). Catridge would be removed soon and we don't want to add redundant complexity to the code. Also, we want to rely on the catridge tests for picodata (as it doesn't have a test coverage at the moment).

Edited by Denis Smirnov