From 0343882d58969cddc8d778f7ba26137811c9c453 Mon Sep 17 00:00:00 2001 From: ocelot-inc <pgulutzan@ocelot.ca> Date: Thu, 16 Jan 2014 17:05:00 -0700 Subject: [PATCH] lua-tutorial.xml exercise cjson and index iterator --- doc/user/lua-tutorial.xml | 240 +++++++++++++++++++++++++++++++++++++- 1 file changed, 239 insertions(+), 1 deletion(-) diff --git a/doc/user/lua-tutorial.xml b/doc/user/lua-tutorial.xml index bb30369dd7..07bf19e5b6 100644 --- a/doc/user/lua-tutorial.xml +++ b/doc/user/lua-tutorial.xml @@ -4,8 +4,10 @@ <title>Lua tutorial</title> +<section xml:id="lua-tutorial-insert"> +<title>Insert one million tuples with a Lua stored procedure</title> + <para> -<bridgehead renderas="sect4">Insert one million tuples with a Lua stored procedure</bridgehead> This is an exercise assignment: <quote>Insert one million tuples. Each tuple should have a constantly-increasing numeric primary-key field and a random alphabetic 10-character string field.</quote> @@ -481,6 +483,242 @@ tuples took 42 seconds. The host computer was a Toshiba laptop with a 2.2-GHz Intel Core Duo CPU. </para> +</section> + +<section xml:id="lua-tutorial-sum"> +<title>Sum a JSON field for all tuples</title> + +<para> +This is an exercise assignment: <quote>Assume that inside every tuple there +is a string formatted as JSON. Inside that string there is a JSON numeric +field. For each tuple, find the numeric field's value and add it to a +'sum' variable. At end, return the 'sum' variable.</quote> +</para> + +<para> +The purpose of the exercise is to show one way to read and process tuples. +This is harder than the first exercise because the function is useful. +A function which is useful, and therefore is going to be used more than +once by more than one person, has to be robust and understandable. +So here is the function. It's best to start by looking at each line -- +there are only twelve lines so it will only take a few minutes to guess what they do. +Then it will take somewhat longer to read the detailed +comments about the function, and follow the links wherever necessary. +Once again, to further enhance learning, type the statements +in with the tarantool client while reading along. At the very end there +is an example that shows how to make a few tuples and invoke the function. +</para> + +<programlisting language="lua"> +SETOPT DELIMITER='!' +lua function sum_json_field(field_name) + local v, t, sum, field_value, is_valid_json, lua_table --[[1]] + sum = 0 --[[2]] + v = box.space[0].index[0]:iterator(box.index.ALL) --[[3]] + for t in v do --[[4]] + is_valid_json, lua_table = pcall(box.cjson.decode, t[1]) --[[5]] + if is_valid_json then --[[6]] + field_value = lua_table[field_name] --[[7]] + if type(field_value) == "number" then sum = sum + field_value end --[[8]] + end --[[9]] + end --[[10]] + return sum --[[11]] + end! +SETOPT DELIMITER=''! +</programlisting> + +<para> +SPACES. There is one space after every comma (line 3, line 5). There is one space +before and one space after every operator such as '<code>=</code>' or '<code>==</code>' or '<code>+</code>' (line 2, +line 3, line 5, line 7, line 8). There are no spaces around parentheses. +Each indentation is two spaces (actually Tarantool developers often use four +spaces but we follow the unofficial <link xlink:href="http://lua-users.org/wiki/LuaStyleGuide">Lua Style Guide</link> here). +Indentation starts within a function, and within every block that is introduced +by "<code>for</code>" or "<code>if</code>", and ends when the block ends with "<code>end</code>" (lines 4 to 10, lines 6 to 9). +</para> + +<para> +COMMENTS. Every comment begins with "<code>--[[</code>" and ends with "<code>]]</code>". Although this example uses comments to +indicate line numbers, the normal practice is to put comments when the +meaning of the code would not be clear by merely looking at the code. +</para> + +<para> +LINE 1: WHY "LOCAL". This line declares all the variables that will be used +in the function. Actually it's not necessary to declare all variables at the start, +and in a long function it would be better to declare variables just before using +them. In fact it's not even necessary to declare variables at all, but an +undeclared variable is "global". That's not desirable for any of the variables +that are declared in line 1, because all of them are for use only within the +function. +</para> + +<para> +LINE 1: NAMES. Single-letter variable names like <code>'v</code>' are okay when they're +strictly for use as an iterator -- '<code>v</code>' is going to be the thing that goes +up in the "<code>for t in v do</code>" statement in line 4. Terse names like '<code>sum</code>' +are okay for local variables when there's only one sum and the name is +not an abbreviation. The prefix "is_" in the name "<code>is_valid_json</code>" is +there because the variable will get a Boolean (true/false) value and +will be true only for a string that "is valid [according to] JSON [format rules]". +</para> + +<para> +LINE 2: INITIALIZING. The only variable that needs initializing is <code>sum</code>, which +must start at zero, so line 2 is "<code>sum = 0</code>". It's easier to do initialization +on the declaration line, that is, we could have said "<code>local sum = 0</code>". We +chose to put it on a separate line to make sure that it's visible. +</para> + +<para> +LINE 3: WHY INDEX ITERATOR". Our job is to go through all the rows and there are two ways +to do it: with <olink targetptr="box.select_range">box.select_range()</olink> or with +<olink targetptr="box.index.iterator">index[].iterator</olink>. We preferred +index[].iterator because it works regardless of the index type, that is, +it works with HASH, TREE, and BITSET indexes. +</para> + +<para> +LINE 3: MEANING. The value zero is hard-coded so this will only work for space[0] +and index[0] -- we're making some hopeful assumptions here. The meaning is "variable <code>v</code> gets +the iterator for the primary index of the first space". +</para> + +<para> +LINE 4: START THE MAIN LOOP. Everything inside this "<code>for</code>" loop will be repeated +as long as there is another index key. A tuple is fetched and can be referenced +with variable <code>t</code>. +</para> + +<para> +LINE 5: WHY "PCALL". If we simply said "<code>lua_table = box.cjson.decode(t[1]))</code>", +then the function would abort with an error if it encountered something wrong +with the JSON string -- a missing colon, for example. By putting the function +inside "<code>pcall</code>" (<link xlink:href="http://www.lua.org/pil/8.4.html">protected call</link>), we're saying: we want to intercept that sort +of error, so if there's a problem just set <code>is_valid_json = false</code> and we +will know what to do about it later. +</para> + +<para> +LINE 5: MEANING. The function is <olink targetptr="box.cjson">box.cjson.decode</olink> which means decode a JSON +string, and the parameter is <code>t[1]</code> which is a reference to a JSON string. +Once again there's a bit of hard coding here, we're assuming that the second +field in the tuple is where the JSON string was inserted. For example, we're assuming a tuple looks like <programlisting>field[0]: 444 +field[1]: '{"Hello": "world", "Quantity": 15}' +</programlisting>meaning that the tuple's first field, the primary key field, is a number +while the tuple's second field, the JSON string, is a string. Thus the +entire statement means "decode <code>t[1]</code> (the tuple's second field) as a JSON +string; if there's an error set <code>is_valid_json = false</code>; if there's no error +set <code>is_valid_json = true</code> and set <code>lua_table</code> = a Lua table which has the +decoded string". +</para> + +<para> +LINE 6. This "<code>if</code>" statement means "if the <code>box.cjson.decode</code> function failed, +don't execute the next indented lines", so <code>sum</code> will be unchanged if +<code>box.cjson.decode</code> failed. Although "<code>if is_valid_json == true</code>" would be clearer, the +usual style is to say "<code>if is_valid_json</code>" and let "<code>== true</code>" be assumed. +</para> + +<para> +LINE 7. At last we are ready to get the JSON field value from the Lua +table that came from the JSON string. +The value in <code>field_name</code>, which is the parameter for the whole function, +must be a name of a JSON field. For example, inside the JSON string +'{"Hello": "world", "Quantity": 15}', there are two JSON fields: "Hello" +and "Quantity". If the whole function is invoked with <code>sum_json_field("Quantity")</code>, +then <code>field_value = lua_table[field_name]</code> is effectively the same as +<code>field_value = lua_table["Quantity"]</code> or even <code>field_value = lua_table.Quantity</code>. +Those are just three different ways of saying: for the Quantity field +in the Lua table, get the value and put it in variable <code>field_value</code>. +</para> + +<para> +LINE 8: WHY "IF". Suppose that the JSON string is well formed but the +JSON field is not a number, or is missing. In that case, the function +would be aborted when there was an attempt to add it to the sum. +By first checking <code>type(field_value) == "number"</code>, we avoid that abortion. +Again, as in line 5, this is slightly paranoid -- anyone who knows +that the database is in perfect shape can skip this kind of thing. +Incidentally the "<code>if ... end</code>" statement is so short that it fits on +a single line, which is acceptable but optional practice. +</para> + +<para> +LINE 8: MEANING. The meat, the whole reason for the function's existence, +is in the words "<code>sum = sum + field_value</code>". This addition of <code>field_value</code> +to <code>sum</code> will happen for every tuple, provided the field is there and is +numeric. +</para> + +<para> +LINE 9. This "<code>end</code>" statement matches the "<code>if is_valid_json</code>" statement +in line 6. +</para> + +<para> +LINE 10. This "<code>end</code>" statement matches the "<code>for t in v do</code>" statement +in line 4. The effect is that another iteration of the loop will take +place, unless there are no more tuples. +</para> + +<para> +LINE 11: This is after the end of the "<code>for t in v do</code>" loop. Return <code>sum</code> to the caller. +This effectively ends the execution of the whole function, so all the +local variables are destroyed and the function's caller gets the result. +</para> + +<para> +LINE 12: This "<code>end</code>" statement matches the start of the function. +</para> + +<para> +And the function is complete. Time to test it. +Starting with an empty database, defined the same way as the +sandbox database that was introduced in +<olink +targetptr="getting-started-start-stop"><quote>Starting Tarantool and making your first database</quote></olink>, +add some tuples where the first field is a number and the second field is a string. +</para> +<programlisting> +INSERT INTO t0 VALUES (444,'{"Item": "widget", "Quantity": 15}') +INSERT INTO t0 VALUES (445,'{"Item": "widget", "Quantity": 7}') +INSERT INTO t0 VALUES (446,'{"Item": "golf club", "Quantity": "sunshine"}') +INSERT INTO t0 VALUES (447,'{"Item": "waffle iron", "Quantit": 3}') +</programlisting> +<para> +Since this is a test, there are deliberate errors. The "golf club" and +the "waffle iron" do not have numeric Quantity fields, so must be ignored. +Therefore the real sum of the Quantity field in the JSON strings should be: +15 + 7 = 22. +</para> + +<para> +Invoke the function with either <code>CALL sum_json_field("Quantity")</code> or +<code>lua sum_json_field("Quantity")</code>. +<programlisting language="lua"> +<prompt>localhost></prompt> <userinput>lua sum_json_field("Quantity")</userinput> +--- + - 22 +... +</programlisting> +</para> + +<para> +It works. We'll just leave, as exercises for future improvement, the possibility +that the "hard coding" assumptions could be removed, that there might have to be +an overflow check if some field values are huge, and that the function should +contain a "yield" instruction if the count of tuples is huge. +</para> + +<para> +What has been shown is that a 12-line Lua function can scan a database and +process JSON strings, in a way that's useful, robust, and -- now that +this tutorial exercise is over -- understandable. +</para> + +</section> + </appendix> <!-- -- GitLab