From 0343882d58969cddc8d778f7ba26137811c9c453 Mon Sep 17 00:00:00 2001
From: ocelot-inc <pgulutzan@ocelot.ca>
Date: Thu, 16 Jan 2014 17:05:00 -0700
Subject: [PATCH] lua-tutorial.xml exercise cjson and index iterator

---
 doc/user/lua-tutorial.xml | 240 +++++++++++++++++++++++++++++++++++++-
 1 file changed, 239 insertions(+), 1 deletion(-)

diff --git a/doc/user/lua-tutorial.xml b/doc/user/lua-tutorial.xml
index bb30369dd7..07bf19e5b6 100644
--- a/doc/user/lua-tutorial.xml
+++ b/doc/user/lua-tutorial.xml
@@ -4,8 +4,10 @@
 
 <title>Lua tutorial</title>
 
+<section xml:id="lua-tutorial-insert">
+<title>Insert one million tuples with a Lua stored procedure</title>
+
 <para>
-<bridgehead renderas="sect4">Insert one million tuples with a Lua stored procedure</bridgehead>
 This is an exercise assignment: <quote>Insert one million tuples.
 Each tuple should have a constantly-increasing numeric primary-key field
 and a random alphabetic 10-character string field.</quote>
@@ -481,6 +483,242 @@ tuples took 42 seconds. The host computer was a Toshiba
 laptop with a 2.2-GHz Intel Core Duo CPU.
 </para>
 
+</section>
+
+<section xml:id="lua-tutorial-sum">
+<title>Sum a JSON field for all tuples</title>
+
+<para>
+This is an exercise assignment: <quote>Assume that inside every tuple there
+is a string formatted as JSON. Inside that string there is a JSON numeric
+field. For each tuple, find the numeric field's value and add it to a
+'sum' variable. At end, return the 'sum' variable.</quote>
+</para>
+
+<para>
+The purpose of the exercise is to show one way to read and process tuples.
+This is harder than the first exercise because the function is useful.
+A function which is useful, and therefore is going to be used more than
+once by more than one person, has to be robust and understandable.
+So here is the function. It's best to start by looking at each line --
+there are only twelve lines so it will only take a few minutes to guess what they do.
+Then it will take somewhat longer to read the detailed
+comments about the function, and follow the links wherever necessary.
+Once again, to further enhance learning, type the statements
+in with the tarantool client while reading along. At the very end there
+is an example that shows how to make a few tuples and invoke the function.
+</para>
+
+<programlisting language="lua">
+SETOPT DELIMITER='!'
+lua function sum_json_field(field_name)
+  local v, t, sum, field_value, is_valid_json, lua_table                --[[1]]
+  sum = 0                                                               --[[2]]
+  v = box.space[0].index[0]:iterator(box.index.ALL)                     --[[3]]
+  for t in v do                                                         --[[4]]
+    is_valid_json, lua_table = pcall(box.cjson.decode, t[1])            --[[5]]
+    if is_valid_json then                                               --[[6]]
+      field_value = lua_table[field_name]                               --[[7]]
+      if type(field_value) == "number" then sum = sum + field_value end --[[8]]
+    end                                                                 --[[9]]
+  end                                                                   --[[10]]
+  return sum                                                            --[[11]]
+  end!
+SETOPT DELIMITER=''!
+</programlisting>
+
+<para>
+SPACES. There is one space after every comma (line 3, line 5). There is one space
+before and one space after every operator such as '<code>=</code>' or '<code>==</code>' or '<code>+</code>' (line 2,
+line 3, line 5, line 7, line 8). There are no spaces around parentheses.
+Each indentation is two spaces (actually Tarantool developers often use four
+spaces but we follow the unofficial <link xlink:href="http://lua-users.org/wiki/LuaStyleGuide">Lua Style Guide</link> here).
+Indentation starts within a function, and within every block that is introduced
+by "<code>for</code>" or "<code>if</code>", and ends when the block ends with "<code>end</code>" (lines 4 to 10, lines 6 to 9).
+</para>
+
+<para>
+COMMENTS. Every comment begins with "<code>--[[</code>" and ends with "<code>]]</code>". Although this example uses comments to
+indicate line numbers, the normal practice is to put comments when the
+meaning of the code would not be clear by merely looking at the code.
+</para>
+
+<para>
+LINE 1: WHY "LOCAL". This line declares all the variables that will be used
+in the function. Actually it's not necessary to declare all variables at the start,
+and in a long function it would be better to declare variables just before using
+them. In fact it's not even necessary to declare variables at all, but an
+undeclared variable is "global". That's not desirable for any of the variables
+that are declared in line 1, because all of them are for use only within the
+function.
+</para>
+
+<para>
+LINE 1: NAMES. Single-letter variable names like <code>'v</code>' are okay when they're
+strictly for use as an iterator -- '<code>v</code>' is going to be the thing that goes
+up in the "<code>for t in v do</code>" statement in line 4. Terse names like '<code>sum</code>'
+are okay for local variables when there's only one sum and the name is
+not an abbreviation. The prefix "is_" in the name "<code>is_valid_json</code>" is
+there because the variable will get a Boolean (true/false) value and
+will be true only for a string that "is valid [according to] JSON [format rules]".
+</para>
+
+<para>
+LINE 2: INITIALIZING. The only variable that needs initializing is <code>sum</code>, which
+must start at zero, so line 2 is "<code>sum = 0</code>". It's easier to do initialization
+on the declaration line, that is, we could have said "<code>local sum = 0</code>". We
+chose to put it on a separate line to make sure that it's visible.
+</para>
+
+<para>
+LINE 3: WHY INDEX ITERATOR". Our job is to go through all the rows and there are two ways
+to do it: with <olink targetptr="box.select_range">box.select_range()</olink> or with
+<olink targetptr="box.index.iterator">index[].iterator</olink>. We preferred
+index[].iterator because it works regardless of the index type, that is,
+it works with HASH, TREE, and BITSET indexes.
+</para>
+
+<para>
+LINE 3: MEANING. The value zero is hard-coded so this will only work for space[0]
+and index[0] -- we're making some hopeful assumptions here. The meaning is "variable <code>v</code> gets
+the iterator for the primary index of the first space".
+</para>
+
+<para>
+LINE 4: START THE MAIN LOOP. Everything inside this "<code>for</code>" loop will be repeated
+as long as there is another index key. A tuple is fetched and can be referenced
+with variable <code>t</code>.
+</para>
+
+<para>
+LINE 5: WHY "PCALL". If we simply said "<code>lua_table = box.cjson.decode(t[1]))</code>",
+then the function would abort with an error if it encountered something wrong
+with the JSON string -- a missing colon, for example. By putting the function
+inside "<code>pcall</code>" (<link xlink:href="http://www.lua.org/pil/8.4.html">protected call</link>), we're saying: we want to intercept that sort
+of error, so if there's a problem just set <code>is_valid_json = false</code> and we
+will know what to do about it later.
+</para>
+
+<para>
+LINE 5: MEANING. The function is <olink targetptr="box.cjson">box.cjson.decode</olink> which means decode a JSON
+string, and the parameter is <code>t[1]</code> which is a reference to a JSON string.
+Once again there's a bit of hard coding here, we're assuming that the second
+field in the tuple is where the JSON string was inserted. For example, we're assuming a tuple looks like <programlisting>field[0]: 444
+field[1]: '{"Hello": "world", "Quantity": 15}'
+</programlisting>meaning that the tuple's first field, the primary key field, is a number
+while the tuple's second field, the JSON string, is a string. Thus the
+entire statement means "decode <code>t[1]</code> (the tuple's second field) as a JSON
+string; if there's an error set <code>is_valid_json = false</code>; if there's no error
+set <code>is_valid_json = true</code> and set <code>lua_table</code> = a Lua table which has the
+decoded string".
+</para>
+
+<para>
+LINE 6. This "<code>if</code>" statement means "if the <code>box.cjson.decode</code> function failed,
+don't execute the next indented lines", so <code>sum</code> will be unchanged if
+<code>box.cjson.decode</code> failed. Although "<code>if is_valid_json == true</code>" would be clearer, the
+usual style is to say "<code>if is_valid_json</code>" and let "<code>== true</code>" be assumed.
+</para>
+
+<para>
+LINE 7. At last we are ready to get the JSON field value from the Lua
+table that came from the JSON string.
+The value in <code>field_name</code>, which is the parameter for the whole function,
+must be a name of a JSON field. For example, inside the JSON string
+'{"Hello": "world", "Quantity": 15}', there are two JSON fields: "Hello"
+and "Quantity". If the whole function is invoked with <code>sum_json_field("Quantity")</code>,
+then <code>field_value = lua_table[field_name]</code> is effectively the same as
+<code>field_value = lua_table["Quantity"]</code> or even <code>field_value = lua_table.Quantity</code>.
+Those are just three different ways of saying: for the Quantity field
+in the Lua table, get the value and put it in variable <code>field_value</code>.
+</para>
+
+<para>
+LINE 8: WHY "IF". Suppose that the JSON string is well formed but the
+JSON field is not a number, or is missing. In that case, the function
+would be aborted when there was an attempt to add it to the sum.
+By first checking <code>type(field_value) == "number"</code>, we avoid that abortion.
+Again, as in line 5, this is slightly paranoid -- anyone who knows
+that the database is in perfect shape can skip this kind of thing.
+Incidentally the "<code>if ... end</code>" statement is so short that it fits on
+a single line, which is acceptable but optional practice.
+</para>
+
+<para>
+LINE 8: MEANING. The meat, the whole reason for the function's existence,
+is in the words "<code>sum = sum + field_value</code>". This addition of <code>field_value</code>
+to <code>sum</code> will happen for every tuple, provided the field is there and is
+numeric.
+</para>
+
+<para>
+LINE 9. This "<code>end</code>" statement matches the "<code>if is_valid_json</code>" statement
+in line 6.
+</para>
+
+<para>
+LINE 10. This "<code>end</code>" statement matches the "<code>for t in v do</code>" statement
+in line 4. The effect is that another iteration of the loop will take
+place, unless there are no more tuples.
+</para>
+
+<para>
+LINE 11: This is after the end of the "<code>for t in v do</code>" loop. Return <code>sum</code> to the caller.
+This effectively ends the execution of the whole function, so all the
+local variables are destroyed and the function's caller gets the result.
+</para>
+
+<para>
+LINE 12: This "<code>end</code>" statement matches the start of the function.
+</para>
+
+<para>
+And the function is complete. Time to test it.
+Starting with an empty database, defined the same way as the
+sandbox database that was introduced in
+<olink
+targetptr="getting-started-start-stop"><quote>Starting Tarantool and making your first database</quote></olink>,
+add some tuples where the first field is a number and the second field is a string.
+</para>
+<programlisting>
+INSERT INTO t0 VALUES (444,'{"Item": "widget", "Quantity": 15}')
+INSERT INTO t0 VALUES (445,'{"Item": "widget", "Quantity": 7}')
+INSERT INTO t0 VALUES (446,'{"Item": "golf club", "Quantity": "sunshine"}')
+INSERT INTO t0 VALUES (447,'{"Item": "waffle iron", "Quantit": 3}')
+</programlisting>
+<para>
+Since this is a test, there are deliberate errors. The "golf club" and
+the "waffle iron" do not have numeric Quantity fields, so must be ignored.
+Therefore the real sum of the Quantity field in the JSON strings should be:
+15 + 7 = 22.
+</para>
+
+<para>
+Invoke the function with either <code>CALL sum_json_field("Quantity")</code> or
+<code>lua sum_json_field("Quantity")</code>.
+<programlisting language="lua">
+<prompt>localhost&gt;</prompt> <userinput>lua sum_json_field("Quantity")</userinput>
+---
+ - 22
+...
+</programlisting>
+</para>
+
+<para>
+It works. We'll just leave, as exercises for future improvement, the possibility
+that the "hard coding" assumptions could be removed, that there might have to be
+an overflow check if some field values are huge, and that the function should
+contain a "yield" instruction if the count of tuples is huge.
+</para>
+
+<para>
+What has been shown is that a 12-line Lua function can scan a database and
+process JSON strings, in a way that's useful, robust, and -- now that
+this tutorial exercise is over -- understandable.
+</para>
+
+</section>
+
 </appendix>
 
 <!--
-- 
GitLab