Skip to content
Snippets Groups Projects
Commit 0343882d authored by ocelot-inc's avatar ocelot-inc
Browse files

lua-tutorial.xml exercise cjson and index iterator

parent d3f1dc11
No related branches found
No related tags found
No related merge requests found
......@@ -4,8 +4,10 @@
<title>Lua tutorial</title>
<section xml:id="lua-tutorial-insert">
<title>Insert one million tuples with a Lua stored procedure</title>
<para>
<bridgehead renderas="sect4">Insert one million tuples with a Lua stored procedure</bridgehead>
This is an exercise assignment: <quote>Insert one million tuples.
Each tuple should have a constantly-increasing numeric primary-key field
and a random alphabetic 10-character string field.</quote>
......@@ -481,6 +483,242 @@ tuples took 42 seconds. The host computer was a Toshiba
laptop with a 2.2-GHz Intel Core Duo CPU.
</para>
</section>
<section xml:id="lua-tutorial-sum">
<title>Sum a JSON field for all tuples</title>
<para>
This is an exercise assignment: <quote>Assume that inside every tuple there
is a string formatted as JSON. Inside that string there is a JSON numeric
field. For each tuple, find the numeric field's value and add it to a
'sum' variable. At end, return the 'sum' variable.</quote>
</para>
<para>
The purpose of the exercise is to show one way to read and process tuples.
This is harder than the first exercise because the function is useful.
A function which is useful, and therefore is going to be used more than
once by more than one person, has to be robust and understandable.
So here is the function. It's best to start by looking at each line --
there are only twelve lines so it will only take a few minutes to guess what they do.
Then it will take somewhat longer to read the detailed
comments about the function, and follow the links wherever necessary.
Once again, to further enhance learning, type the statements
in with the tarantool client while reading along. At the very end there
is an example that shows how to make a few tuples and invoke the function.
</para>
<programlisting language="lua">
SETOPT DELIMITER='!'
lua function sum_json_field(field_name)
local v, t, sum, field_value, is_valid_json, lua_table --[[1]]
sum = 0 --[[2]]
v = box.space[0].index[0]:iterator(box.index.ALL) --[[3]]
for t in v do --[[4]]
is_valid_json, lua_table = pcall(box.cjson.decode, t[1]) --[[5]]
if is_valid_json then --[[6]]
field_value = lua_table[field_name] --[[7]]
if type(field_value) == "number" then sum = sum + field_value end --[[8]]
end --[[9]]
end --[[10]]
return sum --[[11]]
end!
SETOPT DELIMITER=''!
</programlisting>
<para>
SPACES. There is one space after every comma (line 3, line 5). There is one space
before and one space after every operator such as '<code>=</code>' or '<code>==</code>' or '<code>+</code>' (line 2,
line 3, line 5, line 7, line 8). There are no spaces around parentheses.
Each indentation is two spaces (actually Tarantool developers often use four
spaces but we follow the unofficial <link xlink:href="http://lua-users.org/wiki/LuaStyleGuide">Lua Style Guide</link> here).
Indentation starts within a function, and within every block that is introduced
by "<code>for</code>" or "<code>if</code>", and ends when the block ends with "<code>end</code>" (lines 4 to 10, lines 6 to 9).
</para>
<para>
COMMENTS. Every comment begins with "<code>--[[</code>" and ends with "<code>]]</code>". Although this example uses comments to
indicate line numbers, the normal practice is to put comments when the
meaning of the code would not be clear by merely looking at the code.
</para>
<para>
LINE 1: WHY "LOCAL". This line declares all the variables that will be used
in the function. Actually it's not necessary to declare all variables at the start,
and in a long function it would be better to declare variables just before using
them. In fact it's not even necessary to declare variables at all, but an
undeclared variable is "global". That's not desirable for any of the variables
that are declared in line 1, because all of them are for use only within the
function.
</para>
<para>
LINE 1: NAMES. Single-letter variable names like <code>'v</code>' are okay when they're
strictly for use as an iterator -- '<code>v</code>' is going to be the thing that goes
up in the "<code>for t in v do</code>" statement in line 4. Terse names like '<code>sum</code>'
are okay for local variables when there's only one sum and the name is
not an abbreviation. The prefix "is_" in the name "<code>is_valid_json</code>" is
there because the variable will get a Boolean (true/false) value and
will be true only for a string that "is valid [according to] JSON [format rules]".
</para>
<para>
LINE 2: INITIALIZING. The only variable that needs initializing is <code>sum</code>, which
must start at zero, so line 2 is "<code>sum = 0</code>". It's easier to do initialization
on the declaration line, that is, we could have said "<code>local sum = 0</code>". We
chose to put it on a separate line to make sure that it's visible.
</para>
<para>
LINE 3: WHY INDEX ITERATOR". Our job is to go through all the rows and there are two ways
to do it: with <olink targetptr="box.select_range">box.select_range()</olink> or with
<olink targetptr="box.index.iterator">index[].iterator</olink>. We preferred
index[].iterator because it works regardless of the index type, that is,
it works with HASH, TREE, and BITSET indexes.
</para>
<para>
LINE 3: MEANING. The value zero is hard-coded so this will only work for space[0]
and index[0] -- we're making some hopeful assumptions here. The meaning is "variable <code>v</code> gets
the iterator for the primary index of the first space".
</para>
<para>
LINE 4: START THE MAIN LOOP. Everything inside this "<code>for</code>" loop will be repeated
as long as there is another index key. A tuple is fetched and can be referenced
with variable <code>t</code>.
</para>
<para>
LINE 5: WHY "PCALL". If we simply said "<code>lua_table = box.cjson.decode(t[1]))</code>",
then the function would abort with an error if it encountered something wrong
with the JSON string -- a missing colon, for example. By putting the function
inside "<code>pcall</code>" (<link xlink:href="http://www.lua.org/pil/8.4.html">protected call</link>), we're saying: we want to intercept that sort
of error, so if there's a problem just set <code>is_valid_json = false</code> and we
will know what to do about it later.
</para>
<para>
LINE 5: MEANING. The function is <olink targetptr="box.cjson">box.cjson.decode</olink> which means decode a JSON
string, and the parameter is <code>t[1]</code> which is a reference to a JSON string.
Once again there's a bit of hard coding here, we're assuming that the second
field in the tuple is where the JSON string was inserted. For example, we're assuming a tuple looks like <programlisting>field[0]: 444
field[1]: '{"Hello": "world", "Quantity": 15}'
</programlisting>meaning that the tuple's first field, the primary key field, is a number
while the tuple's second field, the JSON string, is a string. Thus the
entire statement means "decode <code>t[1]</code> (the tuple's second field) as a JSON
string; if there's an error set <code>is_valid_json = false</code>; if there's no error
set <code>is_valid_json = true</code> and set <code>lua_table</code> = a Lua table which has the
decoded string".
</para>
<para>
LINE 6. This "<code>if</code>" statement means "if the <code>box.cjson.decode</code> function failed,
don't execute the next indented lines", so <code>sum</code> will be unchanged if
<code>box.cjson.decode</code> failed. Although "<code>if is_valid_json == true</code>" would be clearer, the
usual style is to say "<code>if is_valid_json</code>" and let "<code>== true</code>" be assumed.
</para>
<para>
LINE 7. At last we are ready to get the JSON field value from the Lua
table that came from the JSON string.
The value in <code>field_name</code>, which is the parameter for the whole function,
must be a name of a JSON field. For example, inside the JSON string
'{"Hello": "world", "Quantity": 15}', there are two JSON fields: "Hello"
and "Quantity". If the whole function is invoked with <code>sum_json_field("Quantity")</code>,
then <code>field_value = lua_table[field_name]</code> is effectively the same as
<code>field_value = lua_table["Quantity"]</code> or even <code>field_value = lua_table.Quantity</code>.
Those are just three different ways of saying: for the Quantity field
in the Lua table, get the value and put it in variable <code>field_value</code>.
</para>
<para>
LINE 8: WHY "IF". Suppose that the JSON string is well formed but the
JSON field is not a number, or is missing. In that case, the function
would be aborted when there was an attempt to add it to the sum.
By first checking <code>type(field_value) == "number"</code>, we avoid that abortion.
Again, as in line 5, this is slightly paranoid -- anyone who knows
that the database is in perfect shape can skip this kind of thing.
Incidentally the "<code>if ... end</code>" statement is so short that it fits on
a single line, which is acceptable but optional practice.
</para>
<para>
LINE 8: MEANING. The meat, the whole reason for the function's existence,
is in the words "<code>sum = sum + field_value</code>". This addition of <code>field_value</code>
to <code>sum</code> will happen for every tuple, provided the field is there and is
numeric.
</para>
<para>
LINE 9. This "<code>end</code>" statement matches the "<code>if is_valid_json</code>" statement
in line 6.
</para>
<para>
LINE 10. This "<code>end</code>" statement matches the "<code>for t in v do</code>" statement
in line 4. The effect is that another iteration of the loop will take
place, unless there are no more tuples.
</para>
<para>
LINE 11: This is after the end of the "<code>for t in v do</code>" loop. Return <code>sum</code> to the caller.
This effectively ends the execution of the whole function, so all the
local variables are destroyed and the function's caller gets the result.
</para>
<para>
LINE 12: This "<code>end</code>" statement matches the start of the function.
</para>
<para>
And the function is complete. Time to test it.
Starting with an empty database, defined the same way as the
sandbox database that was introduced in
<olink
targetptr="getting-started-start-stop"><quote>Starting Tarantool and making your first database</quote></olink>,
add some tuples where the first field is a number and the second field is a string.
</para>
<programlisting>
INSERT INTO t0 VALUES (444,'{"Item": "widget", "Quantity": 15}')
INSERT INTO t0 VALUES (445,'{"Item": "widget", "Quantity": 7}')
INSERT INTO t0 VALUES (446,'{"Item": "golf club", "Quantity": "sunshine"}')
INSERT INTO t0 VALUES (447,'{"Item": "waffle iron", "Quantit": 3}')
</programlisting>
<para>
Since this is a test, there are deliberate errors. The "golf club" and
the "waffle iron" do not have numeric Quantity fields, so must be ignored.
Therefore the real sum of the Quantity field in the JSON strings should be:
15 + 7 = 22.
</para>
<para>
Invoke the function with either <code>CALL sum_json_field("Quantity")</code> or
<code>lua sum_json_field("Quantity")</code>.
<programlisting language="lua">
<prompt>localhost&gt;</prompt> <userinput>lua sum_json_field("Quantity")</userinput>
---
- 22
...
</programlisting>
</para>
<para>
It works. We'll just leave, as exercises for future improvement, the possibility
that the "hard coding" assumptions could be removed, that there might have to be
an overflow check if some field values are huge, and that the function should
contain a "yield" instruction if the count of tuples is huge.
</para>
<para>
What has been shown is that a 12-line Lua function can scan a database and
process JSON strings, in a way that's useful, robust, and -- now that
this tutorial exercise is over -- understandable.
</para>
</section>
</appendix>
<!--
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment