.. _tutorial:

Tutorial
========

To get our feet wet, we will develop a query to find functions the
arguments of which are directly of structure or class types.  Passing
such values in parameters may incur a potentially expensive copy, and
it might be desirable to audit these uses.  If you have a ``dwgrep``
distribution tarball handy, there's a file
``tests/nontrivial-types.o``, which you can use to play along with the
tutorial.  Or your can compile one yourself, this is the source that
we are using::

	struct foo {};

	int blah (struct foo f) {
	  return 0;
	}

	int main(int argc, char *argv[])
	{
	  return blah ((struct foo) {});
	}

First and foremost, a quick word about the :ref:`command line
interface <cli>`.  ``dwgrep`` is to some extent modeled after ``grep``
utility.  The following invokes ``dwgrep`` with an empty query on
aforementioned file::

	$ dwgrep '' ./tests/nontrivial-types.o
	<Dwarf "./tests/nontrivial-types.o">

Much like with ``grep``, you can change the order of arguments if you
denote which of them is query::

	$ dwgrep ./tests/nontrivial-types.o -e ''
	<Dwarf "./tests/nontrivial-types.o">

You can also use ``-c`` if you only care about the number of results,
not the results themselves::

	$ dwgrep -c ./tests/nontrivial-types.o -e ''
	1

The empty query is not very interesting, but it shows us that the
input stack actually contains one value: the Dwarf file that we gave
on the command line.  So let's look inside!

``entry``, ``unit`` --- DIE iteration
-------------------------------------

There are several functions for entering the graph, the most useful
one is likely ``entry``.  This takes [#takes]_ a Dwarf, and yields
[#yields]_ all DIE's in that Dwarf::

	$ dwgrep ./tests/nontrivial-types.o -e 'entry'
	[b]	compile_unit
		producer (strp)	GNU C 4.6.3 20120306 (Red Hat 4.6.3-2);
		[... more attributes ...]
	[2d]	structure_type
		name (string)	foo;
	[... more output ...]


.. [#takes] The word "takes" is generally used to mean that the
   function expects a stack, whose top value(s) fits the given
   description.  E.g. ``length`` takes a string or a sequence, ``add``
   takes two integers, strings or sequences, etc.

   Taken values are discarded from the stack, though typically the
   function in question pushes some other value(s) in their stead.

.. [#yields] This word is used to mean that the function produces
   stack(s) with the outlined property.  E.g. if ``length`` takes a
   string or a sequence, and yields their length, that means that the
   top value is popped, its length is determined, and a value
   corresponding to that length is pushed back.  Such stack is then
   sent to output.

Other words include ``unit``, which selects compilation and type
units, and ``abbrev`` which selects abbreviation units.

It is impractical to have to repeat the whole ``dwgrep`` command line
every time a code snippet needs to be shown.  In the following we will
use the following notation::

	$ 'unit'
	CU 0

The dollar at the beginning represents the command line prompt, and
then the query itself is given in single quotes.

If we don't care about query output (e.g. if it is trivial,
predictable or uninteresting), we will just list the query itself::

	entry


``?DW_TAG_foo``, ``?TAG_foo`` --- DIE tag assertion
---------------------------------------------------

The assertion ``?DW_TAG_foo`` holds [#holds]_ for DIE's the tag of
which is ``DW_TAG_foo``.

.. [#holds] The functions which either yield an unmodified input or
   nothing at all are called assertions, and are very common in Zwerg
   expressions.  We say that assertion holds if the values on top of
   stack (TOS) satisfy some property.  If the assertion holds, it
   yields an unchanged incoming stack.  If it doesn't hold, it yields
   nothing at all.  Assertions never modify the stack.

You can write ``?DW_TAG_foo`` also as ``?TAG_foo``, they do the same
thing.  In the following, that's how we will refer to this assertion.

In Zwerg, function pipelines are formed by simply placing functions
next to each other.  The stacks that the left function produces
becomes an input of the right function.

For example, to list all subprograms, one would use the following
expression::

	$ 'entry ?TAG_subprogram'
	[35]	subprogram
  		external (flag)	true;
  		name (strp)	blah;
  		[... more attributes ...]
	[6c]	subprogram
		external (flag)	true;
		name (strp)	main;
  		[... more attributes ...]


``?DW_AT_foo``, ``!DW_AT_foo`` --- attribute presence assertion
---------------------------------------------------------------

``?DW_AT_foo`` holds on DIE's that have an attribute ``DW_AT_foo``,
and on attributes that are ``DW_AT_foo``.  As with tags (and any
Dwarf-related assertions at all), you can shorten this to ``?AT_foo``.

``!DW_AT_foo`` similarly holds on DIE's that do NOT have an attribute
``DW_AT_foo``, or on attributes that are NOT ``DW_AT_foo``.

For assertions, concatenation happens to work as a logical and, so
for example, this is how we can get a list of all subprograms that
have a declaration attribute::

	entry ?TAG_subprogram ?AT_declaration

Similarly, to get subprograms that do not have declaration attribute,
we say::

  	entry ?TAG_subprogram !AT_declaration

``child`` --- child traversal
-----------------------------

``child`` takes a DIE and yields each of its children.  (Which is to
say: it yields nothing for child-less DIE's, or it can actually yield
a number of stacks if there are many DIE's.  Each of them will have
one of the children on TOS.)

Applied to DIE's in our running example, this will get us to formal
arguments of the selected subprograms::

	$ 'entry ?TAG_subprogram !AT_declaration child'
	[58]	formal_parameter
		name (string)	f;
  		[... more attributes ...]
	[8f]	formal_parameter
		name (strp)	argc;
  		[... more attributes ...]
	[9d]	formal_parameter
		name (strp)	argv;
  		[... more attributes ...]


If we want to make sure these children are actually formal
parameters::

	entry ?TAG_subprogram !AT_declaration child ?TAG_formal_parameter

``@DW_AT_foo`` --- value of attribute ``DW_AT_foo``
---------------------------------------------------

This word is used for accessing values of attributes.  It always takes
a DIE, but what it yields varies by the attribute type.  It could be
another DIE, a string or a number, a sequence of other values, or
whatever value type is deemed best for representing a given piece of
Dwarf.

We could for example obtain names of the formal parameters selected
above::

	$ 'entry ?TAG_subprogram !AT_declaration child ?TAG_formal_parameter @AT_name'
	f
	argc
	argv

This could be used e.g. to select a particular attribute--we'll see
later how to do this.

In is not an error to request value of attribute that a DIE doesn't
have.  In such case, ``@AT_*`` would simply not yield at all::

	$ dwgrep ./tests/aranges.o -c -e 'entry @AT_data_member_location'
	0

``@AT_*`` forms could actually also yield more than once.  For example
attributes of location expression types yield once for every covered
address range::

	$ dwgrep ./tests/aranges.o -c -e 'entry ?AT_location'
	1

	$ dwgrep ./tests/aranges.o -e 'entry @AT_location'
	0x10000..0x10010:[0:reg5]
	0x10010..0x1001a:[0:fbreg<-24>]

There's another use of this same feature: for attributes with
reference form, we get the effect of traversing over the edge rooted
at given attribute.  For example, we could get types of formal
parameters::

	$ 'entry ?TAG_subprogram !AT_declaration child ?TAG_formal_parameter @AT_type'
	[2d]	structure_type
		name (string)	foo;
  		[... more attributes ...]
	[65]	base_type
		byte_size (data1)	4;
		encoding (data1)	DW_ATE_signed;
		name (string)	int;
	[ac]	pointer_type
		byte_size (data1)	8;
		type (ref4)	[b2];

That's quite a bit more useful--we could find out whether the formal
parameters have structure types::

	$ 'entry ?TAG_subprogram !AT_declaration child ?TAG_formal_parameter
	   @AT_type ?TAG_structure_type'
	[2d]	structure_type
		name (string)	foo;
  		[... more attributes ...]

So that will let us know whether there are any offenders like that.
That's closer to being interesting, but not quite what we need either.
We would like to know about the subprograms themselves, that have this
property!

``?(EXPR)``, ``!(EXPR)`` --- Sub-expression assertions
------------------------------------------------------

Some Zwerg expressions are evaluated in what we call a sub-expression
context.  What happens in sub-expression context, stays there--the
stack effects of sub-expression computation never leak back to the
surrounding expression.

``?(EXPR)`` expression is one such case.  It asserts that *EXPR*
produces at least one element.  We can use it to get to DIE's that
have arguments that are structures::

	$ 'entry ?TAG_subprogram !AT_declaration
	   ?(child ?TAG_formal_parameter @AT_type ?TAG_structure_type)'
	[35]	subprogram
		external (flag)	true;
		name (strp)	blah;
		decl_file (data1)	/home/petr/proj/dwgrep/x.c;
		decl_line (data1)	3;
		prototyped (flag)	true;
		type (ref4)	[65];
		low_pc (addr)	0x10000;
		high_pc (addr)	0x1000b;
		frame_base (block1)	0..0xffffffffffffffff:[0:call_frame_cfa];
		sibling (ref4)	[65];

This asks whether, after going to types of children that are formal
parameters, we get a structure.  Because the initial two assertions
have no stack effects anyway, we might say the same thing thus::

	entry ?(?TAG_subprogram !AT_declaration
	        child ?TAG_formal_parameter @AT_type ?TAG_structure_type)'

The other sub-expression assertion, ``!(EXPR)``, holds if *EXPR*
produces no values at all.  E.g. to select child-less DIE's in some
query, we would say::

	some other query !(child)

``EXPR == EXPR``, ``EXPR != EXPR`` --- Infix assertions
-------------------------------------------------------

As you might well know, mere presence of ``DW_AT_declaration``
attribute doesn't tell use whether a DIE is a pure declaration.  We
can probably safely assume that when a compiler produces that
attribute, it will have a value of true (and a form of
``DW_FORM_flag_present``), so most of the time ``?AT_declaration``
(and ``!AT_declaration``) is all you need to write.  But if there are
grounds for suspicion that this is not so, or if we simply want to
shield ourselves from the possibility, we need to actually look at
``DW_AT_declaration``'s value.  So instead of ?AT_declaration, we
should be writing this::

	(@AT_declaration == true)

This intuitively-looking construct actually deserves a closer
attention.  Infix assertions are always evaluated in sub-expression
context.  The mode of operation is that each side is evaluated
separately with the same incoming stack.  Then if the assertion holds
for any pair of stacks that the two sides yield, the overall assertion
holds.  Zwerg has a full suite of these operators--``!=``, ``<``,
``<=``, etc.  There's also ``=~`` and ``!~`` for matching regular
expressions.

Importantly, infix assertions really are assertions.  If they hold,
they produce unchanged incoming stack, otherwise they produce nothing
at all.  Thus expressions such as ``((A > B) == (C > D))`` don't mean
what they seem to.  This one for example is just ``((A > B) (C >
D))``--i.e. two independent conditions.  But consider for example this
snippet::

  	((A > B) != (C > D))	# WRONG!

If the two ``>``'s hold, the expression reduces to ``!=``, or
inequality of two nops.  Such assertion thus simply never holds
[#alwaysfail]_.

.. [#alwaysfail] If for whatever reason you actually do need an
   assertion that never holds, a simple one is ``!()``.

Precedence of comparison operators is lower than that of
concatenation, so you can write a couple words on each side of the
operator.  For example, to look for DIE's where one of the location
expression opcodes is ``DW_OP_addr``, you could say::

	entry (@AT_location elem label == DW_OP_addr)

Due to this precedence setting, comparisons are typically enclosed in
parens (as in the example), so that they don't force too much of your
computation into sub-expression context.  The precedence is however
above ``,`` and ``||`` that are introduced further, so those need to
be parenthesized further if they should be a comparison operand.

For completeness sake, to check that a flag is false, you would use
the following form::

	!(@AT_declaration == true)

If there's no ``DW_AT_declaration`` at a given DIE, the left hand side
of the inner expression doesn't yield anything, and the outer ``!()``
succeeds--which is what we want, because flag absence is an implicit
false value.  If the attribute is present, then the ``!()``
effectively works as a logical negation.  Contrast this with the
following::

	(@AT_declaration == false)	# WRONG!

You would be probably hard pressed to even find a Dwarf file that
actually encodes false flags like this, so the above is useless.

Back to the problem at hand--besides DW_TAG_structure_type, we care
about ``DW_TAG_class_type`` as well!  We can express "and" easily
simply by juxtaposing the assertions, but we would like a way of
expressing "or" as well.

``EXPR, EXPR`` --- ALT-lists
----------------------------

An expression like ``EXPR₁, EXPR₂, ...`` evaluates all constituent
*EXPRₙ*'s with the same input, and then yields all values that each
*EXPRₙ* yields.  If the expressions are assertions, this happens to
behave exactly like a logical or.  So::

	entry ?TAG_subprogram !AT_declaration
	?(child ?TAG_formal_parameter @AT_type (?TAG_structure_type, ?TAG_class_type))

But the applicability is wider.  Since the semantics are
essentially those of a fork, one can for example ask whether an
attribute has one of a number of values::

	$ 'entry (@AT_name == ("argc", "argv"))'
	[8f]	formal_parameter
		name (strp)	argc;
  		[... more attributes ...]
	[9d]	formal_parameter
		name (strp)	argv;
  		[... more attributes ...]

``EXPR || EXPR`` --- OR-lists
-----------------------------

An expression like ``EXPR₁ || EXPR₂ || ...`` works differently.  The
input stack is passed to *EXPR₁* first, and anything that this yields,
is sent to output.  But if nothing is yielded, the same input stack is
passed to *EXPR₂*.  And so on.  It thus yields whatever is yielded by
the first expression that actually yields anything.  It therefore
operates in a manner similar to the operator ``||`` in C language.
The typical use would be in fallbacks.  For example if we prefer
``DW_AT_MIPS_linkage_name`` to ``DW_AT_name``, but can make do with
the latter, that would be encoded as follows::

	entry (@AT_MIPS_linkage_name || @AT_name)

For selecting structures and classes, we can use either of these two
tools interchangeably.

So this is fine, but it still shows only functions that take
structure (or class) arguments directly.  But what if they take a
const argument?  Or if they take a typedef that evaluates to a
structure?  For these cases we need to keep peeling the fluff until
we get to the interesting DIE's.  Enter iterators:

``EXPR*``, ``EXPR+``, ``EXPR?`` --- expression iteration
--------------------------------------------------------

- ``EXPR*`` yields the incoming stack unchanged, and also passes it to
  *EXPR*.  It then collects whatever *EXPR* yields, yields it, and
  sends again to *EXPR*.  It works similarly to ``*`` in regular
  expressions.
- ``EXPR+`` is exactly like ``EXPR EXPR*``.
- ``EXPR?`` is exactly like ``(, EXPR)`` --- it *may* apply once

We can use this tool to remove ``DW_TAG_const_type``,
``DW_TAG_volatile_type`` and ``DW_TAG_typedef`` layers from our
potential structure::

	entry ?TAG_subprogram !AT_declaration
	?(child ?TAG_formal_parameter
	  @AT_type ((?TAG_const_type, ?TAG_volatile_type, ?TAG_typedef) @AT_type)*
	  (?TAG_structure_type, ?TAG_class_type))

Next on, we would like to write a message:

Literals, Strings, Formatting
-----------------------------

Zwerg has roughly C-like string literals, using \\ as an escape
character.  Hello world program looks like this in Zwerg::

	"Hello, world!"

This is an example of a string literal.  Literals in Zwerg add
themselves to the stack.  There are many types of literals--apart from
strings and usual numeric literals, dwrgep knows about all the named
Dwarf constants--e.g. ``DW_AT_name``, ``DW_TAG_array_type``,
``DW_FORM_flag``, etc. are all valid forms [#dwgrepzwerg]_.

.. [#dwgrepzwerg] That dwarf constants are recognized is actually not
	a feature of Zwerg per se.  It's the way that ``dwgrep``
	wrapper sets up Zwerg query engine.

Like C printf, string literals in Zwerg allow formatting directives.
To write a nice error message for our running example, we could do for
example this::

	$ 'entry ?TAG_subprogram !AT_declaration
	   ?(child ?TAG_formal_parameter
	     @AT_type ((?TAG_const_type, ?TAG_volatile_type, ?TAG_typedef) @AT_type)*
	     (?TAG_structure_type, ?TAG_class_type))
	   "%s: one of the parameters has non-trivial type."'
	[35] subprogram: one of the parameters has non-trivial type.

It's clear that we'd like to improve on this a bit.  We'd like to
mention which parameter it is, and we'd like to tell the user the name
of the function, not just a DIE offset.  We'll address both--later.
But first, a bit of background.

When dwgrep sees a string with formatting directives, it converts it
into a function.  That function takes one value for each ``%s``,
substitutes the ``%s`` with values of corresponding stack values, and
then pushes the result to stack.  Consequently, to convert anything to
a string in dwgrep, you would just say::

	"%s"

E.g.::

	$ 'entry ?TAG_subprogram "%s"'
	[35] subprogram
	[6c] subprogram

When there are more formatting directives, each of them takes one
value from the stack, in order from right to left::

	$ dwgrep '1 2 "%s %s"'
	1 2

We could get the desired format string improvements with these tools
in our hands already.  But there's a bit of syntax that will make our
job easier still.

``let X := EXPR;`` --- name binding
-----------------------------------

Often you need to refer back to a value that was computed earlier.
Since this is a stack machine, one way to do this is to use stack
shuffling words--``dup``, ``swap``, ``rot``, ``over`` and ``drop``--to
move stuff around the way you need it.  But keeping track of what is
where when gets old quickly.  For this reason, Zwerg allows that you
give value a name.  Later on, when that name is mentioned, it acts as
a function that pushes the bound value to stack.

In an expression such as ``let X Y := EXPR;``, *EXPR* is evaluated in
a sub-expression context.  Top of stack is bound to name *Y*, and the
value below that to *X*, and so on in this fashion if there are more
names.  E.g.::

	$ dwgrep 'let A := 1;
	          A A add'
	2

	$ dwgrep 'let A B := 10 2;
	          A B div'
	5

Let's use this tool to first remember the two values that we care
about: the subprogram (S) and its naked structure parameter (P)::

	$ dwgrep ./tests/nontrivial-types.o -f /dev/stdin <<EOF
	let S := entry ?TAG_subprogram !AT_declaration;
	let P := S child ?TAG_formal_parameter
	         ?(@AT_type ((?TAG_const_type,
	                      ?TAG_volatile_type, ?TAG_typedef) @AT_type)*
	           (?TAG_structure_type, ?TAG_class_type));
	S P "%s: %s has non-trivial type."
	EOF

	---
	[35] subprogram: [58] formal_parameter has non-trivial type.
	<Dwarf "./tests/nontrivial-types.o">

Which is not too shabby, but having to keep track of which ``%s``
takes which value is perhaps not too comfortable.  For that reason,
Zwerg allows splicing of expressions in strings.

``%( EXPR %)`` --- format string splicing
-----------------------------------------

In format strings, code between ``%(`` and the matching ``%)`` is
evaluated in plain context, after which TOS of the result is popped
and inserted in place of the ``%(...%)``.  ``%s`` is then exactly
equivalent to ``%(%)``.

With this tool, we can make the formatting string clearer::

	$ dwgrep ./tests/nontrivial-types.o -f /dev/stdin <<EOF
	let S := entry ?TAG_subprogram !AT_declaration;
	let P := S child ?TAG_formal_parameter
	         ?(@AT_type ((?TAG_const_type,
	                      ?TAG_volatile_type, ?TAG_typedef) @AT_type)*
	           (?TAG_structure_type, ?TAG_class_type));
	"%( S %): %( P %) has non-trivial type."
	EOF

	---
	[35] subprogram: [58] formal_parameter has non-trivial type.
	<Dwarf "./tests/nontrivial-types.o">

But the actual output is still not very nice.  Ideally we'd mention
names and source code coordinates instead of Dwarf offsets and tag
names.  But with splicing, that's actually quite easy to achieve::

	$ dwgrep ./tests/nontrivial-types.o -f /dev/stdin <<"EOF"
	let S := entry ?TAG_subprogram !AT_declaration;
	let P := S child ?TAG_formal_parameter
	         ?(@AT_type ((?TAG_const_type,
	                      ?TAG_volatile_type, ?TAG_typedef) @AT_type)*
	           (?TAG_structure_type, ?TAG_class_type));

	"%( S @AT_decl_file %): %( S @AT_decl_line %): note: in function "\
	"`%( S @AT_name %)', parameter `%( P @AT_name %)' type is not trivial."
	EOF

	---
	/home/petr/proj/dwgrep/x.c: 3: note: in function `blah', parameter `f' type is not trivial.
	<Dwarf "./tests/nontrivial-types.o">

The message here is already fairly decent, the only thing making it
ugly is that we actually yield a two-value stack.  We'll deal with
this next.

One thing to note here though is the string continuation syntax.  Note
how the formatting string is split into two fragments.  The former one
then ends with ``"\`` instead of the customary ``"``, which is a
signal to the lexer that it should concatenate the two fragments
together before handing them further.  For all intents and purposes,
these two fragments form a single string literal.

If we are paranoid, we can guard against missing ``@AT_decl_file`` and
``@AT_decl_line``.  This is actually fairly important, because
requesting a missing attribute is not an error, but merely causes the
computation to stop.  If, say, ``@AT_decl_line`` weren't available,
the computation would be silently dropped--right at the point where we
had an offender and were ready to report them.  So let's change the
formatting string thus::

	"%( S @AT_decl_file || "???" %): %( S @AT_decl_line || "???" %): "\
	"note: in function `%( S @AT_name %)', "\
	"parameter `%( P @AT_name %)' type is not trivial."

Note how you can use string literals inside ``%( %)`` inside
formatting strings.  Not that it would be a good idea to nest layers
and layers of strings, but in principle it is possible, and for a
quick default like this, there's no harm.

Now to get rid of the Dwarf value that's occupying our bottom stack
slot.  The simplest approach is to drop the value at the point where
we don't need it anymore::

	$ dwgrep ./tests/nontrivial-types.o -f /dev/stdin <<"EOF"
	let S := entry ?TAG_subprogram !AT_declaration;
	let P := S child ?TAG_formal_parameter
	         ?(@AT_type ((?TAG_const_type,
	                      ?TAG_volatile_type, ?TAG_typedef) @AT_type)*
	           (?TAG_structure_type, ?TAG_class_type));
	drop

	"%( S @AT_decl_file || "???" %): %( S @AT_decl_line || "???" %): "\
	"note: in function `%( S @AT_name %)', "\
	"parameter `%( P @AT_name %)' type is not trivial."
	EOF
	/home/petr/proj/dwgrep/x.c: 3: note: in function `blah', parameter `f' type is not trivial.

But there's one more way to make this work, and it would allow us to
introduce another Zwerg feature.

``(|X Y| EXPR)`` --- scoped bindings
------------------------------------

This expression introduces a function that takes one parameter for
each name mentioned between the pipes, then passes the remaining stack
to *EXPR*, which is evaluated in plain context.  When the bound names
are mentioned within *EXPR*, they recall the bound values.  E.g.::

	$ dwgrep '1 (|A| A A add)'
	2

	$ dwgrep '10 2 (|A B| A B div)'
	5

	$ dwgrep '1 (|A| A A add (|A| A A add))'
	4

If we enclose the whole expression into a scope, we can drop the Dwarf
from the stack where we don't need it, but still keep it around as a
name::

	$ dwgrep ./tests/nontrivial-types.o -f /dev/stdin <<"EOF"
	(|D|
	  let S := D entry ?TAG_subprogram !AT_declaration;
	  let P := S child ?TAG_formal_parameter
	           ?(@AT_type ((?TAG_const_type,
	                        ?TAG_volatile_type, ?TAG_typedef) @AT_type)*
	             (?TAG_structure_type, ?TAG_class_type));

	  "%( S @AT_decl_file || "???" %): %( S @AT_decl_line || "???" %): "\
	  "note: in function `%( S @AT_name %)', "\
	  "parameter `%( P @AT_name %)' type is not trivial."
	)
	EOF
	/home/petr/proj/dwgrep/x.c: 3: note: in function `blah', parameter `f' type is not trivial.

So, that's it.  This was a quick tour through the interesting parts of
``dwgrep``.  You may now want to check out :ref:`syntax`, and
:ref:`zw_vocabulary_core` or :ref:`zw_vocabulary_dwarf` to learn about
the actual function words that you can use.
