.. _zw_vocabulary_core:

Core vocabulary
===============

.. include:: zw_vocabulary_core-blurb.txt

.. _zw_vocabulary_core type T_CONST:

T_CONST
-------



Values of this type are used to hold integral constants.  Zwerg
constants allow storing signed or unsigned 64-bit numbers.  Arithmetic
operators handle transitions between signed and unsigned values
transparently::

	$ dwgrep '0xffffffffffffff00 0xffffffffffffffff sub'
	-0xff

Note that division and modulo by zero, as well as overflow, are caught
at runtime::

	$ dwgrep '5 (2, 0, 3) div'
	2
	Error: division by zero occured when computing 5/0
	1

	$ dwgrep '0xffffffffffffffff (-1, 0, 1) add'
	0xfffffffffffffffe
	0xffffffffffffffff
	Error: overflow occured when computing 18446744073709551615+1


Applicable words
................
:ref:`\add <zw_vocabulary_core add>`, :ref:`\div <zw_vocabulary_core div>`, :ref:`\mod <zw_vocabulary_core mod>`, :ref:`\mul <zw_vocabulary_core mul>`, :ref:`\sub <zw_vocabulary_core sub>`, :ref:`\value <zw_vocabulary_core value>`



.. _zw_vocabulary_core type T_SEQ:

T_SEQ
-----



Values of this type hold sequences, ordered heterogeneous collections
of other values (including other sequences)::

	$ dwgrep '[[], 2, "yes"] elem type'
	T_SEQ
	T_CONST
	T_STR


Applicable words
................
:ref:`\!empty <zw_vocabulary_core !empty>`, :ref:`\!ends <zw_vocabulary_core !ends>`, :ref:`\!find <zw_vocabulary_core !find>`, :ref:`\!starts <zw_vocabulary_core !starts>`, :ref:`\?empty <zw_vocabulary_core ?empty>`, :ref:`\?ends <zw_vocabulary_core ?ends>`, :ref:`\?find <zw_vocabulary_core ?find>`, :ref:`\?starts <zw_vocabulary_core ?starts>`, :ref:`\add <zw_vocabulary_core add>`, :ref:`\elem <zw_vocabulary_core elem>`, :ref:`\length <zw_vocabulary_core length>`, :ref:`\relem <zw_vocabulary_core relem>`



.. _zw_vocabulary_core type T_STR:

T_STR
-----



Values of this type hold strings (sequences of characters).  No
unicode support is provided as such, though UTF-8 should naturally
work.  Zwerg strings are not NUL-terminated::

	$ dwgrep '"abc \x00 def" (, length)'
	abc  def
	9


Applicable words
................
:ref:`\!empty <zw_vocabulary_core !empty>`, :ref:`\!ends <zw_vocabulary_core !ends>`, :ref:`\!find <zw_vocabulary_core !find>`, :ref:`\!match <zw_vocabulary_core !match>`, :ref:`\!starts <zw_vocabulary_core !starts>`, :ref:`\!~ <zw_vocabulary_core !~>`, :ref:`\=~ <zw_vocabulary_core =~>`, :ref:`\?empty <zw_vocabulary_core ?empty>`, :ref:`\?ends <zw_vocabulary_core ?ends>`, :ref:`\?find <zw_vocabulary_core ?find>`, :ref:`\?match <zw_vocabulary_core ?match>`, :ref:`\?starts <zw_vocabulary_core ?starts>`, :ref:`\add <zw_vocabulary_core add>`, :ref:`\elem <zw_vocabulary_core elem>`, :ref:`\length <zw_vocabulary_core length>`, :ref:`\relem <zw_vocabulary_core relem>`



.. _zw_vocabulary_core !=:
.. _zw_vocabulary_core !eq:
.. _zw_vocabulary_core !ge:
.. _zw_vocabulary_core !gt:
.. _zw_vocabulary_core !le:
.. _zw_vocabulary_core !lt:
.. _zw_vocabulary_core !ne:
.. _zw_vocabulary_core <:
.. _zw_vocabulary_core <=:
.. _zw_vocabulary_core ==:
.. _zw_vocabulary_core >:
.. _zw_vocabulary_core >=:
.. _zw_vocabulary_core ?eq:
.. _zw_vocabulary_core ?ge:
.. _zw_vocabulary_core ?gt:
.. _zw_vocabulary_core ?le:
.. _zw_vocabulary_core ?lt:
.. _zw_vocabulary_core ?ne:

!=, !eq, !ge, !gt, !le, !lt, !ne, <, <=, ==, >, >=, ?eq, ?ge, ?gt, ?le, ?lt, ?ne
--------------------------------------------------------------------------------



These are comparison operators.  The ones with no alphanumeric
characters in them, ``==``, ``!=``, ``<`` etc., are for use in infix
expressions, such as::

	entry (offset == 0x123)

The others are low-level assertions with equivalent behavior.

Two elements are inspected: one below TOS and TOS (*A* and *B*,
respectively).  The assertion holds if *A* and *B* satisfy a relation
implied by the word.

For example::

	$ dwgrep '1 2 ?lt "yep"'
	---
	yep
	2
	1

Note that there is both ``!eq`` and ``?ne``, ``!lt`` and ``?ge``, etc.
These are mostly for symmetry.  For consistency, the first character
of any assertion is always either ``?`` or ``!``, and both flavors are
always available.



.. _zw_vocabulary_core !empty:
.. _zw_vocabulary_core ?empty:

!empty, ?empty
--------------

``T_SEQ``
.........

Asserts that a sequence on TOS is empty.  This predicate holds for
both empty sequence literals as well as sequences that just happen to
be empty::

	$ dwgrep '[] ?empty'
	[]

	$ dwgrep 'dwgrep '[(1, 2, 3) (== 0)] ?empty'
	[]

``T_STR``
.........

This predicate holds if the string on TOS is empty::

	$ dwgrep '"" ?empty ">%s< is empty"'
	>< is empty

	$ dwgrep '"\x00" !empty ">%s< is not empty"'
	>< is not empty



.. _zw_vocabulary_core !ends:
.. _zw_vocabulary_core ?ends:

!ends, ?ends
------------

``T_SEQ`` ``T_SEQ``, ``T_STR`` ``T_STR``
........................................

The word ``?ends`` asserts that the value on TOS forms a suffix of the
value below it.  This would be used e.g. like so::

	(hay stack expression) ?("needle" ?ends)

This word is applicable to sequences as well as strings::

	[hay stack] ?([needle] ?ends)

For example::

	$ dwgrep '"foobar" "bar" ?ends'
	---
	bar
	foobar



.. _zw_vocabulary_core !find:
.. _zw_vocabulary_core ?find:

!find, ?find
------------

``T_SEQ`` ``T_SEQ``, ``T_STR`` ``T_STR``
........................................

``?find`` asserts that the value on top of stack is contained within
the one below it.   This would be used e.g. like so::

	(hay stack expression) ?("needle" ?find)

This word is applicable to sequences as well as strings::

	[hay stack] ?([needle] ?find)

For example::

	$ dwgrep '"foobar" "oba" ?find'
	---
	oba
	foobar

To determine whether a sequence (or a string) contains a particular
element, you would use the following construct instead::

	[that sequence] (elem == something)

E.g.::

	[child @AT_name] ?(elem == "foo")
	[child] ?(elem @AT_name == "foo")

To filter only those elements that match, you could do the following::

	[child] [|L| L elem ?(@AT_name == "foo")]

The above is suitable if the origin of the sequence is out of your
control.  It is of course preferable to write this sort of thing
directly, if possible::

	[child ?(@AT_name == "foo")]



.. _zw_vocabulary_core !match:
.. _zw_vocabulary_core !~:
.. _zw_vocabulary_core =~:
.. _zw_vocabulary_core ?match:

!match, !~, =~, ?match
----------------------

``T_STR`` ``T_STR``
...................

This asserts that TOS (which is a string with a regular expression)
matches the string below TOS.  The whole string has to match.  If you
want to look for matches anywhere in the string, just surround your
expression with ``.*``'s::

	"haystack" ?(".*needle.*" ?match)

For example::

	$ dwgrep '"foobar" "f.*r" ?match'
	---
	f.*r
	foobar



.. _zw_vocabulary_core !starts:
.. _zw_vocabulary_core ?starts:

!starts, ?starts
----------------

``T_SEQ`` ``T_SEQ``, ``T_STR`` ``T_STR``
........................................

The word ``?start`` asserts that the value on TOS forms a prefix of
the value below it.  This would be used e.g. like so::

	(hay stack expression) ?("needle" ?starts)

This word is applicable to sequences as well as strings::

	[hay stack] ?([needle] ?starts)

For example::

	$ dwgrep '"foobar" "foo" ?starts'
	---
	foo
	foobar



.. _zw_vocabulary_core add:

add
---

``T_SEQ`` ``T_SEQ`` ``->`` ``T_SEQ``
....................................

Concatenate two sequences and yield the resulting sequence::

	$ dwgrep '[1, 2, 3] [4, 5, 6] add'
	[1, 2, 3, 4, 5, 6]

Using sub-expression capture may be a more flexible alternative to
using ``add``::

	$ dwgrep '[1, 2, 3] [4, 5, 6] [7, 8, 9] [|A B C| (A, B, C) elem]'
	[1, 2, 3, 4, 5, 6, 7, 8, 9]

``T_CONST`` ``T_CONST`` ``->?`` ``T_CONST``
...........................................

Zwerg contains a suite of basic operators for integer arithmetic:
``add``, ``sub``, ``mul``, ``div`` and ``mod``.  Two elements are
popped: A and B, with B the original TOS, and ``A OP B`` is pushed
again.  *OP* is an operation suggested by the operator name.

Integers in Zwerg are 64-bit signed or unsigned quantities.  A value
of type ``T_CONST`` can hold either signed or unsigned number.  Which
it is is decided automatically.  Arithmetic operators handle these
cases transparently.

Overflows and division and modulo by zero produce an error message and
abort current computation, which is the reason this operation is
denoted with ``->?`` relation::

	$ dwgrep '5 (2, 0, 3) div'
	2
	Error: division by zero occured when computing 5/0
	1

``T_STR`` ``T_STR`` ``->`` ``T_STR``
....................................

``add`` concatenates two strings on TOS and yields the resulting string::

	$ dwgrep '"foo" "bar" add'
	foobar

Using formatting strings may be a better way to concatenate strings::

	$ dwgrep '"foo" "bar" "baz" "%s%s%s"'
	foobarbaz



.. _zw_vocabulary_core bin:
.. _zw_vocabulary_core dec:
.. _zw_vocabulary_core hex:
.. _zw_vocabulary_core oct:

bin, dec, hex, oct
------------------



``bin`` is used for converting constants to base-2, ``oct`` to base-8,
``dec`` to base-10 and ``hex`` to base-16.  These operators yield
incoming stack, except the domain of constant on TOS is changed.
Examples::

	$ dwgrep '64 hex'
	0x40

	$ dwgrep 'DW_AT_name hex'
	0x3

The value remains a constant, only the way it's displayed changes.
You can use ``"%s"`` to convert it to a string, in which case it's rendered
with the newly-selected domain::

	$ dwgrep 'DW_AT_name "=%s="'
	=DW_AT_name=
	$ dwgrep 'DW_AT_name hex "=%s="'
	=0x3=

Though you can achieve the same effect with formatting directives
``%b``, ``%o``, ``%d`` and ``%x``::

	$ dwgrep 'DW_AT_name "=%x="'
	=0x3=



.. _zw_vocabulary_core div:
.. _zw_vocabulary_core mod:
.. _zw_vocabulary_core mul:
.. _zw_vocabulary_core sub:

div, mod, mul, sub
------------------

``T_CONST`` ``T_CONST`` ``->?`` ``T_CONST``
...........................................

Zwerg contains a suite of basic operators for integer arithmetic:
``add``, ``sub``, ``mul``, ``div`` and ``mod``.  Two elements are
popped: A and B, with B the original TOS, and ``A OP B`` is pushed
again.  *OP* is an operation suggested by the operator name.

Integers in Zwerg are 64-bit signed or unsigned quantities.  A value
of type ``T_CONST`` can hold either signed or unsigned number.  Which
it is is decided automatically.  Arithmetic operators handle these
cases transparently.

Overflows and division and modulo by zero produce an error message and
abort current computation, which is the reason this operation is
denoted with ``->?`` relation::

	$ dwgrep '5 (2, 0, 3) div'
	2
	Error: division by zero occured when computing 5/0
	1



.. _zw_vocabulary_core drop:
.. _zw_vocabulary_core dup:
.. _zw_vocabulary_core over:
.. _zw_vocabulary_core rot:
.. _zw_vocabulary_core swap:

drop, dup, over, rot, swap
--------------------------



These words reorder elements on stack according to the following
schemes:

+------+---------+-----------+
| op   | before  | after     |
+======+=========+===========+
| dup  | A B C D | A B C D D |
+------+---------+-----------+
| over | A B C D | A B C D C |
+------+---------+-----------+
| swap | A B C D | A B D C   |
+------+---------+-----------+
| rot  | A B C D | A C D B   |
+------+---------+-----------+
| drop | A B C D | A B C     |
+------+---------+-----------+

Realistically, most of what end users should write will be an
occasional dup or drop.  Most of the stack reorganization effects can
be described more clearly using bindings and sub-expressions.  But the
operators are present for completeness' sake.



.. _zw_vocabulary_core elem:
.. _zw_vocabulary_core relem:

elem, relem
-----------

``T_SEQ`` ``->*`` ``T_???``
...........................

For each element in the input sequence, which is popped, yield a stack
with that element pushed on top.

To zip contents of two top lists near TOS, do::

	$ dwgrep '[1, 2, 3] ["a", "b", "c"]
	          (|A B| A elem B elem) ?((|A B| A pos == B pos)) [|A B| A, B]'
	[1, a]
	[2, b]
	[3, c]

The first parenthesis enumerates all combinations of elements.  The
second then allows only those that correspond to each other
position-wise.  At that point we get three stacks, each with two
values.  The last bracket then packs the values on stacks back to
sequences, thus we get three stacks, each with a two-element sequence
on top.

The expression could be simplified a bit on expense of clarity::

	[1, 2, 3] ["a", "b", "c"]
	(|A B| A elem B elem (pos == drop pos)) [|A B| A, B]

``relem`` operates in the same fashion as ``elem``, but backwards.

``T_STR`` ``->*`` ``T_STR``
...........................

The description at ``T_SEQ`` overload applies.  For strings, ``elem``
and ``relem`` yield individual characters of the string, again as
strings::

	$ dwgrep '["foo" elem]'
	[f, o, o]



.. _zw_vocabulary_core false:
.. _zw_vocabulary_core true:

false, true
-----------



.. _zw_vocabulary_core length:

length
------

``T_SEQ`` ``->`` ``T_CONST``
............................

Yield number of elements of sequence on TOS.

E.g. the following tests whether the DIE's whose all attributes report
the same form as their abbreviations suggest, comprise all DIE's.
This test comes from dwgrep's test suite::

	[entry ([abbrev attribute label] == [attribute label])] length
	== [entry] length

(Note that this isn't anything that should be universally true, though
it typically will, and it is for the particular file that this test is
run on.  Attributes for which their abbreviation suggests
``DW_FORM_indirect`` will themselves have a different form.)

``T_STR`` ``->`` ``T_CONST``
............................

Yields length of string on TOS::

	dwgrep '"foo" length'
	3



.. _zw_vocabulary_core pos:

pos
---



Each function numbers elements that it produces, and stores number of
each element along with the element.  That number can be recalled by
saying ``pos``::

	$ dwgrep ./tests/dwz-partial -e 'unit (|A| A root "%s" A pos)'
	---
	0
	[34] compile_unit
	---
	1
	[a4] compile_unit
	---
	2
	[e1] compile_unit
	---
	3
	[11e] compile_unit

If you wish to know the number of values produced, you have to count
them by hand::

	[|Die| Die child]
	let Len := length;
	(|Lst| Lst elem)

At this point, ``pos`` and ``Len`` can be used to figure out both
absolute and relative position of a given element.

Note that *every* function counts its elements anew::

	$ dwgrep '"foo" elem pos'
	0
	1
	2

	$ dwgrep '"foo" elem type pos'
	0
	0
	0



.. _zw_vocabulary_core type:

type
----



This produces a constant according to the type of value on TOS (such
as T_CONST, T_DIE, T_STR, etc.).



.. _zw_vocabulary_core value:

value
-----

``T_CONST`` ``->`` ``T_CONST``
..............................

Returns underlying value of the constant, with plain domain.  For
example::

	$ dwgrep 'DW_FORM_flag value'
	12

	$ dwgrep '0xc value'
	12



