               Recursive Functions Algorithmic Language               
                                REFAL-M                               
                       Source Language Reference                      
                             version 2.00                             

Table of Contents
1. Introduction .........................................
2. Data .................................................
     2.1. Atoms .........................................
     2.2. Sequences .....................................
     2.3. Expressions ...................................
3. Simple Patterns ......................................
     3.1. Pattern Variables .............................
     3.2. Basic Patterns ................................
4. Functions ............................................
5. How Things Work ......................................
6. Example ..............................................
7. Advanced Patterns ....................................
     7.1. Symmetric Patterns ............................
     7.2. Non-linear Patterns ...........................
     7.3. Implicitly Iterative Patterns .................
     7.4. Constraints ...................................
          7.4.1. Elementary Constraints .................
          7.4.2. Composite Constraints ..................
          7.4.3. Named Constraints ......................
     7.5. Formal Pattern Matching Specification .........
8. Modular Structure ....................................
9. Additional Features ..................................
     9.1. Empty Functions ...............................
     9.2. Variables and Assignment ......................

Appendix 1. Glossary ....................................

1. Introduction

     REFAL (REcursive Functions Algorithmic Language)  is  a  symbolic 
processing language based  on  pattern  matching  and  rewriting  over 
symbolic expressions. REFAL derives its power from Markov Algorithms - 
a theoretical formalism introduced to capture the notion of  algorithm 
by the russian school of constructive mathematics lead by A.A.Markov. 
     REFAL  combines  few  highly  productive   programming   concepts 
(e.g.  pattern  matching,  equational  programming,  rewriting)  in  a 
surprisingly natural and easy-to-understand way. 
     Consider some simple examples just to get an idea  of  how  REFAL 
programs look like.

SIMPLE EXAMPLE 1: Append function

append    ( ) EY = EY
          (S1 EX) EY = S1 <append (EX) EY>

there is another way to express the append function in Refal:

append    (E1)E2 = E1 E2

SIMPLE EXAMPLE 2: Factorial function

factorial  /0/ = /1/
           SN  = <mul SN <factorial  <sub SN 1>>>

SIMPLE EXAMPLE 3: Partitioning a list into two halfs

partition  EX = <partition-1 () EX ()>
partition-1 (E1) WL EX WR (E2) = <partition-1 (E1 WL) EX (WR E2)>
            (E1) (E2)  = (E1) (E2)
          

     This document has the following structure. In Section 2  symbolic 
expressions are introduced. Section 3 presents  patterns  -  the  main 
concept of programming in Refal. Patterns denote classes  of  symbolic 
expressions  and  are  used  to  check  the  structure  of  the  input 
expression, to access it's  components  as  well  as  to  specify  the 
resulting output expression. Patterns are used in  Refal  to  describe 
functions^ which are the main unit of  describing  transformations  of 
symbolic expressions. Refal functions are introduced in section 4.
     Section 5 provides a general  idea  of  how  Refal  programs  are 
executed. It  addresses  the  order  of  computation  (application  of 
functions to the work expression and gives details  of  the  two  main 
processes: pattern matching (analysis of  the  input  expression)  and 
substitution (synthesis of the output expression).
     Section 6 provides a small but complete example of a  program  in 
Refal. The program is a classical binary tree insertion sorting.
     Section 7 introduces advanced patterns, which  account  for  high 
descriptive power of Refal.
     Section 8 introduces the modular organization of Refal  programs, 
export and import definitions. Section  9  addresses  some  additional 
feature of the language.

     Companion documents include Library Reference and  User's  Manual 
for Refal-M Programming Environment.

2. Data 

     Refal programs process symbolic  structures  built  from  certain 
characters (e.g. the standard ASCII character set). The subset of  the 
same alphabet is used to represent Refal-programs (e.g. uppercase  and 
lowercase letters, digits and certain other characters). The following 
metacode is used to distinguish  data  objects  from  program  objects 
which share the same alphabet:  data  objects  are  treated  specially 
(i.e. enclosed in delimeters) while the  same  characters  as  program 
objects are represented verbose. The following special characters  are 
used to build program objects in Refal (i.e. are reserved):
          < > ( ) / ' = S W V E + 

2.1. Atoms

     Atoms are the minimal units of data  (not  decomposable  by   any 
means of Refal). There are three types of atoms in Refal:  characters, 
numbers, and symbols.
     A  character is denoted  by  any  valid  character,  enclosed  in 
apostrophes ('). The character apostrophe (') itself is denoted  as  a 
pair of apostrophes (''). Note that ' ' is a representation  of  the 
space character.
     Numbers are denoted by a sequence of digits, optionally  preceded 
by a plus (+) or minus (-) sign, enclosed in slashes (/). For  numeric 
operations  Refal  handles   integer   numbers   in   the   range   of 
0- 2**32-1. 
     Symbols in Refal  are  denoted  by  an  identifier,  enclosed  in 
slashes (/). Identifiers in Refal are sequences of letters, digits and 
special characters (-) and (_), starting with  a  letter.  Symbols  in 
Refal are used to denote functions or simply as atomic objects with  a 
meaningful name (e.g. delimiters, tags, etc.).

2.2. Sequences

     The basic data structure in Refal is a sequence. A sequence is  a 
concatenation of atomic objects. Any several atoms can be concatenated 
into  a  sequence.  Note  that  Refal  does  not  have  any   sequence 
constructor (i.e. any  reserved  character,  denoting  a  sequence  of 
objects). Any several juxtaposed objects  represent  a  sequence.  For 
example, the following line contains a sequence of five atoms:
     'a' 'b' /100/ /abc/ 'c'

     There is a shorthand notation for a sequence of  characters:  all 
component characters can be put in a  string,  enclosed  only  in  two 
outer apostrophes. If the sequence of  characters  contain  apostrophe 
characters, they are denoted by a pair of  apostrophes  (''). The 
following lines contain several examples:

Original Sequence    Refal Sequence       Refal String              

abcd             ->    'a' 'b' 'c' 'd'  -->  'abcd'
a cc             ->    'a' ' ' 'c' 'c'  -->  'a bcd'
it's             ->    'i' 't' '' 's'   -->  'it''s'

'                ->    ''               -->  ''
''               ->    ''''             -->  ''''

Note that if the sequence contains only apostrophes,  the  string   is 
not further enclosed into outer  apostrophes  (apostrophes  are  still 
duplicated).

2.3. Expressions

     Arbitrary tree structures can be denoted  as  Refal  expressions. 
Expressions  in  Refal  are  the  same  as  S-expressions   in   Lisp. 
Expressions in Refal are built from atoms and sequences of atoms using 
pairs of parentheses to denote  structuring.
     Refal expression is defined as a  (possibly  empty)  sequence  of 
terms. A term is either an atom or a sequence of terms  (i.e.  another 
expression)  enclosed  in  parentheses.  A  sequence   of   terms   in 
parentheses is called a bag.
     The following lines contain several examples of expressions:
          (/a/ 'abcdef') (/b/ 'ijklmno') (/c/ 'pqrst')
          /1/ '+' /2/ '-' ( /3/ '*' /4/)
          (/1/ (/2/ () ()) (/4/ ()()) ) 

     Syntactically expressions are defined as follows:

<expression>::= 
          | <expression> <term> 
<term>::= <atom> 
          | <bag>
<bag>::= ( <expression> )

3. Simple Patterns 

     Patterns play the key  role  in Refal  programming.   Abstractly, 
patterns denote classes of  expressions.  Pragmatically,  the  use  of 
patterns in Refal is threefold: they are used to check  the  structure 
of the input expression, to access its components and to  specify  how 
these components should be rearranged to  form  the  resulting  output 
expression. 

3.1. Pattern Variables

     Patterns are constructed from terms (i.e. atoms and  parentheses) 
and pattern variables. A trivial pattern  consists only of  atoms  and 
bags and denotes a single  expression. 
     Non-trivial patterns can be constructed with the use  of  pattern 
variables. An occurrence of a pattern variable in a pattern means that 
this particular place in  the  expression  should  be  occupied  by  a 
subexpression of a certain kind. The kind of the subexpression depends 
on the kind of the particular pattern variable. There are  four  kinds 
of pattern variables in Refal:
    -  S-variables, which  denote  a  subexpression  consisting  of  a 
single atom (but neither a bag nor a sequence);
     - W-variables, which  denote  a  subexpression  consisting  of  a 
single term (i.e. either an atom or a bag, but not a sequence);
     - V-variables, which denote any non-empty sequence  of  arbitrary 
terms;
     - E-variables,  which  denote  any (possibly empty)  sequence  of 
arbitrary terms;

     Alternatively, one can say that:
    -  S-variables are used to represent any single atom at a certain
particular place in the sequence;
     - W-variables are used when the corresponding  place  in 
the sequence can be occupied by any single term;
     - V-variables are used to represent any non empty subexpression;
     - E-variables are used when the  corresponding   place   in   the 
sequence can be occupied by any subexpression (possibly empty);

     Syntactically, a pattern variable is the following triple:

<pattern variable>::= <tag><constraint><var-name>

     The tag is one of  the  four  special  characters  (S,W,E,V)  and 
denotes the kind of the variable as outlined above.
     Constraint part of the variable may be empty. Constraints can  be 
used to specify additional conditions on the  subexpressions,  denoted 
by the variable. Constraints are introduced in section 7.4.  When  the 
constraint part is empty, it means that no additional  conditions  are 
provided and possible subexpressions are derived from the tag  of  the 
variable.
     The name of the variable is used to  subsequently  reference  the 
subexpression denoted by this particular variable (i.e. unique name of 
the pattern variable). A name is a single letter or digit.

<tag>::= S | W | E | V
<constraint>::= | ...
<var-name>::=<letter> | <digit>

     Below are some starting examples of patterns, of   expressions 
that they denote (match) and those they don't match.

pattern     matching expression        non-matching expression

'abc'       only 'abc'                  'a','abcd',('abc'), etc.
SX         'a','b',/100/,/abc/,etc.    ('a'), empty, etc.     
SX'abc'    'aabc',/1/'abc',etc.        'a','aabcd',('a')'abc',etc.
SX SY SZ    'abc',/1/ /2/ /3/,etc.      'a' 'b', ()()(), etc.    
WX/100/    /1/ /100/, (/1/)/100/,etc.   /100/, /1/'a',etc.
WX EY      ('abc'),'abcdef',etc.        empty.
(SX EY) SZ ('abc')/1/,(/1/)/2/),etc.    'ab',(/1/)(/2/),etc.
'a' EX         'a','ab', 'a'('bc'), etc.  'b', empty, etc.
('a'VX)       ('ab'),('a'('b')),etc.    'ab', ('a'), etc.
EX             any                      none


     To summarize, a pattern is defined as follows:

<pattern>::= 
          | <pattern> <pattern-term>
<pattern-term>::= <atom> 
          | <pattern variable> 
          | ( <pattern> )

     Patterns  are  used  to  partition the input expression  (and  to 
check if it may be partitioned in the desired way). Patterns are  also 
used to generate other expressions.
     Non-procedurally,  patterns  denote  classes  of  expressions.  A 
pattern and an input expression may be considered as an equation which 
is to be resolved by generating appropriate instantiations of  pattern 
variables. Another pattern can be used as a generator to describe  the 
composition      of      another      expression      from       these 
instantiations.
     Procedurally,   patterns   cover   the   following   actions   of 
conventional imperative languages:
- parameter passing;
- assignment to local variables;
- conditions and branching;
- sequencing;
- calls of primitive procedures (e.g. Lisp functions CAR, CDR, etc.);

     Following  sections  cover  pattern  matching  in  more   detail. 
Patterns are divided into several classes according to the  complexity 
of the underlying pattern matching process.

3.2. Basic Patterns

     Let's  define  several  classes   of   patterns   in   increasing 
complexity:   constant  patterns,  rigid  patterns,  basic   patterns, 
symmetric  patterns,  non-linear  patterns  and  implicitly  iterative 
(arbitrary) patterns. 
     Constant pattern is any expression (possibly empty). The matching 
of a constant  pattern  is  a  simple  sequence  of  checks  on  terms 
(including also some elements of tree traversal with respect to bags).

<constant pattern>::= <expression>

Examples of constant patterns:

'abc'              matches 'abc'
('abc)d')('e')     matches ('abc)d')('e')
                   matches             

Note that the last example means:  empty  matches  empty!  There is no 
denotation for an empty  expression.  By the way note  also  different 
types of parentheses in the second example: the expression is a  valid 
sequence of only two bags where the first bag contains a sequence five 
characters, the fourth of which is a left parenthes character.

     Rigid patterns  may  contain  only  S-,  and  W-variables.  Rigid 
patterns are linear (each variable occurs  exactly  once).  The  added 
complexity in matching rigid patterns is that certain terms have to be 
assigned to local variables. 

<rigid pattern>::= 
          | <rigid pattern> <rigid term>
<rigid term>::= <atom>
          | <term> 
          | <S-variable> 
          | <W-variable>
          | ( <rigid pattern> )

Examples of rigid patterns:

SX SY SZ            matches any sequence of exactly three atoms
W1 (WA WB)          matches a sequence of a term and a bag, containing 
                              a sequence of two other terms
W1 'a' W2           matches a sequence of a term,  character  'a'  and 
                              another term

     Basic patterns allow restricted use of E-, and V-variables. In  a 
basic pattern E- and V-variables may occur only as last terms (e.g  in 
a bag or at the top level). Basic patterns  are  linear  patterns.  In 
basic patterns E- and V-variables are used to denote tail portions  of 
sequences.
     Informally, a  basic  pattern  is  a  linear  pattern  (i.e.  all 
variables occur only once), which can be matched in a single pass from 
left to right. Syntactically, basic patterns are  defined  as  follows 
(all occurrences of pattern variables are unique):

<basic pattern>::=  <basic sequence>  
     | <basic sequence> <E-variable>
     | <basic sequence> <V-variable>
<basic sequence>::= 
     | <basic sequence> <basic term>
<basic term>::= <atom>
     | <term>
     | <S-variable>
     | <W-variable>
     | ( <basic pattern> )

     Constant, rigid and basic patterns are  called  simple  patterns. 
They can be matched in a single pass over the expression from left  to 
right.

     In fact, REFAL does  not  impose  any  limitations  on  patterns. 
Several other classes of patterns are unique to REFAL. These  patterns 
include:
- symmetric patterns (i.e. which imply symmetric processing of  
expression (i.e. from right to left as well as from left to right);
- non-linear patterns (i.e. which have several occurrences of the same  
variable);
-  implicitly  iterative  patterns (i.e.   which   contain   E-,   and 
V-variables and imply a restricted form of backtracking to match); 

     Advanced patterns are covered in section 7.


4. Functions 

     Patterns in REFAL are used in pairs. A pair of patterns is called 
an equation. It describes a certain transformation rule for a  certain 
class of  expressions.  The  first  pattern  in  a  pair  (called  the 
left-hand side of the rule or  simply  LHS)  describes  the  class  of 
applicable input expressions, and (procedurally) also  the  break-down 
of the input expression into the values of the  corresponding  pattern 
variables. It is said that when the expression is matched against  the 
pattern, pattern  variables  are  instantiated  to  the  corresponding 
subexpressions. 
     The second pattern (right-hand side or RHS) describes the outcome 
of the transformation. Pattern variables in this pattern denote places 
where the instantiated values should be inserted. Obviously,  the  RHS 
of the rule should not contain variables with  names  which  have  not 
occurred in LHS.
     Several equations  are  grouped  together  to   form   the   main 
transformation unit - Refal function. Each function has a unique name. 
     Syntactically, a function is:

<function definition>::=<ident> <equations>
<equations>::= 
     | <equations> <lhs> = <rhs>
<lhs>::= <pattern>
<rhs>::= <RHS-pattern>

     Functions in Refal have exactly one argument and map  expressions 
to expressions.
     When a function  is  applied  to  an  expression,  the  left-hand 
patterns are matched against the argument until the  matching  LHS  is 
found. Then the transformation of the argument expression is performed 
using the RHS of the first  applicable  equation  and  the  values  of 
pattern variables. Equations are tried in the same order  they  appear 
in the porgramm, i.e. top to bottom. 

     How to call a  function with a certain argument ?  To  accomplish 
this the RHS of each rule apart  from  parentheses  can  have  angular 
(functional) brackets (< and >). Left angular bracket  (<)  should  be 
followed by the name of the function. The expression enclosed into the 
angular brackets is the argument of the function. Calls can be  nested 
to arbitrary depth.
     Non-procedurally, the occurrence of an application of a  function 
name to a certain argument in  RHS  denotes  the  the  result  of  the 
function.

<RHS-pattern>::=
          | <RHS-pattern> <RHS-pattern-term>
<RHS-pattern-term>::= <atom> 
          | <pattern variable> 
          | ( <RHS-pattern> )
          | <RHS-application>
<RHS-application>::= <<name> <RHS-pattern>>

     Functions can call themselves. Consider the example of  factorial 
function.

* Factorial function

factorial  /0/ = /1/
           SN  = <mul SN <factorial  <sub SN 1>>>

     The RHS  of  the  second  rule  has  three  nested  applications. 
Functions mul and sub are primitive functions for  multiplication  and 
subtraction. The middle application calls the same factorial function.

5. How Things Work 

     The overall computational process in Refal is the  transformation 
of a single work  expression.  A  work  expression  is  an  expression 
containing applications.  The  transformation  is  carried  out  in  a 
sequence of steps, where each step is an application of  one  function 
to its argument. When a transformation step has  been  performed,  the 
work expression is updated. The work expression may expand or contract 
or even remain the same. New  applications  may  appear  in  the  work 
expression.
     How the application to be processed at the next step is chosen ?
     Applications  are  processed   in   inside-out,   left-to-right 
(applicative) order. In  other  words,  the  argument  of  a  function 
application is computed first  and  then  the  application  itself  is 
processed. 
     Let's define the leading application as the application which  is 
chosen at the next step. The  rule of thumb is: a leading  application 
has the first closing angular bracket in the expression (from left  to 
right). Obviously a leading application is  the  left-most  inner-most 
application (the first application from left to right which  does  not 
have other applications in its argument). 

<work-expression>::= 
          | <work-expression> <work-term> 
<work-term>::= <atom> 
          | <bag>
          | <application>
<bag>::= ( <expression> )
<application>::= <<name> <work-expression>>

     Refal computational process can be outlined as follows:

while( there are applications in work expression ) {
     choose the leading application;
     i=1;
     do {
          if ( match LHS[i] to the argument expression )
               generate result using RHS[i];
               substitute result for the application;
               break;
          else  i=i+1;
          } while ( there are more rules );
     if ( no rules matched ) abort;
     }

     Let's consider processing the following initial work expression:
<factorial /3/> 

The work expression will be transformed in the following  sequence  of 
steps:
<factorial /3/>
<mul /3/ <factorial <sub /3/ /1/>>>
<mul /3/ <factorial /2/>>
<mul /3/ <mul /2/ <factorial <sub /2/ /1/>>>>
<mul /3/ <mul /2/ <factorial /1/>>>
<mul /3/ <mul /2/ <mul /1/ <factorial <sub /1/ /1/>>>>>
<mul /3/ <mul /2/ <mul /1/ /1/>>>
<mul /3/ <mul /2/ /1/>>
<mul /3/ /2/>
/6/

6. Example

     The program considered in  this  section  performs   binary  tree 
insertion sort. Each node of the binary tree is represented  as  three 
terms - the root element and two bags, containing subtrees:
                  <elem> ( <tree> ) ( <tree> )                    
     All elements, less or equal than the root, are stored in the left 
subtree; all elements greater than the root are stored  in  the  right 
subtree. An empty tree is represented as an empty expression.

*
* This function inserts an element into the tree
* Synopsis: insert <tree> <element>
*
insert   SE = SE () ()
         SR (EL)(ER) SE = <insert-1 <nrel SR SE> (EL)(ER)>
*
* nrel is a standard library function to compare two numbers
* It returns relation symbol {'<','=','>'} and both numbers.
*

insert-1 '<' SR SE (EL)(ER) = (SR ( <insert EL SE> ) (ER))
          '='SR SE (EL)(ER) = (SR ( <insert EL SE> ) (ER))
          '>'SR SE (EL)(ER) = (SR (EL) (<insert ER SE> ))
*
* To sort a sequence of elements insert all elements into a tree
* and then flatten the tree;
* The tree is kept in a bag and initially is empty 
* Synopsis: sort <element> <element> ...
*
sort      =
       ES = <flatten <insert-all () ES>>

*
* To insert a sequence of  elements  into  a  tree  insert  the  first 
* element and then insert all other elements
* The tree is kept in a bag, and when all elements are inserted, the
* resulting tree is taken out of the bag
* Synopsis: insert-all ( <tree> ) <element> <element> ...
*
insert-all  (ET) = ET
            (ET) SE EX = <insert-all ( <insert ET SE> ) EX

*
* A flattened tree is a sequence of flattened  left  subtree,   the 
* root element and flattened right subtree
* Synopsis: flatten <tree>
*
flatten    =
          SR (EL)(ER) = <flatten EL> SR <flatten ER>

*
* Test: print some sorted sequence
*
* task is a standard entry point of the program
* the initial work expression is always <task>
*

task      = <print <sort /99/ /1/ /98/ /2/ /99/ /97/ /3/>>

7. Advanced Patterns

     A great deal of  the  power  of  Refal  comes  from  the  use  of 
sophisticated patterns. Refal does not impose any restrictions on  the 
composition of patterns. Basic patterns were  covered  in  section  3. 
This section covers several other classes of patterns which are unique 
to REFAL. These patterns include:
- symmetric patterns;
- non-linear patterns; 
- implicitly  iterative  (arbitrary) patterns;

     Section 7.4 covers the use of constraints on variables, mentioned 
briefly in section 3. Section 7.5 provides a formal  specification  of 
pattern matching in Refal.

7.1. Symmetric Patterns

     Symmetric patterns are rigid patterns,  where  each  sequence  of 
terms may have one E- or V-variable (or none). Basic patterns  can  be 
thought of as restricted  symmetric  patterns,  where  the  E-  or  V- 
variable may occur only at the end of term sequence.
     Symmetric patterns does not imply the  symmetry  of  the  pattern 
itself, but rather the symmetry of  the  two  directions  of  matching  
(i.e.  to  match  a  symmetric  pattern  the  expression  has  to   be 
traaaversed both from left to right and from right to left).

<symmetric pattern>::=  <symmetric sequence>  
     | <symmetric sequence> <E-variable> <symmetric sequence>
     | <symmetric sequence> <V-variable> <symmetric sequence>
<symmetric sequence>::= 
     | <symmetric sequence> <symmetric term>
<symmetric term>::= <atom>
     | <term>
     | <S-variable>
     | <W-variable>
     | ( <symmetric pattern> )

     Lets consider the use of symmetric patterns   in  a simple 
example:

* Partitioning a list into two halfs

partition  EX = <partition-1 () EX ()>
partition-1 (E1) WL EX WR (E2) = <partition-1 (E1 WL) EX (WR E2)>
            (E1) (E2)  = (E1) (E2)

     Partition function creates two empty bags and passes its argument 
to partition-1.
     The function partition-1  uses  symmetric  pattern  to  partition 
the sequence: (E1) WL EX WR (E2)
     The non-procedural reading of this  pattern  is:  any  expression 
which starts with a bag, ends with a bag and  has  at  least  two  more 
terms in between. 
     Procedurally the pattern matching process might check  the  first 
term of the expression, make sure that it is a bag,  move  inside  the 
bag and assign the inside sequence to a local variable, check that the 
second term of the expression  exists,  assign  it  to  another  local 
variable, move to the end of the expression, check that the last  term 
is a bag and assign its contents to yet another variable and then move 
to the next term from right to left, assign  it  and  the  assign  the 
middle of the expression between the second and the last but one terms 
to another local variable.
     The  RHS of the first rule in partition-1  describes  the  output 
expression: (E1 WL) EX (WR E2)
     Procedurally, this pattern  reads  as  follows:  stuff  the  term 
assigned to WL into the first bag, then stuff the  term  WR  into  the 
last bag.
     Symmetric patterns need not be symmetric  themselves!  They  only 
denote the matching process that is capable of processing the argument 
from both sides. The following pattern is also symmetric:
     E1 ('abc')
     Note, that this pattern matches any  expression  where  the  last 
term is a bag, containing the string 'abc'.

7.2. Non-linear Patterns

     Refal  does  not  impose  the  limitation  on  the  linearity  of 
patterns. There may be several occurances of variable  with  identical 
name in a single LHS. The idea is of course that all instantiations of 
variables with identical names   are  to  be  the  same.  In  case  of 
S-variables this implies a rather simple  check  on  the  equality  of 
assigned instantiations during pattern matching. For W-,  E-,  and  V- 
variables  this  equality  check  implies  tree  traversal   and   its 
complexity depends on the size of instantiations being compared.  This 
means that the run time behaviour of non-linear patterns  can  not  be 
easily predicted at compile-time.
     Basic  or  symmetric  non-linear  patterns  are  still   obvious. 
Arbitrary (implicitly iterative) non-linear patterns are somewhat more 
difficult to grasp. The behaviour of arbitrary patterns can  be  fully 
understood after  considering  the  formal  specification  of  pattern 
matching. (see sections 7.3 and 7.5)

     Let's  consider  a  small   example   on   non-linear   patterns. 
The function palindrome checks,  wheater  a  sequence  of  atoms  is  a 
palindrome and outputs 'yes' or 'no' accordingly.

palindrome  = 'yes'
          SX = 'yes'
          SX EM SX = <palindrome EM>
          SX EM SY = 'no'

     Note the order of rules in palindrome. The non-linear pattern (SX 
EM SX)is more specific than its linear  counterpart  (SX  EM  SY)  and 
therefore it is placed first (since the rules are matched from top  to 
bottom).

7.3. Implicitely Iterative Patterns

     Implicitely  iterative  patterns  may  contain  several  E-    or 
V-variables in a single sequence. Let's consider a trivial example:

find-a  E1 'a' E2 = 'yes'
        EX        = 'no'

     This simple function involves an implicitly iterative pattern  E1 
'a' E2. Non-procedurally, this pattern reads  as  follows:  break  the 
argument into two parts with a character 'a' in between. Procedurally, 
this means an iterative process that traverses the sequence and  looks 
for a character 'a', lengthening the instantiation of E1 when it fails 
to match the current term as 'a'. Complexity of such search depends on 
the argument, and thus can not be easily predicted at compile-time.
     E- and V-variables that should be iteratively lengthened for  the 
whole  pattern  to  match  are  called  open  variables.   Other   E-, 
V-variables which can be matched either as "the tail of the  sequence" 
(basic patterns) or as "the middle   of   the   sequence"   (symmetric 
patterns) are called closed variables. 
     Note that in the pattern above there is only one  open  variable. 
Actually there are two possibilities as to which of the two  variables 
E1, E2 is open: when matched left to right E1  is  open  E2  -  closed 
(tail  of  the  sequence).  But  the  same  pattern  can  be   matched 
from right to left as well. Then E2 is open and E1 - closed. Note that 
lengthening from left to right is always preferred.
     Implicitly iterative patterns are  great  for  programming  table 
lookups. When used with  constraints  (see  next  section)  implicitly 
iterative patterns can be used to programm lexical analysis, etc.
     When fully understood, implicitly iterative patterns can be used 
to programm very complex conditions on sequences in a compact way.

7.4. Constraints

     Each  pattern  variable  may  have  a   constraint. Constraints 
can be provided in one of three forms:

<constraint>::= 
     | (<constraint sequence>)
     | :<constraint name>:
     | (:<constraint name>:)

<constraint sequence>::= <composite constraint>

     Constraints provide restrictions on the top level  terms  in  the 
instantiation expression. They do not specify the order  of  terms  in 
the top level sequence.

Below are some simple examples of  constrained variables:

S('abc')X     matches   'a','b' or 'c' and nothing else
E('abc')X     matches   any  sequence  consisting  of  the characters 
               'a','b' and 'c' or an empty expression, e.g. 'cccc' or 
               'cabcab'  or 'aabbcc', etc.

     In case  of  non-linear  patterns,  i.e.  several  occurences  of 
variables with identical names, each variable  may  have  a  different 
constraint.  According  the  obvious  rule  that  all  variables  with 
identical names should have identical  instantiations,  the  resulting 
constraint is an intersection of all individual constraints. 

S('ab')X S('ac')X      matches only expression 'aa'
E('abc')X E('def')X    matches only an empty expression
V('x')1 V('y')1        doesn't  match  any  expression  at  all

     Procedurally, each constraint specifies additional  checks  which 
should be performed when the host variable has been matched  (or  when 
it was lengthened in case of open variables  of  implicitly  iterative 
patterns).
     Constraint sequences may be named and kept apart from variables 
(see section 7.4.3.) In this case the constraint part of the  variable 
contains the name of the constraint sequence.

7.4.1. Elementary Constraints

     Following constraints are treated as elementary building  blocks, 
from  which  composite  constratins  can  be  built.  Each  elementary 
constraint denotes a certain set of atoms or terms.
     A set consisting of a certain atom is denoted by the atom itself. 
Thus any atom can be used as an elementary constraint.
     Elementary constraints include the  following  standard  sets  of 
atoms, which are denoted by a single capital letter:

D- a set of digit characters ('0',...,'9');
L- a set of letter characters (uppercase and lowercase);
O- a set of all characters;
N- a set of all numbers;
F- a set of all symbols;

     The following elementary constraints  denote  terms:
S- a set of all atomic terms;
B- a set of all bags;
W- a set of all terms;

<elementary constraint>::= <atom>
     | <standard set>
<standard set>::= D | L | O | N | F | S | B | W

7.4.2. Composite Constraints

     More complex  sets  can  be  constructed  on  top  of  the  sets, 
denoted by  elementary  constraints.  The  composition  operations  on 
the sets of constraints are union and difference.
     A sequence of constraints denotes a union of the sets, denoted by 
the component constraints. A sequence of  constraints  in  parentheses 
(Q) denote a negation of the set denoted by Q (i.e.  difference  W-Q), 
where W is a standard elementary constraint, denoting the set  of  all 
possible terms. A pair (Q)P where P,Q are constraints  denotes  a  set 
theoretical difference P - Q.

<composite constraint>::= <constraint union>
     | ( <constraint sequence> ) 
     | ( <constraint sequence> ) <constraint union>

<constraint union>::= <elementary constraint>
     | <constraint union> <elementary constraint>

Below are more examples of the use of constrained variables:

S(L)1 E(LD'_')2 matches any valid identifier
S(LD)A  S('+-*/')B   S(LD)C    matches   certain   simple   arithmetic 
     expressions expressions, e.g. 'a+1', '1-3', 'b*3', '1/x', etc.
E(' ')1 E((' ')L)X E(' ')2  matches any sequence of characters with 
                              leading and trainling spaces.


7.4.3. Named Constraints

     Unique names can be associated with constraint  sequences.  These 
names should not conflict with that  of  functions.  Named  constraint 
sequences are introduced by constraint definitions.

<constraint definition>::=<name> S <constraint sequence>

7.5. Formal Pattern Matching Specification

     Let's consider details of pattern matching in Refal.
     Let's consider an application <F E0>, where E0 is some expression 
not containing applications. The group of rules  for  the  function  F 
consists of N pairs: LHS[1]=RHS[1] ,... LHS[N]=RHS[N].
     As outlined above,  the  step  involving   application   <F   E0> 
consistsof trying to match  LHS[i]  for  i=1,..,N  until  some  LHS[k] 
matches E0.
     Let's consider an LHS L, which is a sequence of pattern terms:
     L=PT1 PT2 ... PTn,  where  PTi are pattern terms. 
     Assume further, that L contains several variables: V1,...,Vk

     By definition an expression E0 matches  LHS  L  iff  there  exist 
instantiations E1,...,Ek of pattern variables  V1,...,Vk  in  L,  such 
that the instantiated pattern (i.e. L in which all occurrences  of  Vi 
are substituted for Ei) is equal to E0. It should be  noted  that  the 
instantiation of an S-variable can be a single atom; instantiation  of 
a W-variable - an atom or a bag; instantiation of a V-variable can  be 
any non-empty expression.  Instantiations  of  variables  should  also 
satisfy the available  constraints  (see  section  7.4).  In  case  of 
multiple occurrences of some variable in L, all instantiations of that 
variable should be equal.
     Any ambiguities,  resulting  from  the  occurrences  of  E-,  and 
V-variables are resolved as follows.  Let's introduce  the  length  of 
expression as the number of terms at the highest level. Inductively,
the length of a n empty expression is 0;
if E=T1 T2 ... Tn where Ti are terms, the length of E is n.
     The unambiguous pattern matcher will  choose  the  instantiations 
for E-, and V-, variables such that for each two pattern variables  Vi 
Vj in E0 where i < j the length of the instantiation Ei of Vi is  less 
than the length of Ej of Vj.

8. Modular Structure

     A Refal program is  a  collection  of  function,  constraint  and 
other definitions. A program may  consist  of  several  separately 
compiled modules.

<module>::= <module header> <definitions> <module end>
<module header>::=<name> START <export-import definitions>
<export-import definitions>::=
     | <export-import definitions> <export definition>
     | <export-import definitions> <import definition>
<export definition>::=  ENTRY <name list>
<import definition>::=  EXTERN <name list>
          | SYSTEM <name list>

<name list>::= <export-import name>
     | <name list> , <export-import name>
<export-import name>::= <name>
     | <name>(<name>)

<definitions>::= 
     | <definitions> <function definition>
     | <definitions> <constraint definition>
     | <definitions> <box definition>
     | <definition>  <symbol definition>

<module end>::= END

     All names imported from other modules should be listed in  import 
definitions. An external synonym (i.e.  the  name  as  it  is  visible 
outside the module) can be provided in parentheses. Built-in functions 
should be listed in  system  definitions.  Refal-M  makes  distinction 
between Refal names imported from other Refal modules and the names of 
built-in functions.
     All names which should be made visible outside the current module 
should be listed in export definitions. One module should  export  the 
name TASK which is the main  function  of  the  program.  An  external 
synonym can be provided with the exported names.
     Lexically the text of each module is a sequence of lines.  Module 
can include comment lines, which should start with an asterisk at  the 
first position. All function and constraint definitions  should  start 
from  the  first  position.  All  export,  import,  box,  and   symbol 
definitions should not start from the first position.
     Line are composed of tokens and separators. A separator  is  a 
whitespace. Any line can be broken at a  separator.  In  this  case  a 
continuation mark (+) has to be inserted at the end of line.

Example:

example start
     entry task
     system print
task = <print 'Hello, World!'>
     end

9. Additional Features ..................................

     9.1. Empty Functions ...............................

     Usually atomic symbols in Refal  correspond  to  function  names. 
Sometimes it is useful to have a  unique  symbol  for  other  purposes 
(i.e. to be used as a tag, or as a delimiter, etc.). In this  case  an 
empty function has to be defined. One way to do this in  Refal  is  to 
define a function with no rules, i.e. just to put the desired name  on 
a separate line from position one. A more convenient way to accomplish 
the same is to group all empty functions into a  single  definition(s) 
by  using an explicit symbol definition:

<symbol definition>::= EMPTY <ident list>
<ident list>::=<ident>
     | <ident list> <ident>

     9.2. Variables and Assignment ......................

     Variables let the programmer write history sensitive programs  at 
the expense of breaking the applicative nature of the  language.  Note 
that there is a more "clean" way to achieve the same  by  passing  the 
state information around.
     Refal provides explicit variables in the form of boxes.  Abox  is  
defined using box definition:

<box definition>::= SWAP <ident list>

     Boxes are static objects.  Each  box  may  contain  an  arbitrary 
expression. There are two sets of interfaces to a box  in  Refal.  One 
way to deal with boxes is to use explicit built-in functions. 
     There are five built-in function which can be used to  work  with 
boxes: 
- WTR (Store expression into box)
- PTR (Append expression to the contents of box)
- GTR (Take the contents out of the box; the box is cleaned)
- RDR (Copy the contents of the box; the box does not change)
- SWR (Swap expression and the current contents of the box)

Example:
boxes start
     entry task
     system write(wtr),read(rdr)
     system print
     swap box1
task = <print <rdr /box1/>>                        +
              <print <wtr /box1/ 'abc'>>           +
              <print <rdr /box1/>>
     end

This program will print 'abc'. The first operation on box1  yields  an 
empty expression since all boxes are  initialized  to  hold  an  empty 
expression. the  second  operation  on  box1  again  yields  an  empty 
expression, because the result of wtr  is  empty.  Finally  the  third 
operation yields the copy of  the  new  contents  of  the  box  -  the 
sequence 'abc'.
     A more exotic interface views boxes as function  which  swap  the 
contents of the box and the argument. 

Example 
boxes-2 start
     entry task
     system print
     swap box2
task = <print <box2>>  +
          <print <box2 'abc'>> +
          <print <box2>>
Note that this example still prints the sequence 'abc',  but  the  box 
will be empty at the end of execution.

Appendix 1. Glossary

atom                Data  unit  (of  the  type  character,  number  or 
                   symbol) not decomposable by the means of Refal

bag                 A sequence of terms in parentheses

box                 A static variable capable of storing an  arbitrary 
                   expression

sequence            Concatenation of terms

expression          Tree-structure of terms

term                An atom or a bag

pattern             Construction denoting a class of  expressions;  is 
                   built using atoms parentheses and pattern variables

pattern   variable    Construction   used   to   denote   a    certain 
                   subexpression. Pattern variables have  a  tag  (S-, 
                   W-, E-, V-variables) and a name (single  letter  or 
                   digit) and possibly a constraint between the two.

constraint          Construction  to  denote  the  set  of  additional 
                   conditions  on  the  terms  at  the  top-level   of 
                   subexpression matched by a pattern variable. 

S-variable          Denotes  a   subexpression,   consisting   of   an 
                   arbitrary atom. Constraints may be used  to  narrow 
                   the class of atoms denotable by an S-variable.

W-variable          Denotes a subexpression, consisting  of  a  single 
                   term. Constraints may be added to further  restrict 
                   the class of atoms denotable by a W-variable.

E-variable          Denotes  a   subexpression.   consisting   of   an 
                   arbitrary  (possibly  zero)  number  of   arbitrary 
                   terms. Constraints may be added to denote  the  set 
                   of valid terms.

V-variable          Denotes a non empty subexpression,  consisting  of 
                   arbitrary terms. Constraints may be added to denote 
                   valid terms.

constant pattern    

rigid pattern

basic pattern

symmetric pattern

linear pattern

non-linear pattern 

implicitly iterative pattern

LHS

RHS

application

failure




