Compiler and Virtual Machine extensions

If you have programmed in an imperative programming language, you might have noticed the absence of an assignment operator in Motto. This is because the kernel of Motto is a purely Functional language. The Motto virtual machine has the capability to update a variable. It is just not exposed to the outside world. Before I show you how to expose it, let me explain why Motto cannot have an assignment operator based on its simple stack-based programming model.

A symbol bound to a value, is evaluated to that value:

> 100 a @
> a
100

As a Motto procedure always follow its arguments, we cannot have an “assignment” procedure because all it will see on the stack is the value, not the symbol. So we need to add a facility to handle this special case. We need a procedure that precedes its argument or at least one of its arguments. This can be accomplished by writing a compiler definition, i.e, a procedure executed by the compiler. This procedure will read the symbol from the input and emit virtual machine instructions to assign the previous value on the stack to that symbol. As this special procedure is executed by the compiler, it has to be written in the language the compiler itself is written. The Motto compiler and virtual machine is written in Scheme, a dialect of Lisp. Especially, we use a Scheme implementation called Gambit-C which has the ability to generate highly-optimized executable programs for a variety of platforms. Let us call our assignment procedure set and add its definition to the Motto compiler:

> (cdef 'set (lambda (compiler)
               (list (c-push (get-symbol (compiler-input compiler)))
                     (c-set))))

cdef maps the Scheme symbol set to a Scheme procedure (a lambda form). The Motto virtual machine consumes lists of instructions. Here the compiler will read a symbol from the input stream and emit the instruction to push that symbol to the stack. This instruction is followed by a set bytecode which will assign the value already on the stack to the symbol:

> 140 set a
> a
140

In fact, Motto comes pre-packaged with set and a few other useful compiler definitions:

> 20 def b ;; defines a new variable.
> 50 def a ;; re-defines the existing variable `a`.
> a b + def c
> c
70
> 100 c set
> c
100

The use of def and set becomes more evident when we have to deal with non-local variables:

> 100 def x
> 200 def y
> [x y .s .c] !
100 200
> x y
100 200

> ;; In the following code block, `set` will update the global variable `x`, while
  ;; `def` will declare a new local variable `y`.
> [300 set x 
   500 def y 
   x y .s .c
   600 set y
   x y .s .c] !
300 500
300 600
> ;; `x` was changed by the procedure, while `y` stays intact
  ;; as the procedure modified its local definition of `y`.
> x y
300 200

There are compiler definitions to load and compile Motto source code files. For example, here is a simple Motto script – hello.m:

;; file: hello.m
[hello world .s .c] def say-hello

This can be imported to the interpreter:

> import "hello.m"
> say-hello
hello, world

The interpreter will compile the script before executing it. The script can be explicitly compiled and stored in an object file for faster loading:

> compile "hello.m"
> import "hello.mo"
> say-hello
hello, world

Another useful compiler definition is probe which can be used to look-up a value in an anonymous closure:

> 100 200 [def x def y []] ! dup probe x swap probe y
200 100

Compiler definitions can be used to optimize some computations by executing them at compile-time and inserting the result into bytecode. An example can be seen here. Most of the basic operations in Motto are compiler definitions. You can extend the compiler with your own definitions or change the behavior of existing definitions. For example, arithmetic procedures can be re-defined to make Motto do infix arithmetic:

> (cdef '+ (lambda (compiler)
             (list (c-push (get-number (compiler-input compiler)))
                   (c-arith '+))))    
> 10 + 20
30

Virtual Machine definitions

Just like the compiler, the virtual machine can also be extended with definitions written in Scheme. These definitions are executed at runtime, like normal Motto procedures. A virtual machine definition is a lambda expression that takes two arguments: the current instance of the virtual machine and a reference to the data stack. This lambda should return a new data stack. A virtual machine definition is introduced using the def Scheme procedure. The following sample shows a VM definition that squares all numbers on the stack. It first converts the stack to a list and uses the Lisp map function to produce a list of squares. This list is then converted to a stack and returned. (Conversions between the data stack and Scheme lists are not expensive, because the current implementation of Motto uses a Scheme list as its data stack!).

> (def 'sqr-all
        (lambda (vm stack)
          (list->stack (map (lambda (x) (* x 2)) 
                            (stack->list stack)))))
> 1 2 3 4 5 sqr-all
2 4 6 8 10

Programming in other human languages

As the syntax of a Motto program is dictated by the programmer himself, it is easy to transform it to support languages other than English. The following example re-defines some built-in Motto procedures in Greek and use them to write a simple program.

First we need the Greek versions of def, eq and display:

> (cdef 'ορίζουν (lambda (self)
              (list (c-push (get-symbol (compiler-input self)))
                    (c-def))))

> [eq] ορίζουν ίση
> [display] ορίζουν οθόνη

Next we define a new procedure that will say a greeting based on an English input. If the input is “hi” it will print the Greek equivalent of “hello”, otherwise it will print the Greek equivalent of “bye”:

> [hi ίση ? γειά-σου αντίο .s .c] ορίζουν χαιρετώ

> hi χαιρετώ
γειά-σου
> see-you χαιρετώ
αντίο

I don’t know Greek. So I used on Google for translation. If you know Greek and find some problem with the translation used here, please let me know.

Procedural abstractions

This post explore some interesting ways procedures can be used to express algorithms and data abstractions in Motto programs.

Recursion

The following procedure solves the Towers of Hanoi problem with recursion:

> [c @ b @ a @ n @ 
   [n 1 - a c b hanoi
    Move disk from pole  a  to  b .s .c
    n 1 - c b a hanoi]
   n 0 gt when] 
   hanoi @

> 3 1 2 3 hanoi
Move disk from pole 1 to 2 
Move disk from pole 1 to 3 
Move disk from pole 2 to 3 
Move disk from pole 1 to 2 
Move disk from pole 3 to 1 
Move disk from pole 3 to 2 
Move disk from pole 1 to 2

The omnipresent stack makes it easy to visualize the shape of a recursive procedure. Let us try this with the classic factorial example:

> [.s n @  [n n 1 - fact *] n n 1 eq if] fact @

> 4 fact
4
4 3
4 3 2
4 3 2 1
24
> .c

> 10 fact
10
10 9
10 9 8
10 9 8 7
10 9 8 7 6
10 9 8 7 6 5
10 9 8 7 6 5 4
10 9 8 7 6 5 4 3
10 9 8 7 6 5 4 3 2
10 9 8 7 6 5 4 3 2 1
3628800
> .c

It is possible to prevent the stack from growing n steps deep by making the procedure iterative:

> [.s n @ f @ [f n * n 1 - fact-helper] f n 1 eq if] fact-helper @
> [1 swap fact-helper] fact-iter @

> 4 fact-iter
1 4
4 3
12 2
24 1
24
> .c

> 10 fact-iter
1 10
10 9
90 8
720 7
5040 6
30240 5
151200 4
604800 3
1814400 2
3628800 1
3628800
> .c

For repeatedly executing a piece of code we can also use the loop procedure. This will execute the code block on the top of the stack until the second value on the stack becomes false:

; Print "in loop" until user enter `quit`.
> [read quit eq not] ["in loop" display newline] loop
hello
in loop
blah
in loop
quit

Closures

A procedure is associated with an environment that contain references to non-local variables. A procedure together with such a referencing environment is known as a closure. The following is a simple example of a closure:

> [x @ y @ []] point @

A call to point will leave an empty procedure on the stack. Though this procedure contains no code to execute, it has an environment that refers to the values of x and y. The Motto virtual machine has an instruction called probe that allows us to access the variables in a closure. This instruction is invoked using a special dot notation.

> 10 20 point p @
> p.x p.y
20 10

A closure can also get computations done using its local state. The following updated version of point can construct new points by adding a value to its current coordinates:

> [x @ y @ 
   [a @ y a + x a + point2] new-point @ 
   []] 
  point2 @

> 10 20 point2 p2 @
> p2.x p2.y
20 10
> 100 p2.new-point p3 @
> p3.x p3.y
120 110

Closures can be used to implement abstract types. An abstract type reveals only its interface and hides all internal implementation details. This makes it easy to switch a type implementation without changing other parts of the program. In the above example, the point2 type has a simple interface: x, y and new-point. A new implementation of this type might choose to use a pair instead of two separate variable to store the coordinates, without disrupting the interface.

> [pair xy @ 
   [xy second] x @ 
   [xy first] y @
   [a @ xy first a + xy second a + point3] new-point @
   []] point3 @

> 10 20 point3 pp1 @
> pp1.x pp1.y
20 10
> .c
> 100 pp1.new-point pp2 @
> pp2.x pp2.y
120 110

A generic constructor procedure can be used to initialize the actual point type based on some input from the user:

> ;; Constructs a `point2` if user-input is `1`.
  ;; Constructs a `point3` on any other input.
> [ptype @ [point3] [point2] ptype 1 eq if] mk-point @

> 10 20 1 mk-point pt1 @
> 100 200 2 mk-point pt2 @
> pt1.x pt2.y
20 100 
> pt2.x pt1.y
200 10

Data types - Part II

This is a quick overview of the composite data types in Motto.

The data stack

Any value typed at the REPL is automatically pushed to the stack. Here are some basic stack operations:

> 10 20 30
10 20 30
> .p ;; Pop the stack
10 20
> .c ;; Clear the stack
> 100 20 78 32 12 [*] reduce 
59904000
> .c
> ;; Find the mean of all numbers on the stack:
> 6 11 7 length len @ [+] reduce len /
8
> red green blue
> ;; Check if the stack contains a particular value:
> red contains
#t
> [display newline] println @
> ;; The stack can be used as a key-value store:    
> name: sam age: 20
> name: get println
sam
> age: get println
20

The stack does away with the need to pass arguments explicitly to procedures. Procedures can leave any number of results on the stack. As an example, let us define a procedure for computing the mean of a list of numbers:

> [length count @ [+] reduce-intact count quotient] mean @
> 6 7 11 mean println
8
> .c
> 10 20 30 40 50 60 mean println
35

Mean is one of many averages that can be derived from a sequence of numbers. We can also define procedures to compute other averages like median, mode and range. There can be a single procedure that is used to aggregate all these averages together:

> [[gt] sort 
   length len @
   len 2 quotient mid @
   len is-odd ? [mid at] [mid at mid at + 2 quotient]]
  median @

> [most-occurs] mode @

> [[max] reduce-intact mx @
   [min] reduce-intact mn @
   mx mn -] range @

> [median avg_median @
   mean avg_mean @
   mode avg_mode @
   range avg_range @
   .c 
   median: avg_median
   mean: avg_mean
   mode: avg_mode
   range: avg_range] 
  averages @

> ;; tests
> 1 2 4 4 5 8 9 averages
median: 4 mean: 4 mode: 4 range: 8 
> .c
> 80 100 62 180 21 55 averages
median: 71 mean: 83 mode: 21 range: 159 
> .c

Each average could be retrieved separately using get:

> 3 5 12 averages
> median: get
5
> mean: get
6
> range: get
9
> mode: get
3
> .c

fetch is similar to get, but after retrieving the value, it removes the mapping from the stack. In the last post we saw how to use get to implement a procedure that take keyword arguments. Here we will redefine that procedure to use fetch so that the arguments are actually consumed by the procedure. In fact, having an extended form of fetch that can return a default value will be more useful here:

> ;; Extended form of `fetch` that accepts a default value.
> [key @ default @ key fetch dup key eq default swap if] fetch-d @

> ;; Procedure to compute distance between two points. If an argument is not
  ;; specified, it will default to 0.
> [0 x2: fetch-d 0 x1: fetch-d - 2 expt 
   0 y2: fetch-d 0 y1: fetch-d - 2 expt 
   + sqrt] distance @  

> x1: 10 x2: 5 y1: 100 y2: 120 distance
20.615528128088304 
> .c
> x1: 10 x2: 5 distance
5

Pair and List

As the name suggests pair is a composite of two values.

> 10 20 pair
(10 . 20)
> dup first
(10 . 20) 10
> swap second
10 20

It is possible to construct a list by making a pair point to another:

> 10 20 30 40 50 [pair] reduce
(10 20 30 40 . 50)

Motto has a built-in list type. The values on the stack can be wrapped up in a new list using the list procedure:

> 10 20 30 40 50 list lst @
> lst
(10 20 30 40 50)

> list-length
5  
> .c
> lst list-first lst list-rest
10 (20 30 40 50)
> .c
> lst list-reverse
(50 40 30 20 10)

> hello lst list-push   
(hello 10 20 30 40 50) 
> hello lst list-append
(10 20 30 40 50 hello)

> [2 * display #\space display] lst list-for-each
20 40 60 80 100
> .c

> ;; The `map` function from Lisp:
> [list-for-each list] my-list-map @
> [2 *] lst my-list-map
(20 40 60 80 100)

Array

Arrays are more efficient than lists for some operations, especially for accessing an arbitrary element. There are two ways to create an array. Either by using the array procedure or by typing in an array literal:

> a b c d array
#(a b c d) 

> ; An array literal.
> #(1 2 3 4 5)
#(1 2 3 4 5) 

> a b c d array arr @
> 0 arr array-at
a 
> hello 0 arr array-set
#(hello b c d) 
> arr ; array-set modifies the array.
#(hello b c d) 
> #(1 2 3 4) #(hello world) array-concat
#(1 2 3 4 hello world) 
> array-length
6

String

Strings are sequences of characters.

> "hello, world" s @
> 1 s string-at
#\e
> 3 5 s substring "lo" string-eq
#t
> 3 5 s substring "LO" string-eq ; `string-eq` is case-sensitive.
#f 
> 3 5 s substring "LO" string-ci-eq
#t 

> ; split the string using comma as the delimiter.
> #\, s string-split
("hello" " world") 
> ; use space as the delimiter.
> #\space s string-split
("hello," "world") 

> ; search for a substring.
> "hello" s string-find
0 
> "hullo" s string-find
#f 
> "hEllo" s string-find
#f 
> "hEllo" s string-ci-find
0 
> "llo" s string-find
2 
> "hello " "world" string-append
"worldhello "
> "hello " "world" string-concat
"hello world"

Map

Map is a data structure that acts like a dictionary, where a value is associated with a key. symbols and keywords are the most natural candidates for keys.

> name: mat age: 45 map m @
> name: m map-at
mat

> ; `map-at` will raise an error if the key is not found.
> salary: m map-at
#<$error #3 tag: value-not-found message: "Value not found."> 
> .c

> ; `map-get` allows to pass a default value for a missing key.
> 1450 salary: m map-get
1450 
> salary: 5000 m map-put
> 1450 salary: m map-at
5000

A map can stand-in for the switch-case control structure found in many imperative languages:

> [signal @ 
   red [stop! .s .c] green [go! .s .c] yellow [be ready .s .c] map signals @
   ["not a valid signal" error .s .c] signal signals map-get !] 
  signal-action @

> red signal-action
stop! 
> yellow signal-action
be ready 
> green signal-action 
go! 
> black signal-action
#<$error #2 tag: error message: "not a valid signal"> 
> red signal-action  
stop!

Data types - Part 1

This post will introduce the atomic data types in Motto.

Numbers

Various types of numbers supported by Motto are – complex, real, rational and integer.

> 3+4i ;complex
> 3.14 ;real
> #e1e10 ;real
> 6/10 ;rational
> 300 ;integer

There are predicates that can recognize the accurate type of a number:

> ; Here only the result of the last operation is shown. 
  ; The actual REPL will print all values on the stack.
> -2.5+0.0i is-complex
#t
> -2.5+0.0i is-real
#f
> 6/10 is-rational
#t
> 3 is-integer
#t

A number can have these radix prefixes: #b (binary), #o (octal), #d (decimal), and #x (hexadecimal). With no radix prefix, a number is assumed to be decimal.

Some operations with numbers:

> 34 number-to-string is-string
#t
> ;convert to base 2 representation.
> 2 34 number-to-string-r 
"100010"
> ;convert a string in base 16 representation to an integer.
> 16 "#xFFE" string-to-number-r
4094 
> 10 factorial
3628800     
> 100 sqrt
10
> ;generate a random number between 1 and 10.
> 10 random-integer
8
> ;generate a random number between 0 and 1.
> random-real
.8574025375628211

A division by zero will return the special symbol +nan.0 which means “Not A Number”. Any arithmetic operation with this value will return +nan.0. The symbols +inf.0 and -inf.0 represent positive and negative infinities.

Boolean

The literals #t (true) and #f (false) are used to represent Boolean values. Procedures that return Boolean values are known as predicates. There are built-in predicates for comparing two objects on the stack:

> 4 3 gt ;greater than.
#t
> 4 3 lteq ;less than or equals.
#f
> 10 10.00001 eq ;equality
#f
> red red eq ;eq works for symbols too.
#t 
> "red" "red" eq ;but not for strings.
#f 
> "red" "red" string-eq
#t

The if procedure can be used to conditionally execute code based on a Boolean value at the top of the stack. For example, the following procedure will print stop if the signal is red:

> [start stop rot red eq if .s .c] manage-traffic @ 
> red manage-traffic                              
stop 
> green manage-traffic
start

When manage-traffic is invoked the first time, there are four values on the stack:

red start stop red

The call to rot will transform the stack like this:

start stop red red

Now eq will leave #t on the stack:

start stop #t

As the top value is #t, if will leave stop on the stack and drop start. If the top value is #f, it will drop the next value (stop) instead. (I will discuss control structures in detail in a future post.).

Character

A character literal is preceded by #\. Characters are Unicode encoded. As Motto internally uses the Scheme read procedure to parse its input, it can recognize special character literals like #\space and #\newline.

> #\a #\a char-eq
#t
> #\a #\A char-eq
#f
> ;compare by ignoring case.
> #\a #\A char-ci-eq
#t
> ;get the UNICODE encoding of the character.
> #\c char-to-integer
99

Symbol

A symbol is a unique identifier. It can be used as constants, variable names etc. The following code block declares a list of symbols to represent the months in a year.

> [jan feb mar apr may jun jul aug sep oct nov dec] months @
> months
jan feb mar apr may jun jul aug sep oct nov dec

It is easy to check the stack for a particular symbol using the contains predicate:

> .c
> [months 12 at 13 remove-at contains no yes rot if println .c] is-month @
> [display newline] println @
> jan is-month
yes
> aug is-month
yes
> blah is-month
no

Symbols with a colon at the end are called “keywords”. (A colon by itself is not a keyword, it is a symbol.). Keywords are similar to symbols but are self evaluating and distinct from the symbol data type. They can be used for specifying key-value pairs and to pass named arguments to procedures:

> name: sam age: 34
> name: get println ; get helps us to retrive a value mapped to a key.
sam
> age: get println
34
> .c

The following procedure that computes the distance between two points on a Cartesian Plane accept keyword arguments, so the order in which the arguments are passed becomes irrelevant:

> [x2: get x1: get - 2 expt y2: get y1: get - 2 expt + sqrt] distance @  

> x1: 10 y1: 20 x2: 34 y2: 21 distance println .c
24.020824298928627
> x1: 10 y2: 21 x2: 34 y1: 20 distance println .c
24.020824298928627

The next post will explore the composite data types in Motto.