Job scheduler

This post describes a simple job scheduler. Each job description consists of a procedure and some ticks into the future. The scheduler will pick up the job after the given ticks (seconds) and run the procedure. A job description is represented using the following structure. (All code in Spark-Scheme.):

(define-struct job (run-at proc))

There is a fixed size vector that is used to store the jobs.

(define max-jobs 10)
(define jobs (make-vector max-jobs ()))

The index on which a job get stored is calculated as:

(remainder (+ (current-seconds) ticks) max-jobs)

The job is added to a list at the resulting index. The following procedure contain the complete logic for scheduling a job:

(define (schedule-job ticks job-proc)
  (let* ((run-at (+ (current-seconds) ticks))
         (job (make-job run-at job-proc))
         (index (remainder run-at max-jobs))
         (q (vector-ref jobs index)))
       (vector-set! jobs index (cons job q))))

The scheduler wakes up for each second and run all the jobs scheduled for that second. If the current list has jobs that are scheduled to run in the future, they are left back to be executed later:

(define (scheduler)
  (sleep 1)
  (run-jobs (current-seconds))
  (scheduler))

(define (run-jobs now)
  (let* ((index (remainder now max-jobs))
         (q (vector-ref jobs index)))
    (let loop ((old-q q) (new-q ()))
      (if (not (null? old-q))
          (let ((job (car old-q)))
            (if (run-job job now)
                (loop (cdr old-q) new-q)
                (loop (cdr old-q) (cons job new-q))))
          (vector-set! jobs index new-q)))))

 (define (run-job job now)
   (if (<= (job-run-at job) now)
       (begin
         (thread (job-proc job))
         #t)
       #f))

The scheduler runs in its own thread:

(define (run-scheduler)
  (thread scheduler))

A test run of the scheduler:

> (run-scheduler)
> (schedule-job 5 (lambda () (printf "group1 job1~n") (flush-output)))
> (schedule-job 5 (lambda () (printf "group1 job2~n") (flush-output)))
> (schedule-job 15 (lambda () (printf "group3 job1~n") (flush-output)))
> (schedule-job 10 (lambda () (printf "group2 job1~n") (flush-output)))
;; after 5 seconds:
group1 job1
group1 job2
;; after 10 seconds:
group2 job1
;; after 15 seconds:
group3 job1

Reference: The Design and Implementation of the FreeBSD Operating System.

kth order statistic

A worst-case O(n) algorithm to find the kth largest element in a list:

(define (quickselect alist k)
 ;; Divide the list into two:
 ;; elements greater and lesser than
 ;; the median.
(let* ((x (pick-mid alist))
       (lt (lesser x alist))
       (gt (greater x alist))
       (gt-len (length gt)))
 (cond ((<= k gt-len)
           ;; kth largest must be in gt
           (quickselect gt k))
          ((> k (+ gt-len 1))
           (quickselect lt (- k gt-len 1)))
         (else x))))

A sample run:

(define a '(10 3 5 8 20 90))
(printf "~a~n" (quickselect a 5)) ;; => 5
(printf "~a~n" (quickselect a 1)) ;; => 90
(printf "~a~n" (quickselect a 4)) ;; => 8
(printf "~a~n" (quickselect a 3)) ;; => 10
(printf "~a~n" (quickselect a 2)) ;; => 20

The helper functions used by quickselect:

(define (lesser x alist)
  (filter (lambda (a) (< a x)) alist))

(define (greater x alist)
  (filter (lambda (a) (> a x)) alist))

(define (pick-mid alist)
  (list-ref alist (round (/ (length alist) 2))))

Speeding up Fibonacci

The following definition of the Fibonacci function is found in almost all introductory programming texts that teach recursion:

(define (fib N)
  (if (or (= N 0) (= N 1))
      1
    (+ (fib (- N 1)) (fib (- N 2)))))

It works fine for very small values of N. But the algorithm starts to show its inefficiency as the value of N gets larger. An attempt to profile (fib 40) in Racket produced the following output:

(time (fib 40))


cpu time: 60763 real time: 61093 gc time: 0
165580141

With inputs like 29, 30 or 40, the function has to make millions of recursive calls. Usually authors just ignore this inefficiency. Lawrence C. Paulson might be an exception. In his book, ML for the Working Programmer, he shows how mathematical induction could be used to optimize this function. His algorithm is simple and is based on the following reasoning:

Each Fibonacci number is the sum of the previous two:

0 + 1 = 1, 1 + 1 = 2, 1 + 2 = 3, 2 + 3 = 5,  3 + 5 = 8 ...

It is easy to define a function that takes a pair of numbers and return their sum:

(define (sum pair) (+ (car pair) (cdr pair)))

Let us define a function that returns a pair that contain the last element of the input along with the sum so that recursively calling it will reproduce the above pattern:

(define (nextfib pair) (cons (cdr pair) (sum pair)))

> (nextfib (cons 0 1))
(1 . 1)
> (nextfib (cons 1 1))
(1 . 2)
> (nextfib (cons 1 2))
(2 . 3)
> (nextfib (cons 2 3))
(3 . 5)
> (nextfib (cons 3 5))
(5 . 8)

To compute the Nth element we just recursively call nextfib N times:

(define (fib n)
  (if (= n 0)
      (cons 0 1)
    (nextfib (fib (- n 1)))))

> (time (fib 40))
cpu time: 0 real time: 1 gc time: 0
(102334155 . 165580141)
> (time (fib 400))
cpu time: 0 real time: 0 gc time: 0
(176023680645013966468226945392411250770384383304492191886725992896575345044216019675
 .
284812298108489611757988937681460995615380088782304890986477195645969271404032323901)

The profiling results show that we indeed have a better performing fibonacci procedure! It is also possible to improve the space efficiency of nextfib if we take advantage of tail call optimization:

(define (nextfib n pair)
  (if (= n 0)
    pair
  (nextfib (- n 1) (cons (cdr pair) (sum pair)))))

(define (fib n)
  (nextfib n (cons 0 1)))

> (time (fib 40))
cpu time: 0 real time: 0 gc time: 0
(102334155 . 165580141)
> (time (fib 400))
cpu time: 0 real time: 0 gc time: 0
(176023680645013966468226945392411250770384383304492191886725992896575345044216019675
 .
284812298108489611757988937681460995615380088782304890986477195645969271404032323901)

A simple finite automaton

A finite automaton that recognizes the regular language of all strings (made of 0’s and 1’s) that contain the string 001 as a substring. For example, 001, 1001 and 1111001001 are valid in the language, but 11 and 010 are not.

(define (one? x)
  (= x 1))

;; State that skips all 1's.
;; When a 0 is encountered, state changes to q0.
(define (q x)
  (if (one? x)
      q
      q0))

;; If the pattern 00 is seen, moves to state q00.
;; Goes back to q if x = 1.
(define (q0 x)
  (if (one? x)
      q
      q00))

;; State is in the pattern 00.
;; If 1 is seen, move to the end state (e).
;; Otherwise, move back to q0.
(define (q00 x)
   (if (one? x)
       e
       q00))

;; Notifies the state machine that the pattern has been found.
(define (e)
  #t)

;; The state machine itself.
(define (m string)
  (let ((state q))
    (call/ec
      (lambda (k)
        (for x in string
            (set! state (state x))
            (if (eq? state e)
                (k (state))))
       #f))))

Tests:

> (m '(1 1 0 1 0))
#f
> (m '(1 1 0 0 1 0))
#t
> (m '(0 0 1))
#t
> (m '(1 1 1 1 1 1 1 0 0 1 1 1 1 1))
#t

We don't compress enough

Here is a problem on lossless data compression:

Write the entire 24 volumes of the Encyclopedia Brittanica on the head of a pin.

The person who raised the problem also explained how this could be accomplished:

The head of a pin is a sixteenth of an inch across. If you magnify it by 25,000 diameters, the area of the head of the pin is then equal to the area of all the pages of the Encyclopaedia Brittanica. Therefore, all it is necessary to do is to reduce in size all the writing in the Encyclopaedia by 25,000 times. Is that possible? The resolving power of the eye is about 1/120 of an inch—-that is roughly the diameter of one of the little dots on the fine half-tone reproductions in the Encyclopedia. This, when you demagnify it by 25,000 times, is still 80 angstroms in diameter—-32 atoms across, in an ordinary metal. In other words, one of those dots still would contain in its area 1,000 atoms. So, each dot can easily be adjusted in size as required by the photoengraving, and there is no question that there is enough room on the head of a pin to put all of the Encyclopedia Brittanica.

Furthermore, it can be read if it is so written. Let’s imagine that it is written in raised letters of metal; that is, where the black is in the Encyclopedia, we have raised letters of metal that are actually 1/25,000 of their ordinary size. How would we read it?

If we had something written in such a way, we could read it using techniques in common use today. (They will undoubtedly find a better way when we do actually have it written, but to make my point conservatively I shall just take techniques we know today.) We would press the metal into a plastic material and make a mold of it, then peel the plastic off very carefully, evaporate silica into the plastic to get a very thin film, then shadow it by evaporating gold at an angle against the silica so that all the little letters will appear clearly, dissolve the plastic away from the silica film, and then look through it with an electron microscope!

There is no question that if the thing were reduced by 25,000 times in the form of raised letters on the pin, it would be easy for us to read it today. Furthermore; there is no question that we would find it easy to make copies of the master; we would just need to press the same metal plate again into plastic and we would have another copy.

Here is the full transcript of the speech. I am sure you won’t be surprised by the name of the speaker!

Amortization

The notion of amortization is important in situations where we care only about the total running time of a group of operations and don’t care about the running time of each individual operation. For instance, given a sequence of n operations, we may wish to bound the total running time of the sequence by O(n) without insisting that each individual operation run in O(1) time. We might be satisfied if a few operations run in O(log n) or even O(n) time, provided the total cost of the sequence is only O(n). The following implementation of aQueue exploits the idea of amortization and provides O(1) snoc (insert) and remove operations. This is achieved by using two internal lists, one to represent the front of the queue and the other to represent the tail. The implementation contains no mutable operation, so it can be safely shared by concurrent processes without explicit synchronization:

(define (mk) (cons null null))

(define front car)
(define rear cdr)

(define (apply-invariant q)
  (if (and (null? (front q))
           (not (null? (rear q))))
      ;; reverse eats up the credits accumulated 
      ;; by the elements in the list.
      (cons (reverse (rear q)) null)
      q))

;; Every snoc adds a credit of one to the new element.
(define (snoc q e)
  (apply-invariant (cons (front q) (cons e (rear q)))))

(define (head q)
  (if (not (null? (front q)))
      (caar q)
      null))

(define (tail q)
  (if (not (null? (front q)))
      (apply-invariant (cons (cdr (front q)) (rear q)))
      (apply-invariant (cons null (rear q)))))

(define (remove q)
  (if (not (null? (front q)))
      (apply-invariant (cons (cdr (front q)) (rear q)))))

Sample usage:

> (define q (snoc (snoc (snoc (mk) 1) 2) 3))
> q
=> ((1) 3 2)
> (front q)
=> (1)
> (tail q)
=> ((2 3))
> (snoc q 4)
=> ((1) 4 3 2)
> (remove q)
=> ((2 3))
> (remove (remove q))
=> ((3))

Every function except tail takes O(1) worst-case time, but tail takes O(n) worst-case time. How is this cost amortized over time? This is done by assigning each element of the rear list a credit of 1. Every snoc into a non-empty queue takes one actual step and allocates a credit to the new element of the rear list, for an amortized cost of two. Every tail that does not reverse the rear list takes one actual step and neither allocates nor spends any credits, for an amortized cost of one. Finally, every tail that does reverse the rear list takes m + 1 actual steps, where m is the length of the rear list, and spends the m credits contained by that list, for an amortized cost of m + 1 - m = 1.

Reference: Purely Functional Data Structures by Chris Okasaki

To ref count or not

A ‘Stop-The-World’ garbage collector makes it almost impossible to get performance estimates right. Given the vast combination of parameters that will influence actual deployments, we can’t predict when the collector will run. A reference counted real-time collector is simpler to implement and provides room for making better runtime estimates. But the VM should still keep track of allocated objects for two reasons. One is for doing a full sweep to break cyclic references. (Memory exhaustion caused by cyclic references are extremely rare. So in actual practice this collector will almost never run and memory management could be considered ‘real-time’). A record of allocated objects is also needed for providing data required by profilers. The Limbo programming language implements this scheme.