Quicksort For SQR

We can’t always output information in the proper order by using the DBMS to sort the incoming data. Imagine a corporate roster showing who reports to whom. That’s going to require multiple selects of PS_EMPLOYEES. We could write the data to a scratch table and select it again in order. We could write it to a disk file, use a sort utility, and then read the file.

At moments like this, we face the greatest mystery of our time: why doesn’t SQR have a sort command? My theory is that SQ Software, Sybase, MITI/Sqribe, Brio, Hyperion, and Oracle never added a sort command for our own good, because programming sorts builds character.

(Seriously, does anyone know why there is no sort command in SQR?)

Quicksort is algorithm invented by C.A.R. Hoare in 1962*. It runs fast and small. It’s not optimal in every situation, but it’s usually a good choice.

The challenge for SQR programmers is that Quicksort is recursive and SQR is not. Quicksort’s basic approach is to move the “lesser” data to the first half of the list and the “greater” data to the second half of the list. Next, it calls itself twice, first to repeat the process on the first half of the list, then to repeat the process on the second half of the list.

I searched the Internet for “nonrecursive quicksort,” and found a Basic program by James Haley, dated August 28, 1985. I rewrote it for SQR and generalized it to handle any array. We can put the generic quicksort procedure in an sqc include file and write specific supporting procedures (save_pivot, compare_rows, and swap_rows) for each program.

Here is the generic quicksort procedure. I used the “local” option so that quicksort’s variables wouldn’t interfere with those variables in the main program.

(History quiz – does anyone know why #i and #j are used so often for loop counters?)

begin-procedure quicksort local
  create-array name=ranges size=32
    field=from:integer=0
    field=thru:integer=0

  clear-array name=ranges
  move 0 to #range_num
  put #first_row #last_row into ranges(#range_num)

  while #range_num >= 0
    get #from #thru from ranges(#range_num)
    subtract 1 from #range_num
    if #from < #thru
      do save_pivot
      let #i = #from
      let #j = #thru + 1
      while #i < #j
        move '<' to $relationship
        while $relationship = '<' and #i < #thru
          add 1 to #i
          do compare_rows(#i, $relationship)
        end-while
        move '>' to $relationship
        while $relationship = '>'
          subtract 1 from #j
          do compare_rows(#j, $relationship)
        end-while
        if #i < #j
          do swap_rows(#i, #j)
        end-if
      end-while
      do swap_rows(#from, #j)
      add 1 to #range_num
      let ranges.from(#range_num) = #from
      let ranges.thru(#range_num) = #j - 1
      add 1 to #range_num
      let ranges.from(#range_num) = #j + 1
      let ranges.thru(#range_num) = #thru
    end-if
  end-while
end-procedure quicksort

Each SQR program that uses quicksort needs to have its own versions of the save_pivot, compare_rows, and swap_rows procedures. For this example, I imagine a program that needs to sort three times. It uses the $sort variable to control evaluate commands so that quicksort will act on three different arrays (array, employee, and dept) and sort employee in two different ways (“by salary” and “by name”).

Note that save_pivot does not have local variables; it uses global variable $sort to set global variables #value2, or #salary2, or $last_name2 and $first_name2, or department $name2.

begin-procedure save_pivot
  evaluate $sort
    when = 'array'
      let #value2 = array.field(#from)
    when = 'employee by salary'
      let #salary2 = employee.salary(#from)
    when = 'employee by name'
      get $last_name2 $first_name2
        from employee(#from) last_name first_name
    when = 'dept by name'
      let $name2 = dept.name(#from)
  end-evaluate
end-procedure save_pivot

Note that compare_rows uses local variables, so it has to refer to $sort and the variables from save_pivot with underscores. It compares to rows from an array and returns the symbols for less than, equal to, or greater than in the $relationship variable. The value of $relationship tells quicksort whether #row1 should come before or after the pivot in the array. If we want to sort in descending order, we would reverse the rules. We can sort by multiple fields (see the employee by name example), with some field ascending and some fields descending.

begin-procedure compare_rows(#row1, :$relationship)
  evaluate $_sort
    when = 'array'
      let #value1 = array.field(#row1)
      evaluate #value1
        when < #_value2
          let $relationship = '<'
        when = #_value2
          let $relationship = '='
        when > #_value2
          let $relationship = '>'
      end-evaluate
    when = 'employee by salary'
      let #salary1 = employee.salary(#row1)
      evaluate #salary1
        when < #_salary2
          let $relationship = '<'
        when = #_salary2
          let $relationship = '='
        when > #_salary2
          let $relationship = '>'
      end-evaluate
    when = 'employee by name'
      get $last_name1 $first_name1
        from employee(#row1) last_name first_name
      evaluate $last_name1
        when < $_last_name2
          let $relationship = '<'
        when = $_last_name2
          evaluate $first_name1
            when < $_first_name2
              let $relationship = '<'
            when = $_first_name2
              let $relationship = '='
            when > $_first_name2
              let $relationship = '>'
          end-evaluate
        when > $_last_name2
          let $relationship = '>'
      end-evaluate
    when = 'dept by name'
      let $name1 = dept.name(#row1)
      evaluate $name1
        when < $_name2
          let $relationship = '<'
        when = $_name2
          let $relationship = '='
        when > $_name2
          let $relationship = '>'
      end-evaluate
  end-evaluate
end-procedure compare_rows

Swapping rows is easy when we’re dealing with just one array and just one field. For the general case, we need something this elaborate. Note that we can use the same swapping code for “employee by salary” and “employee by name.”

begin-procedure swap_rows(#row1, #row2)
  evaluate $_sort
    when = 'array'
      let #value1 = array.field(#row1)
      let #value2 = array.field(#row2)
      let array.field(#row2) = #value1
      let array.field(#row1) = #value2
    when = 'employee by salary'
    when = 'employee by name'
      get $emplid1 $last_name1 $first_name1 #salary1
        from employee(#row1)
      get $emplid2 $last_name2 $first_name2 #salary2
        from employee(#row2)
      put $emplid1 $last_name1 $first_name1 #salary1
        into employee(#row2)
      put $emplid2 $last_name2 $first_name2 #salary2
        into employee(#row1)
    when = 'dept by name'
      get $setid1 $deptid1 $name1 from dept(#row1)
      get $setid2 $deptid2 $name2 from dept(#row2)
      put $setid1 $deptid1 $name1 into dept(#row2)
      put $setid2 $deptid2 $name2 into dept(#row1)
  end-evaluate
end-procedure swap_rows

* March 2, 2009 addendum: C.A.R. Hoare is Sir Charles Antony Richard Hoare, a Britsh Computer Scientist. Not only did he invent Quicksort, he invented null pointers in 1965. He now calls null pointers his billion dollar mistake because of programmers like me who have written countless bugs based on null pointers. Thankfully, I haven’t dereferenced a pointer since I started programming in SQR.

Looking Ahead

Sometimes we sort data for use within the program, not just for output. Looking for data in an unsorted array requires a slow, linear search. Looking for data in a sorted array can use a fast, binary search. Next week we’ll look at binary searches and binary/linear searches. (A binary/linear search is my name for searching through sorted, effective-dated data.)

Brain Teaser

Here’s the warm-up: there are two variables, #x and #y. We need to set #min to the lesser of the two and #max to the greater of the two. Avoiding if commands, what are the formulae for #min and #max?

Here’s the challenge: there are three variables, #x, #y, and #z. What is the simplest code to set the values of #min, #mid, and #max?

13 Comments

  1. Bob Josephson says:

    To answer your history quiz question (”does anyone know why #i and #j are used so often for loop counters?”): yes, someone knows.

    (And one of those someones would be me.)

    (But what letter would you use if you had more than 6 nested loops? You can’t use ‘o’ as a loop variable…)

    (And what would ancient Roman FORTRAN programmers do if they had more than 5 nested loops?)

    By the way, I notice that as soon as you start talking about recursion and the presence or absence of sorts in a language, you stop keeping score between SQR and C. How conveeeenient!

    • administrator says:

      You seem to know the answer, so why don’t you explain it for the readers who aren’t as old as you are? If you do, I’ll tell you how Fortran II (yes, it was identified with roman numerals) supported more than six nested loops – for the sake of the readers who aren’t as old as I am.

      I stopped keeping score between SQR and C because I demonstrated that recursion is completely unnecessary. Next week, I will debunk pointers (smirk).

      • Bob Josephson says:

        OK: in the mid-1950s (before I was born – just for the record) FORTRAN defined variables as integers if their names started with an I, J, K, L, M, or N. Variables starting with other letters were real numbers.

        So ‘I’ was just the first and shortest legal choice for an integer loop counter.

        Now, I don’t have encyclopedic knowledge of features introduced in each version of FORTRAN, but at some point it allowed programmers to explicitly declare variables with any name to have any type. Is that what you’re referring to as the solution introduced in FORTRAN II?

        (Of course, I was being facetious in the first place about the problem with nested loop variables. Even the original FORTRAN allowed variables with multi-character names. As for the ancient Romans, they used APL, thus causing Servilius Casca to famously complain “it was Greek to me.”)

        As for recursion not being necessary: sure, you can, with varying degrees of difficulty, survive without it. But the same is true of the EVALUATE statement, and you were keeping score for that. I detect blatant pro-SQR bias!! :-)

        • administrator says:

          You’re right on both counts.

          First, FORTRAN, the IBM FORmula TRANslating System, had integer variables and floating point variables. Fortran variable names that started with I through N were integers.

          Second, Fortran allowed variables with up to five or six character names (I forget which). I don’t know whether that was in the first version; my earliest experience was with Fortran II and most of my experience was with Fortran IV. In Fortran IV, any variable could be declared as an integer.

  2. Nathan says:

    Regarding the brain teaser:

    Why are we avoiding IF? Can we use the cond() function? How about range()? Or are you trying to stick with “math” functions like sign()?

    • administrator says:

      I wanted to avoid IF because the solution with IF isn’t much of a brain teaser. COND() and RANGE() are good solutions, although they merely move the IF operation into the function. There are ways to get MIN and MAX without comparing the inputs. Note that the puzzle uses numeric variables, not string variables.

      • Nathan says:

        Okay, I was thinking you might be trying to go down the “algebra” road….

        For the two-variable case, I remembered from years ago the trick that relies on the fact that the average of two numbers is half-way between them, so if you start at the average and then move one-half of the difference between the two numbers you end back at one of the original two numbers — and by controling which direction you move, you will arrive at either the smaller or the larger of the two.

        So, simplifying the formula and converting it into SQR code produces:

        begin-procedure min_max(#x, #y, :#min, :#max)
        let #min= ( #x + #y - abs(#x - #y) )/2
        let #max= ( #x + #y + abs(#x - #y) )/2
        end-procedure

        For the three-variable case, I wasn’t able to convince myself that any code would be “simpler” than breaking down the problem into pairs and invoking the two-variable procedure on each pair:

        begin-procedure min_mid_max(#x, #y, #z, :#min, :#mid, :#max)
        do min_max(#x, #y, #lower, #higher)
        do min_max(#lower, #z, #min, #dummy1)
        do min_max(#higher, #z, #dummy2, #max)
        let #mid= #x + #y + #z - #min - #max
        end-procedure

        (We know that at least one of #dummy1 or #dummy2 is going to be #mid, but we don’t know which one, so as long as we are avoiding IF commands we just use addition/subtraction to calculate #mid.)

        Did you find anything simpler than that? (Well, other than just using IF, which would clearly be much simpler :) )

        • administrator says:

          I think you’ve found the simplest approach, maybe even simpler than using IF statements. I’ve tried to sketch out the code with IF statements in my head and it was surprisingly complicated.

          • Nathan says:

            Well, the thing about using the mathematical method is that I’m always slightly nervous that some math-implementation issue ( e.g. rounding errors ) might show up at some point in the life of the application. Given a choice, I’d lean toward straight MOVEs rather than relying on floating-point arithmetic, even though I know that most applications where SQR is used don’t involve numbers that push any implementation limits.

            So, I’d want to at least implement my “min_max” routine above with a simple IF statement (no nesting would be required for a two-variable min/max) instead of using calculations. (That would still leave the LET expression that calculats #mid, but I trust addition more than division. :) )

            But really I’d probably want to replace both min_max and min_mid_max with the following:

            begin-procedure min_mid_max_if(#x, #y, #z, :#min, :#mid, :#max)

            move #x to #min

            if #y < #min
              move #min to #mid
              move #y to #min
            else
              move #y to #mid
            end-if

            if #z < #min
              move #mid to #max
              move #min to #mid
              move #z to #min
            else
              if #z < #mid
                move #mid to #max
                move #z to #mid
              else
                move #z to #max
              end-if
            end-if

            end-procedure ! min_mid_max_if

            Although this routine involves more lines of code than the min_max/min_mid_max combo, I think it is somewhat easier to understand what each line is trying to accomplish (and it involves no arithmetic calculations at all)….

            • administrator says:

              This is better than what I had in mind; you’ve basically written a specialized, three element insertion sort. The advantage of this is that you could extend it to four elements more easily than extending a mathematical solution. The next step would be to try a three element bubble sort.

  3. John says:

    There is a problem with the “#from” variable. In the quicksort procedure, this is a local variable. When the save_pivot procedure tries to reference it, it’s not available. I change this to the global variable “#_from” in the quicksort procedure and it seems to be working now.

    • administrator says:

      John, thank you for discovering and fixing this. I guess we can take three approaches;

      (1) use #_from in the quicksort procedure
      (2) pass #from as a parameter to the save_pivot procedure
      (3) make quicksort a non-local procedure (I made it local so I could assign variable names freely)

      I should change the code, but I need to think about these alternatives.