Thursday, 12 August 2010

Sorting out Ruby

These days, writing your own sort is something that happens to most programmers only in their Data Structures and Algorithms classes (and again on job interviews). The rest of the time you get to use whatever sort method your standard library/API/framework provides and for most purposes it proves adequate, especially with the sort of computational horsepower available even on handheld devices these days.

Having said that, I wound up writing a toy class in Ruby that implements five different sorts (bubble, quick, insertion, shell and merge). Just on the offchance that this is useful to someone some day (when doing their DS&A coursework, for example), here is the listing:

#require 'profile'

class Sorter
  attr_accessor :array

  def initialize(array)
    @array = Array.new(array)
  end

  def bubblesort()
    n = @array.length
    begin
      newn = 0
      for i in 0..(n-2) do
        if @array[i] > @array[i+1]
          @array[i], @array[i+1] = @array[i+1], @array[i]
          newn = i +1
        end
      end
      n = newn
    end while n > 1
  end

  def quicksort()
    def partition(array, left, right, pivot_index)
      pivot_value = array[pivot_index]
      array[pivot_index], array[right] = array[right], array[pivot_index]
      store_index = left
      for i in left..right-1 do
        if array[i] <= pivot_value
          array[i], array[store_index] = array[store_index],array[i]
          store_index += 1
        end
      end
      array[store_index], array[right] = array[right], array[store_index]
      return store_index
    end
    def qsort(array, left, right)
      if right > left
        pivot_index = left + ( right - left ) / 2
        pivot_new_index = partition(array, left, right, pivot_index)
        qsort(array, left, pivot_new_index - 1)
        qsort(array, pivot_new_index + 1, right)
      end
    end
    qsort(@array, 0, @array.length - 1)
  end

  def shellsort()
    increment = Float(@array.length/2).round
    while increment > 0 do
      for i in increment..@array.length-1 do
        j = i
        temp = @array[i]
        while j >= increment and @array[j-increment] > temp do
          @array[j] = @array[j-increment]
          j -= increment
        end
        @array[j] = temp
      end
      increment = Float(increment/2.2).round
    end
  end

  def insertionsort()
    for i in 1..@array.length-1 do
      value = @array[i]
      j = i - 1
      done = false
      begin
        if @array[j] > value
          @array[j + 1] = @array[j]
          j -= 1
          if j < 0
            done = true
          end
        else
          done = true
        end
      end until done
      @array[j + 1] = value
    end
  end

  def mergesort()
  def msort(lo,hi)
    if lo < hi
    mid = (lo+hi)/2
    msort(lo,mid)
    msort(mid+1,hi)
    merge(lo,mid,hi)
    end
  end
  def merge(lo,mid,hi)
    i,j = 0,lo
    aux = Array.new
    while j <= mid
    aux[i] = @array[j]
    i += 1
    j += 1
    end
    i=0
    k=lo
    while k < j and j <= hi
    if aux[i] <= @array[j]
      @array[k] = aux[i]
      k += 1
      i += 1
    else
      @array[k] = @array[j]
      k += 1
      j += 1
    end
    end
    while k < j
    @array[k] = aux[i]
    k += 1
    i += 1
    end
  end
  msort(0,@array.length-1)
  end
end

sort_this = [ 0,7,9,7,4,1,4,5,4,2,8 ]
sorts = ["bubblesort","quicksort","shellsort","insertionsort","mergesort"]
sorterobject = Sorter.new(sort_this)

for sorttype in sorts do
  sortermethod = sorterobject.method(sorttype)
  sortermethod.call()
  puts sorttype
  print sort_this, " => ", sorterobject.array, "\n"
  sorterobject.array = Array.new(sort_this)
end

Friday, 6 August 2010

Ruby Tuesday

Well, Ruby Friday to be precise.

I've been looking at Ruby and Ruby-on-Rails for some cool web development action and have hit on a number of things that bug me. No doubt I'll find find good reasons for them, but at the moment...

 The one thing that really bugs me right now (quite unreasonably so, as it isn't broken, just an odd design decision) is the handling of constants. You define a constant by the mechanism of starting a variable name with a capital letter, so var_name is not a constant whilst Var_name or VAR_NAME is (I prefer all upper-case to make it absolutely screamingly obvious). This isn't a coding convention that the interpreter pays no heed to, it's an actual language convention, but the really annoying part is that a constant isn't constant.

If in Ruby, you try to assign to an already-defined constant, the interpreter will quite happily let you, but at the same time will warn you that you're doing a bad thing. This half-way house to me makes little sense; either a constant should be constant by language enforcement or completely by convention, especially as it is presumably possible to make optimizations in the interpreter if constants are genuinely constant that isn't the case if they might change.

It is said that Ruby follows the principle of least confusion; in this case, I'm not sure this is the case.

Saturday, 26 June 2010

Flattening lists of lists in Python; List comprehension edition

List comprehensions are a powerful part of Python and I am forever finding new ways to abuse them.

I'm writing a toy package for finding duplicate files across a directory structure, just to build up confidence in using Python. Basically we have a function to generate a list of files to be found in a given directory (and any sub-trees if so required), and several other functions that take a list of files and produce a list of lists of files where each sublist contains the files that match each other on some criteria such as filesize or MD5 hash of the first kilobyte of file or by byte-by-byte comparison.

The later file-matchers produce more accurate matches between files with less false positive matches, but execute much more slowly as a result. In order to reduce the time taken to get a good set of matches across a set of files, you feed the output of the loose-but-fast matchers into the accurate-but-slow matchers, the idea being that each step on the chain reduces the size of input for the next function.

This is where we hit an interesting problem. Each matcher expects a flat list of files to compare whilst producing a list of nested lists. Feeding the output of a matcher as-is into the input of another matcher will cause an error unless the list is flattened first.

For example:

>>> nested_list = [['a','b','c],['d','e','f'],['g','h','i']]

>>> flattened_list = ['a','b','c','d','e','f','g','h','i']


So how do we flatten the list? Python doesn't have a built in flatten operation, presumably because what exactly a programmer wants to flatten in a data structure depends on the data structure and what it is used for. Fortunately for me in my current problem, I want a very basic flatten- so basic it can be done with list comprehensions.

To flatten the nested list in the previous example, we can do this:


>>> comprehensively_flattened_list = [i for sublist in nested_list for i in sublist]
>>> print comprehensively_flattened_list
 ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i']


Hooray!  But wait, will it work if the sublists are variable in length or will the comprehension try to treat each sublist as being the same length?

Only one way to find out...

>>> horrible_list =  [['b','a','d'],['t'],['e','s'],['t','l','i','s','t']]
>>> print horrible_list
[['b', 'a', 'd'], ['t'], ['e', 's'], ['t', 'l', 'i', 's', 't']]
>>> less_horrible_list = [i for sublist in horrible_list for i in sublist]
['b', 'a', 'd', 't', 'e', 's', 't', 'l', 'i', 's', 't']


Great stuff! For nested lists of known, uniform nesting depth, it looks like flattening can be neatly handled by list comprehensions alone. For more complex nested structures, I'm afraid it's back to boring old recursive traverse functions.

I give it a day before I find out that flattening through list comprehensions is a terrible idea.

Wednesday, 16 June 2010

10 PRINT "HELLO WORLD"

I started a general blog about a year ago, talking about life, the universe and my place in it. One of the minor issues is that whenever I get the urge to post something overly programmatic, I feel like I'm derailing the blog. As a result, I've started this new blog for all my hard-core technical gibberings with the other blog remaining vaguely general interest. In theory