Archive for the ‘Code’ Category

A nice interview on STL, coding and algorithms

Friday, December 25th, 2009

Alexander Stepanov is the main driver behind the OMFG-that’s-amazing Standard Template Library for C++. I rather agree with him on object oriented programming (OOP):

I find OOP technically unsound. It attempts to decompose the world in terms of interfaces that vary on a single type. To deal with the real problems you need multisorted algebras – families of interfaces that span multiple types. I find OOP philosophically unsound. It claims that everything is an object. Even if it is true it is not very interesting – saying that everything is an object is saying nothing at all. I find OOP methodologically wrong. It starts with classes. It is as if mathematicians would start with axioms. You do not start with axioms – you start with proofs. Only when you have found a bunch of related proofs, can you come up with axioms. You end with axioms. The same thing is true in programming: you have to start with interesting algorithms. Only when you understand them well, can you come up with an interface that will let them work.

The rest of the interview is quite good as well.

Paul wins an iPhone!

Thursday, July 23rd, 2009
The iPhone 3GS

The iPhone 3GS

On July 14th, hosting company Engine Yard posted a programming contest that was (presumably) to promote their services and generally create PR. First and second prize included…. new iPhones! (Actually, Dorian pointed the contest out to me – thanks!)

Now, I’ve only tried one programming contest, and it was An Epic Fail. EPIC.

Ahem. Traumatic memories.

Anyway, this time the coding problem was based around the SHA-1 hash algorithm, trying to find a close match to one posted by Engine Yard. What I know about hash algorithms is pretty basic, but we have a couple of friends who do lots of code & crypto, namely Dan Bernstein and Tanja Lange. I figured that if anyone had code laying about, it’d be them and that was in fact the case. I also guessed that CUDA would be the weapon of choice for this sort of pleasingly-parallel application; that also turned out to be correct. I fired off an email and promptly forgot about it in the rush to get ready to travel.

Luckily, Dan and Tanja had time, were interested, had compute resources and (most importantly to me) didn’t want an iPhone. Win!

(Insert scramble to find computers, of which I only got two and they got over 100)

Dan rolled out some seriously awesome code, starting with the OpenSSL SHA-1 codebase. His core2 tweaked version started at 2725 CPU cycles per hash, but by version 8 was down to 201 – a nice factor of 10+ speedup, and fast enough to do 10 million+ keys/second even on my laptop. It wasn’t parallel, so we just ran one version per CPU, which worked quite well.

As if that wasn’t enough, he wrote his first CUDA program and got it down to 35 cycles/hash, meaning that it really screamed. He and Tanja contacted a couple of their Taiwanese collaborators, who had GPUs to run the CUDA code, and we were off to the races. (They wanted the 2,000$ in cloud computing credits if we won, which was of no interest to me, Dan or Tanja – a perfect collaboration.) At peak, we were cranking over 3 billion keys per second, which was a tremendous advantage in a contest reliant on searching an enormous keyspace. You can see all the people, machines, rates and details on the page Tanja wrote for the contest.

It was a blast. We used Skype to text-chat during the contest, subversion for the code and that was all we needed. “We” is a bit strong, since I mainly cheerlead as far as code. (Dan doesn’t believe in comments, it would seem!) One odd thing was that the GPU version of the code gave my MacBook fits:

  • I had to compile it as 32 and not 64 bit, or I’d get compile errors in the C++ ‘new’ header file. That was a head scratcher.
  • You have to add -msse2 the compiler to get the cpucycles.c to compile
  • As before, you need the DYLD fix.
  • With gcc-4.0 and 4.2, running the binary would freeze the laptop. 100% repeatable, required a poweroff. Mouse still worked, but that’s it. The same code worked on the Linux boxes in Taiwan, so I’m stumped here. Other people had GPU code running on OSX, so it may be something in my configuration or the code.
And we won! At the last minute we got lucky, and a Tesla GPU in Taiwan found the winning key. We won by a single bit, but the odds of that were pretty low so it was actually a decent margin.
Of the two Macintoshes I brought in, the laptop (ebauche) found a 34, and the 8-CPU MacPro found a 33.
Technically, the main insight for speed was to pick words that totaled to 64 bytes in length, compute that hash and then feed that into a search of the 5-character keyspace. SHA-1 does blocks of 64 bytes, so this is twice as fast as computing the whole thing every time.
On the coincidence front, the GPUs exercised were quite possibly using some of my brother John’s code, as he does kernel driver development for nVidia. I only thought of this afterward, but it’s kinda cool. It’s a small world sometimes.
I’ve not talked to Engine Yard yet about the phone, and dunno how I’ll deal with service since we’re still on T-Mobile, but I’m still thrilled and can’t wait to get it. With any luck they’ll be willing to pay for the unlocked version…
Friends are wonderful. Dan and Tanja – thanks for a great deal of fun, for letting me claim the prize that was yours for the taking, and for teaching me lots of fast SHA-1 tricks in a very short period of time!

Useful zsh tricks

Friday, June 26th, 2009

I’m using zsh on my mac, mostly for its better completion system. A coworker asked me for a couple of tweaks, the answers took a bit to find so here’s a FYI for the command line nerds out there.

Q1: The cd command tab-completes files, which is totally broken.

Fix:

 compctl -/ cd
Q2: Tab-completion for git commands, please
Ans:

 compctl -k "(add bisect branch checkout co clone commit diff fetch grep init log merge mv pull push rebase reset rm show status tag)" git
The second just says that that ‘these words in the array are the arguments for the git command.’
There’s also the vastly more elaborate code here, which modifies the prompt to show git status when in a git-controlled repository. I’m trying that now.

Q3: Fix the prompt when using virtualenv
Ans: See this post (especially read the comments). Looks like 1.6.1 of virtualenvwrapper and newer should ‘just work.’

Progress on the Arduino

Wednesday, June 10th, 2009

I was having a bad karma day (OSX install failing, RabbitMQ borked, Debian server freezing, apt failures, etc, etc.) and worked a bit on the Arduino project as a break. Here’s the LM35 wired up and working:

img_0189

(That’s my ancient self-designed linear power supply with 5/6/12 and LM7805)

Temp out goes to ADC0. Arduino runs

void setup() {
// initialize the serial communication:
  Serial.begin(9600);
}

void loop() {
  // send the value of analog input 0:
  int val = analogRead(0);
  float fVal = map(val, 0, 1023, 0, 500);
  Serial.println(fVal);
  // wait a bit for the analog-to-digital converter
  // to stabilize after the last reading:
  delay(100);
}

and, on the mac, Processing runs

// Graphing sketch

// This program takes ASCII-encoded strings
// from the serial port at 9600 baud and graphs them. It expects values in the
// range 0 to 1023, followed by a newline, or newline and carriage return

// Created 20 Apr 2005
// Updated 18 Jan 2008
// by Tom Igoe

import processing.serial.*;

Serial myPort;        // The serial port
int xPos = 1;         // horizontal position of the graph

void setup () {
  // List all the available serial ports
  println(Serial.list());
  // I know that the first port in the serial list on my mac
  // is always my  Arduino, so I open Serial.list()[0].
  // Open whatever port is the one you're using.
  myPort = new Serial(this, Serial.list()[0], 9600);
  // don't generate a serialEvent() unless you get a newline character:
  myPort.bufferUntil('\n');
}
void draw () {
  // everything happens in the serialEvent()
}

void serialEvent (Serial myPort) {
  // get the ASCII string:
  String inString = myPort.readStringUntil('\n');

  println(trim(inString));
}

Notes so far:

  1. On OSX 10.5, Processing fails with RXTX errors. The solution, found here, was a new version of librxtxSerial.jnilib, which I copied into /Library/Java/Extensions along with the RXTXcomm.jar
  2. Code has a truncation bug – no fractional degrees.
  3. Arduino is pretty easy to code and use; I’m impressed.
Up next is the more complex SC600 humidity sensor, for which I need capacitors and a bit of wirewrap. Wondering how to plot the captured data on a web page…

Arduino!

Monday, June 8th, 2009

Yep, I finally had a project where I needed a cheap OSX-compatible analog input (gonna build a temp & humidity monitor using LM35CAZ and Ohmic SC-600), and finally had the forehead-slapping moment of

I don’t need an ADC per se; I need a board with ADCs on it.

Yep, for example an Arduino. So I’ve got one, $34, working already. One trick on OSX that tripped me up – you have to open the Arduino.app/Contents/Info.plist file and set the java version from 1.4+ to 1.5 – OSX bug. Otherwise you get RXTX errors, which are a red herring. Here it is, blink code and all:
picture-2
Nice board. I got mine from Sparkfun, and it was here in a few days. Should be easy to hack on, there sure is a lotta chatter around this board. I suspect that the Arduino, just like my beloved mechanical watches, are fun because they’re hackable and comprehensible, two things that modern computers are increasingly lacking.

Python continues to impress

Thursday, February 12th, 2009

I was browsing the Python 2.6 docs and saw the ‘datetime’ library. On a lark, I rewrote my old C-coded ‘days’ app as Python:

#!/usr/bin/env python
# Rewrite of days-old calculator using Python. Gotta love this language!
# pfh 2/12/09

import datetime

bd = datetime.date(year=1968,month=2,day=2)
today = datetime.date.today()

tdiff = today - bd

print("\nToday is %s and you are %s days old.\n" \
% (today.strftime("%A, %B %d %Y"), tdiff.days))

Took me all of ten minutes, and the code is shorter, easy to tweak and such. Damn, I’m liking Python.

Ze power of Python

Thursday, January 22nd, 2009

One of the languages I’ve wanted to learn for a while is Python. There’s a lot of interesting buzz about it, from Dr Chuck and the google app engine launch, and after skimming a book it looked quite cool. Going back a bit, Miro at Fermilab wrote their cluster control software in Python, and he and I worked together as his code was a client to my server. I also hacked around a bit in Sage for the ABC project.

However, for me at least, to learn a language I really need a project. Otherwise, as with Ruby, the new knowledge doesn’t get used and never transitions to permanent memory. When the chance came up recently at work to use Python on a NASA project, I was all over it and have been having an obscenely good time so far. It’s fast to learn, easy to code, almost fun to refactor, and has made tricky tasks quite simple. I’m a fan!

Today I found an interesting article in Embedded Systems, about a new platform from Synapse Wireless that uses a Python-based virtual machine for low-power mesh networks, on 8-bit chips. Similar to the Sun ‘Spot’ platform, but Python instead of Java. 

Those are some of the hardware for sale, there’s also an eval kit:

At 149 for the eval kit, I’m tempted. I’m a little annoyed that it doesn’t include ADC, though. You have to cough up $380 for their ‘network eval kit‘ or $145 for the ‘end device‘ to get that. Which is annoying; the entire idea of a sensor network is to have, y’know, sensors

Anyway.

Unlike the MSP430 ZigBee boards I’ve messed with, this one has some real advantages:

  • Coded in Python! VM only takes 40KB of memory, which is impressive for an 8-bit platform.
  • Write and deploy code interactively from the desktop via wireless – sorta like Spot but easier to use.
  • Nice terminal strips for connecting your own hardware.
I’m quite tempted, as I’ve a project in mind that would benefit from the wireless aspect. I really like the idea of Python as portable code for mesh networks; the complexity of coding these things is greatly reduced that way. Brilliant.

Laptop Go Fast

Sunday, November 9th, 2008

I was fiddling around with the CUDA code on the new Macbook and got some interesting-to-me benchmarks. Keep in mind that this is the cheaper of the Mac laptop lineup, using the slower Nvidia chip:

Yeesh!  1.6GB/sec to the device, and 5.8GB/sec once there!

Not sure how to interpret all of these yet; clearly each core has a fixed allocation of memory to play with. 800MHz seems slow, not sure if that’s correct or not.

16 cores on a laptop! Wow.

This seems to be a benchmark based on the Black-Scholes option pricing PDE. Again over 5GB/sec of usable device bandwidth. Damned impressive.

I’ve yet to try writing code, just running the sample code from the download, but it’s still amazing. Laptops are now portable supercomputers, particularly if you get the Macbook Pro – 24 vs 16 cores, even faster. This is really looking good for tightly-coupled code, it’ll be a challenge to port the lock-free ABC code to this model and make it run well.

How to define ‘ironic’

Tuesday, November 4th, 2008

I read the Joel on Software RSS feed for its occasional tidbits, and yesterday brought a lovely one. He wrote a column for Inc magazine, where he espoused the virtues of his latest effort, a website called Stack Overflow. So far so good, and I’m always happy to see new programmer resources online.

The irony emerges in his boasts of the sites’ stability and scalability:

We’re not going to need big racks of computers; it turns out that Jeff and his programmers were so good that they built a site that could serve 80,000 visitors a day (roughly 755,000 page views) using only one server that costs a few hundred bucks a month.

I should also note that ’stack overflow’ (Wikipedia has a nice page on it) is a bug/error that causes unpredictable failures.

So this morning I go to look and get this:

Yep, site down. Without the boasting it would have just been growing pains, but with the boasting it transcends into hubris and irony. Oops.

CUDA on the MacBook Aluminum

Tuesday, October 28th, 2008

One of the reasons to upgrade to the new MacBook is the new motherboard/chipset combo. It went from an Intel system to an Nvidia one, with the consequent addition of the NV9400M GPU. If you’ve not looked recently, this is a mobile part, which usually means slower and older than desktop chips. This one, however,

With 16 parallel processing cores and a 128-bit memory interface, the GeForce 9400M is able to deliver up to 54 Gigaflops of processor power

(That’s from this article)

54 gigaflops is fifty four billion floating point operations per second. That’s really, really fast. Nvidia has a programming system called CUDA that allows you to tap all of that goodness, and it’s available for OSX too. Version 2.0 works for me, but I had the following two problems that I wanted to mention:

  1. The initial ‘make’ from /Developer/CUDA fails with a library error. Running
     ranlib *
    

    in /Developer/CUDA/lib solves this and then the build succeeds.

  2. The programs then all fail to run with this error
    dyld: Library not loaded: @rpath/libcudart.dylib

    The fix for that, found here, is to define DYLD_LIBRARY_PATH. My zshrc looks like this, also showing the modified path:

    typeset CUDA_HOME=/usr/local/cuda
    
    typeset DYLD_LIBRARY_PATH=$CUDA_HOME/lib
    
    typeset PATH=${HOME}/bin:${ANT_HOME}/bin:/usr/local/bin:$CUDA_HOME/bin:/Library/
    OpenSC/bin:${PATH}

I’m now experimenting to see how fast it runs the examples, so more later. I wonder how well this’d run the alphabet code? Hmm.