June 13, 2011

Challenge: Fastest ‘atoi’ Implementation

Filed under: Technical — Tags: , — James Bunton @ 6:22 pm

At work recently I was writing some code that needs to parse very large input files consisting of decimal numbers. These need to be converted to binary. My first attempt at doing this with boost::lexical_cast() was laughably slow, so I did some experiments to see how fast I could get it.

Update 2011/06/14 – Add C++ and Java solutions and added new numbers for J2 after improvements

The problem

Read decimal digits from a file, separated by newline chars, into a 64bit long.
Generate a sample file like this:

#!/usr/bin/env python
SIZE=200 * 1024 * 1024
DIGITS=1

MIN=10**(DIGITS-1)
MAX=(10**DIGITS)-1
import random
random.seed(0)
f = open("sample%d" % DIGITS, "w")
for x in xrange(SIZE / (DIGITS+1)):
    f.write("%d\n" % random.randint(MIN, MAX))
f.close()

Results:

Run your program against the input more than once and ignore the first results until you get the whole input file in the disk cache. Here are the results of running a few tests piped to /dev/null on my Intel Core 2 Quad Q8400. The sample code was compiled with gcc 4.4.3 from Ubuntu 10.04 using -O3. Times are measured in seconds and are the result of averaging three runs.

Digits md5 X1 X2 X3 J1 J2
1 9edeadf12e178c93358ee3341048b7d8 8.7 17.7 42 1.1 0.5
5 a1d9cd533b121427fcb84acd9f54ded8 3.4 8.2 20 1.0 0.4
10 03d1cf4751f7f3b4526265913d13a028 2.2 5 15 1.0 0.4
15 cb9acdd57e77f98605815cbc80cf9927 1.8 3.9 13 1.0 0.4

Discussion

The goal is to run in less than 2sec, which means we’re processing ~100MB/sec, which is about how fast a hard disk can feed data to the CPU.

X1, X2 and X3 are what I considered to be obvious and reasonably optimised solutions in C, Java and C++. It was surprising to me that none of these were able to keep up with the disk IO.

J1 and J2 are two more optimised solutions I’ve written that I’ll post about later with more discussion. Good luck!

Sample solutions

X1 – stdcatoi.c
#include <stdio.h>
#include <stdlib.h>

int main() {
  char line[32]; // 64bit integers have < 20 digits
  long value;
  while(fgets(line, sizeof(line), stdin)) {
    value = atoll(line);
    fwrite(&value, sizeof(value), 1, stdout);
  }
  return 0;
}
X2 – jatoi.java
import java.io.BufferedReader;
import java.io.BufferedOutputStream;
import java.io.DataOutputStream;
import java.io.InputStreamReader;

public class jatoi {
  public static void main(String[] args) throws Exception {
    DataOutputStream out = new DataOutputStream(new BufferedOutputStream(System.out));
    BufferedReader in = new BufferedReader(new InputStreamReader(System.in));
    String line;
    while((line = in.readLine()) != null) {
      out.writeLong(Long.parseLong(line));
    }
  }
}
X3 – cppatoi.cpp
#include <iostream>
using namespace std;

int main() {
  long x;
  while(cin >> x) {
    cout.write((char*)&x, 8);
  }
  return 0;
}

1 comment

Jono says:

Interesting, though I can’t really test this on the machine I’m using right now – I’ll code something up on a small case and then see how it does tonight :-)

Comments are closed.