Challenge: Fastest ‘atoi’ Implementation
At work recently I was writing some code that needs to parse very large input files consisting of decimal numbers. These need to be converted to binary. My first attempt at doing this with boost::lexical_cast() was laughably slow, so I did some experiments to see how fast I could get it.
Update 2011/06/14 – Add C++ and Java solutions and added new numbers for J2 after improvements
The problem
Read decimal digits from a file, separated by newline chars, into a 64bit long.
Generate a sample file like this:
#!/usr/bin/env python SIZE=200 * 1024 * 1024 DIGITS=1 MIN=10**(DIGITS-1) MAX=(10**DIGITS)-1 import random random.seed(0) f = open("sample%d" % DIGITS, "w") for x in xrange(SIZE / (DIGITS+1)): f.write("%d\n" % random.randint(MIN, MAX)) f.close()
Results:
Run your program against the input more than once and ignore the first results until you get the whole input file in the disk cache. Here are the results of running a few tests piped to /dev/null on my Intel Core 2 Quad Q8400. The sample code was compiled with gcc 4.4.3 from Ubuntu 10.04 using -O3. Times are measured in seconds and are the result of averaging three runs.
Digits | md5 | X1 | X2 | X3 | J1 | J2 |
1 | 9edeadf12e178c93358ee3341048b7d8 | 8.7 | 17.7 | 42 | 1.1 | 0.5 |
5 | a1d9cd533b121427fcb84acd9f54ded8 | 3.4 | 8.2 | 20 | 1.0 | 0.4 |
10 | 03d1cf4751f7f3b4526265913d13a028 | 2.2 | 5 | 15 | 1.0 | 0.4 |
15 | cb9acdd57e77f98605815cbc80cf9927 | 1.8 | 3.9 | 13 | 1.0 | 0.4 |
Discussion
The goal is to run in less than 2sec, which means we’re processing ~100MB/sec, which is about how fast a hard disk can feed data to the CPU.
X1, X2 and X3 are what I considered to be obvious and reasonably optimised solutions in C, Java and C++. It was surprising to me that none of these were able to keep up with the disk IO.
J1 and J2 are two more optimised solutions I’ve written that I’ll post about later with more discussion. Good luck!
Sample solutions
X1 – stdcatoi.c
#include <stdio.h> #include <stdlib.h> int main() { char line[32]; // 64bit integers have < 20 digits long value; while(fgets(line, sizeof(line), stdin)) { value = atoll(line); fwrite(&value, sizeof(value), 1, stdout); } return 0; }
X2 – jatoi.java
import java.io.BufferedReader; import java.io.BufferedOutputStream; import java.io.DataOutputStream; import java.io.InputStreamReader; public class jatoi { public static void main(String[] args) throws Exception { DataOutputStream out = new DataOutputStream(new BufferedOutputStream(System.out)); BufferedReader in = new BufferedReader(new InputStreamReader(System.in)); String line; while((line = in.readLine()) != null) { out.writeLong(Long.parseLong(line)); } } }
X3 – cppatoi.cpp
#include <iostream> using namespace std; int main() { long x; while(cin >> x) { cout.write((char*)&x, 8); } return 0; }
Interesting, though I can’t really test this on the machine I’m using right now – I’ll code something up on a small case and then see how it does tonight :-)