Skip to content

TSV parsing is extremely slow #661

@philippbayer

Description

@philippbayer

Was just playing around with comparing Julia to Python when it comes to iterating over tab-delimited files, tried it on a 90mb table and on a 800mb table:

On 80mb:
$ python parse.py
0.797344923019

$ julia parse.jl
16789.66212272644

On 800mb:
$ python parse.py
6.59492301941

$ julia parse.jl
129588.43898773193

Here's the code for both Python and Julia-implementation, based on the benchmarks on the mainpage:

import time

def parse():
    file_handle = open("./2.txt")
    for line in file_handle:
        line = line.split("\t")

tmin = float("inf")
for i in range(5):
    t = time.time()
    parse()
    t = time.time()-t
    if t < tmin:
        tmin = t

print tmin

Julia:

macro timeit(ex,name)
    quote
        t = Inf
        for i=1:5
            t = min(t, @elapsed $ex)
        end
        println(t*1000)
    end
end

function parse()
file = LineIterator(open("./2.txt"))
    for line in file
        split(line, "\t")
     end    
end

@timeit parse() "parse"

Why is this?

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions