TSV parsing is extremely slow

Was just playing around with comparing Julia to Python when it comes to iterating over tab-delimited files, tried it on a 90mb table and on a 800mb table:

On 80mb:
$ python parse.py
0.797344923019

$ julia parse.jl
16789.66212272644

On 800mb:
$ python parse.py
6.59492301941

$ julia parse.jl
129588.43898773193

Here's the code for both Python and Julia-implementation, based on the benchmarks on the mainpage:

``` python
import time

def parse():
    file_handle = open("./2.txt")
    for line in file_handle:
        line = line.split("\t")

tmin = float("inf")
for i in range(5):
    t = time.time()
    parse()
    t = time.time()-t
    if t < tmin:
        tmin = t

print tmin
```

---

Julia:

``` Julia
macro timeit(ex,name)
    quote
        t = Inf
        for i=1:5
            t = min(t, @elapsed $ex)
        end
        println(t*1000)
    end
end

function parse()
file = LineIterator(open("./2.txt"))
    for line in file
        split(line, "\t")
     end    
end

@timeit parse() "parse"
```

Why is this?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

TSV parsing is extremely slow #661

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

TSV parsing is extremely slow #661

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions