Skip to content

Rewrite database importer to work in memory #14

@jlnr

Description

@jlnr

Instead of downloading the full Wikipedia dump, extracting it, then running a ragel script over the XML file, can we just do it all in memory? Pseudocode: curl -s http://dumps.wikimedia.org/.../enwiki-20170220-pages-articles-multistream.xml.bz2 | bzcat | ./extract-movies enwiki?

Rationale: Having 100 GB of free space is a rare occurence for me.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions