Skip to content

Memory leak and incorrect behavior of stop/resume #200

@georgyfarniev

Description

@georgyfarniev

Hello.

Thanks for creating this parser, it works fast and has nice documentation, really wonderful job. Unfortunately, I ran into issues with pausing stream. I need it for writing record to database before continuing.

I noticed incorrect behavior when I call stop/resume in closing tag handler, and I got out of memory crash on big xml file, but unfortunately I cannot provide this file, as it contains corporate information.

So, to reproduce, use scripts below. First script will generate large XML file, and second will show error. While there's 100000 records in sample.xml, parser will show less closed tags (in my case, 3274). If I remove stop and resume calls, it shows 100000 as it should.

On another file (which I cannot provide unfortunately) it causes out of memory crash. I will try to create auto-generated file which can reproduce this issue.

Sample generator:

const fs = require('fs')
const stream = require('stream')

const SAMPLE_RECORDS_COUNT = 100000
const SAMPLE_PATH = './sample.xml'

class DataStream extends stream.Readable {
  constructor() {
    super()
    this.count = 0
  }

  _read() {
    if (this.count === 0) {
      this.push('<?xml version="1.0" encoding="UTF-8"?><root>')
      this.count++
      return;
    }


    this.count++

    if (this.count > SAMPLE_RECORDS_COUNT) {
      this.push('</root>')
      this.push(null)
    } else {
      this.push('<child>text</child>\n')
    }

  }
}

const ws = fs.createWriteStream(SAMPLE_PATH)
const ds = new DataStream()

ds.pipe(ws).on('close', () => {
  console.log('done')
  process.exit()
})

Reproduce:

const fs = require('fs')
const expat = require('node-expat')

console.log('started')

const parser = new expat.Parser('UTF-8')
const rs = fs.createReadStream('./sample.xml')

parser.on('startElement', (elt, attrs) => {
})

let count = 0
parser.on('endElement', async (elt, attrs) => {
  count++
  parser.stop()
  parser.resume()
})

parser.on('error', console.error)
rs.on('end', () => console.log(count))

rs.pipe(parser)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions