Skip to content

Zilla crashes when a large number of MQTT clients connect #793

@vordimous

Description

@vordimous

Describe the bug
Running the taxi-demo and load_test.sh script simulates a large number of connected MQTT clients, producing ~100k messages within a few min. Zilla crashes with the below error:

org.agrona.concurrent.AgentTerminationException: java.lang.NullPointerException: Cannot read field "initialAck" because "stream" is null
    at io.aklivity.zilla.runtime.engine@0.9.60/io.aklivity.zilla.runtime.engine.internal.registry.DispatchAgent.doWork(DispatchAgent.java:707)
    at org.agrona.core/org.agrona.concurrent.AgentRunner.doDutyCycle(AgentRunner.java:291)
    at org.agrona.core/org.agrona.concurrent.AgentRunner.run(AgentRunner.java:164)
    at java.base/java.lang.Thread.run(Thread.java:1623)
Caused by: java.lang.NullPointerException: Cannot read field "initialAck" because "stream" is null
    at io.aklivity.zilla.runtime.binding.kafka@0.9.60/io.aklivity.zilla.runtime.binding.kafka.internal.stream.KafkaClientConnectionPool$KafkaClientConnection.onConnectionWindow(KafkaClientConnectionPool.java:1574)
    at io.aklivity.zilla.runtime.binding.kafka@0.9.60/io.aklivity.zilla.runtime.binding.kafka.internal.stream.KafkaClientConnectionPool$KafkaClientConnection.onConnectionMessage(KafkaClientConnectionPool.java:1383)
    at io.aklivity.zilla.runtime.engine@0.9.60/io.aklivity.zilla.runtime.engine.internal.registry.DispatchAgent.handleReadInitial(DispatchAgent.java:1106)
    at io.aklivity.zilla.runtime.engine@0.9.60/io.aklivity.zilla.runtime.engine.internal.registry.DispatchAgent.handleRead(DispatchAgent.java:1041)
    at io.aklivity.zilla.runtime.engine@0.9.60/io.aklivity.zilla.runtime.engine.internal.concurent.ManyToOneRingBuffer.read(ManyToOneRingBuffer.java:181)
    at io.aklivity.zilla.runtime.engine@0.9.60/io.aklivity.zilla.runtime.engine.internal.registry.DispatchAgent.doWork(DispatchAgent.java:701)
    ... 3 more
    Suppressed: java.lang.Exception: [engine/data#3]        [0x03030000000003e7] streams=[consumeAt=0x00644218 (0x0000000000644218), produceAt=0x006447b0 (0x00000000006447b0)]
            at io.aklivity.zilla.runtime.engine@0.9.60/io.aklivity.zilla.runtime.engine.internal.registry.DispatchAgent.doWork(DispatchAgent.java:705)
            ... 3 more

To Reproduce
Steps to reproduce the behavior:

  1. Go to taxi-demo on the load-test branch
  2. Run docker compose build
  3. Follow the demo instructions to start the demo
  4. Review the load testing instructions
  5. with replication set to 300, run the load_test.sh script 2-5 times or until Zilla throws an error.

Expected behavior
Zilla should handle this many clients

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions