Skip to content

Commit 52a36f8

Browse files
committed
GH-49656: [Ruby] Add benchmark for writers
Performance is important in Apache Arrow. So benchmark is useful for developing Apache Arrow implementation. * Add benchmarks for file and streaming writers. * Remove redundant type arguments from array constructors. Here are benchmark results on my environment. Pure Ruby implementation is about 2-2.5x slower than release build C++ implementation but about 2-2.5x faster than debug build C++ implementation. Release build C++/GLib: File format: ```console $ ruby -v -S benchmark-driver ruby/red-arrow-format/benchmark/file-writer.yaml ruby 4.1.0dev (2026-03-26T07:27:31Z master c5ab2114df) +PRISM [x86_64-linux] Warming up -------------------------------------- Arrow::Table#save 348.499 i/s - 374.000 times in 1.073175s (2.87ms/i) Arrow::RecordBatchFileWriter 353.426 i/s - 385.000 times in 1.089337s (2.83ms/i) ArrowFormat::FileWriter 133.293 i/s - 140.000 times in 1.050314s (7.50ms/i) Calculating ------------------------------------- Arrow::Table#save 336.984 i/s - 1.045k times in 3.101035s (2.97ms/i) Arrow::RecordBatchFileWriter 338.695 i/s - 1.060k times in 3.129655s (2.95ms/i) ArrowFormat::FileWriter 134.640 i/s - 399.000 times in 2.963462s (7.43ms/i) Comparison: Arrow::RecordBatchFileWriter: 338.7 i/s Arrow::Table#save: 337.0 i/s - 1.01x slower ArrowFormat::FileWriter: 134.6 i/s - 2.52x slower ``` Streaming format: ```console $ ruby -v -S benchmark-driver ruby/red-arrow-format/benchmark/streaming-writer.yaml ruby 4.1.0dev (2026-03-26T07:27:31Z master c5ab2114df) +PRISM [x86_64-linux] Warming up -------------------------------------- Arrow::Table#save 356.995 i/s - 385.000 times in 1.078447s (2.80ms/i) Arrow::RecordBatchStreamWriter 347.891 i/s - 374.000 times in 1.075050s (2.87ms/i) ArrowFormat::StreamingWriter 156.709 i/s - 160.000 times in 1.021004s (6.38ms/i) Calculating ------------------------------------- Arrow::Table#save 350.743 i/s - 1.070k times in 3.050665s (2.85ms/i) Arrow::RecordBatchStreamWriter 345.821 i/s - 1.043k times in 3.016011s (2.89ms/i) ArrowFormat::StreamingWriter 160.022 i/s - 470.000 times in 2.937090s (6.25ms/i) Comparison: Arrow::Table#save: 350.7 i/s Arrow::RecordBatchStreamWriter: 345.8 i/s - 1.01x slower ArrowFormat::StreamingWriter: 160.0 i/s - 2.19x slower ``` Debug build C++/GLib: File format: ```console $ ruby -v -S benchmark-driver ruby/red-arrow-format/benchmark/file-writer.yaml ruby 4.1.0dev (2026-03-26T07:27:31Z master c5ab2114df) +PRISM [x86_64-linux] Warming up -------------------------------------- Arrow::Table#save 63.290 i/s - 66.000 times in 1.042815s (15.80ms/i) Arrow::RecordBatchFileWriter 62.655 i/s - 66.000 times in 1.053389s (15.96ms/i) ArrowFormat::FileWriter 138.082 i/s - 140.000 times in 1.013891s (7.24ms/i) Calculating ------------------------------------- Arrow::Table#save 63.165 i/s - 189.000 times in 2.992143s (15.83ms/i) Arrow::RecordBatchFileWriter 61.773 i/s - 187.000 times in 3.027220s (16.19ms/i) ArrowFormat::FileWriter 134.709 i/s - 414.000 times in 3.073285s (7.42ms/i) Comparison: ArrowFormat::FileWriter: 134.7 i/s Arrow::Table#save: 63.2 i/s - 2.13x slower Arrow::RecordBatchFileWriter: 61.8 i/s - 2.18x slower ``` Streaming format: ```console $ ruby -v -S benchmark-driver ruby/red-arrow-format/benchmark/streaming-writer.yaml ruby 4.1.0dev (2026-03-26T07:27:31Z master c5ab2114df) +PRISM [x86_64-linux] Warming up -------------------------------------- Arrow::Table#save 63.252 i/s - 66.000 times in 1.043439s (15.81ms/i) Arrow::RecordBatchStreamWriter 61.272 i/s - 66.000 times in 1.077162s (16.32ms/i) ArrowFormat::StreamingWriter 152.598 i/s - 160.000 times in 1.048506s (6.55ms/i) Calculating ------------------------------------- Arrow::Table#save 61.016 i/s - 189.000 times in 3.097525s (16.39ms/i) Arrow::RecordBatchStreamWriter 63.024 i/s - 183.000 times in 2.903642s (15.87ms/i) ArrowFormat::StreamingWriter 160.416 i/s - 457.000 times in 2.848846s (6.23ms/i) Comparison: ArrowFormat::StreamingWriter: 160.4 i/s Arrow::RecordBatchStreamWriter: 63.0 i/s - 2.55x slower Arrow::Table#save: 61.0 i/s - 2.63x slower ```
1 parent ddc4229 commit 52a36f8

File tree

4 files changed

+293
-25
lines changed

4 files changed

+293
-25
lines changed
Lines changed: 89 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,89 @@
1+
# Licensed to the Apache Software Foundation (ASF) under one
2+
# or more contributor license agreements. See the NOTICE file
3+
# distributed with this work for additional information
4+
# regarding copyright ownership. The ASF licenses this file
5+
# to you under the Apache License, Version 2.0 (the
6+
# "License"); you may not use this file except in compliance
7+
# with the License. You may obtain a copy of the License at
8+
#
9+
# http://www.apache.org/licenses/LICENSE-2.0
10+
#
11+
# Unless required by applicable law or agreed to in writing,
12+
# software distributed under the License is distributed on an
13+
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
14+
# KIND, either express or implied. See the License for the
15+
# specific language governing permissions and limitations
16+
# under the License.
17+
18+
prelude: |
19+
Warning[:experimental] = false
20+
21+
require "arrow"
22+
require "arrow-format"
23+
24+
seed = 29
25+
random = Random.new(seed)
26+
27+
n_columns = 100
28+
n_rows = 10000
29+
max_uint32 = 2 ** 32 - 1
30+
arrays = n_columns.times.collect do |i|
31+
if i.even?
32+
Arrow::UInt32Array.new(n_rows.times.collect {random.rand(max_uint32)})
33+
else
34+
Arrow::BinaryArray.new(n_rows.times.collect {random.bytes(random.rand(10))})
35+
end
36+
end
37+
columns = arrays.collect.with_index {|array, i| [i, array]}
38+
red_arrow_table = Arrow::Table.new(columns)
39+
40+
fields = arrays.collect.with_index do |array, i|
41+
case array
42+
when Arrow::UInt32Array
43+
type = ArrowFormat::UInt32Type.singleton
44+
when Arrow::BinaryArray
45+
type = ArrowFormat::BinaryType.singleton
46+
end
47+
ArrowFormat::Field.new(i.to_s, type)
48+
end
49+
schema = ArrowFormat::Schema.new(fields)
50+
def convert_buffer(buffer)
51+
return nil if buffer.nil?
52+
IO::Buffer.for(buffer.data.to_s.dup)
53+
end
54+
columns = fields.zip(arrays).collect do |field, array|
55+
case array
56+
when Arrow::UInt32Array
57+
field.type.build_array(n_rows,
58+
convert_buffer(array.null_bitmap),
59+
convert_buffer(array.data_buffer))
60+
when Arrow::BinaryArray
61+
field.type.build_array(n_rows,
62+
convert_buffer(array.null_bitmap),
63+
convert_buffer(array.offsets_buffer),
64+
convert_buffer(array.data_buffer))
65+
end
66+
end
67+
red_arrow_format_record_batch =
68+
ArrowFormat::RecordBatch.new(schema, n_rows, columns)
69+
70+
GC.start
71+
GC.disable
72+
benchmark:
73+
"Arrow::Table#save": |
74+
buffer = Arrow::ResizableBuffer.new(4096)
75+
red_arrow_table.save(buffer, format: :arrow_file)
76+
"Arrow::RecordBatchFileWriter": |
77+
buffer = Arrow::ResizableBuffer.new(4096)
78+
Arrow::BufferOutputStream.open(buffer) do |output|
79+
schema = red_arrow_table.schema
80+
Arrow::RecordBatchFileWriter.open(output, schema) do |writer|
81+
writer.write_table(red_arrow_table)
82+
end
83+
end
84+
"ArrowFormat::FileWriter": |
85+
output = +"".b
86+
writer = ArrowFormat::FileWriter.new(output)
87+
writer.start(red_arrow_format_record_batch.schema)
88+
writer.write_record_batch(red_arrow_format_record_batch)
89+
writer.finish
Lines changed: 89 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,89 @@
1+
# Licensed to the Apache Software Foundation (ASF) under one
2+
# or more contributor license agreements. See the NOTICE file
3+
# distributed with this work for additional information
4+
# regarding copyright ownership. The ASF licenses this file
5+
# to you under the Apache License, Version 2.0 (the
6+
# "License"); you may not use this file except in compliance
7+
# with the License. You may obtain a copy of the License at
8+
#
9+
# http://www.apache.org/licenses/LICENSE-2.0
10+
#
11+
# Unless required by applicable law or agreed to in writing,
12+
# software distributed under the License is distributed on an
13+
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
14+
# KIND, either express or implied. See the License for the
15+
# specific language governing permissions and limitations
16+
# under the License.
17+
18+
prelude: |
19+
Warning[:experimental] = false
20+
21+
require "arrow"
22+
require "arrow-format"
23+
24+
seed = 29
25+
random = Random.new(seed)
26+
27+
n_columns = 100
28+
n_rows = 10000
29+
max_uint32 = 2 ** 32 - 1
30+
arrays = n_columns.times.collect do |i|
31+
if i.even?
32+
Arrow::UInt32Array.new(n_rows.times.collect {random.rand(max_uint32)})
33+
else
34+
Arrow::BinaryArray.new(n_rows.times.collect {random.bytes(random.rand(10))})
35+
end
36+
end
37+
columns = arrays.collect.with_index {|array, i| [i, array]}
38+
red_arrow_table = Arrow::Table.new(columns)
39+
40+
fields = arrays.collect.with_index do |array, i|
41+
case array
42+
when Arrow::UInt32Array
43+
type = ArrowFormat::UInt32Type.singleton
44+
when Arrow::BinaryArray
45+
type = ArrowFormat::BinaryType.singleton
46+
end
47+
ArrowFormat::Field.new(i.to_s, type)
48+
end
49+
schema = ArrowFormat::Schema.new(fields)
50+
def convert_buffer(buffer)
51+
return nil if buffer.nil?
52+
IO::Buffer.for(buffer.data.to_s.dup)
53+
end
54+
columns = fields.zip(arrays).collect do |field, array|
55+
case array
56+
when Arrow::UInt32Array
57+
field.type.build_array(n_rows,
58+
convert_buffer(array.null_bitmap),
59+
convert_buffer(array.data_buffer))
60+
when Arrow::BinaryArray
61+
field.type.build_array(n_rows,
62+
convert_buffer(array.null_bitmap),
63+
convert_buffer(array.offsets_buffer),
64+
convert_buffer(array.data_buffer))
65+
end
66+
end
67+
red_arrow_format_record_batch =
68+
ArrowFormat::RecordBatch.new(schema, n_rows, columns)
69+
70+
GC.start
71+
GC.disable
72+
benchmark:
73+
"Arrow::Table#save": |
74+
buffer = Arrow::ResizableBuffer.new(4096)
75+
red_arrow_table.save(buffer, format: :arrow_streaming)
76+
"Arrow::RecordBatchStreamWriter": |
77+
buffer = Arrow::ResizableBuffer.new(4096)
78+
Arrow::BufferOutputStream.open(buffer) do |output|
79+
schema = red_arrow_table.schema
80+
Arrow::RecordBatchStreamWriter.open(output, schema) do |writer|
81+
writer.write_table(red_arrow_table)
82+
end
83+
end
84+
"ArrowFormat::StreamingWriter": |
85+
output = +"".b
86+
writer = ArrowFormat::StreamingWriter.new(output)
87+
writer.start(red_arrow_format_record_batch.schema)
88+
writer.write_record_batch(red_arrow_format_record_batch)
89+
writer.finish

ruby/red-arrow-format/lib/arrow-format/array.rb

Lines changed: 97 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -140,8 +140,8 @@ def slice_offsets_buffer(id, buffer, buffer_type)
140140
end
141141

142142
class NullArray < Array
143-
def initialize(type, size)
144-
super(type, size, nil)
143+
def initialize(size)
144+
super(NullType.singleton, size, nil)
145145
end
146146

147147
def each_buffer
@@ -186,6 +186,10 @@ def element_size
186186
end
187187

188188
class BooleanArray < PrimitiveArray
189+
def initialize(size, validity_buffer, values_buffer)
190+
super(BooleanType.singleton, size, validity_buffer, values_buffer)
191+
end
192+
189193
def to_a
190194
return [] if empty?
191195

@@ -209,51 +213,120 @@ def clear_cache
209213
end
210214

211215
class IntArray < PrimitiveArray
216+
def initialize(size, validity_buffer, values_buffer)
217+
super(self.class.type, size, validity_buffer, values_buffer)
218+
end
212219
end
213220

214221
class Int8Array < IntArray
222+
class << self
223+
def type
224+
Int8Type.singleton
225+
end
226+
end
215227
end
216228

217229
class UInt8Array < IntArray
230+
class << self
231+
def type
232+
UInt8Type.singleton
233+
end
234+
end
218235
end
219236

220237
class Int16Array < IntArray
238+
class << self
239+
def type
240+
Int16Type.singleton
241+
end
242+
end
221243
end
222244

223245
class UInt16Array < IntArray
246+
class << self
247+
def type
248+
UInt16Type.singleton
249+
end
250+
end
224251
end
225252

226253
class Int32Array < IntArray
254+
class << self
255+
def type
256+
Int32Type.singleton
257+
end
258+
end
227259
end
228260

229261
class UInt32Array < IntArray
262+
class << self
263+
def type
264+
UInt32Type.singleton
265+
end
266+
end
230267
end
231268

232269
class Int64Array < IntArray
270+
class << self
271+
def type
272+
Int64Type.singleton
273+
end
274+
end
233275
end
234276

235277
class UInt64Array < IntArray
278+
class << self
279+
def type
280+
UInt64Type.singleton
281+
end
282+
end
236283
end
237284

238285
class FloatingPointArray < PrimitiveArray
286+
def initialize(size, validity_buffer, values_buffer)
287+
super(self.class.type, size, validity_buffer, values_buffer)
288+
end
239289
end
240290

241291
class Float32Array < FloatingPointArray
292+
class << self
293+
def type
294+
Float32Type.singleton
295+
end
296+
end
242297
end
243298

244299
class Float64Array < FloatingPointArray
300+
class << self
301+
def type
302+
Float64Type.singleton
303+
end
304+
end
245305
end
246306

247307
class TemporalArray < PrimitiveArray
248308
end
249309

250310
class DateArray < TemporalArray
311+
def initialize(size, validity_buffer, values_buffer)
312+
super(self.class.type, size, validity_buffer, values_buffer)
313+
end
251314
end
252315

253316
class Date32Array < DateArray
317+
class << self
318+
def type
319+
Date32Type.singleton
320+
end
321+
end
254322
end
255323

256324
class Date64Array < DateArray
325+
class << self
326+
def type
327+
Date64Type.singleton
328+
end
329+
end
257330
end
258331

259332
class TimeArray < TemporalArray
@@ -318,8 +391,8 @@ class DurationArray < TemporalArray
318391
end
319392

320393
class VariableSizeBinaryArray < Array
321-
def initialize(type, size, validity_buffer, offsets_buffer, values_buffer)
322-
super(type, size, validity_buffer)
394+
def initialize(size, validity_buffer, offsets_buffer, values_buffer)
395+
super(self.class.type, size, validity_buffer)
323396
@offsets_buffer = offsets_buffer
324397
@values_buffer = values_buffer
325398
end
@@ -364,18 +437,38 @@ def offset_size
364437
end
365438

366439
class BinaryArray < VariableSizeBinaryArray
440+
class << self
441+
def type
442+
BinaryType.singleton
443+
end
444+
end
367445
end
368446

369447
class LargeBinaryArray < VariableSizeBinaryArray
448+
class << self
449+
def type
450+
LargeBinaryType.singleton
451+
end
452+
end
370453
end
371454

372455
class VariableSizeUTF8Array < VariableSizeBinaryArray
373456
end
374457

375458
class UTF8Array < VariableSizeUTF8Array
459+
class << self
460+
def type
461+
UTF8Type.singleton
462+
end
463+
end
376464
end
377465

378466
class LargeUTF8Array < VariableSizeUTF8Array
467+
class << self
468+
def type
469+
LargeUTF8Type.singleton
470+
end
471+
end
379472
end
380473

381474
class FixedSizeBinaryArray < Array

0 commit comments

Comments
 (0)