Flink: put everything together for range distribution in Flink sink #10859

stevenzwu · 2024-08-02T17:28:41Z

last PR for put everything together from the project: [Priority 2] Flink: support range distribution (view)

docs/docs/flink-configuration.md

stevenzwu · 2024-08-02T17:41:20Z

flink/v1.19/flink/src/main/java/org/apache/iceberg/flink/FlinkWriteOptions.java

+          .defaultValue(StatisticsType.Auto.name());
+
+  public static final ConfigOption<Double> CLOSE_FILE_COST_WEIGHT_PERCENTAGE =
+      ConfigOptions.key("close-file-cost-weight-percentage").doubleType().defaultValue(0.02d);


open to feedback on the config name, type, and default value. 0.02 means 2% of close file weight on the target weight per task. it avoids placing more than 50 files in one writer task.

Also keeps open 50 writers (which means high memory footprint).

Does 0.02d mean 2 percent? In this case we can use close-file-cost-weight.
Or alternatively if percentage is in the name, then the value should be 2

Yes, 0.02d meant 2%. I didn't go with the percentage in the naming and integer as value in order to get a bit more flexibility like maybe like 0.005d for 0.5%.

agree with the comment on naming that is probably not the best. maybe close-file-cost-weight. In the doc, we already explained 0.02d meant 2%.

I renamed this config to RANGE_DISTRIBUTION_SORT_KEY_BASE_WEIGHT as I think it is more accurate. added more extensive Javadoc and explanation in the doc. hope that it is more clear to users.

Will follow up with a separate PR to change internal code from closeFileCost to sortKeyBaseWeight. it will touch a bunch of internal files and lines.

stevenzwu · 2024-08-02T17:42:20Z

flink/v1.19/flink/src/main/java/org/apache/iceberg/flink/sink/FlinkSink.java


-      // Convert the requested flink table schema to flink row type.
      RowType flinkRowType = toFlinkRowType(table.schema(), tableSchema);
+      int writerParallelism =


writer parallelism is also needed by distributeDataStream method as the downstream operator parallelism for range partitioner.

docs/docs/flink-writes.md

flink/v1.19/flink/src/main/java/org/apache/iceberg/flink/sink/FlinkSink.java

.../flink/src/test/java/org/apache/iceberg/flink/sink/TestFlinkIcebergSinkDistributionMode.java

pvary · 2024-08-06T10:30:02Z

@rodmeneses: This will effect your PR as well. Please sync with @stevenzwu about the order of the commits

stevenzwu · 2024-08-06T15:36:42Z

@rodmeneses: This will effect your PR as well. Please sync with @stevenzwu about the order of the commits

I don't think we should worry about the order. we can integrate the range distribution with the v2 sink separately after the v2 sink is merged.

docs/docs/flink-writes.md

docs/docs/flink-configuration.md

docs/docs/flink-writes.md

flink/v1.19/flink/src/main/java/org/apache/iceberg/flink/sink/FlinkSink.java

pvary

A few nits, but LGTM

yegangy0718 · 2024-08-18T06:32:29Z

docs/docs/flink-configuration.md

+
+Config value is a enum type: `Map`, `Sketch`, `Auto`.
+<ul>
+<li>Map: collect accurate sampling count for every single key.


collect -> collects

yegangy0718 · 2024-08-18T06:41:00Z

docs/docs/flink-writes.md

+
+### Range distribution (experimental)
+
+RANGE distribution shuffle data by partition key or sort order via a custom range partitioner.


shuffle -> shuffles ?

yegangy0718 · 2024-08-18T06:50:30Z

docs/docs/flink-writes.md

+
+#### Use cases
+
+RANGE distribution can be applied an Iceberg table that either is partitioned or


can be applied -> can be applied to

yegangy0718 · 2024-08-18T07:03:29Z

flink/v1.19/flink/src/main/java/org/apache/iceberg/flink/FlinkWriteOptions.java

+          .withDescription("Type of statistics collection: Auto, Map, Sketch");
+
+  public static final ConfigOption<Double> RANGE_DISTRIBUTION_SORT_KEY_BASE_WEIGHT =
+      ConfigOptions.key("ange-distribution-sort-key-base-weight")


ange-distribution-sort-key-base-weight -> range-distribution-sort-key-base-weight

yegangy0718 · 2024-08-19T04:36:50Z

docs/docs/flink-writes.md

+<li>For low cardinality scenario (like hundreds or thousands),
+HashMap is used to track traffic distribution for every key.
+If a new sort key value shows up, range partitioner would just
+round-robin it to the writer tasks before traffic distribution has been learned.


learned. -> learned, let's remove the extra .

…ight as it is more accurate and probably more user friendly.

…pache#10859)

…pache#10859) (cherry picked from commit ed07fd1)

(cherry picked from commit ce772a6)

stevenzwu requested a review from pvary August 2, 2024 17:28

github-actions bot added flink docs labels Aug 2, 2024

stevenzwu commented Aug 2, 2024

View reviewed changes