Improve (or clarify) write scalling for iceberg (and hive?) tables

I am trying to run very simple query:

```
CREATE TABLE iceberg_catalog.my_schema.my_dest_table
  WITH (
         format = 'PARQUET',
         format_version = 2,
         location = 's3://my_bucket/my_schema/my_dest_table'
)
AS
SELECT * from iceberg_catalog.my_schema.my_source_table;
```

There can be variations: more complex query, partitions in tables but the problem appeared anyway.

Expected behaviour: 

1. Trino estimate input data
2. pick up a piece of data
3. process data
4. write a piece of process data
5. repeat 2 - 4 until all data processed

Expected behaviour: 

1. Trino reads all data in memory
2. Trino write all data to s3
3. If memory on a worker is not enough - it fails with `Encountered too many errors talking to a worker node`

Note about partitions

When destionation table has partitions writers are scalled much better. But anyway when a lot of data I have to add additional partition filed. For example if table partitioned by day and fails - I have to add bucketing for email `bucket(email, 10)` - so less data goes to worker and writing completed.

I have tried [writer properties](https://trino.io/docs/current/admin/properties-writer-scaling.html) but have no effect. Also there it is not mentioned that these setting have no effect when table have partitions.

**What is the problem in total**

There is no way to "say" to Trino: "slice data into N size butches" or "use this function to scale writes".

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve (or clarify) write scalling for iceberg (and hive?) tables #28465

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Improve (or clarify) write scalling for iceberg (and hive?) tables #28465

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions