-
Notifications
You must be signed in to change notification settings - Fork 3.5k
Description
I am trying to run very simple query:
CREATE TABLE iceberg_catalog.my_schema.my_dest_table
WITH (
format = 'PARQUET',
format_version = 2,
location = 's3://my_bucket/my_schema/my_dest_table'
)
AS
SELECT * from iceberg_catalog.my_schema.my_source_table;
There can be variations: more complex query, partitions in tables but the problem appeared anyway.
Expected behaviour:
- Trino estimate input data
- pick up a piece of data
- process data
- write a piece of process data
- repeat 2 - 4 until all data processed
Expected behaviour:
- Trino reads all data in memory
- Trino write all data to s3
- If memory on a worker is not enough - it fails with
Encountered too many errors talking to a worker node
Note about partitions
When destionation table has partitions writers are scalled much better. But anyway when a lot of data I have to add additional partition filed. For example if table partitioned by day and fails - I have to add bucketing for email bucket(email, 10) - so less data goes to worker and writing completed.
I have tried writer properties but have no effect. Also there it is not mentioned that these setting have no effect when table have partitions.
What is the problem in total
There is no way to "say" to Trino: "slice data into N size butches" or "use this function to scale writes".