|
| 1 | +--- |
| 2 | +title: Multiple datasets |
| 3 | +description: Learn how to use multiple datasets within your Actors to organize and store different types of data separately. |
| 4 | +slug: /actors/development/actor-definition/dataset-schema/multiple-datasets |
| 5 | +--- |
| 6 | + |
| 7 | +**Specify datasets with different structure.** |
| 8 | + |
| 9 | +--- |
| 10 | + |
| 11 | +Some Actors produce data with different structure. In some cases, it's convenient to store the data in separate datasets, instead of pushing all data to the default one. Multiple datasets allow to specify those datasets upfront and enforce validation rules. |
| 12 | + |
| 13 | +New datasets are created when the run starts, and follow it's data-retention. |
| 14 | + |
| 15 | + |
| 16 | +## Defining multiple datasets |
| 17 | + |
| 18 | +The multiple datasets may defined in Actor schema using `datasets` object: |
| 19 | + |
| 20 | +```json title=".actor/actor.json" |
| 21 | +{ |
| 22 | + "actorSpecification": 1, |
| 23 | + "name": "this-is-book-library-scraper", |
| 24 | + "title": "Book Library scraper", |
| 25 | + "version": "1.0.0", |
| 26 | + "storages": { |
| 27 | + "datasets": { |
| 28 | + "products": "./products_dataset_schema.json", |
| 29 | + "categories": "./categories_dataset_schema.json" |
| 30 | + } |
| 31 | + } |
| 32 | +} |
| 33 | +``` |
| 34 | +Schemas of individual datasets can be provided as a file reference or inlined. |
| 35 | + |
| 36 | +The keys of the `datasets` objects are **aliases**, which can be used to refer to the specific datasets. In the example above, we have two datasets, aliased as `products` and `categories`. |
| 37 | + |
| 38 | +:::info |
| 39 | + |
| 40 | +Alias and **name** are not the same thing. Named datasets have specific behavior in Apify platform (eg, the automatic data retention policy does not apply to them). Aliased datasets follow the data retention of their respective run. Aliases stay local to the run they belong to. |
| 41 | + |
| 42 | +::: |
| 43 | + |
| 44 | +The `datasets` object has to contain at least one dataset. The first one specified is treated as the default dataset for all purposes where a default dataset is needed. For this reason, it's automatically aliased also as `default`. In the example above, the `products` dataset is going to be used as the default one. |
| 45 | + |
| 46 | +The `datasets` and `dataset` objects are mutually exclusive, the schema can only contain one. |
| 47 | + |
| 48 | +## Accessing the datasets in Actor code |
| 49 | + |
| 50 | +Mapping of aliases to the IDs is passed to the Actor in JSON encoded `ACTOR_STORAGE_IDS` environment variable. |
| 51 | + |
| 52 | +```javascript |
| 53 | +const storageIds = JSON.parse(process.env.ACTOR_STORAGE_IDS) |
| 54 | +const productsDataset = await Actor.openDataset(storageIds.datasets.products); |
| 55 | +``` |
| 56 | + |
| 57 | +Incoming SDK support: |
| 58 | + |
| 59 | +```javascript |
| 60 | +const productsDataset = await Actor.openDataset({alias: 'products'}); |
| 61 | +``` |
| 62 | + |
| 63 | +## Showing data to users |
| 64 | + |
| 65 | +Actors with output schema can refer to the datasets through variables using aliases: |
| 66 | + |
| 67 | +```json |
| 68 | +{ |
| 69 | + "actorOutputSchemaVersion": 1, |
| 70 | + "title": "Output schema", |
| 71 | + "properties": { |
| 72 | + "products": { |
| 73 | + "type": "string", |
| 74 | + "title": "Products", |
| 75 | + "template": "{{storages.datasets.products.apiUrl}}/items" |
| 76 | + }, |
| 77 | + "categories": { |
| 78 | + "type": "string", |
| 79 | + "title": "Categories", |
| 80 | + "template": "{{storages.datasets.categories.apiUrl}}/items" |
| 81 | + } |
| 82 | + } |
| 83 | +} |
| 84 | +``` |
| 85 | + |
| 86 | +- TODO: Rely on default display |
0 commit comments