Deprecate CSVDataSource.#1009
Conversation
kroenlein
left a comment
There was a problem hiding this comment.
Approve, subject to the context we've spoken about where I need CSVDataSources for high-throughput benchmarking.
| from citrine.informatics.descriptors import RealDescriptor | ||
| from citrine.informatics.predictors import AutoMLPredictor | ||
|
|
||
| data_source = GemTableDataSource(table_id=training_data_table_uid, table_version=1) |
There was a problem hiding this comment.
Would it make more sense to document this via the from_gemtable method?
| data_source = GemTableDataSource(table_id=training_data_table_uid, table_version=1) | |
| data_source = GemTableDataSource.from_gemtable(gemtable) |
There was a problem hiding this comment.
We already have a bunch of places across multiple files that construct it directly, most of which use these placeholder variable names. So I figure consistency wins out here.
Unless you're suggesting we make that change everywhere the ID and version aren't already available?
There was a problem hiding this comment.
That's what I was feeling, but I'm not going to block on it.
There was a problem hiding this comment.
I have some other doc updates to make for 4.0, so I may tack it on in that PR.
| "Poisson\'s Ratio": poissons_ratio | ||
| } | ||
| ) | ||
| training_data = GemTableDataSource(table_id=training_data_table_uid, table_version=1) |
There was a problem hiding this comment.
As above
| training_data = GemTableDataSource(table_id=training_data_table_uid, table_version=1) | |
| training_data = GemTableDataSource.from_gemtable(gemtable) |
| identifiers=['Ingredient id'] | ||
| ) | ||
| # Once you've ingested the data to the platform, plug in its table ID and version. | ||
| data_source = GemTableDataSource(table_id=training_data_table_uid, table_version=1) |
There was a problem hiding this comment.
As above
| data_source = GemTableDataSource(table_id=training_data_table_uid, table_version=1) | |
| data_source = GemTableDataSource.from_gemtable(gemtable) |
| if isinstance(data_source, CSVDataSource): | ||
| # TODO: There's no obvious way to recover the column_definitions & identifiers from the ID | ||
| with pytest.warns(UserWarning): | ||
| transformed = DataSource.from_data_source_id(data_source.to_data_source_id()) | ||
| assert isinstance(data_source, CSVDataSource) | ||
| assert transformed.file_link == data_source.file_link | ||
| else: | ||
| assert data_source == DataSource.from_data_source_id(data_source.to_data_source_id()) |
There was a problem hiding this comment.
I remember being so disappointed when I put that mess in there. Glad this is going away.
| GemTableDataSource(table_id=uuid.uuid4(), table_version="2"), | ||
| GemTableDataSource(table_id=uuid.uuid4(), table_version="2"), |
There was a problem hiding this comment.
Why are there two identical GemTableDataSources here? Obviously no relevant to immediate work.
b1c5a15 to
866c318
Compare
It has long been supplanted by GemTableDataSource, which is itself growing long in the tooth.
866c318 to
b7aff0f
Compare
kroenlein
left a comment
There was a problem hiding this comment.
This all looks good to me.
It has long been supplanted by GemTableDataSource, which is itself growing long in the tooth.
Citrine Python PR
Description
Please briefly explain the goal of the changes/this PR.
The reviewer should be able to understand why the change is being made by reading this description
and its links (e.g. JIRA tickets).
PR Type:
Adherence to team decisions