Large public data set and modeling inference ingestion

Various large public data sets exist on Kaggle, etc. which may be used with ML pipelines.

- determine which data sets would produce large numbers of entries
- add logging to pipelines to log the data to FeatureBase
- use a visualizer example with the data and include queries that can be run to produce graphs