Add local mode support for json scan and json document scan#925
Conversation
|
|
||
| documents = [] | ||
|
|
||
| def process_file(info): |
There was a problem hiding this comment.
I know the actual document parsing is somewhat different between the two json methods, but can we factor out the filesystem stuff? Seems common with BinaryScan.local_source as well, no?
There was a problem hiding this comment.
yes, make sense, would try to factor out as much as possible.
| assert doc.properties["props"] == "propValue" | ||
| assert isinstance(doc.properties["web-app"], dict) | ||
|
|
||
| def test_json_scan_all_props_local(self): |
There was a problem hiding this comment.
Do we not have tests for JsonDocumentScan?
There was a problem hiding this comment.
no, we don't have. Test against the notebook having jsondocumentscan
| if self._is_s3_scheme(): | ||
| document.properties["path"] = "s3://" + info.path |
There was a problem hiding this comment.
Is this s3 stuff needed needed in the other implementations of process_file?
There was a problem hiding this comment.
I keep it the same, json_scan is handled inside the _to_document; for json_document_scan, I assume it's whatever inside the json file.
No description provided.