Skip to content

Add local mode support for json scan and json document scan#925

Merged
bohou-aryn merged 1 commit into
mainfrom
jsonscan
Oct 17, 2024
Merged

Add local mode support for json scan and json document scan#925
bohou-aryn merged 1 commit into
mainfrom
jsonscan

Conversation

@bohou-aryn
Copy link
Copy Markdown
Collaborator

No description provided.


documents = []

def process_file(info):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know the actual document parsing is somewhat different between the two json methods, but can we factor out the filesystem stuff? Seems common with BinaryScan.local_source as well, no?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, make sense, would try to factor out as much as possible.

assert doc.properties["props"] == "propValue"
assert isinstance(doc.properties["web-app"], dict)

def test_json_scan_all_props_local(self):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we not have tests for JsonDocumentScan?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, we don't have. Test against the notebook having jsondocumentscan

Comment on lines +211 to +215
if self._is_s3_scheme():
document.properties["path"] = "s3://" + info.path
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this s3 stuff needed needed in the other implementations of process_file?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I keep it the same, json_scan is handled inside the _to_document; for json_document_scan, I assume it's whatever inside the json file.

@bohou-aryn bohou-aryn merged commit ff2b558 into main Oct 17, 2024
@bohou-aryn bohou-aryn deleted the jsonscan branch October 17, 2024 00:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants