-
Notifications
You must be signed in to change notification settings - Fork 4k
Open
Labels
Component: PythonStatus: needs championHigh impact issues which aren't being worked on but require a volunteer to move the task forward.High impact issues which aren't being worked on but require a volunteer to move the task forward.Type: usageIssue is a user questionIssue is a user questiongood-first-issue
Description
Describe the bug, including details regarding any error messages, version, and platform.
Pyarrow fs incorrectly resolves valid S3 URIs with a whitespace as a local path:
from pyarrow.fs import _resolve_filesystem_and_path, FileSystem
uri = "s3://bucket/prefix with space/a=a"
resolved_filesystem, resolved_path = _resolve_filesystem_and_path(uri, None)
resolved_filesystem
<pyarrow._fs.LocalFileSystem at 0x10316ff30>This causes subsequent calls such as getting the file info to fail:
path_info = resolved_filesystem.get_file_info(resolved_path)
pyarrow.lib.ArrowInvalid: Expected a local filesystem path, got a URI...A quick look into the method indicates that a LocalFilesytem is chosen by default and returned if alternative filesystems are not detected which seems like a dubious strategy...
I assume this is where the S3 filesystem should be detected but a URI containing a whitespace seems to throw an exception although it's valid:
filesystem, path = FileSystem.from_uri(uri)
Cannot parse URI: 's3://bucket/prefix with space/a=a/'Component(s)
Python
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
Component: PythonStatus: needs championHigh impact issues which aren't being worked on but require a volunteer to move the task forward.High impact issues which aren't being worked on but require a volunteer to move the task forward.Type: usageIssue is a user questionIssue is a user questiongood-first-issue