Skip to content

[Python] Pyarrow fs incorrectly resolves S3 URIs with white space as a local path #41365

@jaidisido

Description

@jaidisido

Describe the bug, including details regarding any error messages, version, and platform.

Pyarrow fs incorrectly resolves valid S3 URIs with a whitespace as a local path:

from pyarrow.fs import _resolve_filesystem_and_path, FileSystem

uri = "s3://bucket/prefix with space/a=a"

resolved_filesystem, resolved_path = _resolve_filesystem_and_path(uri, None)

resolved_filesystem
<pyarrow._fs.LocalFileSystem at 0x10316ff30>

This causes subsequent calls such as getting the file info to fail:

path_info = resolved_filesystem.get_file_info(resolved_path)

pyarrow.lib.ArrowInvalid: Expected a local filesystem path, got a URI...

A quick look into the method indicates that a LocalFilesytem is chosen by default and returned if alternative filesystems are not detected which seems like a dubious strategy...

I assume this is where the S3 filesystem should be detected but a URI containing a whitespace seems to throw an exception although it's valid:

filesystem, path = FileSystem.from_uri(uri)

Cannot parse URI: 's3://bucket/prefix with space/a=a/'

Component(s)

Python

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions