-
Notifications
You must be signed in to change notification settings - Fork 65
Full Fledged CLI and Library #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
While a generic script is useful, there was alot of opportunity to stop duplicating the same functions we've written many times. split the functions in the script out into modules and introduce the library structure
tylerhutcherson
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
|
|
||
| The project is brand new and will undergo improvements over time. | ||
| A CLI and Library to help with loading data into Redis specifically for | ||
| usage with RediSearch and Redis Vector Search capabilities |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should back link to our docs probably!
redisvl/cli/load.py
Outdated
|
|
||
| # Create Redis Connection | ||
| logger.info(f"Connecting to {args.host}:{str(args.port)}") | ||
| redis_conn = get_async_redis_connection(args.host, args.port, args.password) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would be good for us to also support env vars too (eventually)
The expectation for saved datasets has now been changed to require that a column in a pickled dataframe have byte vectors already created. This removes the need for the user to specify a column for the vector field. The goal is to eventually have helper functions in the library to make this easy. Lastly, a bunch of project helpers and python project standard items were added.
VoyageAI vectorizer and reranker
Refactor interfaces related to the Redis client in AsyncRedisIndex (and
possibly others).
## Constraints
- We want to initialize the Redis connection lazily, not at object
instantiation or module import.
- Python doesn't have async properties, so properties are limited to
sync function calls
- Therefore, we don't want properties to be part of the interface
- For functions we "remove," we'll leave them in place with a
deprecation warning
- Remaining backwards-compatible until the next major release
## The Design Today
- Can create a client and set it using `index.connect()`
- Pass in a `redis_url`
- Calls `index.set_client()`
- Can pass in a client and set it with `index.set_client()`
- Pass in a client instance
- **Can't** pass in a client instance with `__init__()`
- Or anything URL, etc.
- Can create a client with `index.from_existing()`
- Pass it a `redis_url`
- Calls `index.set_client()`
- **Can't** use index as an async context manager
- Requires explicit resource handling, `atexit` handler
- But this is broken
- The `setup_async_redis()` decorator creates a function wrapper that
calls `validate_async_redis()` with every `set_client()` call
- The `RedisConnectionFactory.connect()` returns incompatible types
(sync, async `Redis`)
- Sync vs. async depends on a parameter
## The Design After Refactor
- **Can** use as an async context manager
- Either disconnect the index manually with `disconnect()` or use it as
a context manager
- `async with Index()...` will `disconnect()` after you exit the context
block
- Lazily instantiate client with `self._get_client()` method ("private")
- Remove `set_client()` public interface if possible
- Lazily instantiate client
- Remove `connect()` public interface if possible
- Lazily instantiate client
- Leave `from_existing()`
- Leave `client()` property, now just returns `._redis_client` and can
be None
- Call `validate_async_redis()` when setting a new client instance for
the first time
- But make this an internal check rather than attached to public
interface
- Allow `redis_url`, `redis_kwargs`, or `redis_client` in `__init__()`
### Examples Post-Refactor
```python
#1: New instance with redis URL
index = AsyncRedisIndex(redis_url="...")
#2: New instance with existing client
index = AsyncRedisIndex(redis_client=client)
#3: New instance with `from_existing`
index = AsyncRedisIndex.from_existing(..., redis_url="...")
#4 Passing both a client and a URL is isallowed
try:
index = AsyncRedisIndex(redis_client=client, redis_url="...")
except:
pass
else:
raise RuntimeError("Should have raised!")
# The client is lazily connected here, and we send telemetry.
await index.something_using_redis()
await index.something_else_using_redis()
# Close the client.
await index.close()
async with AsyncRedisIndex(redis_url="...") as index:
# Same story: client is lazily created now
await index.something_using_redis()
await index.something_else_using_redis()
# Client is automatically closed
# __enter__ is reentrant
async with index:
# Lazily opens a new client connection.
await index.a_third_thing_using_redis()
# Client is automatically closed
```
---------
Co-authored-by: Tyler Hutcherson <[email protected]>
Description
While a generic script is useful, there was alot of opportunity to stop duplicating the same functions we've written many times.
The goal of this is to be able to use this library across our projects and use the cli where and when needed to reduce friction.
Schema
This also adds the ability to create flexible custom schema based on a yaml file. these fields largely follow what the
redis-pylibrary expects.Here is a sample
See the new README for more details.