Skip to content

Comments

feat: ORM SQLAlchemyDataLayer#2760

Open
hayescode wants to merge 4 commits intomainfrom
feat/sqlalchemy_orm
Open

feat: ORM SQLAlchemyDataLayer#2760
hayescode wants to merge 4 commits intomainfrom
feat/sqlalchemy_orm

Conversation

@hayescode
Copy link
Contributor

@hayescode hayescode commented Jan 17, 2026

Refactor SQLAlchemyDataLayer to use ORM

Summary

Refactors sql_alchemy.py from raw SQL queries to SQLAlchemy ORM, improving maintainability and adding multi-dialect database support.

Benefits

  • Multi-dialect support: Now works with PostgreSQL, SQLite, and MySQL/MariaDB out of the box
  • Type safety: ORM models provide better IDE autocompletion and type checking
  • Reduced code duplication: Removed ~200 lines of repetitive SQL string construction
  • Automatic schema creation: New create_tables=True parameter auto-creates tables on first use
  • Easier testing: Test fixtures reduced from 100+ lines of manual SQL to 4 lines

What's New

File Description
chainlit/data/models.py New ORM models with CrossDialectJSON type for PostgreSQL/SQLite/MySQL compatibility
chainlit/data/sql_alchemy.py Refactored to use ORM operations instead of raw SQL

Migration

No breaking changes - the public API remains identical.

Optional: Users can now enable automatic table creation:

from chainlit.data.sql_alchemy import SQLAlchemyDataLayer

data_layer = SQLAlchemyDataLayer(
    conninfo="sqlite+aiosqlite:///data.db",
    create_tables=True,  # NEW: Auto-creates tables if they don't exist
)

Summary by cubic

Refactors SQLAlchemyDataLayer to use SQLAlchemy ORM, adding multi-dialect support and optional automatic table creation. Improves maintainability and type safety, keeps the public API unchanged, and simplifies tests.

  • New Features

    • Works with PostgreSQL, SQLite, and MySQL/MariaDB.
    • Optional create_tables=True to auto-create tables.
    • CrossDialectJSON for JSONB on Postgres and JSON elsewhere.
  • Refactors

    • Replaced raw SQL with ORM models and session-based operations; unified per-dialect upsert.
    • Removed SQL helpers and duplicated query code.
    • Simplified tests by enabling automatic schema creation.

Written for commit 469274c. Summary will update on new commits.

@dosubot dosubot bot added size:XXL This PR changes 1000+ lines, ignoring generated files. data layer Pertains to data layers. labels Jan 17, 2026
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 issues found across 3 files

Prompt for AI agents (all issues)

Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="backend/chainlit/data/sql_alchemy.py">

<violation number="1" location="backend/chainlit/data/sql_alchemy.py:615">
P2: `autoPlay` is read from a non-existent attribute on `Element`, so auto-play settings are always saved as NULL. Use the `element_dict` (or `element.auto_play`) instead.</violation>

<violation number="2" location="backend/chainlit/data/sql_alchemy.py:616">
P2: `playerConfig` is read from a non-existent attribute on `Element`, so video player configuration never persists. Use the `element_dict` (or `element.player_config`) instead.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

hayescode and others added 2 commits January 16, 2026 19:22
Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>
Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>
@Sanady
Copy link

Sanady commented Jan 20, 2026

Hey @hayescode, is this pull request going to fix the issue that I reported? #2758

@dominpm
Copy link
Contributor

dominpm commented Feb 3, 2026

If we are currently using the official ChainlitDataLayer with a postgresql and a blob container in Azure for storage_client i.e.:

@data_layer
def get_data_layer():
    return ChainlitDataLayer(
        database_url=DATABASE_URL,
        storage_client=AzureBlobStorageClient(
            container_name=DATALAYER_AZURE_BLOB_CONTAINER_NAME,
            storage_account=DATALAYER_AZURE_BLOB_STORAGE_ACCOUNT_NAME,
            storage_key=DATALAYER_AZURE_BLOB_STORAGE_ACCOUNT_KEY
        )
    )

should we migrate to this SQLAlchemy data layer?

And if so, is there a direct way?

PS: Is this datalayer still maintained https://github.com/Chainlit/chainlit-datalayer?

@hayescode
Copy link
Contributor Author

That data layer is not maintained. The goal here is to consolidate the mess of data layers.

@ilkersigirci
Copy link

ilkersigirci commented Feb 16, 2026

Could you kindly say the status of this PR? It would be very good that we have only one source of truth for database related operations

@nzjrs
Copy link

nzjrs commented Feb 18, 2026

@hayescode just sharing some observations - we are currently using the existing dalalchemy data layer and are patching around a few bugs when using it with sqlite. Hopefully these patches can be dropped if going to an ORM approach

Basically running SQLAlchemyDataLayer with SQLite (sqlite+aiosqlite:///) when auto_tag_thread = true with chat profiles active, the emitter passes tags as a Python list (e.g. ['default']) into update_thread(). SQLite complains sqlite3.ProgrammingError: type 'list' is not supported on every thread update. Fix was

class SQLiteDataLayer(SQLAlchemyDataLayer):
    """SQLAlchemyDataLayer with SQLite-compatible tags serialization.

    Chainlit passes tags as a Python list, which PostgreSQL binds natively
    but SQLite rejects. Serialize to JSON on write, deserialize on read.
    """

    async def update_thread(self, thread_id, name=None, user_id=None,
                            metadata=None, tags=None):
        if isinstance(tags, list):
            tags = json.dumps(tags)
        return await super().update_thread(
            thread_id, name=name, user_id=user_id,
            metadata=metadata, tags=tags,
        )

    async def get_all_user_threads(self, user_id=None, thread_id=None):
        threads = await super().get_all_user_threads(user_id, thread_id)
        if threads:
            for t in threads:
                if isinstance(t.get("tags"), str):
                    try:
                        t["tags"] = json.loads(t["tags"])
                    except (json.JSONDecodeError, TypeError):
                        t["tags"] = []
        return threads

We also have another patch get_all_user_threads returns element props as a raw JSON string instead of a parsed dict. The write path (sql_alchemy.py:549) parses it fine, but the read path (~:848) doesn't deserialize. So we manually parse props in our chat resume handler before the thread dict hits the frontend.

Hopefully these both go away once the orm layer knows the types that both underlying dbs handle. Just a note that you test these.

Copy link
Collaborator

@dokterbob dokterbob left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This definitely looks like welcome and much needed cleanup.

I'd say LGTM but I am reluctant to approve without other users taking it for a spin - particularly in non-SQLite environments (PostgreSQL, MySQL/MariaDB), on their own apps.

Could anyone provide feedback?



class Base(DeclarativeBase):
"""Shared base for all ORM models. Required so Base.metadata.create_all() discovers all tables."""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inheriting directly from DeclarativeBase doesn't work?

@dokterbob
Copy link
Collaborator

Does this mean we can close #1365?

@dokterbob dokterbob added the review-me Ready for review! label Feb 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

data layer Pertains to data layers. review-me Ready for review! size:XXL This PR changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants