Skip to content

edoigtrd/ubiquite

Repository files navigation

Ubiquité

Ubiquité is an open-source Perplexity clone.

This project is inspired by other Perplexity clones such as Perplexica and Morphic. I made this project because both Perplexica and Morphic were not exactly what I needed.

Compared to those projects, Ubiquité aims to improve on the following points:

  • Multiple LLM providers: You can choose between multiple LLM providers.
  • Easy model switching: You can easily switch between models for different use cases while maintaining the ability to connect to multiple AI providers.
  • Unified and complete configuration: All configuration is done in a single file, with all the options available for each provider and model, this file is editable from the UI.
  • KaTeX support: Math rendering with KaTeX.
  • Map Tool: A tool that allows the LLM to generate maps based on user queries.

Map Tool MistralAI • magistral-medium-latest

Features

Installation (with Docker)

  1. Clone the repository:

    git clone https://github.com/edoigtrd/ubiquite.git
    cd ubiquite
  2. Move the example configuration file:

    mv config.sample.toml config.toml
  3. Run the application using Docker Compose:

    docker-compose -p ubiquite up -d
  4. Access the application at http://localhost:6003. You can edit the configuration file from the UI (go to the settings).

Note: Ubiquité needs your location to work properly which don't work on "unsafes" origins, if you are on your own domain you can set it as trusted in chrome using chrome://flags/#unsafely-treat-insecure-origin-as-secure

How to configure

Move config.sample.toml to config.toml and fill in the required fields.

In this file you can configure:

Providers

In the provider section, you can configure the LLM provider you want to use.

[provider]
[provider.openai]
type = "openai"
api_key = "your_api_key"
openai_api_base = "https://api.openai.com/v1" # Optional, only if you want to use a different endpoint default is OpenAI's endpoint

Supported providers are:

  • anthropic
  • groq
  • mistral
  • openai (+ any openai compatible endpoint such as xai, nous ...)
  • ollama

Note: the codebase make it easy to add new providers, look into infrastructure/providers.py

If your provider is available to an openai compatible endpoint (xai, nous ...), you can use the openai type and set the openai_api_base to your provider's endpoint.

Technically speaking every parameter in one model will be unpacked to the appropriate langchain provider class.

Models presets

In the models section you can configure model presets.

[models]
[models.fast]
provider = "groq" # Point towards the provider configured in the provider section, in this case provider.groq
model_name = "moonshotai/kimi-k2-instruct-0905"

Standard presets are:

  • fast: A fast and cheap model for quick responses.
  • smart: A smarter model for more complex queries.
  • balanced: A balanced model for general use.
  • related: The model used to generate related questions.
  • title: The model used to generate titles for saved conversations.
  • image_search: The model used to generate image search queries.

Prompts

Each prompt is configurable

[prompts]
[prompts.search]
template = """
You are a helpful AI assistant with access to real-time web search, content retrieval, video search capabilities, and the ability to ask clarifying questions.

[REDACTED FOR BREVITY]

Additional information:
{additional_context}
"""

search and related prompt is copied from Morphic; image_search is from Perplexica

Available prompts are:

  • search: The main prompt used to answer user queries, note that {additional_context} placeholder will be replaced with a yaml containing user informations such as timezone, location and other preferences.
  • related: The prompt used to generate related questions.
  • image_search: The prompt used to generate image search queries.
  • title: The prompt used to generate titles for saved conversations.

SearX configuration

In the SearX section you can configure the SearX instance you want to use.

[searx]
url = "https://your-searx-instance.com"

You can find a list of public SearX instances here.

Database configuration

In the database section you can configure the database you want to use.

[database]
url = "sqlite:///data/ubiquite.db"

I only tested SQLite but any url supported by SQLModel should work.

Focus configuration

Focus mode lets you concentrate on a specific topic by adding custom conditions to SearX search queries. These conditions can use any valid SearX syntax, such as site:, file:, intitle:, or others.

Example:

[focuses]
[focuses.reddit]
cond = ["site:reddit.com"]
name = "Reddit"
icon = "logos:reddit-icon"
description = "Reddit focus"
llm_description = """
**Reddit focus:**
If the user has activated the Reddit Focus, it is likely because they want to know users' opinions and get responses based on real experiences from online discussions.
"""
  • cond — List of conditions to append to the search query.

    • Multiple conditions are joined with the OR operator by default.

    • If you need a different logic, you can group them manually, e.g.:

      cond = ["site:example.com AND file:pdf"]
    • You can use any SearX-compatible flags like filetype:, inurl:, intitle:, etc.

    • You can also leave this list empty if your focus only provides additional LLM context (e.g. a behavioral mode or reasoning focus).

  • name — Display name of the focus (UI label).

  • icon — Iconify icon name. icônes collection

  • description — Shown in the UI.

  • llm_description — Description injected into the system prompt when this focus is active.

Map Tool (Nominatim / Geocoder Configuration)

The map tool (my personal favorite feature) allows the LLM to answer geolocation-related questions by generating a GeoJSON object that is then rendered as an interactive map.
It relies on any geopy geocoder for address lookup, and you can configure the geocoder URL in the configuration file.

By default, the map tool uses the public OpenStreetMap Nominatim instance.
It is not recommended for heavy usage, as it enforces strict rate limits and usage policies that may lead to increased errors.
You can either host your own instance or use a different geocoding provider by changing the cls parameter in the [nominatim] section of the configuration file.

For example, here is my personal GoogleV3 configuration:

[nominatim]
cls = "geopy.geocoders.GoogleV3"
api_key = ""
ratelimiter = { min_delay_seconds = 1.0 }

cls refers to the full import path of the geopy geocoder class you want to use. You can find the list of supported classes here.

I haven’t tested all geocoders, so please refer to the geopy documentation for details on how to configure each one. If your geocoder doesn’t work properly with Ubiquité, it’s probably because its response paths are not compatible with reverse_geocode_city in geo.py. In that case, please open an issue or a pull request.

Image search

Image search is supported using the same SearX instance configured for web search. Under the hood the prompts.image_search prompt is used to generate the search query for images based on the user query. Then we fetch the searx instance for image results. Then a reranker is used to reorder the results based on their relevance to the user query. The default (and recommended) reranker is the ClipReranker which uses a CLIP model to rank the images.

Here is the recommended configuration for reranker:

[images_search.reranker]
class = "ClipReranker"
[images_search.reranker.config]
model_name = "hf-hub:apple/MobileCLIP-S1-OpenCLIP"
device = "auto"

You can change the model_name to use a different CLIP model. The list of available models can be found here.

If you want to write your own reranker, you can start by looking at the ClipReranker implementation in clip.py.

Your reranker should return a sorted list of ImageResult objects.

ImageResult definition is in master.py

You'll need to register your reranker in the ImageRerankerRegistry to make it available for use. master.ImageRerankerRegistry.get_default_registry().register(MyReranker) And that's it! Note that the content of images_search.reranker.config will be passed to your reranker constructor as unpacked kwargs.

Architecture

Ubiquité is composed of two main parts:

  • The backend: A FastAPI server that handles the LLM requests, search requests and database.
  • The frontend: A React app that handles the user interface.

The database is a SQLite database managed by SQLModel. All the LLM requests are handled by LangChain.

Contributing

Contributions are welcome!
If you'd like to add features or improve the project, please fork the repo and submit a pull request.

Roadmap

  • Ability to fork a conversation.
  • Deep research mode.
  • Find something to do with placeholders quick actions.
  • Logging (debugging)
  • Token usage tracking, ability to generate invoices.
  • Animation when waiting for first token.
  • Frontend responsive design.
  • Discovery page
  • Better settings page (i think we can't do better actually :D)

About

Ubiquité : Open-source Perplexity clone with multi-LLM support and KaTeX math rendering.

Topics

Resources

License

Stars

Watchers

Forks

Packages