Skip to content

Commit 8a6d773

Browse files
Update README.md (#289)
1 parent 863274a commit 8a6d773

2 files changed

Lines changed: 47 additions & 130 deletions

File tree

README.md

Lines changed: 46 additions & 129 deletions
Original file line numberDiff line numberDiff line change
@@ -16,51 +16,64 @@ Magentic-UI is a **research prototype** of a human-centered interface powered by
1616

1717
https://github.com/user-attachments/assets/7975fc26-1a18-4acb-8bf9-321171eeade7
1818

19-
19+
## 🚀 Quick Start
2020

2121
Here's how you can get started with Magentic-UI:
2222

23-
> **Note**: Before installing, please read the [pre-requisites](#-pre-requisites) carefully. Magentic-UI requires Docker to run, and if you are on Windows, you will need WSL2. We recommend using [uv](https://docs.astral.sh/uv/getting-started/installation/) for a quicker installation. If you are using Mac or Linux, you can skip the WSL2 step.
24-
2523
```bash
24+
# 1. Setup environment
2625
python3 -m venv .venv
2726
source .venv/bin/activate
2827
pip install magentic-ui --upgrade
29-
# export OPENAI_API_KEY=<YOUR API KEY>
28+
29+
# 2. Set your API key
30+
export OPENAI_API_KEY="your-api-key-here"
31+
32+
# 3. Launch Magentic-UI
3033
magentic-ui --port 8081
3134
```
32-
If your port is 8081, you can then access Magentic-UI at <http://localhost:8081>.
3335

36+
Then open <http://localhost:8081> in your browser to interact with Magentic-UI!
37+
38+
> **Prerequisites**: Requires Docker and Python 3.10+. Windows users should use WSL2. See [detailed installation](#️-installation) for more info.
39+
40+
## ✨ What's New
3441

35-
If you are not able to setup Docker, you can run a limited version of Magentic-UI which does not have the ability to execute code, navigate files or display the browser in the interface with the command:
42+
- **File Upload Support**: Upload any file through the UI for analysis or modification
43+
- **MCP Agents**: Extend capabilities with your favorite MCP servers
44+
- **Easier Installation**: We have uploaded our docker containers to GHCR so you no longer need to build any containers! Installation time now is much quicker.
3645

46+
## Alternative Usage Options
47+
48+
**Without Docker** (limited functionality: no code execution):
3749
```bash
3850
magentic-ui --run-without-docker --port 8081
3951
```
4052

41-
You can also run Magentic-UI in a command-line-interface:
53+
**Command Line Interface**:
4254
```bash
4355
magentic-cli --work-dir PATH/TO/STORE/DATA
4456
```
4557

46-
To use Azure models or Ollama please install with the optional dependencies:
58+
**Custom LLM Clients**:
4759
```bash
48-
# for Azure
49-
pip install magentic-ui[azure]
50-
# for Ollama
60+
# Azure
61+
pip install magentic-ui[azure]
62+
63+
# Ollama (local models)
5164
pip install magentic-ui[ollama]
5265
```
5366

54-
For further details on installation please read the <a href="#%EF%B8%8F-installation">🛠️ Installation</a> section. For common installation issues and their solutions, please refer to the [troubleshooting document](TROUBLESHOOTING.md).
67+
For further details on installation please read the <a href="#-installation">🛠️ Installation</a> section. For common installation issues and their solutions, please refer to the [troubleshooting document](TROUBLESHOOTING.md). See advanced usage instructions with the command `magentic-ui --help`.
5568

5669

5770
## Quick Navigation:
5871
<p align="center">
5972
<a href="#-how-it-works">🟪 How it Works</a> &nbsp;|&nbsp;
60-
<a href="#%EF%B8%8F-installation">🛠️ Installation</a> &nbsp;|&nbsp;
61-
<a href="#%EF%B8%8F-troubleshooting">⚠️ Troubleshooting</a> &nbsp;|&nbsp;
62-
<a href="#-contributing">🤝 Contributing</a> &nbsp;|&nbsp;
63-
<a href="#-license">📄 License</a>
73+
<a href="#-installation">🛠️ Installation</a> &nbsp;|&nbsp;
74+
<a href="#troubleshooting">⚠️ Troubleshooting</a> &nbsp;|&nbsp;
75+
<a href="#contributing">🤝 Contributing</a> &nbsp;|&nbsp;
76+
<a href="#license">📄 License</a>
6477
</p>
6578

6679
---
@@ -74,7 +87,7 @@ Magentic-UI is especially useful for web tasks that require actions on the web (
7487

7588
The interface of Magentic-UI is displayed in the screenshot above and consists of two panels. The left side panel is the sessions navigator where users can create new sessions to solve new tasks, switch between sessions and check on session progress with the session status indicators (🔴 needs input, ✅ task done, ↺ task in progress).
7689

77-
The right-side panel displays the session selected. This is where you can type your query to Magentic-UI alongside text and image attachments and observe detailed task progress as well as interact with the agents. The session display itself is split in two panels: the left side is where Magentic-UI presents the plan, task progress and asks for action approvals, the right side is a browser view where you can see web agent actions in real time and interact with the browser. Finally, at the top of the session display is a progress bar that updates as Magentic-UI makes progress.
90+
The right-side panel displays the session selected. This is where you can type your query to Magentic-UI alongside any file attachments and observe detailed task progress as well as interact with the agents. The session display itself is split in two panels: the left side is where Magentic-UI presents the plan, task progress and asks for action approvals, the right side is a browser view where you can see web agent actions in real time and interact with the browser. Finally, at the top of the session display is a progress bar that updates as Magentic-UI makes progress.
7891

7992

8093
The example below shows a step by step user interaction with Magentic-UI:
@@ -104,33 +117,6 @@ What differentiates Magentic-UI from other browser use offerings is its transpar
104117
▶️ <em> Click to watch a video and learn more about Magentic-UI </em>
105118
</div>
106119

107-
### ℹ️ Agentic Workflow
108-
109-
Magentic-UI's underlying system is a team of specialized agents adapted from AutoGen's Magentic-One system illustrated in the figure below.
110-
111-
<p align="center">
112-
<img src="./docs/img/magenticui.jpg" alt="Magentic-UI" height="400">
113-
</p>
114-
115-
The agents work together to create a modular system:
116-
117-
- 🧑‍💼 **Orchestrator** is the lead agent, powered by a large language model (LLM), that performs co-planning with the user, decides when to ask the user for feedback, and delegates sub-tasks to the remaining agents to complete.
118-
- 🌐 **WebSurfer** is an LLM agent equipped with a web browser that it can control. Given a request by the Orchestrator, it can click, type, scroll, and visit pages in multiple rounds to complete the request from the Orchestrator. This agent is a significant improvement over the AutoGen ``MultimodalWebSurfer`` in terms of the actions it can do (tab management, select options, file upload, multimodal queries).
119-
To learn more how this agent is built, follow along this [Tutorial: Building a Browser Use Agent From Scratch and with Magentic-UI
120-
](docs/tutorials/web_agent_tutorial_full.ipynb).
121-
- 💻 **Coder** is an LLM agent equipped with a Docker code-execution container. It can write and execute Python and shell commands and provide a response back to the Orchestrator.
122-
- 📁 **FileSurfer** is an LLM agent equipped with a Docker code-execution container and file-conversion tools from the MarkItDown package. It can locate files in the directory controlled by Magentic-UI, convert files to markdown, and answer questions about them.
123-
- 🧑 **UserProxy** is an agent that represents the user interacting with Magentic-UI. The Orchestrator can delegate work to the user instead of the other agents.
124-
125-
To interact with Magentic-UI, **users can enter a text message and attach images**. In response, Magentic-UI creates a natural-language step-by-step plan with which users can interact through a plan-editing interface. **Users can add, delete, edit, regenerate steps, and write follow-up messages to iterate on the plan.** While the user editing the plan adds an upfront cost to the interaction, it can potentially save a significant amount of time in the agent executing the plan and increase its chance at success.
126-
127-
The plan is stored inside the Orchestrator and is used to execute the task. **For each step of the plan, the Orchestrator determines which of the agents (WebSurfer, Coder, FileSurfer) or the user should complete the step.** Once that decision is made, the Orchestrator sends a request to one of the agents or the user and waits for a response. After the response is received, the Orchestrator decides whether that step is complete. If the step is complete, the Orchestrator moves on to the following step.
128-
129-
**Once all steps are completed, the Orchestrator generates a final answer that is presented to the user.** If, while executing any of the steps, the Orchestrator decides that the plan is inadequate (for example, because a certain website is unreachable), the Orchestrator can replan with user permission and execute a new plan.
130-
131-
All intermediate progress steps are clearly displayed to the user. Furthermore, the user can pause the execution of the plan and send additional requests or feedback. The user can also configure through the interface whether agent actions (e.g., clicking a button) require approval.
132-
133-
134120
### Autonomous Evaluation
135121

136122
To evaluate its autonomous capabilities, Magentic-UI has been tested against several benchmarks when running with o4-mini: [GAIA](https://huggingface.co/datasets/gaia-benchmark/GAIA) test set (42.52%), which assesses general AI assistants across reasoning, tool use, and web interaction tasks ; [AssistantBench](https://huggingface.co/AssistantBench) test set (27.60%), focusing on realistic, time-consuming web tasks; [WebVoyager](https://github.com/MinorJerry/WebVoyager) (82.2%), measuring end-to-end web navigation in real-world scenarios; and [WebGames](https://webgames.convergence.ai/) (45.5%), evaluating general-purpose web-browsing agents through interactive challenges.
@@ -141,8 +127,7 @@ To reproduce these experimental results, please see the following [instructions]
141127
If you're interested in reading more checkout our [blog post](https://www.microsoft.com/en-us/research/blog/magentic-ui-an-experimental-human-centered-web-agent/).
142128

143129
## 🛠️ Installation
144-
145-
### 📝 Pre-Requisites
130+
### Pre-Requisites
146131

147132
**Note**: If you're using Windows, we highly recommend using [WSL2](https://docs.microsoft.com/en-us/windows/wsl/install) (Windows Subsystem for Linux).
148133

@@ -154,7 +139,7 @@ If using Docker Desktop, make sure it is set up to use WSL2:
154139

155140

156141

157-
2. During the Installation step, you will need to set up your `OPENAI_API_KEY`. To use other models, review the [Custom Client Configuration](#Configuration) section below.
142+
2. During the Installation step, you will need to set up your `OPENAI_API_KEY`. To use other models, review the [Custom Client Configuration](#configuration) section below.
158143

159144
3. You need at least [Python 3.10](https://www.python.org/downloads/) installed.
160145

@@ -190,89 +175,23 @@ To run Magentic-UI, make sure that Docker is running, then run the following com
190175
magentic-ui --port 8081
191176
```
192177

193-
The first time that you run this command, it will take a while to build the Docker images -- go grab a coffee or something. The next time you run it, it will be much faster as it doesn't have to build the Docker again.
194-
195-
If you have trouble building the dockers, please try to rebuild them with the command:
178+
>**Note**: Running this command for the first time will pull two docker images required for the Magentic-UI agents. If you encounter problems, you can build them directly with the following command:
196179
```bash
197-
magentic-ui --rebuild-docker --port 8081
180+
cd docker
181+
sh build-all.sh
198182
```
199-
If you face further issues, please refer to the [TROUBLESHOOTING.md](TROUBLESHOOTING.md) document.
200-
201-
Once the server is running, you can access the UI at <http://localhost:8081>.
202183

184+
If you face issues with Docker, please refer to the [TROUBLESHOOTING.md](TROUBLESHOOTING.md) document.
203185

204-
You can also run a command line interface (CLI) for Magentic-UI with the command:
186+
Once the server is running, you can access the UI at <http://localhost:8081>.
205187

206-
```bash
207-
magentic-cli --work-dir PATH_TO_STORE_LOGS
208-
```
209188

210189
### Configuration
211190

212191
#### Model Client Configuration
213192

214-
If you want to use a different OpenAI key, or if you want to configure use with Azure OpenAI or Ollama, you can do so inside the UI by navigating to settings (top right icon) and changing model configuration with the format of the `config.yaml` file below. You can also create a `config.yaml` and import it inside the UI or point Magentic-UI to its path at startup time:
215-
```bash
216-
magentic-ui --config path/to/config.yaml
217-
```
218-
219-
An example `config.yaml` for OpenAI is given below:
220-
221-
```yaml
222-
# config.yaml
223-
224-
######################################
225-
# Default OpenAI model configuration #
226-
######################################
227-
model_config: &client
228-
provider: autogen_ext.models.openai.OpenAIChatCompletionClient
229-
config:
230-
model: gpt-4o
231-
api_key: <YOUR API KEY>
232-
max_retries: 10
233-
234-
##########################
235-
# Clients for each agent #
236-
##########################
237-
orchestrator_client: *client
238-
coder_client: *client
239-
web_surfer_client: *client
240-
file_surfer_client: *client
241-
action_guard_client: *client
242-
```
243-
244-
The corresponding configuration for Azure OpenAI is:
193+
If you want to use a different OpenAI key, or if you want to configure use with Azure OpenAI or Ollama, you can do so inside the UI by navigating to settings (top right icon) and changing model configuration.
245194

246-
```yaml
247-
# config.yaml
248-
249-
######################################
250-
# Azure model configuration #
251-
######################################
252-
model_config: &client
253-
provider: AzureOpenAIChatCompletionClient
254-
config:
255-
model: gpt-4o
256-
azure_endpoint: "<YOUR ENDPOINT>"
257-
azure_deployment: "<YOUR DEPLOYMENT>"
258-
api_version: "2024-10-21"
259-
azure_ad_token_provider:
260-
provider: autogen_ext.auth.azure.AzureTokenProvider
261-
config:
262-
provider_kind: DefaultAzureCredential
263-
scopes:
264-
- https://cognitiveservices.azure.com/.default
265-
max_retries: 10
266-
267-
##########################
268-
# Clients for each agent #
269-
##########################
270-
orchestrator_client: *client
271-
coder_client: *client
272-
web_surfer_client: *client
273-
file_surfer_client: *client
274-
action_guard_client: *client
275-
```
276195

277196
#### MCP Server Configuration
278197

@@ -322,7 +241,7 @@ git clone https://github.com/microsoft/magentic-ui.git
322241
cd magentic-ui
323242
```
324243

325-
#### 3. Install Magentic-UI's dependencies with uv:
244+
#### 3. Install Magentic-UI's dependencies with uv or your favorite package manager:
326245

327246
```bash
328247
# install uv through https://docs.astral.sh/uv/getting-started/installation/
@@ -357,11 +276,6 @@ yarn build
357276
magentic-ui --port 8081
358277
```
359278

360-
>**Note**: Running this command for the first time will pull two docker images required for the Magentic-UI agents. If you encounter problems, you can build them directly with the following command:
361-
```bash
362-
cd docker
363-
sh build-all.sh
364-
```
365279

366280
#### Running the UI from source
367281

@@ -394,19 +308,22 @@ magentic-ui --port 8081
394308
The frontend from source will be available at <http://localhost:8000>, and the compiled frontend will be available at <http://localhost:8081>.
395309

396310

397-
## ⚠️ Troubleshooting
398311

399-
If you were unable to get Magentic-UI running, do not worry! The first step is to make sure you have followed the steps outlined above, particularly with the [pre-requisites](#-pre-requisites).
312+
313+
## Troubleshooting
314+
315+
316+
If you were unable to get Magentic-UI running, do not worry! The first step is to make sure you have followed the steps outlined above, particularly with the [pre-requisites](#pre-requisites).
400317

401318
For common issues and their solutions, please refer to the [TROUBLESHOOTING.md](TROUBLESHOOTING.md) file in this repository. If you do not see your problem there, please open a `GitHub Issue`.
402319

403-
## 🤝 Contributing
320+
## Contributing
404321

405322
This project welcomes contributions and suggestions. For information about contributing to Magentic-UI, please see our [CONTRIBUTING.md](CONTRIBUTING.md) guide, which includes current issues to be resolved and other forms of contributing.
406323

407324
This project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/). For more information, see the [Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/) or contact [opencode@microsoft.com](mailto:opencode@microsoft.com) with any additional questions or comments.
408325

409-
## 📄 License
326+
## License
410327

411328
Microsoft, and any contributors, grant you a license to any code in the repository under the [MIT License](https://opensource.org/licenses/MIT). See the [LICENSE](LICENSE) file.
412329

src/magentic_ui/version.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,3 @@
1-
VERSION = "0.0.6"
1+
VERSION = "0.1.0"
22
__version__ = VERSION
33
APP_NAME = "Magentic-UI"

0 commit comments

Comments
 (0)