From 33abc167fb81f1d7529509fe0378d99e24af2728 Mon Sep 17 00:00:00 2001 From: zilto Date: Thu, 23 May 2024 15:19:19 -0400 Subject: [PATCH 1/7] jupyter V2.1 with backwards compatible changes --- examples/jupyter_notebook_magic/README.md | 30 +- .../jupyter_notebook_magic/tutorial.ipynb | 2093 +++++++++++++++++ hamilton/plugins/jupyter_magic.py | 591 ++--- 3 files changed, 2415 insertions(+), 299 deletions(-) create mode 100644 examples/jupyter_notebook_magic/tutorial.ipynb diff --git a/examples/jupyter_notebook_magic/README.md b/examples/jupyter_notebook_magic/README.md index ed1a939a5..9c0a4a73d 100644 --- a/examples/jupyter_notebook_magic/README.md +++ b/examples/jupyter_notebook_magic/README.md @@ -1,25 +1,27 @@ -# This example shows a notebook using the Hamilton Jupyter magic +# Hamilton notebook extension + +One of the best part about notebooks is the ability to execute and immediately inspect results. They provide a "read-eval-print" loop (REPL) coding experience. However, the way Hamilton separates dataflow definition (functions in a module) from execution (building and executing a driver) creates an extra step that can slowdown this loop. + +We built the Hamilton notebook extension to tighten that loop and even give a better experience than the core notebook experience! To load the magic: ```python -# load some extensions / magic... %load_ext hamilton.plugins.jupyter_magic ``` -Then to use it: +For example, this would allow you to define the module `joke` from your notebook ```python -%%cell_to_module -m MODULE_NAME # more args -``` -Other arguments (--help to print this.): - -m, --module_name: Module name to provide. Default is jupyter_module. - -c, --config: JSON config string, or variable name containing config to use. - -r, --rebuild-drivers: Flag to rebuild drivers. - -d, --display: Flag to visualize dataflow. - -v, --verbosity: of standard output. 0 to hide. 1 is normal, default. +%%cell_to_module joke --display +def topic() -> str: + return "Cowsay" -Example use: +def joke_prompt(topic: str) -> str: + return f"Knock, knock. Who's there? {topic}" -```python -%%cell_to_module -m MODULE_NAME --display --rebuild-drivers +def reply(joke_prompt: str) -> str: + _, _, right = joke_prompt.partition("? ") + return f"{right} who?" ``` + +Go explore `tutorial.ipynb` to learn about all interactive features! diff --git a/examples/jupyter_notebook_magic/tutorial.ipynb b/examples/jupyter_notebook_magic/tutorial.ipynb new file mode 100644 index 000000000..602a795e0 --- /dev/null +++ b/examples/jupyter_notebook_magic/tutorial.ipynb @@ -0,0 +1,2093 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "cff304da", + "metadata": {}, + "source": [ + "# Hamilton notebook extension\n", + "Jupyter magics are commands that can be executed in notebooks using `%` and `%%` in code cells.\n", + "- **Line magics** start with `%` and apply to the current line\n", + "- **Cell magics** start with `%%`, need to be the first line of a cell, and apply to the entire cell.\n", + "\n", + " You can think of them as Python decorators for lines and cells.\n", + "\n", + "> For example, `%timeit complex_function()` will return the time to execute `complex_function()` and adding `%%timeit` will return the time to execute the entire cell.\n", + "\n", + "This notebook is a tutorial on the Hamilton Jupyter magics and how they can improve your interactive development experience. It is meant to be read and have all cells executed linearly.\n", + "\n", + "- **Section 2** - Dataflow definition\n", + "- **Section 3** - Dataflow execution\n", + "\n", + "> ⚠ This notebook extension is something we're actively developing. If you find any bugs, edge cases, performance impacts, or if you have feature requests, let us know." + ] + }, + { + "cell_type": "markdown", + "id": "245fa568", + "metadata": {}, + "source": [ + "## 1. Loading the extension" + ] + }, + { + "cell_type": "markdown", + "id": "bddc7450", + "metadata": {}, + "source": [ + "To load our Jupyter Magic, we use `%load_ext` with the import path for the Python module (as if you did `import ...`). You only need to load it once, and will need to reload it if you restart the kernel just like you would for a Python module." + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "302defaa", + "metadata": {}, + "outputs": [], + "source": [ + "%reload_ext hamilton.plugins.jupyter_magic\n", + "from hamilton import driver # we'll need this later" + ] + }, + { + "cell_type": "markdown", + "id": "bb28d555", + "metadata": {}, + "source": [ + "After loading the extension, Hamilton magics become available:\n", + "- `%%cell_to_module`\n", + "- `%module_to_cell`\n", + "\n", + "This notebook will cover them one-by-one, but if you need a quick refresher you can prepend `?` to get help." + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "f4a9d444", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\u001b[0;31mDocstring:\u001b[0m\n", + "::\n", + "\n", + " %cell_to_module [-m [MODULE_NAME]] [-d [DISPLAY]] [-x [EXECUTE]]\n", + " [-b BUILDER] [-c CONFIG] [-i INPUTS] [-o OVERRIDES]\n", + " [--hide_results] [-w [WRITE_TO_FILE]]\n", + " [module_name]\n", + "\n", + "Turn a notebook cell into a Hamilton module definition. This allows you to define\n", + "and execute a dataflow from a single cell.\n", + "\n", + "For example:\n", + "```\n", + "%%cell_to_module dataflow --display --execute\n", + "def A() -> int:\n", + " return 37\n", + "\n", + "def B(A: int) -> bool:\n", + " return (A % 3) > 2\n", + "```\n", + "\n", + "positional arguments:\n", + " module_name Name for the module defined in this cell.\n", + "\n", + "options:\n", + " -m <[MODULE_NAME]>, --module_name <[MODULE_NAME]>\n", + " Alias for positional argument `module_name`. There for\n", + " backwards compatibility. Prefer the position arg.\n", + " -d <[DISPLAY]>, --display <[DISPLAY]>\n", + " Display the dataflow. The argument is the variable\n", + " name of a dictionary of visualization kwargs; else {}.\n", + " -x <[EXECUTE]>, --execute <[EXECUTE]>\n", + " Execute the dataflow. The argument is the variable\n", + " name of a list of nodes; else execute all nodes.\n", + " -b BUILDER, --builder BUILDER\n", + " Builder to which the module will be added and used for\n", + " execution. Allows to pass Config and Adapters\n", + " -c CONFIG, --config CONFIG\n", + " Config to build a Driver. Passing -c/--config at the\n", + " same time as a Builder -b/--builder with a config will\n", + " raise an exception.\n", + " -i INPUTS, --inputs INPUTS\n", + " Execution inputs. The argument is the variable name of\n", + " a dict of inputs; else {}.\n", + " -o OVERRIDES, --overrides OVERRIDES\n", + " Execution overrides. The argument is the variable name\n", + " of a dict of overrides; else {}.\n", + " --hide_results Hides the automatic display of execution results.\n", + " -w <[WRITE_TO_FILE]>, --write_to_file <[WRITE_TO_FILE]>\n", + " Write cell content to a file. The argument is the file\n", + " path; else write to {module_name}.py\n", + "\u001b[0;31mFile:\u001b[0m ~/projects/dagworks/hamilton/hamilton/plugins/jupyter_magic.py" + ] + } + ], + "source": [ + "?%%cell_to_module" + ] + }, + { + "cell_type": "markdown", + "id": "8db7e808", + "metadata": {}, + "source": [ + "## 2. Define a Hamilton dataflow" + ] + }, + { + "cell_type": "markdown", + "id": "6d310207", + "metadata": {}, + "source": [ + "### 2.1 Basics\n", + "The main magic is `%%cell_to_module MODULE_NAME` which turns a cell into a temporary Python module in-memory. Successful cell execution means it's a valid Hamilton dataflow." + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "a2f33575", + "metadata": {}, + "outputs": [ + { + "data": { + "image/svg+xml": [ + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "%3\n", + "\n", + "\n", + "cluster__legend\n", + "\n", + "Legend\n", + "\n", + "\n", + "\n", + "joke_prompt\n", + "\n", + "joke_prompt\n", + "str\n", + "\n", + "\n", + "\n", + "_joke_prompt_inputs\n", + "\n", + "topic\n", + "str\n", + "\n", + "\n", + "\n", + "_joke_prompt_inputs->joke_prompt\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "input\n", + "\n", + "input\n", + "\n", + "\n", + "\n", + "function\n", + "\n", + "function\n", + "\n", + "\n", + "\n" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "%%cell_to_module joke -d\n", + "def joke_prompt(topic: str) -> str:\n", + " return f\"Tell me a short joke about {topic}\"" + ] + }, + { + "cell_type": "markdown", + "id": "54e60f6b", + "metadata": {}, + "source": [ + "The module name allows to namespace functions " + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "b9676cc6", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Tell me a short joke about hello\n", + "Tell me a short joke about greetings\n" + ] + } + ], + "source": [ + "print(joke.joke_prompt(topic=\"hello\"))\n", + "print(joke_prompt(topic=\"greetings\"))" + ] + }, + { + "cell_type": "markdown", + "id": "15977369", + "metadata": {}, + "source": [ + "### 2.2 Module imports\n", + "Code found in cells with `%%cell_to_module` is treated like an isolated `.py` file. This means you need to define Python imports in the cell itself. " + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "id": "aaf63f03", + "metadata": {}, + "outputs": [], + "source": [ + "%%cell_to_module joke\n", + "from typing import Optional # remove to get `NameError: name 'Optional' is not defined``\n", + "\n", + "def joke_prompt(topic: Optional[str] = None) -> str:\n", + " return f\"Tell me a short joke about {topic}\"" + ] + }, + { + "cell_type": "markdown", + "id": "b9e3f0a0", + "metadata": {}, + "source": [ + "### 2.3 Display module\n", + "You can visualize with the `--display / -d` argument. It can receive a dictionary of [visualization `kwargs`](https://hamilton.dagworks.io/en/latest/reference/drivers/Driver/#hamilton.driver.Driver.display_all_functions)." + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "id": "378a4552", + "metadata": {}, + "outputs": [ + { + "data": { + "image/svg+xml": [ + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "%3\n", + "\n", + "\n", + "cluster__legend\n", + "\n", + "Legend\n", + "\n", + "\n", + "\n", + "joke_prompt\n", + "\n", + "joke_prompt\n", + "str\n", + "\n", + "\n", + "\n", + "_joke_prompt_inputs\n", + "\n", + "topic\n", + "str\n", + "\n", + "\n", + "\n", + "_joke_prompt_inputs->joke_prompt\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "input\n", + "\n", + "input\n", + "\n", + "\n", + "\n", + "function\n", + "\n", + "function\n", + "\n", + "\n", + "\n" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "%%cell_to_module joke --display\n", + "def joke_prompt(topic: str) -> str:\n", + " return f\"Tell me a short joke about {topic}\"" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "id": "94772b5a", + "metadata": {}, + "outputs": [], + "source": [ + "display_config = dict(orient=\"TB\") # orient visualization top to bottom" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "id": "af95a65e", + "metadata": {}, + "outputs": [ + { + "data": { + "image/svg+xml": [ + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "%3\n", + "\n", + "\n", + "cluster__legend\n", + "\n", + "Legend\n", + "\n", + "\n", + "\n", + "joke_prompt\n", + "\n", + "joke_prompt\n", + "str\n", + "\n", + "\n", + "\n", + "_joke_prompt_inputs\n", + "\n", + "topic\n", + "str\n", + "\n", + "\n", + "\n", + "_joke_prompt_inputs->joke_prompt\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "input\n", + "\n", + "input\n", + "\n", + "\n", + "\n", + "function\n", + "\n", + "function\n", + "\n", + "\n", + "\n" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "%%cell_to_module joke --display display_config\n", + "def joke_prompt(topic: str) -> str:\n", + " return f\"Tell me a short joke about {topic}\"" + ] + }, + { + "cell_type": "markdown", + "id": "f8b86615", + "metadata": {}, + "source": [ + "### 2.4 Write module to file\n", + "To make the transition from notebook to module easy and avoid copy-pasting, you can use `--write_to_file / -w`. This will copy the content of the file to `{MODULE_NAME}.py`. You can also specify a destination file path explicitly.\n", + "\n", + "> ⛔ Be careful with overwriting files with this command. Use git to version your files.\n", + "\n", + "After the running the next cell, you should see `joke.py` generated in your directory." + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "id": "0b428a30", + "metadata": {}, + "outputs": [], + "source": [ + "%%cell_to_module joke --write_to_file\n", + "def joke_prompt(topic: str) -> str:\n", + " return f\"Knock, knock. Who's there? {topic}\"" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "id": "f52076cb", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "\"Knock, knock. Who's there? Cowsays\"" + ] + }, + "execution_count": 10, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "import joke\n", + "joke.joke_prompt(\"Cowsays\")" + ] + }, + { + "cell_type": "markdown", + "id": "6f10bf5f", + "metadata": {}, + "source": [ + "### 2.5 Configure the `Driver`\n", + "When using the `@config` function modifiers, you might need to pass a configuration to properly build your dataflow. You can do this inline with the `-c/--config` argument. It supports 3 different format:\n", + "\n", + "1. **Variable name** `--config my_config` where `my_config` is a variable e.g., `my_config=dict(a=True, b=-1)`\n", + "2. **Key-value** `--config a=True, b=-1` evaluates to `dict(a=\"True\", b=\"-1\")` since everything is interpreted as strings.\n", + "3. **JSON** `--config '{\"a\": true, \"b\": -1}'` evaluates to `dict(a=True, b=-1)`. For valid JSON, you need double quotes `\"` inside and have single quotes `'` outside.\n", + "\n", + "Here are examples. Notice how the config is properly displayed." + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "id": "3c2b11be", + "metadata": {}, + "outputs": [ + { + "data": { + "image/svg+xml": [ + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "%3\n", + "\n", + "\n", + "cluster__legend\n", + "\n", + "Legend\n", + "\n", + "\n", + "\n", + "knock_joke\n", + "\n", + "\n", + "\n", + "knock_joke\n", + "true\n", + "\n", + "\n", + "\n", + "joke_prompt\n", + "\n", + "joke_prompt: knock_joke\n", + "str\n", + "\n", + "\n", + "\n", + "_joke_prompt_inputs\n", + "\n", + "topic\n", + "str\n", + "\n", + "\n", + "\n", + "_joke_prompt_inputs->joke_prompt\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "config\n", + "\n", + "\n", + "\n", + "config\n", + "\n", + "\n", + "\n", + "input\n", + "\n", + "input\n", + "\n", + "\n", + "\n", + "function\n", + "\n", + "function\n", + "\n", + "\n", + "\n" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "%%cell_to_module joke --display --config '{\"knock_joke\": \"true\"}'\n", + "from hamilton.function_modifiers import config\n", + "\n", + "@config.when(knock_joke=\"true\")\n", + "def joke_prompt__knock(topic: str) -> str:\n", + " return f\"Knock, knock. Who's there? {topic}\"" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "id": "39bc255e", + "metadata": {}, + "outputs": [ + { + "data": { + "image/svg+xml": [ + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "%3\n", + "\n", + "\n", + "cluster__legend\n", + "\n", + "Legend\n", + "\n", + "\n", + "\n", + "knock_joke\n", + "\n", + "\n", + "\n", + "knock_joke\n", + "true\n", + "\n", + "\n", + "\n", + "joke_prompt\n", + "\n", + "joke_prompt: knock_joke\n", + "str\n", + "\n", + "\n", + "\n", + "_joke_prompt_inputs\n", + "\n", + "topic\n", + "str\n", + "\n", + "\n", + "\n", + "_joke_prompt_inputs->joke_prompt\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "config\n", + "\n", + "\n", + "\n", + "config\n", + "\n", + "\n", + "\n", + "input\n", + "\n", + "input\n", + "\n", + "\n", + "\n", + "function\n", + "\n", + "function\n", + "\n", + "\n", + "\n" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "%%cell_to_module joke --display --config knock_joke=true\n", + "from hamilton.function_modifiers import config\n", + "\n", + "@config.when(knock_joke=\"true\")\n", + "def joke_prompt__knock(topic: str) -> str:\n", + " return f\"Knock, knock. Who's there? {topic}\"" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "id": "ebcc2729", + "metadata": {}, + "outputs": [], + "source": [ + "my_config = dict(knock_joke=\"true\")" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "id": "a30d4cec", + "metadata": {}, + "outputs": [ + { + "data": { + "image/svg+xml": [ + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "%3\n", + "\n", + "\n", + "cluster__legend\n", + "\n", + "Legend\n", + "\n", + "\n", + "\n", + "knock_joke\n", + "\n", + "\n", + "\n", + "knock_joke\n", + "true\n", + "\n", + "\n", + "\n", + "joke_prompt\n", + "\n", + "joke_prompt: knock_joke\n", + "str\n", + "\n", + "\n", + "\n", + "_joke_prompt_inputs\n", + "\n", + "topic\n", + "str\n", + "\n", + "\n", + "\n", + "_joke_prompt_inputs->joke_prompt\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "config\n", + "\n", + "\n", + "\n", + "config\n", + "\n", + "\n", + "\n", + "input\n", + "\n", + "input\n", + "\n", + "\n", + "\n", + "function\n", + "\n", + "function\n", + "\n", + "\n", + "\n" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "%%cell_to_module joke --display --config my_config\n", + "from hamilton.function_modifiers import config\n", + "\n", + "@config.when(knock_joke=\"true\")\n", + "def joke_prompt__knock(topic: str) -> str:\n", + " return f\"Knock, knock. Who's there? {topic}\"" + ] + }, + { + "cell_type": "markdown", + "id": "0fc21caf", + "metadata": {}, + "source": [ + "### 2.6 Build a `Driver`" + ] + }, + { + "cell_type": "markdown", + "id": "37873813", + "metadata": {}, + "source": [ + "The `Driver` definition can be required to properly build some Hamilton dataflow, in particular those using `.with_config() / @config` and `Parallelizable[]/Collect[]`. We can pass a `Builder` object using `-b/--builder`.\n", + "\n", + "Here are examples. Notice how the config from the `Builder` is properly displayed." + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "id": "2582a54d", + "metadata": {}, + "outputs": [], + "source": [ + "my_builder = (\n", + " driver.Builder()\n", + " .enable_dynamic_execution(allow_experimental_mode=True)\n", + " .with_config({\"knock_joke\": \"true\"})\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "id": "922a4255", + "metadata": {}, + "outputs": [ + { + "data": { + "image/svg+xml": [ + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "%3\n", + "\n", + "\n", + "cluster__legend\n", + "\n", + "Legend\n", + "\n", + "\n", + "\n", + "knock_joke\n", + "\n", + "\n", + "\n", + "knock_joke\n", + "true\n", + "\n", + "\n", + "\n", + "topic\n", + "\n", + "\n", + "topic\n", + "Parallelizable\n", + "\n", + "\n", + "\n", + "joke_prompt\n", + "\n", + "joke_prompt: knock_joke\n", + "str\n", + "\n", + "\n", + "\n", + "topic->joke_prompt\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "joke_prompt_collection\n", + "\n", + "\n", + "joke_prompt_collection\n", + "list\n", + "\n", + "\n", + "\n", + "joke_prompt->joke_prompt_collection\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "config\n", + "\n", + "\n", + "\n", + "config\n", + "\n", + "\n", + "\n", + "function\n", + "\n", + "function\n", + "\n", + "\n", + "\n", + "expand\n", + "\n", + "\n", + "expand\n", + "\n", + "\n", + "\n", + "collect\n", + "\n", + "\n", + "collect\n", + "\n", + "\n", + "\n" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "%%cell_to_module joke --display -b my_builder\n", + "from hamilton.htypes import Parallelizable, Collect\n", + "from hamilton.function_modifiers import config\n", + "\n", + "def topic() -> Parallelizable[str]:\n", + " for t in [\"Tom\", \"Jerry\"]:\n", + " yield t\n", + "\n", + "@config.when(knock_joke=\"true\")\n", + "def joke_prompt__knock(topic: str) -> str:\n", + " return f\"Knock, knock. Who's there? {topic}\"\n", + "\n", + "def joke_prompt_collection(joke_prompt: Collect[str]) -> list:\n", + " return list(joke_prompt)" + ] + }, + { + "cell_type": "markdown", + "id": "8c2774be", + "metadata": {}, + "source": [ + "### 2.7 Load external modules\n", + "While developing your dataflow with `%%cell_to_module`, you might want to load nodes from another Python module. To do, simply import it and add it to the `Driver` using `.with_modules()` " + ] + }, + { + "cell_type": "code", + "execution_count": 33, + "id": "c59a633b", + "metadata": {}, + "outputs": [], + "source": [ + "my_builder = driver.Builder().with_modules(joke)" + ] + }, + { + "cell_type": "markdown", + "id": "eb6c1b06", + "metadata": {}, + "source": [ + "The nodes `topic` and `joke_prompt` origin from `joke.py`" + ] + }, + { + "cell_type": "code", + "execution_count": 34, + "id": "2ca32ee2", + "metadata": {}, + "outputs": [ + { + "data": { + "image/svg+xml": [ + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "%3\n", + "\n", + "\n", + "cluster__legend\n", + "\n", + "Legend\n", + "\n", + "\n", + "\n", + "reply\n", + "\n", + "reply\n", + "str\n", + "\n", + "\n", + "\n", + "punchline\n", + "\n", + "punchline\n", + "str\n", + "\n", + "\n", + "\n", + "reply->punchline\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "joke_response\n", + "\n", + "joke_response\n", + "str\n", + "\n", + "\n", + "\n", + "joke_prompt\n", + "\n", + "joke_prompt\n", + "str\n", + "\n", + "\n", + "\n", + "joke_prompt->reply\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "joke_prompt->joke_response\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "function\n", + "\n", + "function\n", + "\n", + "\n", + "\n" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "%%cell_to_module reply --display --builder my_builder\n", + "def joke_response(joke_prompt: str) -> str:\n", + " return f\"{joke_prompt}\\n\\nCowsay who?\"" + ] + }, + { + "cell_type": "markdown", + "id": "2c7624a0", + "metadata": {}, + "source": [ + "### 2.8 Edit external modules\n", + "It is also possible to load the content of a Python module into a notebook cell to be able to edit it interactively!\n", + "\n", + "This is essentially the reverse operation of `%%cell_to_module` hence why it's called `%module_to_cell`. This is a line magic (single `%`) and it reads the content of the line as a file path to a `.py` file." + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "id": "d72ac640", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "# execute this to generate a new cell\n", + "%module_to_cell ./joke.py" + ] + }, + { + "cell_type": "markdown", + "id": "c8331b92", + "metadata": {}, + "source": [ + "If you executed the previous cell, a new code cell was created above with the content of `joke.py`. You can add `--write_to_file` to write the notebook cell back to the file." + ] + }, + { + "cell_type": "markdown", + "id": "ed689861", + "metadata": {}, + "source": [ + "## 3. Execute a dataflow\n", + "One of the best part about notebooks is the ability to execute and immediately inspect results. They provide a \"read-eval-print\" loop (REPL) coding experience. With this extension, you can use a single notebook cell to define and execute your dataflow for a tight feedback loop.\n", + "\n", + "If you're familiar with Hamilton, you probably figured out that you can build a `Driver` from the dynamically defined modules (like the next cell). But we have better interactive options!" + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "id": "ed95343f", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "{'joke_prompt': \"Knock, knock. Who's there? Cowsay\"}" + ] + }, + "execution_count": 22, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "dr = driver.Builder().with_modules(joke).build()\n", + "results = dr.execute([\"joke_prompt\"], inputs=dict(topic=\"Cowsay\"))\n", + "results" + ] + }, + { + "cell_type": "markdown", + "id": "b36418f1", + "metadata": {}, + "source": [ + "### 3.1 Execute cell\n", + "By adding `--execute / -x` to your module definition, the defined dataflow will be executed using `Driver.execute()` with all available nodes.\n", + "\n", + "The `--display` visualization should now include **output** nodes reflecting the executed nodes." + ] + }, + { + "cell_type": "code", + "execution_count": 23, + "id": "0da888d5", + "metadata": {}, + "outputs": [ + { + "data": { + "image/svg+xml": [ + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "%3\n", + "\n", + "\n", + "cluster__legend\n", + "\n", + "Legend\n", + "\n", + "\n", + "\n", + "a_dataframe\n", + "\n", + "a_dataframe\n", + "DataFrame\n", + "\n", + "\n", + "\n", + "reply\n", + "\n", + "reply\n", + "str\n", + "\n", + "\n", + "\n", + "punchline\n", + "\n", + "punchline\n", + "str\n", + "\n", + "\n", + "\n", + "reply->punchline\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "joke_prompt\n", + "\n", + "joke_prompt\n", + "str\n", + "\n", + "\n", + "\n", + "joke_prompt->reply\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "function\n", + "\n", + "function\n", + "\n", + "\n", + "\n", + "output\n", + "\n", + "output\n", + "\n", + "\n", + "\n" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
ab
00a
11b
22c
33d
\n", + "
" + ], + "text/plain": [ + " a b\n", + "0 0 a\n", + "1 1 b\n", + "2 2 c\n", + "3 3 d" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/plain": [ + "\"Knock, knock. Who's there? Cowsay\"" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/plain": [ + "'Cowsay who?'" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/plain": [ + "'No, Cowsay MooOOooo'" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "%%cell_to_module joke --display --execute\n", + "import pandas as pd\n", + "\n", + "def joke_prompt() -> str:\n", + " return f\"Knock, knock. Who's there? Cowsay\"\n", + "\n", + "def reply(joke_prompt: str) -> str:\n", + " _, _, right = joke_prompt.partition(\"? \")\n", + " return f\"{right} who?\"\n", + "\n", + "def punchline(reply: str) -> str:\n", + " left, _, _ = reply.partition(\" \")\n", + " return f\"No, {left} MooOOooo\"\n", + "\n", + "def a_dataframe() -> pd.DataFrame:\n", + " return pd.DataFrame({\"a\": [0, 1, 2, 3], \"b\": [\"a\", \"b\", \"c\", \"d\"]})" + ] + }, + { + "cell_type": "markdown", + "id": "8026ec86", + "metadata": {}, + "source": [ + "👆 As you see, node results are automatically displayed in topologically sorted order. You can hide them with `--hide_results`." + ] + }, + { + "cell_type": "markdown", + "id": "9a5738f6", + "metadata": {}, + "source": [ + "### 3.2 Requesting nodes\n", + "You can a variable name to `--execute` which specifies the list of nodes to execute. This will be reflected in the visualization." + ] + }, + { + "cell_type": "code", + "execution_count": 24, + "id": "d20198e1", + "metadata": {}, + "outputs": [], + "source": [ + "node_to_execute = [\"reply\"]" + ] + }, + { + "cell_type": "code", + "execution_count": 25, + "id": "b7eea8d6", + "metadata": {}, + "outputs": [ + { + "data": { + "image/svg+xml": [ + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "%3\n", + "\n", + "\n", + "cluster__legend\n", + "\n", + "Legend\n", + "\n", + "\n", + "\n", + "topic\n", + "\n", + "topic\n", + "str\n", + "\n", + "\n", + "\n", + "joke_prompt\n", + "\n", + "joke_prompt\n", + "str\n", + "\n", + "\n", + "\n", + "topic->joke_prompt\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "reply\n", + "\n", + "reply\n", + "str\n", + "\n", + "\n", + "\n", + "joke_prompt->reply\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "function\n", + "\n", + "function\n", + "\n", + "\n", + "\n", + "output\n", + "\n", + "output\n", + "\n", + "\n", + "\n" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/plain": [ + "'Cowsay who?'" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "%%cell_to_module joke --display --execute node_to_execute\n", + "def topic() -> str:\n", + " return \"Cowsay\"\n", + "\n", + "def joke_prompt(topic: str) -> str:\n", + " return f\"Knock, knock. Who's there? {topic}\"\n", + "\n", + "def reply(joke_prompt: str) -> str:\n", + " _, _, right = joke_prompt.partition(\"? \")\n", + " return f\"{right} who?\"\n", + "\n", + "def punchline(reply: str) -> str:\n", + " left, _, _ = reply.partition(\" \")\n", + " return f\"No, {left} MooOOooo\"" + ] + }, + { + "cell_type": "markdown", + "id": "e10dd1d9", + "metadata": {}, + "source": [ + "### 3.3 Inspecting results\n", + "Ok, but how do you access results? With the node name! Magic 🧙" + ] + }, + { + "cell_type": "code", + "execution_count": 26, + "id": "5eab142c", + "metadata": {}, + "outputs": [ + { + "data": { + "image/svg+xml": [ + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "%3\n", + "\n", + "\n", + "cluster__legend\n", + "\n", + "Legend\n", + "\n", + "\n", + "\n", + "reply\n", + "\n", + "reply\n", + "str\n", + "\n", + "\n", + "\n", + "punchline\n", + "\n", + "punchline\n", + "str\n", + "\n", + "\n", + "\n", + "reply->punchline\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "topic\n", + "\n", + "topic\n", + "str\n", + "\n", + "\n", + "\n", + "joke_prompt\n", + "\n", + "joke_prompt\n", + "str\n", + "\n", + "\n", + "\n", + "topic->joke_prompt\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "joke_prompt->reply\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "function\n", + "\n", + "function\n", + "\n", + "\n", + "\n", + "output\n", + "\n", + "output\n", + "\n", + "\n", + "\n" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "%%cell_to_module joke --display --execute --hide_results\n", + "def topic() -> str:\n", + " return \"Cowsay\"\n", + "\n", + "def joke_prompt(topic: str) -> str:\n", + " return f\"Knock, knock. Who's there? {topic}\"\n", + "\n", + "def reply(joke_prompt: str) -> str:\n", + " _, _, right = joke_prompt.partition(\"? \")\n", + " return f\"{right} who?\"\n", + "\n", + "def punchline(reply: str) -> str:\n", + " left, _, _ = reply.partition(\" \")\n", + " return f\"No, {left} MooOOooo\"" + ] + }, + { + "cell_type": "code", + "execution_count": 27, + "id": "607d3863", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "'No, Cowsay MooOOooo'" + ] + }, + "execution_count": 27, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "punchline" + ] + }, + { + "cell_type": "markdown", + "id": "ad5f79be", + "metadata": {}, + "source": [ + "The results are assigned to variables matching their node names. This means you can quickly access them by typing their name. In most notebook environment, you get tab-completion for the and can view results in the variable inspector (especially useful for dataframes).\n", + "\n", + "What's the magic trick? 🐰\n", + "\n", + "When executing the cell, we are effectively:\n", + "1. Loading it's content as a module\n", + "2. Building a `Driver` with this module\n", + "3. Executing the module with all nodes\n", + "4. Assigning values from the results of `.execute()` to variables\n", + "\n", + "Consequently, functions defined in `%%cell_to_node` are replaced by their \"value\". You access functions directly through their module:" + ] + }, + { + "cell_type": "code", + "execution_count": 28, + "id": "f9a9df33", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "'No, Foxey MooOOooo'" + ] + }, + "execution_count": 28, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "joke.punchline(\"Foxey\")" + ] + }, + { + "cell_type": "markdown", + "id": "72b09bee", + "metadata": {}, + "source": [ + "### 3.4 Inputs & overrides\n", + "In Hamilton, *inputs* are values external to the dataflow and *overrides* are values to replace the output of a node (it effectively skips upstream operations). You can use `--inputs / -i` and `--outputs / -o` to pass dictionaries of values for execution." + ] + }, + { + "cell_type": "code", + "execution_count": 29, + "id": "ce7d51d4", + "metadata": {}, + "outputs": [], + "source": [ + "my_inputs = dict(topic=\"monday\")\n", + "my_overrides = dict(punchline=\"Bingo bongo!\")" + ] + }, + { + "cell_type": "code", + "execution_count": 30, + "id": "83c61edd", + "metadata": {}, + "outputs": [ + { + "data": { + "image/svg+xml": [ + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "%3\n", + "\n", + "\n", + "cluster__legend\n", + "\n", + "Legend\n", + "\n", + "\n", + "\n", + "reply\n", + "\n", + "reply\n", + "str\n", + "\n", + "\n", + "\n", + "punchline\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "punchline\n", + "str\n", + "\n", + "\n", + "\n", + "reply->punchline\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "joke_prompt\n", + "\n", + "joke_prompt\n", + "str\n", + "\n", + "\n", + "\n", + "joke_prompt->reply\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "_joke_prompt_inputs\n", + "\n", + "topic\n", + "str\n", + "\n", + "\n", + "\n", + "_joke_prompt_inputs->joke_prompt\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "input\n", + "\n", + "input\n", + "\n", + "\n", + "\n", + "function\n", + "\n", + "function\n", + "\n", + "\n", + "\n", + "output\n", + "\n", + "output\n", + "\n", + "\n", + "\n", + "override\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "override\n", + "\n", + "\n", + "\n" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/plain": [ + "'monday'" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/plain": [ + "\"Knock, knock. Who's there? monday\"" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/plain": [ + "'monday who?'" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/plain": [ + "'Bingo bongo!'" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "%%cell_to_module joke --display --execute --inputs my_inputs --overrides my_overrides\n", + "def joke_prompt(topic: str) -> str:\n", + " return f\"Knock, knock. Who's there? {topic}\"\n", + "\n", + "def reply(joke_prompt: str) -> str:\n", + " _, _, right = joke_prompt.partition(\"? \")\n", + " return f\"{right} who?\"\n", + "\n", + "def punchline(reply: str) -> str:\n", + " left, _, _ = reply.partition(\" \")\n", + " return f\"No, {left} MooOOooo\"" + ] + }, + { + "cell_type": "markdown", + "id": "2c21fd58", + "metadata": {}, + "source": [ + "### 3.5 Driver Adapters\n", + "You can modify execution by passing adapters to `--builder / -b`. Adapters are flexible tools that can provide a variety of features. For instance, the next few cells uses `PrintLn()` to execution status after each node." + ] + }, + { + "cell_type": "code", + "execution_count": 31, + "id": "abe2edb4", + "metadata": {}, + "outputs": [], + "source": [ + "from hamilton.lifecycle.default import PrintLn\n", + "my_builder = driver.Builder().with_adapters(PrintLn()) # add the adapter" + ] + }, + { + "cell_type": "markdown", + "id": "7d0b177c", + "metadata": {}, + "source": [ + "Notice in the printed statement how the overriden `punchline` node isn't executed." + ] + }, + { + "cell_type": "code", + "execution_count": 32, + "id": "ac0a6585", + "metadata": {}, + "outputs": [ + { + "data": { + "image/svg+xml": [ + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "%3\n", + "\n", + "\n", + "cluster__legend\n", + "\n", + "Legend\n", + "\n", + "\n", + "\n", + "reply\n", + "\n", + "reply\n", + "str\n", + "\n", + "\n", + "\n", + "punchline\n", + "\n", + "punchline\n", + "str\n", + "\n", + "\n", + "\n", + "reply->punchline\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "joke_prompt\n", + "\n", + "joke_prompt\n", + "str\n", + "\n", + "\n", + "\n", + "joke_prompt->reply\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "function\n", + "\n", + "function\n", + "\n", + "\n", + "\n", + "output\n", + "\n", + "output\n", + "\n", + "\n", + "\n" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Executing node: joke_prompt.\n", + "Finished debugging node: joke_prompt in 53.4μs. Status: Success.\n", + "Executing node: reply.\n", + "Finished debugging node: reply in 10.5μs. Status: Success.\n", + "Executing node: punchline.\n", + "Finished debugging node: punchline in 9.78μs. Status: Success.\n" + ] + }, + { + "data": { + "text/plain": [ + "\"Knock, knock. Who's there? Cowsay\"" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/plain": [ + "'Cowsay who?'" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/plain": [ + "'No, Cowsay MooOOooo'" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "%%cell_to_module joke --builder my_builder --display --execute \n", + "def joke_prompt() -> str:\n", + " return f\"Knock, knock. Who's there? Cowsay\"\n", + "\n", + "def reply(joke_prompt: str) -> str:\n", + " _, _, right = joke_prompt.partition(\"? \")\n", + " return f\"{right} who?\"\n", + "\n", + "def punchline(reply: str) -> str:\n", + " left, _, _ = reply.partition(\" \")\n", + " return f\"No, {left} MooOOooo\"" + ] + }, + { + "cell_type": "markdown", + "id": "15cea16a", + "metadata": {}, + "source": [ + "There are ton of awesome adapters that can help you with your notebook experience. Here are a few notable mentions:\n", + "\n", + "1. `hamilton.lifecycle.default.CacheAdapter()` will automatically version the node's code and input values and store its result on disk. When running the same node (code, inputs) pair, it will read the value from disk instead of recomputing. This can help save LLM API costs!\n", + "2. `hamilton.plugins.h_diskcache.DiskCacheAdapter()` same core features as `CacheAdapter()`, but more utilities around cache management\n", + "3. `hamilton.lifecycle.default.PrintLn()` print execution status.\n", + "4. `hamilton.plugins.h_tqdm.ProgressBar()` add a progress bar for execution.\n", + "5. `hamilton.lifecycle.default.PDBDebugger()` allows you to step into a node with a Python debugger, allowing you to execute code line by line.\n", + "\n", + "Note that all of these adapters work with Hamilton outside notebooks too!" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "venv", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.1" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/hamilton/plugins/jupyter_magic.py b/hamilton/plugins/jupyter_magic.py index 5b9947b94..92c626a1c 100644 --- a/hamilton/plugins/jupyter_magic.py +++ b/hamilton/plugins/jupyter_magic.py @@ -1,29 +1,83 @@ -""" -Module for use with jupyter notebooks to facilitate Hamilton development. +import argparse +import ast +import json +import os +from pathlib import Path +from typing import Any, Dict, List, Set, Tuple -Usage: - > # To load it - > %load_ext hamilton.plugins.jupyter_magic +from IPython.core.magic import Magics, cell_magic, line_magic, magics_class +from IPython.core.magic_arguments import argument, magic_arguments, parse_argstring +from IPython.core.shellapp import InteractiveShellApp +from IPython.display import HTML, display +from IPython.utils.process import arg_split - > %%cell_to_module -m MODULE_NAME - > def my_hamilton_funcs(): ... +from hamilton import ad_hoc_utils, driver -If you are developing on this module you'll then want to use: - > %reload_ext hamilton.plugins.jupyter_magic +def get_assigned_variables(module_node: ast.Module) -> Set[str]: + """Get the set of variable names assigned in a AST Module""" + assigned_vars = set() -""" + def visit_node(ast_node): + """Recursive function looking for assigned variable names""" + if isinstance(ast_node, ast.Assign): + for target in ast_node.targets: + if isinstance(target, ast.Name): + assigned_vars.add(target.id) -import json -import os -from pathlib import Path -from types import ModuleType + for child_node in ast.iter_child_nodes(ast_node): + visit_node(child_node) + + visit_node(module_node) + return assigned_vars + + +def execute_and_get_assigned_values(shell: InteractiveShellApp, cell: str) -> Dict[str, Any]: + """Execute source code from a cell in the user namespace and collect + the values of all assigned variables into a dictionary. + """ + shell.ex(cell) + expr = shell.input_transformer_manager.transform_cell(cell) + expr_ast = shell.compile.ast_parse(expr) + return {name: shell.user_ns[name] for name in get_assigned_variables(expr_ast)} -from IPython.core.magic import Magics, cell_magic, line_magic, magics_class -from IPython.core.magic_arguments import argument, magic_arguments, parse_argstring -from IPython.display import HTML, Code, display -from hamilton import ad_hoc_utils, driver, lifecycle +def topological_sort(nodes): + """Sort the nodes for nice output display""" + + def dfs(node, visited, stack): + visited.add(node) + for neighbor in graph.get(node, []): + if neighbor not in visited: + dfs(neighbor, visited, stack) + stack.append(node) + + graph = {n.name: set([*n.required_dependencies, *n.optional_dependencies]) for n in nodes} + visited = set() + stack = [] + + for node in graph: + if node not in visited: + dfs(node, visited, stack) + + return stack + + +def _normalize_result_names(node_name: str) -> str: + """Remove periods from the name of dynamically generated Hamilton nodes""" + return node_name.replace(".", "__") + + +def display_in_databricks(dot): + try: + display(HTML(dot.pipe(format="svg").decode("utf-8"))) + except Exception as e: + print( + f"Failed to display graph: {e}\n" + "Please ensure graphviz is installed via `%sh apt install -y graphviz`" + ) + return + return dot def insert_cell_with_content(): @@ -77,159 +131,272 @@ def insert_cell_with_content(): display(HTML(js_script)) -def find_all_hamilton_drivers_using_this_module(shell, module_name: str) -> list: - """Find all Hamilton drivers in the notebook that use the module `module_name`. +def determine_notebook_type() -> str: + if "DATABRICKS_RUNTIME_VERSION" in os.environ: + return "databricks" + return "default" - :param shell: the ipython shell object - :param module_name: the module name to search for - :return: the list of (driver variable name, drivers) that use the module - """ - driver_instances = { - var_name: shell.user_ns[var_name] - for var_name in shell.user_ns - if isinstance(shell.user_ns[var_name], driver.Driver) and var_name != f"{module_name}_dr" - } - impacted_drivers = [] - for var_name, dr in driver_instances.items(): - for driver_module in dr.graph_modules: - if driver_module.__name__ == module_name: - impacted_drivers.append((var_name, dr)) - break - return impacted_drivers - - -def rebuild_drivers(shell, module_name: str, module_object: ModuleType, verbosity: int = 1) -> dict: - """Function to rebuild drivers that use the module `module_name` with the new module `module_object`. - - This finds the drivers and rebuilds them if it knows how. It will skip rebuilding if the driver has an adapter. - - :param shell: - :param module_name: - :param module_object: - :param verbosity: - :return: - """ - impacted_drivers = find_all_hamilton_drivers_using_this_module(shell, module_name) - drivers_rebuilt = {} - for var_name, dr in impacted_drivers: - modules_to_use = [mod for mod in dr.graph_modules if mod.__name__ != module_name] - modules_to_use.append(module_object) - # TODO: make this more robust by providing some better APIs. - if ( - dr.adapter - and hasattr(dr.adapter, "_adapters") - and ( - not dr.adapter._adapters - or isinstance(dr.adapter._adapters[0], lifecycle.base.LifecycleAdapterSet) - ) - ): - # TODO: make this more robust with a better API - _config = dr.graph._config - dr = ( - driver.Builder() - .with_modules(*modules_to_use) - .with_config(_config) - .with_adapter(dr.adapter) - .build() - ) - drivers_rebuilt[var_name] = dr - if verbosity > 0: - print( - f"Rebuilt {var_name} with module {module_name}, using it's config of {_config}" - ) - else: - if verbosity > 0: - print(f"Driver {var_name} has an adapter passed, skipping rebuild.") +def parse_known_argstring(magic_func, argstring) -> Tuple[argparse.Namespace, List[str]]: + """IPython magic arguments parsing doesn't allow unknown args. + Used instead of IPython.core.magic_arguments.parse_argstring - return drivers_rebuilt + IPython ref: https://github.com/ipython/ipython/blob/43781b39a67f02ff4e9ae63484387f654dd045d4/IPython/core/magic_arguments.py#L164 + argparse ref: https://docs.python.org/3/library/argparse.html#argparse.ArgumentParser.parse_known_args + """ + argv = arg_split(argstring) + # magic_func.parser is an argparse.ArgumentParser subclass + known, unknown = magic_func.parser.parse_known_args(argv) + return known, unknown -def determine_notebook_type() -> str: - if "DATABRICKS_RUNTIME_VERSION" in os.environ: - return "databricks" - return "default" +def parse_config(config_string): + config = {} + for item in config_string.split(): + key, value = item.split("=") + config[key] = value + return config @magics_class class HamiltonMagics(Magics): - """Magics to facilitate Hamilton development in Jupyter notebooks""" + """Magics to facilitate interactive Hamilton development in notebooks.""" + + def __init__(self, **kwargs): + super().__init__(**kwargs) + self.builder = None + self.notebook_env = determine_notebook_type() + + def resolve_unknown_args_cell_to_module(self, unknown: List[str]): + """Handle unknown arguments. It won't make the magic execution fail.""" + + # deprecated in V2 because it's less useful since `%%cell_to_module` can execute itself + if any(arg in ("-r", "--rebuilder-drivers") for arg in unknown): + print( + "DeprecationWarning: -r/--rebuilder-drivers no long does anything and will be removed in future releases." + ) + + # deprecated in V2 because it relates to the deprecated -r/--rebuild-drivers + if any(arg in ("-v", "--verbose") for arg in unknown): + print( + "DeprecationWarning: -v/--verbose no long does anything and will be removed in future releases." + ) + + # there for backwards compatibility. Equivalent to calling `%%cell_to_module?` + # not included as @argument because it's not really a function arg to %%cell_to_module + if any(arg in ("-h", "--help") for arg in unknown): + print(help(self.cell_to_module)) @magic_arguments() # needed on top to enable parsing + @argument("module_name", nargs="?", help="Name for the module defined in this cell.") @argument( - "-m", "--module_name", help="Module name to provide. Default is jupyter_module." - ) # keyword / optional arg + "-m", + "--module_name", + nargs="?", + const=True, + help="Alias for positional argument `module_name`. There for backwards compatibility. Prefer the position arg.", + ) @argument( - "-c", "--config", help="JSON config, or variable name containing config to use." - ) # keyword / optional arg + "-d", + "--display", + nargs="?", + const=True, + help="Display the dataflow. The argument is the variable name of a dictionary of visualization kwargs; else {}.", + ) @argument( - "-r", "--rebuild-drivers", action="store_true", help="Flag to rebuild drivers" - ) # Flag / optional arg + "-x", + "--execute", + nargs="?", + const=True, + help="Execute the dataflow. The argument is the variable name of a list of nodes; else execute all nodes.", + ) @argument( - "-d", "--display", action="store_true", help="Flag to visualize dataflow." - ) # Flag / optional arg + "-b", + "--builder", + help="Builder to which the module will be added and used for execution. Allows to pass Config and Adapters", + ) + @argument( + "-c", + "--config", + help="Config to build a Driver. Passing -c/--config at the same time as a Builder -b/--builder with a config will raise an exception.", + ) + @argument( + "-i", + "--inputs", + help="Execution inputs. The argument is the variable name of a dict of inputs; else {}.", + ) @argument( - "-v", "--verbosity", type=int, default=1, help="0 to hide. 1 is normal, default" - ) # keyword / optional arg + "-o", + "--overrides", + help="Execution overrides. The argument is the variable name of a dict of overrides; else {}.", + ) + @argument( + "--hide_results", + action="store_true", + help="Hides the automatic display of execution results.", + ) + @argument( + "-w", + "--write_to_file", + nargs="?", + const=True, + help="Write cell content to a file. The argument is the file path; else write to {module_name}.py", + ) @cell_magic def cell_to_module(self, line, cell): - """Execute the cell and dynamically create a Python module from its content. - A Hamilton Driver is automatically instantiated with that module for variable `{MODULE_NAME}_dr`. - - > %%cell_to_module -m MODULE_NAME --display --rebuild-drivers + """Turn a notebook cell into a Hamilton module definition. This allows you to define + and execute a dataflow from a single cell. + + For example: + ``` + %%cell_to_module dataflow --display --execute + def A() -> int: + return 37 + + def B(A: int) -> bool: + return (A % 3) > 2 + ``` """ - if "--help" in line.split(): - print("Help for %%cell_to_module magic:") - print(" -m, --module_name: Module name to provide. Default is jupyter_module.") - print(" -c, --config: JSON config string, or variable name containing config to use.") - print(" -r, --rebuild-drivers: Flag to rebuild drivers.") - print(" -d, --display: Flag to visualize dataflow.") - print(" -v, --verbosity: of standard output. 0 to hide. 1 is normal, default.") - return # Exit early - if not hasattr(self, "notebook_env"): - # doing this so I don't have to deal with the constructor - self.notebook_env = determine_notebook_type() # shell.ex() is equivalent to exec(), but in the user namespace (i.e. notebook context). # This allows imports and functions defined in the magic cell %%cell_to_module to be # directly accessed from the notebook self.shell.ex(cell) - args = parse_argstring(self.cell_to_module, line) # specify how to parse by passing method - module_name = self.get_module_name(args) + args, unknown_args = parse_known_argstring( + self.cell_to_module, line + ) # specify how to parse by passing method + self.resolve_unknown_args_cell_to_module(unknown_args) + + # validate variables exist in the user namespace expect `config` because it's a special case + # will exit using `return` in case of error + args_that_read_user_namespace = ["display", "builder", "inputs", "overrides"] + for name, value in vars(args).items(): + if name not in args_that_read_user_namespace: + continue + + # special case: `display` can be passed as a flag (=True), without a config + if name == "display" and value is True: + continue + + # main case: exit if variable is not in user namespace + if value and self.shell.user_ns.get(value) is None: + return f"KeyError: Received `--{name} {value}` but variable not found." + + # special case: `config` expects potentially a JSON string + # there for backwards compatibility + + # default case: didn't receive `-c/--config`. Set an empty dict + if args.config is None: + config = {} + # case 1, 2, 3: `-c/--config` is specified + # case 1: -c/--config refers to variable in the user namespace + elif self.shell.user_ns.get(args.config): + config = self.shell.user_ns.get(args.config) + # case 2: parse using key=value + elif "=" in args.config: + config = parse_config(args.config) + # case 3: parse as JSON + elif ":" in args.config: + try: + # strip quotation marks added by IPython and avoid mutating `args` + config_str = args.config.strip("'\"") + config = json.loads(config_str) + except json.JSONDecodeError: + print(f"JSONDecodeError: Failed to parse `config` as JSON. Received {value}") + return - display_config = self.get_display_config(args) + # get the default values of args + module_name = args.module_name + base_builder = self.shell.user_ns[args.builder] if args.builder else driver.Builder() + inputs = self.shell.user_ns[args.inputs] if args.inputs else {} + overrides = self.shell.user_ns[args.overrides] if args.overrides else {} + display_config = ( + self.shell.user_ns[args.display] if args.display not in [True, None] else {} + ) + + # Decision: write to file before trying to build and execute Driver + # See argument `help` for behavior details + if args.write_to_file: + if isinstance(args.write_to_file, str): + file_path = Path(args.write_to_file) + else: + file_path = Path(f"{module_name}.py") + file_path.write_text(cell) - module_object = ad_hoc_utils.create_module(cell, module_name, verbosity=args.verbosity) + # create_module() is preferred over module_from_source() to simplify + # the integration with the Hamilton UI which assumes physical Python modules + cell_module = ad_hoc_utils.create_module(cell, module_name) + self.shell.push({module_name: cell_module}) - # shell.push() assign a variable in the notebook. The dictionary keys are variable name - self.shell.push({module_name: module_object}) + # determine the Driver config + if config and base_builder.config: + return "AssertionError: Received a config -c/--config and a Builder -b/--builder with an existing config. Pass either one." - # shell.user_ns is a dictionary of all variables in the notebook - # rebuild drivers that use this module - if args.rebuild_drivers: - rebuilt_drivers = rebuild_drivers( - self.shell, module_name, module_object, verbosity=args.verbosity - ) - self.shell.user_ns.update(rebuilt_drivers) + # build the Driver. the Builder is copied to avoid conflict with the user namespace + builder = base_builder.copy() + dr = builder.with_config(config).with_modules(cell_module).build() - # create a driver to display things for every cell with %%with_functions - dr = driver.Builder().with_modules(module_object).with_config(display_config).build() - self.shell.push({f"{module_name}_dr": dr}) + # determine final vars + if args.execute not in [True, None]: + final_vars = self.shell.user_ns[args.execute] + else: + nodes = [n for n in dr.list_available_variables() if not n.is_external_input] + final_vars = topological_sort(nodes) + + # visualize if args.display: - graphviz_obj = dr.display_all_functions() - if self.notebook_env == "databricks" and graphviz_obj: - try: - display(HTML(graphviz_obj.pipe(format="svg").decode("utf-8"))) - except Exception as e: - print(f"Failed to display graph: {e}") - print("Please ensure graphviz is installed via `%sh apt install -y graphviz`") + # try/except `display_config` or inputs/overrides may be invalid + try: + if args.execute: + dot = dr.visualize_execution( + final_vars=final_vars, + inputs=inputs, + overrides=overrides, + **display_config, + ) + else: + dot = dr.display_all_functions(**display_config) + except Exception as e: + print(f"Failed to display {e}.\n\nThe display config was: {display_config}") + dot = dr.display_all_functions() + + # handle output environment + if self.notebook_env == "databricks": + display_in_databricks(dot) + else: + display(dot) + + # execute + if args.execute: + results = dr.execute( + final_vars=final_vars, + inputs=inputs, + overrides=overrides, + ) + # normalize variable names that contain a `.` character like @pipe(step()) + results = {_normalize_result_names(name): value for name, value in results.items()} + self.shell.push(results) + + if args.hide_results: return - # return will go to the output cell. To display multiple elements, use - # IPython.display.display(print("hello"), dr.display_all_functions(), ...) - return graphviz_obj + # results will follow the order of `final_vars` or topologically sorted if all vars + display(*(results[n] for n in final_vars)) + + @magic_arguments() + @argument("name", type=str, help="Creates a dictionary fromt the cell's content.") + @cell_magic + def set_dict(self, line: str, cell: str): + """Execute the cell and store all assigned variables as inputs""" + args = parse_argstring(self.set_dict, line) + self.shell.user_ns[args.name] = execute_and_get_assigned_values(self.shell, cell) @line_magic def insert_module(self, line): + """Alias for `%module_to_cell`.""" + self.module_to_cell(line) + + @line_magic + def module_to_cell(self, line): """Insert in the next cell the source code from the module (.py) at the path specified by `line`. @@ -242,159 +409,13 @@ def insert_module(self, line): module_path = Path(line) # insert our custom %%with_functions magic at the top of the cell - header = f"%%cell_to_module -m {module_path.stem}\n\n" + header = f"%%cell_to_module {module_path.stem}\n" module_source = module_path.read_text() - # insert source code as text in the next cell self.shell.set_next_input(header + module_source, replace=False) - @magic_arguments() # needed on top to enable parsing - @argument("module_name", help="Module name to provide") # keyword / optional arg - @argument( - "-i", - "--identifier", - help="Identifier for this cell w.r.t. the module being created. " - "Integer or String. Integer is simplest.", - ) # required argument - @argument( - "-c", "--config", help="JSON config, or variable name containing config to use." - ) # keyword / optional arg - @argument( - "-d", "--display", action="store_true", help="Flag to visualize dataflow." - ) # Flag / optional arg - @argument( - "-v", "--verbosity", type=int, default=1, help="0 to hide. 1 is normal, default" - ) # keyword / optional arg - @cell_magic - def incr_cell_to_module(self, line, cell): - """Incrementally build a module. This executes the cell and dynamically creates a Python module from its content. - A Hamilton Driver is automatically instantiated with that module for variable `{MODULE_NAME}_dr`. - - > %%incr_cell_to_module -m MODULE_NAME -i IDENTIFIER --display - """ - if "--help" in line.split(): - print("Help for %%incr_cell_to_module magic:") - print("module_name: Module name to provide. Required.") - print(" -i, --identifier: the ID for this cell w.r.t. to the module name. Required.") - print(" -c, --config: JSON config string, or variable name containing config to use.") - print(" -d, --display: Flag to visualize dataflow.") - print(" -v, --verbosity: of standard output. 0 to hide. 1 is normal, default.") - return # Exit early - - if not hasattr(self, "notebook_env"): - self.notebook_env = determine_notebook_type() - if not hasattr(self, "module_to_cell_mapping"): - self.module_to_cell_mapping = {} # dict of dicts - - args = parse_argstring( - self.incr_cell_to_module, line - ) # specify how to parse by passing method - if args.identifier is None: - raise ValueError("Identifier is required. Please provide an identifier for this cell.") - - # shell.ex() is equivalent to exec(), but in the user namespace (i.e. notebook context). - # This allows imports and functions defined in the magic cell %%cell_to_module to be - # directly accessed from the notebook - self.shell.ex(cell) - - module_name = args.module_name - - display_config = self.get_display_config(args) - - if module_name not in self.module_to_cell_mapping: - self.module_to_cell_mapping[module_name] = {} - - self.module_to_cell_mapping[module_name][args.identifier] = cell - module_source = self.get_module_source(module_name) - module_object = ad_hoc_utils.create_module( - module_source, module_name, verbosity=args.verbosity - ) - - # shell.push() assign a variable in the notebook. The dictionary keys are variable name - self.shell.push({module_name: module_object}) - - # shell.user_ns is a dictionary of all variables in the notebook - # create a driver to display things for every cell with %%with_functions - dr = driver.Builder().with_modules(module_object).with_config(display_config).build() - self.shell.push({f"{module_name}_dr": dr}) - if args.display: - graphviz_obj = dr.display_all_functions() - if self.notebook_env == "databricks" and graphviz_obj: - try: - display(HTML(graphviz_obj.pipe(format="svg").decode("utf-8"))) - except Exception as e: - print(f"Failed to display graph: {e}") - print("Please ensure graphviz is installed via `%sh apt install -y graphviz`") - return - # return will go to the output cell. To display multiple elements, use - # IPython.display.display(print("hello"), dr.display_all_functions(), ...) - return graphviz_obj - - def get_display_config(self, args) -> dict: - """Gets the display config from args if they exist""" - display_config = {} - if args.config: - if args.config in self.shell.user_ns: - display_config = self.shell.user_ns[args.config] - else: - try: - if args.config.startswith("'") or args.config.startswith('"'): - # strip quotes if present - args.config = args.config[1:-1] - display_config = json.loads(args.config) - except json.JSONDecodeError: - print("Failed to parse config as JSON. Please ensure it's a valid JSON string:") - print(args.config) - return display_config - - @magic_arguments() # needed on top to enable parsing - @argument("module_name", help="Module name to print.") # required argument - @line_magic - def print_module(self, line): - """Prints the contents of a dynamic module we've been creating.""" - if not hasattr(self, "notebook_env"): - self.notebook_env = determine_notebook_type() - if not hasattr(self, "module_to_cell_mapping"): - self.module_to_cell_mapping = {} - args = parse_argstring( - self.incr_cell_to_module, line - ) # specify how to parse by passing method - module_name = args.module_name - if module_name not in self.module_to_cell_mapping: - raise ValueError(f"Module {module_name} not found.") - module_source = self.get_module_source(module_name) - display(Code(module_source)) - - def get_module_source(self, module_name: str) -> str: - """Creates the module source from incremental code.""" - module_dict = self.module_to_cell_mapping[module_name] - module_order = sorted(list(module_dict.keys())) - module_source = "\n\n".join([module_dict[k] for k in module_order]) - return module_source - - @magic_arguments() # needed on top to enable parsing - @argument("module_name", help="Module to print.") # required argument - @line_magic - def reset_module(self, line): - if not hasattr(self, "notebook_env"): - self.notebook_env = determine_notebook_type() - if not hasattr(self, "module_to_cell_mapping"): - self.module_to_cell_mapping = {} - args = parse_argstring( - self.incr_cell_to_module, line - ) # specify how to parse by passing method - module_name = args.module_name - if module_name in self.module_to_cell_mapping: - print(f"Reset {module_name}") - del self.module_to_cell_mapping[module_name] - - def get_module_name(self, args, default_name: str = "jupyter_module") -> str: - """Gets the module name, else returns the default.""" - module_name = default_name if args.module_name is None else args.module_name - return module_name - -def load_ipython_extension(ipython): +def load_ipython_extension(ipython: InteractiveShellApp): """ Any module file that define a function named `load_ipython_extension` can be loaded via `%load_ext module.path` or be configured to be From d13ef5b1bdf2bce51a477e2c1ed8cdc1112bb6cc Mon Sep 17 00:00:00 2001 From: zilto Date: Fri, 24 May 2024 16:18:18 -0400 Subject: [PATCH 2/7] updated incr_cell_to_module --- hamilton/plugins/jupyter_magic.py | 219 +++++++++++++++++++++++++----- 1 file changed, 186 insertions(+), 33 deletions(-) diff --git a/hamilton/plugins/jupyter_magic.py b/hamilton/plugins/jupyter_magic.py index 92c626a1c..f8ee0043f 100644 --- a/hamilton/plugins/jupyter_magic.py +++ b/hamilton/plugins/jupyter_magic.py @@ -2,13 +2,14 @@ import ast import json import os +from collections import defaultdict from pathlib import Path -from typing import Any, Dict, List, Set, Tuple +from typing import Any, Dict, List, Set, Tuple, Union from IPython.core.magic import Magics, cell_magic, line_magic, magics_class from IPython.core.magic_arguments import argument, magic_arguments, parse_argstring from IPython.core.shellapp import InteractiveShellApp -from IPython.display import HTML, display +from IPython.display import HTML, Code, display from IPython.utils.process import arg_split from hamilton import ad_hoc_utils, driver @@ -150,7 +151,7 @@ def parse_known_argstring(magic_func, argstring) -> Tuple[argparse.Namespace, Li return known, unknown -def parse_config(config_string): +def parse_key_value_config(config_string): config = {} for item in config_string.split(): key, value = item.split("=") @@ -166,6 +167,7 @@ def __init__(self, **kwargs): super().__init__(**kwargs) self.builder = None self.notebook_env = determine_notebook_type() + self.incremental_cells_state = defaultdict(dict) def resolve_unknown_args_cell_to_module(self, unknown: List[str]): """Handle unknown arguments. It won't make the magic execution fail.""" @@ -187,6 +189,28 @@ def resolve_unknown_args_cell_to_module(self, unknown: List[str]): if any(arg in ("-h", "--help") for arg in unknown): print(help(self.cell_to_module)) + def resolve_config_arg(self, config_arg) -> Union[bool, dict]: + # default case: didn't receive `-c/--config`. Set an empty dict + if config_arg is None: + config = {} + # case 1, 2, 3: `-c/--config` is specified + # case 1: -c/--config refers to variable in the user namespace + elif self.shell.user_ns.get(config_arg): + config = self.shell.user_ns.get(config_arg) + # case 2: parse using key=value + elif "=" in config_arg: + config = parse_key_value_config(config_arg) + # case 3: parse as JSON + elif ":" in config_arg: + try: + # strip quotation marks added by IPython and avoid mutating `args` + config_str = config_arg.strip("'\"") + config = json.loads(config_str) + except json.JSONDecodeError: + print(f"JSONDecodeError: Failed to parse `config` as JSON. Received {config_arg}") + return False + return config + @magic_arguments() # needed on top to enable parsing @argument("module_name", nargs="?", help="Name for the module defined in this cell.") @argument( @@ -269,43 +293,25 @@ def B(A: int) -> bool: # validate variables exist in the user namespace expect `config` because it's a special case # will exit using `return` in case of error - args_that_read_user_namespace = ["display", "builder", "inputs", "overrides"] + args_that_read_user_namespace = ["display", "builder", "final_vars", "inputs", "overrides"] for name, value in vars(args).items(): if name not in args_that_read_user_namespace: continue - # special case: `display` can be passed as a flag (=True), without a config - if name == "display" and value is True: + # special case: args that can be passed as a flag (=True) without values + if name in ["display", "final_vars"] and value is True: continue # main case: exit if variable is not in user namespace if value and self.shell.user_ns.get(value) is None: return f"KeyError: Received `--{name} {value}` but variable not found." - # special case: `config` expects potentially a JSON string - # there for backwards compatibility + # parse config; exit if config is invalid + config = self.resolve_config_arg(args.config) + if config is False: + return - # default case: didn't receive `-c/--config`. Set an empty dict - if args.config is None: - config = {} - # case 1, 2, 3: `-c/--config` is specified - # case 1: -c/--config refers to variable in the user namespace - elif self.shell.user_ns.get(args.config): - config = self.shell.user_ns.get(args.config) - # case 2: parse using key=value - elif "=" in args.config: - config = parse_config(args.config) - # case 3: parse as JSON - elif ":" in args.config: - try: - # strip quotation marks added by IPython and avoid mutating `args` - config_str = args.config.strip("'\"") - config = json.loads(config_str) - except json.JSONDecodeError: - print(f"JSONDecodeError: Failed to parse `config` as JSON. Received {value}") - return - - # get the default values of args + # resolve the values of args module_name = args.module_name base_builder = self.shell.user_ns[args.builder] if args.builder else driver.Builder() inputs = self.shell.user_ns[args.inputs] if args.inputs else {} @@ -314,6 +320,11 @@ def B(A: int) -> bool: self.shell.user_ns[args.display] if args.display not in [True, None] else {} ) + # determine the Driver config + # can't check from args.builder because it might be None + if config and base_builder.config: + return "AssertionError: Received a config -c/--config and a Builder -b/--builder with an existing config. Pass either one." + # Decision: write to file before trying to build and execute Driver # See argument `help` for behavior details if args.write_to_file: @@ -328,10 +339,6 @@ def B(A: int) -> bool: cell_module = ad_hoc_utils.create_module(cell, module_name) self.shell.push({module_name: cell_module}) - # determine the Driver config - if config and base_builder.config: - return "AssertionError: Received a config -c/--config and a Builder -b/--builder with an existing config. Pass either one." - # build the Driver. the Builder is copied to avoid conflict with the user namespace builder = base_builder.copy() dr = builder.with_config(config).with_modules(cell_module).build() @@ -382,6 +389,152 @@ def B(A: int) -> bool: # results will follow the order of `final_vars` or topologically sorted if all vars display(*(results[n] for n in final_vars)) + # TODO unify the API and logic of `%%cell_to_module` and `%%incr_cell_to_module` + @magic_arguments() + @argument("module_name", nargs="?", help="Name for the module defined in this cell.") + @argument( + "-id", + "--identifier", + type=int, + help="Identifier for this cell w.r.t. the module being created. ", + ) # required argument + @argument( + "-c", + "--config", + help="Config to build a Driver. Passing -c/--config at the same time as a Builder -b/--builder with a config will raise an exception.", + ) + @argument( + "-b", + "--builder", + help="Builder to which the module will be added and used for execution. Allows to pass Config and Adapters", + ) + @argument( + "-d", + "--display", + nargs="?", + const=True, + help="Display the dataflow. The argument is the variable name of a dictionary of visualization kwargs; else {}.", + ) + @argument( + "-w", + "--write_to_file", + nargs="?", + const=True, + help="Write cell content to a file. The argument is the file path; else write to {module_name}.py", + ) + @cell_magic + def incr_cell_to_module(self, line, cell): + """Incrementally build a module. This executes the cell and dynamically creates a Python module from its content. + A Hamilton Driver is automatically instantiated with that module for variable `{MODULE_NAME}_dr`. + + > %%incr_cell_to_module -m MODULE_NAME -i IDENTIFIER --display + """ + # This function mimics the logic of `.cell_to_module()`. Find more comments there. + # Start by trying to execute the code cell. + self.shell.ex(cell) + + # parse user inputs + args, unknown_args = parse_known_argstring(self.incr_cell_to_module, line) + self.resolve_unknown_args_cell_to_module(unknown_args) + + # check user inputs pointing to variables in user namespace + args_that_read_user_namespace = ["display", "builder"] + for name, value in vars(args).items(): + if name not in args_that_read_user_namespace: + continue + + # special case: `display` can be passed as a flag (=True), without a config + if name in ["display"] and value is True: + continue + + # main case: exit if variable is not in user namespace + if value and self.shell.user_ns.get(value) is None: + return f"KeyError: Received `--{name} {value}` but variable not found." + + # TODO convert -i to -id + if args.identifier is None: + raise ValueError("`-id/--identifier` is required. Please provide an id for this cell.") + + # parse config; exit if config is invalid + config = self.resolve_config_arg(args.config) + if config is False: + return + + # set parsed arguments + module_name = args.module_name + base_builder = self.shell.user_ns[args.builder] if args.builder else driver.Builder() + display_config = ( + self.shell.user_ns[args.display] if args.display not in [True, None] else {} + ) + + # determine the Driver config + # can't check from args.builder because it might be None + if config and base_builder.config: + return "AssertionError: Received a config -c/--config and a Builder -b/--builder with an existing config. Pass either one." + + # store current cell in state + self.incremental_cells_state[module_name][args.identifier] = cell + + # build module source from multiple cells + module_dict = self.incremental_cells_state[module_name] + sorted_module_keys = sorted(list(module_dict[module_name].keys())) + module_source = "\n\n".join([module_dict[k] for k in sorted_module_keys]) + multi_cell_module = ad_hoc_utils.create_module(module_source, module_name) + self.shell.push({module_name: multi_cell_module}) + + # Decision: write to file before trying to build and execute Driver + # See argument `help` for behavior details + if args.write_to_file: + if isinstance(args.write_to_file, str): + file_path = Path(args.write_to_file) + else: + file_path = Path(f"{module_name}.py") + file_path.write_text(module_source) + + # build Driver + builder = base_builder.copy() + dr = builder.with_config(config).with_modules(multi_cell_module).build() + + # visualize + if args.display: + # try/except `display_config` or inputs/overrides may be invalid + try: + dot = dr.display_all_functions(**display_config) + except Exception as e: + print(f"Failed to display {e}.\n\nThe display config was: {display_config}") + dot = dr.display_all_functions() + + # handle output environment + if self.notebook_env == "databricks": + display_in_databricks(dot) + else: + display(dot) + + @magic_arguments() # needed on top to enable parsing + @argument("module_name", help="Module name to print.") # required argument + @line_magic + def print_module(self, line): + """Prints the contents of a dynamic module we've been creating.""" + args = parse_argstring(self.incr_cell_to_module, line) + if args.module_name in self.incremental_cells_state: + module_dict = self.incremental_cells_state[args.module_name] + sorted_module_keys = sorted(list(module_dict[args.module_name].keys())) + module_source = "\n\n".join([module_dict[k] for k in sorted_module_keys]) + display(Code(module_source)) + else: + print(f"KeyError: `{args.module_name}` not found.") + + @magic_arguments() # needed on top to enable parsing + @argument("module_name", help="Module to print.") # required argument + @line_magic + def reset_module(self, line): + args = parse_argstring(self.incr_cell_to_module, line) + if args.module_name in self.incremental_cells_state: + del self.incremental_cells_state[args.module_name] + print(f"Reset `{args.module_name}`") + else: + print(f"KeyError: `{args.module_name}` not found.") + @magic_arguments() @argument("name", type=str, help="Creates a dictionary fromt the cell's content.") @cell_magic From 4a2a69f1fb1423f75342df1a72777b1466f15b75 Mon Sep 17 00:00:00 2001 From: Thierry Jean <68975210+zilto@users.noreply.github.com> Date: Sat, 25 May 2024 10:09:30 -0400 Subject: [PATCH 3/7] Update hamilton/plugins/jupyter_magic.py Co-authored-by: Stefan Krawczyk --- hamilton/plugins/jupyter_magic.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/hamilton/plugins/jupyter_magic.py b/hamilton/plugins/jupyter_magic.py index f8ee0043f..5f34b3e99 100644 --- a/hamilton/plugins/jupyter_magic.py +++ b/hamilton/plugins/jupyter_magic.py @@ -427,7 +427,7 @@ def incr_cell_to_module(self, line, cell): """Incrementally build a module. This executes the cell and dynamically creates a Python module from its content. A Hamilton Driver is automatically instantiated with that module for variable `{MODULE_NAME}_dr`. - > %%incr_cell_to_module -m MODULE_NAME -i IDENTIFIER --display + > %%incr_cell_to_module MODULE_NAME -i IDENTIFIER --display """ # This function mimics the logic of `.cell_to_module()`. Find more comments there. # Start by trying to execute the code cell. From 99a3c23fb9e074a155b68edd8622824f639b718c Mon Sep 17 00:00:00 2001 From: zilto Date: Tue, 28 May 2024 13:34:51 -0400 Subject: [PATCH 4/7] fixed dynamic module registration leading to a bug for function_modifiers.macros --- hamilton/ad_hoc_utils.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/hamilton/ad_hoc_utils.py b/hamilton/ad_hoc_utils.py index a0b85b933..e5a9591f5 100644 --- a/hamilton/ad_hoc_utils.py +++ b/hamilton/ad_hoc_utils.py @@ -101,10 +101,10 @@ def create_module(source: str, module_name: str = None, verbosity: int = 0) -> M # Load the module from the temporary file spec = importlib.util.spec_from_file_location(module_name, module_path) module_object = importlib.util.module_from_spec(spec) - spec.loader.exec_module(module_object) # Register the module in sys.modules sys.modules[module_name] = module_object + spec.loader.exec_module(module_object) # Clean up the temporary file on interpreter shutdown def cleanup(module_path=module_path): From b35d96ae2a7721d922077689a0b8c8a78d5eed5a Mon Sep 17 00:00:00 2001 From: zilto Date: Tue, 28 May 2024 13:35:34 -0400 Subject: [PATCH 5/7] added try/except over dynamic module creation --- hamilton/plugins/jupyter_magic.py | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/hamilton/plugins/jupyter_magic.py b/hamilton/plugins/jupyter_magic.py index f8ee0043f..a3aefb539 100644 --- a/hamilton/plugins/jupyter_magic.py +++ b/hamilton/plugins/jupyter_magic.py @@ -336,7 +336,11 @@ def B(A: int) -> bool: # create_module() is preferred over module_from_source() to simplify # the integration with the Hamilton UI which assumes physical Python modules - cell_module = ad_hoc_utils.create_module(cell, module_name) + try: + cell_module = ad_hoc_utils.create_module(cell, module_name) + except BaseException as e: + print("Failed to build the module. Stack trace:") + raise e self.shell.push({module_name: cell_module}) # build the Driver. the Builder is copied to avoid conflict with the user namespace From 8d115e207aff8b3979b265aeb7581ee4bc53aebf Mon Sep 17 00:00:00 2001 From: zilto Date: Tue, 28 May 2024 14:25:37 -0400 Subject: [PATCH 6/7] -i/--identifier of incr_cell_to_module is now a required positional arg --- hamilton/plugins/jupyter_magic.py | 9 +++------ 1 file changed, 3 insertions(+), 6 deletions(-) diff --git a/hamilton/plugins/jupyter_magic.py b/hamilton/plugins/jupyter_magic.py index a3aefb539..68ec4c194 100644 --- a/hamilton/plugins/jupyter_magic.py +++ b/hamilton/plugins/jupyter_magic.py @@ -397,11 +397,8 @@ def B(A: int) -> bool: @magic_arguments() @argument("module_name", nargs="?", help="Name for the module defined in this cell.") @argument( - "-id", - "--identifier", - type=int, - help="Identifier for this cell w.r.t. the module being created. ", - ) # required argument + "identifier", type=int, help="Identifier for this cell w.r.t. the module being created." + ) @argument( "-c", "--config", @@ -481,7 +478,7 @@ def incr_cell_to_module(self, line, cell): # build module source from multiple cells module_dict = self.incremental_cells_state[module_name] - sorted_module_keys = sorted(list(module_dict[module_name].keys())) + sorted_module_keys = sorted(list(module_dict.keys())) module_source = "\n\n".join([module_dict[k] for k in sorted_module_keys]) multi_cell_module = ad_hoc_utils.create_module(module_source, module_name) self.shell.push({module_name: multi_cell_module}) From bbf490f310b7051dd934c4f4a79656e14e530d5c Mon Sep 17 00:00:00 2001 From: zilto Date: Tue, 28 May 2024 14:27:14 -0400 Subject: [PATCH 7/7] updated existing examples/ notebooks --- .../LLM_Workflows/NER_Example/notebook.ipynb | 14 +- .../simple_pipeline.ipynb | 20 +- examples/jupyter_notebook_magic/example.ipynb | 2324 +++++++++++++---- .../jupyter_notebook_magic/tutorial.ipynb | 2093 --------------- 4 files changed, 1884 insertions(+), 2567 deletions(-) delete mode 100644 examples/jupyter_notebook_magic/tutorial.ipynb diff --git a/examples/LLM_Workflows/NER_Example/notebook.ipynb b/examples/LLM_Workflows/NER_Example/notebook.ipynb index 79d202089..f71f65d92 100644 --- a/examples/LLM_Workflows/NER_Example/notebook.ipynb +++ b/examples/LLM_Workflows/NER_Example/notebook.ipynb @@ -190,7 +190,7 @@ } ], "source": [ - "%%incr_cell_to_module ner_module -i 1 --display\n", + "%%incr_cell_to_module ner_module 1 --display\n", "\n", "from datasets import Dataset\n", "from hamilton.function_modifiers import load_from, save_to, source, value\n", @@ -419,7 +419,7 @@ } ], "source": [ - "%%incr_cell_to_module ner_module -i 2 --display\n", + "%%incr_cell_to_module ner_module 2 --display\n", "\n", "import torch\n", "from transformers import (\n", @@ -505,7 +505,7 @@ "source": [ "# this is what the NER pipeline produces\n", "text = \"The Mars Rover from NASA reached the red planet yesterday.\"\n", - "ner_pipeline(model(NER_model_id()), tokenizer(NER_model_id()), \"cpu\")([text])" + "ner_module.ner_pipeline(model(NER_model_id()), tokenizer(NER_model_id()), \"cpu\")([text])" ] }, { @@ -720,7 +720,7 @@ } ], "source": [ - "%%incr_cell_to_module ner_module -i 3 --display\n", + "%%incr_cell_to_module ner_module 3 --display\n", "from sentence_transformers import SentenceTransformer\n", "\n", "def retriever(\n", @@ -767,7 +767,7 @@ ], "source": [ "# what the embedding model produces -- just show first 10 numbers\n", - "retriever(\"cpu\").encode([\"this is some text\"])[0][0:10]" + "ner_module.retriever(\"cpu\").encode([\"this is some text\"])[0][0:10]" ] }, { @@ -1060,7 +1060,7 @@ } ], "source": [ - "%%incr_cell_to_module ner_module -i 4 --display\n", + "%%incr_cell_to_module ner_module 4 --display\n", "from datasets.formatting.formatting import LazyBatch\n", "from typing import Union\n", "\n", @@ -1937,7 +1937,7 @@ } ], "source": [ - "%%incr_cell_to_module ner_module -i 5 --display \n", + "%%incr_cell_to_module ner_module 5 --display \n", "\n", "import lancedb\n", "import numpy as np\n", diff --git a/examples/LLM_Workflows/RAG_document_extract_chunk_embed/simple_pipeline.ipynb b/examples/LLM_Workflows/RAG_document_extract_chunk_embed/simple_pipeline.ipynb index 8f39f04dc..c57e99a8d 100644 --- a/examples/LLM_Workflows/RAG_document_extract_chunk_embed/simple_pipeline.ipynb +++ b/examples/LLM_Workflows/RAG_document_extract_chunk_embed/simple_pipeline.ipynb @@ -131,7 +131,7 @@ } ], "source": [ - "%%incr_cell_to_module doc_pipeline -i 1 --display\n", + "%%incr_cell_to_module doc_pipeline 1 --display\n", "\n", "from typing import NamedTuple, Optional\n", "\n", @@ -266,7 +266,7 @@ } ], "source": [ - "%%incr_cell_to_module doc_pipeline -i 2 --display\n", + "%%incr_cell_to_module doc_pipeline 2 --display\n", "import requests \n", "import re\n", "import uuid\n", @@ -320,7 +320,7 @@ "# print(requests.get(\"https://hamilton.dagworks.io/en/latest/concepts/best-practices/code-organization/\").text)\n", "# we can test that this works by running the functions:\n", "url = \"https://hamilton.dagworks.io/en/latest/concepts/best-practices/code-organization/\"\n", - "raw_document(url, html_regex())" + "doc_pipeline.raw_document(url, doc_pipeline.html_regex())" ] }, { @@ -480,7 +480,7 @@ } ], "source": [ - "%%incr_cell_to_module doc_pipeline -i 3 --display\n", + "%%incr_cell_to_module doc_pipeline 3 --display\n", "\n", "from langchain import text_splitter\n", "\n", @@ -548,7 +548,7 @@ ], "source": [ "# example what the HTML chunker is doing:\n", - "html_chunker().split_text(\"

title

some text

some more text

subsection1

section text
more text

\")" + "doc_pipeline.html_chunker().split_text(\"

title

some text

some more text

subsection1

section text
more text

\")" ] }, { @@ -579,7 +579,7 @@ ], "source": [ "# example what the text chunker is doing\n", - "text_chunker(5, 0).split_text(\"this is some text\")" + "doc_pipeline.text_chunker(5, 0).split_text(\"this is some text\")" ] }, { @@ -771,7 +771,7 @@ } ], "source": [ - "%%incr_cell_to_module doc_pipeline -i 4 --display\n", + "%%incr_cell_to_module doc_pipeline 4 --display\n", "import openai\n", "\n", "def client() -> openai.OpenAI:\n", @@ -1813,7 +1813,7 @@ ], "source": [ "# example\n", - "client().embeddings.create(input=\"this is some text that will change into a vector\", model=\"text-embedding-3-small\").data[0].embedding" + "doc_pipeline.client().embeddings.create(input=\"this is some text that will change into a vector\", model=\"text-embedding-3-small\").data[0].embedding" ] }, { @@ -2011,7 +2011,7 @@ } ], "source": [ - "%%incr_cell_to_module doc_pipeline -i 5 --display\n", + "%%incr_cell_to_module doc_pipeline 5 --display\n", "import pandas as pd\n", "\n", "def store(\n", @@ -3051,7 +3051,7 @@ "metadata": {}, "outputs": [], "source": [ - "%%incr_cell_to_module parallel_pipeline -i 1 \n", + "%%cell_to_module parallel_pipeline\n", "# we create a new module called parallel_pipeline\n", "from hamilton.htypes import Collect, Parallelizable\n", "import pandas as pd\n", diff --git a/examples/jupyter_notebook_magic/example.ipynb b/examples/jupyter_notebook_magic/example.ipynb index b1150fb87..602a795e0 100644 --- a/examples/jupyter_notebook_magic/example.ipynb +++ b/examples/jupyter_notebook_magic/example.ipynb @@ -1,106 +1,157 @@ { "cells": [ { - "cell_type": "code", - "execution_count": 13, - "id": "initial_id", - "metadata": { - "ExecuteTime": { - "start_time": "2024-02-13T06:33:30.183459Z" - }, - "is_executing": true - }, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "The hamilton.plugins.jupyter_magic extension is already loaded. To reload it, use:\n", - " %reload_ext hamilton.plugins.jupyter_magic\n" - ] - } - ], + "cell_type": "markdown", + "id": "cff304da", + "metadata": {}, + "source": [ + "# Hamilton notebook extension\n", + "Jupyter magics are commands that can be executed in notebooks using `%` and `%%` in code cells.\n", + "- **Line magics** start with `%` and apply to the current line\n", + "- **Cell magics** start with `%%`, need to be the first line of a cell, and apply to the entire cell.\n", + "\n", + " You can think of them as Python decorators for lines and cells.\n", + "\n", + "> For example, `%timeit complex_function()` will return the time to execute `complex_function()` and adding `%%timeit` will return the time to execute the entire cell.\n", + "\n", + "This notebook is a tutorial on the Hamilton Jupyter magics and how they can improve your interactive development experience. It is meant to be read and have all cells executed linearly.\n", + "\n", + "- **Section 2** - Dataflow definition\n", + "- **Section 3** - Dataflow execution\n", + "\n", + "> ⚠ This notebook extension is something we're actively developing. If you find any bugs, edge cases, performance impacts, or if you have feature requests, let us know." + ] + }, + { + "cell_type": "markdown", + "id": "245fa568", + "metadata": {}, "source": [ - "# load some extensions / magic...\n", - "%load_ext hamilton.plugins.jupyter_magic" + "## 1. Loading the extension" ] }, { - "cell_type": "code", - "execution_count": 2, - "id": "f829eb1d88585ff", - "metadata": { - "ExecuteTime": { - "end_time": "2024-02-13T06:29:15.104283Z", - "start_time": "2024-02-13T06:29:15.098328Z" - }, - "collapsed": false - }, - "outputs": [], + "cell_type": "markdown", + "id": "bddc7450", + "metadata": {}, "source": [ - "# import hamilton modules\n", - "from hamilton import driver\n", - "from hamilton import lifecycle" + "To load our Jupyter Magic, we use `%load_ext` with the import path for the Python module (as if you did `import ...`). You only need to load it once, and will need to reload it if you restart the kernel just like you would for a Python module." ] }, { "cell_type": "code", - "execution_count": 7, - "id": "f84593b0496cadd1", - "metadata": { - "collapsed": false, - "ExecuteTime": { - "end_time": "2024-02-13T18:14:06.109282Z", - "start_time": "2024-02-13T18:14:06.087426Z" - } - }, + "execution_count": 1, + "id": "302defaa", + "metadata": {}, "outputs": [], "source": [ - "?%%cell_to_module \n", - "# one way to show usage notes" + "%reload_ext hamilton.plugins.jupyter_magic\n", + "from hamilton import driver # we'll need this later" + ] + }, + { + "cell_type": "markdown", + "id": "bb28d555", + "metadata": {}, + "source": [ + "After loading the extension, Hamilton magics become available:\n", + "- `%%cell_to_module`\n", + "- `%module_to_cell`\n", + "\n", + "This notebook will cover them one-by-one, but if you need a quick refresher you can prepend `?` to get help." ] }, { "cell_type": "code", + "execution_count": 2, + "id": "f4a9d444", + "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ - "Help for %%cell_to_module magic:\n", - " -m, --module_name: Module name to provide. Default is jupyter_module.\n", - " -c, --config: JSON config string, or variable name containing config to use.\n", - " -r, --rebuild-drivers: Flag to rebuild drivers.\n", - " -d, --display: Flag to visualize dataflow.\n", - " -v, --verbosity: of standard output. 0 to hide. 1 is normal, default.\n" + "\u001b[0;31mDocstring:\u001b[0m\n", + "::\n", + "\n", + " %cell_to_module [-m [MODULE_NAME]] [-d [DISPLAY]] [-x [EXECUTE]]\n", + " [-b BUILDER] [-c CONFIG] [-i INPUTS] [-o OVERRIDES]\n", + " [--hide_results] [-w [WRITE_TO_FILE]]\n", + " [module_name]\n", + "\n", + "Turn a notebook cell into a Hamilton module definition. This allows you to define\n", + "and execute a dataflow from a single cell.\n", + "\n", + "For example:\n", + "```\n", + "%%cell_to_module dataflow --display --execute\n", + "def A() -> int:\n", + " return 37\n", + "\n", + "def B(A: int) -> bool:\n", + " return (A % 3) > 2\n", + "```\n", + "\n", + "positional arguments:\n", + " module_name Name for the module defined in this cell.\n", + "\n", + "options:\n", + " -m <[MODULE_NAME]>, --module_name <[MODULE_NAME]>\n", + " Alias for positional argument `module_name`. There for\n", + " backwards compatibility. Prefer the position arg.\n", + " -d <[DISPLAY]>, --display <[DISPLAY]>\n", + " Display the dataflow. The argument is the variable\n", + " name of a dictionary of visualization kwargs; else {}.\n", + " -x <[EXECUTE]>, --execute <[EXECUTE]>\n", + " Execute the dataflow. The argument is the variable\n", + " name of a list of nodes; else execute all nodes.\n", + " -b BUILDER, --builder BUILDER\n", + " Builder to which the module will be added and used for\n", + " execution. Allows to pass Config and Adapters\n", + " -c CONFIG, --config CONFIG\n", + " Config to build a Driver. Passing -c/--config at the\n", + " same time as a Builder -b/--builder with a config will\n", + " raise an exception.\n", + " -i INPUTS, --inputs INPUTS\n", + " Execution inputs. The argument is the variable name of\n", + " a dict of inputs; else {}.\n", + " -o OVERRIDES, --overrides OVERRIDES\n", + " Execution overrides. The argument is the variable name\n", + " of a dict of overrides; else {}.\n", + " --hide_results Hides the automatic display of execution results.\n", + " -w <[WRITE_TO_FILE]>, --write_to_file <[WRITE_TO_FILE]>\n", + " Write cell content to a file. The argument is the file\n", + " path; else write to {module_name}.py\n", + "\u001b[0;31mFile:\u001b[0m ~/projects/dagworks/hamilton/hamilton/plugins/jupyter_magic.py" ] } ], "source": [ - "%%cell_to_module --help \n", - "# shows --help message" - ], - "metadata": { - "collapsed": false, - "ExecuteTime": { - "end_time": "2024-02-13T18:14:07.660691Z", - "start_time": "2024-02-13T18:14:07.644540Z" - } - }, - "id": "a7907aac424f56f4", - "execution_count": 8 + "?%%cell_to_module" + ] + }, + { + "cell_type": "markdown", + "id": "8db7e808", + "metadata": {}, + "source": [ + "## 2. Define a Hamilton dataflow" + ] + }, + { + "cell_type": "markdown", + "id": "6d310207", + "metadata": {}, + "source": [ + "### 2.1 Basics\n", + "The main magic is `%%cell_to_module MODULE_NAME` which turns a cell into a temporary Python module in-memory. Successful cell execution means it's a valid Hamilton dataflow." + ] }, { "cell_type": "code", - "execution_count": 4, - "id": "f76707690893f061", - "metadata": { - "ExecuteTime": { - "end_time": "2024-02-13T06:29:19.482875Z", - "start_time": "2024-02-13T06:29:17.737719Z" - }, - "collapsed": false - }, + "execution_count": 3, + "id": "a2f33575", + "metadata": {}, "outputs": [ { "data": { @@ -108,266 +159,217 @@ "\n", "\n", - "\n", - "\n", - "\n", - "\n", - "\n", + "\n", + "\n", + "\n", + "%3\n", + "\n", "\n", "cluster__legend\n", - "\n", - "Legend\n", - "\n", - "\n", - "\n", - "joke_messages\n", - "\n", - "joke_messages\n", - "list\n", - "\n", - "\n", - "\n", - "joke_response\n", - "\n", - "joke_response\n", - "str\n", - "\n", - "\n", - "\n", - "joke_messages->joke_response\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "llm_client\n", - "\n", - "llm_client\n", - "OpenAI\n", - "\n", - "\n", - "\n", - "llm_client->joke_response\n", - "\n", - "\n", + "\n", + "Legend\n", "\n", "\n", - "\n", + "\n", "joke_prompt\n", - "\n", - "joke_prompt\n", - "str\n", - "\n", - "\n", - "\n", - "joke_prompt->joke_messages\n", - "\n", - "\n", + "\n", + "joke_prompt\n", + "str\n", "\n", "\n", - "\n", + "\n", "_joke_prompt_inputs\n", - "\n", - "topic\n", - "str\n", + "\n", + "topic\n", + "str\n", "\n", "\n", - "\n", + "\n", "_joke_prompt_inputs->joke_prompt\n", - "\n", - "\n", + "\n", + "\n", "\n", "\n", - "\n", + "\n", "input\n", - "\n", - "input\n", + "\n", + "input\n", "\n", "\n", - "\n", + "\n", "function\n", - "\n", - "function\n", + "\n", + "function\n", "\n", "\n", "\n" ], "text/plain": [ - "" + "" ] }, - "execution_count": 4, "metadata": {}, - "output_type": "execute_result" + "output_type": "display_data" } ], "source": [ - "%%cell_to_module -m joke --display --rebuild-drivers \n", - "# The above directive does three things: \n", - "# 1. it creates a module with the contents of this cell\n", - "# 2. it imports the module under the name `joke`.\n", - "# 3. it displays the contents of the module.\n", - "# 4. if changes are made to the module, it will be reloaded automatically. If you constructed a driver with this module, you won't need to re-create it (in most cases).\n", - "# 5. if there is configuration passed, it will be used to help display the module.\n", - "# %%write_file joke.py\n", - "# Once you are happy with your code, you can write it to a file using the `write_file` magic command, thereby creating a module that can be imported elsewhere.\n", - "from typing import List\n", - "\n", - "import openai\n", - "\n", - "\n", - "def llm_client() -> openai.OpenAI:\n", - " return openai.OpenAI()\n", - "\n", - "\n", + "%%cell_to_module joke -d\n", "def joke_prompt(topic: str) -> str:\n", - " return f\"Tell me a short joke about {topic}\"\n", - "\n", - "\n", - "def joke_messages(joke_prompt: str) -> List[dict]:\n", - " return [{\"role\": \"user\", \"content\": joke_prompt}]\n", - "\n", - "\n", - "def joke_response(llm_client: openai.OpenAI,\n", - " joke_messages: List[dict]) -> str:\n", - " response = llm_client.chat.completions.create(\n", - " model=\"gpt-3.5-turbo\",\n", - " messages=joke_messages,\n", - " )\n", - " return response.choices[0].message.content" + " return f\"Tell me a short joke about {topic}\"" ] }, { - "cell_type": "code", - "execution_count": 5, - "id": "42b6eb0c9ac397bc", - "metadata": { - "ExecuteTime": { - "end_time": "2024-02-13T06:29:19.517474Z", - "start_time": "2024-02-13T06:29:19.502194Z" - }, - "collapsed": false - }, - "outputs": [], + "cell_type": "markdown", + "id": "54e60f6b", + "metadata": {}, "source": [ - "# create a driver --- this will be auto rebuilt (if you turn that on)\n", - "dr = (\n", - " driver.Builder()\n", - " .with_modules(joke)\n", - " .with_config({\"dummy\": \"config\"})\n", - " .build()\n", - ")" + "The module name allows to namespace functions " ] }, { "cell_type": "code", - "execution_count": 6, - "id": "e372203e0f3ab1b6", - "metadata": { - "ExecuteTime": { - "end_time": "2024-02-13T06:29:20.866327Z", - "start_time": "2024-02-13T06:29:19.569013Z" - }, - "collapsed": false - }, + "execution_count": 4, + "id": "b9676cc6", + "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ - "{'joke_response': 'Sure, here it is:\\n\\nWhy did the ice cream go to therapy?\\nBecause it had too many scoops of emotions!'}\n" + "Tell me a short joke about hello\n", + "Tell me a short joke about greetings\n" ] } ], "source": [ - "print(dr.execute([\"joke_response\"],\n", - " inputs={\"topic\": \"ice cream\"}))" + "print(joke.joke_prompt(topic=\"hello\"))\n", + "print(joke_prompt(topic=\"greetings\"))" + ] + }, + { + "cell_type": "markdown", + "id": "15977369", + "metadata": {}, + "source": [ + "### 2.2 Module imports\n", + "Code found in cells with `%%cell_to_module` is treated like an isolated `.py` file. This means you need to define Python imports in the cell itself. " ] }, { "cell_type": "code", - "execution_count": 7, - "id": "5d20afdf40d37441", - "metadata": { - "ExecuteTime": { - "end_time": "2024-02-13T06:29:20.897770Z", - "start_time": "2024-02-13T06:29:20.886666Z" - }, - "collapsed": false - }, + "execution_count": 5, + "id": "aaf63f03", + "metadata": {}, "outputs": [], "source": [ - "dr2 = (\n", - " driver.Builder()\n", - " .with_modules(joke)\n", - " .with_adapters(lifecycle.PrintLn()) # this driver will not be rebuilt because of the adapter.\n", - " .build()\n", - ")" + "%%cell_to_module joke\n", + "from typing import Optional # remove to get `NameError: name 'Optional' is not defined``\n", + "\n", + "def joke_prompt(topic: Optional[str] = None) -> str:\n", + " return f\"Tell me a short joke about {topic}\"" + ] + }, + { + "cell_type": "markdown", + "id": "b9e3f0a0", + "metadata": {}, + "source": [ + "### 2.3 Display module\n", + "You can visualize with the `--display / -d` argument. It can receive a dictionary of [visualization `kwargs`](https://hamilton.dagworks.io/en/latest/reference/drivers/Driver/#hamilton.driver.Driver.display_all_functions)." ] }, { "cell_type": "code", - "execution_count": 8, - "id": "fcedb47a8a4c1792", - "metadata": { - "ExecuteTime": { - "end_time": "2024-02-13T06:29:22.187332Z", - "start_time": "2024-02-13T06:29:21.011468Z" - }, - "collapsed": false - }, + "execution_count": 6, + "id": "378a4552", + "metadata": {}, "outputs": [ { - "name": "stdout", - "output_type": "stream", - "text": [ - "Executing node: llm_client.\n", - "Finished debugging node: llm_client in 9.78ms. Status: Success.\n", - "Executing node: joke_prompt.\n", - "Finished debugging node: joke_prompt in 15.7μs. Status: Success.\n", - "Executing node: joke_messages.\n", - "Finished debugging node: joke_messages in 11μs. Status: Success.\n", - "Executing node: joke_response.\n", - "Finished debugging node: joke_response in 1.25s. Status: Success.\n", - "{'joke_response': 'Why did the corn chips go to therapy?\\n\\nBecause they were feeling a little \"salty\" about always being \"dipped\" in salsa without any appreciation!'}\n" - ] + "data": { + "image/svg+xml": [ + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "%3\n", + "\n", + "\n", + "cluster__legend\n", + "\n", + "Legend\n", + "\n", + "\n", + "\n", + "joke_prompt\n", + "\n", + "joke_prompt\n", + "str\n", + "\n", + "\n", + "\n", + "_joke_prompt_inputs\n", + "\n", + "topic\n", + "str\n", + "\n", + "\n", + "\n", + "_joke_prompt_inputs->joke_prompt\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "input\n", + "\n", + "input\n", + "\n", + "\n", + "\n", + "function\n", + "\n", + "function\n", + "\n", + "\n", + "\n" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" } ], "source": [ - "print(dr2.execute([\"joke_response\"],\n", - " inputs={\"topic\": \"corn chips\"}))" + "%%cell_to_module joke --display\n", + "def joke_prompt(topic: str) -> str:\n", + " return f\"Tell me a short joke about {topic}\"" ] }, { "cell_type": "code", - "execution_count": 9, - "id": "cfd9cbe7f9630ade", - "metadata": { - "ExecuteTime": { - "end_time": "2024-02-13T06:29:22.195026Z", - "start_time": "2024-02-13T06:29:22.187770Z" - }, - "collapsed": false - }, + "execution_count": 7, + "id": "94772b5a", + "metadata": {}, "outputs": [], "source": [ - "# some configuration \n", - "conf = {\"some_key\":\"some_value\"}" + "display_config = dict(orient=\"TB\") # orient visualization top to bottom" ] }, { "cell_type": "code", - "execution_count": 10, - "id": "f7c21d5eaa95cc0a", - "metadata": { - "ExecuteTime": { - "end_time": "2024-02-13T06:29:22.701173Z", - "start_time": "2024-02-13T06:29:22.399709Z" - }, - "collapsed": false - }, + "execution_count": 8, + "id": "af95a65e", + "metadata": {}, "outputs": [ { "data": { @@ -375,66 +377,103 @@ "\n", "\n", - "\n", - "\n", - "\n", - "\n", - "\n", + "\n", + "\n", + "\n", + "%3\n", + "\n", "\n", "cluster__legend\n", - "\n", - "Legend\n", + "\n", + "Legend\n", "\n", - "\n", + "\n", "\n", - "some_key\n", - "\n", - "\n", - "\n", - "some_key\n", - "typing.Any\n", - "\n", - "\n", - "\n", - "world\n", - "\n", - "world\n", - "str\n", + "joke_prompt\n", + "\n", + "joke_prompt\n", + "str\n", "\n", - "\n", - "\n", - "hello\n", - "\n", - "hello\n", - "str\n", + "\n", + "\n", + "_joke_prompt_inputs\n", + "\n", + "topic\n", + "str\n", "\n", - "\n", + "\n", "\n", - "hello->world\n", - "\n", - "\n", + "_joke_prompt_inputs->joke_prompt\n", + "\n", + "\n", "\n", - "\n", - "\n", - "config\n", - "\n", - "\n", - "\n", - "config\n", + "\n", + "\n", + "input\n", + "\n", + "input\n", "\n", "\n", - "\n", + "\n", "function\n", - "\n", - "function\n", + "\n", + "function\n", "\n", "\n", "\n" ], "text/plain": [ - "" + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "%%cell_to_module joke --display display_config\n", + "def joke_prompt(topic: str) -> str:\n", + " return f\"Tell me a short joke about {topic}\"" + ] + }, + { + "cell_type": "markdown", + "id": "f8b86615", + "metadata": {}, + "source": [ + "### 2.4 Write module to file\n", + "To make the transition from notebook to module easy and avoid copy-pasting, you can use `--write_to_file / -w`. This will copy the content of the file to `{MODULE_NAME}.py`. You can also specify a destination file path explicitly.\n", + "\n", + "> ⛔ Be careful with overwriting files with this command. Use git to version your files.\n", + "\n", + "After the running the next cell, you should see `joke.py` generated in your directory." + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "id": "0b428a30", + "metadata": {}, + "outputs": [], + "source": [ + "%%cell_to_module joke --write_to_file\n", + "def joke_prompt(topic: str) -> str:\n", + " return f\"Knock, knock. Who's there? {topic}\"" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "id": "f52076cb", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "\"Knock, knock. Who's there? Cowsays\"" ] }, "execution_count": 10, @@ -443,33 +482,30 @@ } ], "source": [ - "%%cell_to_module -m hello --display --config conf --rebuild-drivers\n", - "# shows how to pass in configuration for display\n", - "from hamilton.function_modifiers import config\n", - "\n", - "def hello()->str:\n", - " return \"hi\"\n", + "import joke\n", + "joke.joke_prompt(\"Cowsays\")" + ] + }, + { + "cell_type": "markdown", + "id": "6f10bf5f", + "metadata": {}, + "source": [ + "### 2.5 Configure the `Driver`\n", + "When using the `@config` function modifiers, you might need to pass a configuration to properly build your dataflow. You can do this inline with the `-c/--config` argument. It supports 3 different format:\n", "\n", - "@config.when(some_key=\"some_value\")\n", - "def world__1(hello: str)-> str:\n", - " return f\"{hello} world\"\n", + "1. **Variable name** `--config my_config` where `my_config` is a variable e.g., `my_config=dict(a=True, b=-1)`\n", + "2. **Key-value** `--config a=True, b=-1` evaluates to `dict(a=\"True\", b=\"-1\")` since everything is interpreted as strings.\n", + "3. **JSON** `--config '{\"a\": true, \"b\": -1}'` evaluates to `dict(a=True, b=-1)`. For valid JSON, you need double quotes `\"` inside and have single quotes `'` outside.\n", "\n", - "@config.when_not(some_key=\"some_value\")\n", - "def world__2(hello: str)-> str:\n", - " return f\"World {hello}\"" + "Here are examples. Notice how the config is properly displayed." ] }, { "cell_type": "code", "execution_count": 11, - "id": "fb7135d8a7078a40", - "metadata": { - "ExecuteTime": { - "end_time": "2024-02-13T06:29:23.617271Z", - "start_time": "2024-02-13T06:29:23.339236Z" - }, - "collapsed": false - }, + "id": "3c2b11be", + "metadata": {}, "outputs": [ { "data": { @@ -477,191 +513,1565 @@ "\n", "\n", - "\n", - "\n", - "\n", - "\n", - "\n", + "\n", + "\n", + "\n", + "%3\n", + "\n", "\n", "cluster__legend\n", - "\n", - "Legend\n", + "\n", + "Legend\n", "\n", - "\n", + "\n", "\n", - "joke_messages\n", - "\n", - "joke_messages\n", - "list\n", - "\n", - "\n", - "\n", - "joke_response\n", - "\n", - "joke_response\n", - "str\n", - "\n", - "\n", - "\n", - "joke_messages->joke_response\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "world\n", - "\n", - "world\n", - "str\n", - "\n", - "\n", - "\n", - "some_key\n", - "\n", - "\n", - "\n", - "some_key\n", - "typing.Any\n", + "knock_joke\n", + "\n", + "\n", + "\n", + "knock_joke\n", + "true\n", "\n", "\n", - "\n", + "\n", "joke_prompt\n", - "\n", - "joke_prompt\n", - "str\n", - "\n", - "\n", - "\n", - "joke_prompt->joke_messages\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "llm_client\n", - "\n", - "llm_client\n", - "OpenAI\n", - "\n", - "\n", - "\n", - "llm_client->joke_response\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "hello\n", - "\n", - "hello\n", - "str\n", - "\n", - "\n", - "\n", - "hello->world\n", - "\n", - "\n", + "\n", + "joke_prompt: knock_joke\n", + "str\n", "\n", "\n", - "\n", + "\n", "_joke_prompt_inputs\n", - "\n", - "topic\n", - "str\n", + "\n", + "topic\n", + "str\n", "\n", "\n", - "\n", + "\n", "_joke_prompt_inputs->joke_prompt\n", - "\n", - "\n", + "\n", + "\n", "\n", "\n", - "\n", + "\n", "config\n", - "\n", - "\n", - "\n", - "config\n", + "\n", + "\n", + "\n", + "config\n", "\n", "\n", - "\n", + "\n", "input\n", - "\n", - "input\n", + "\n", + "input\n", "\n", "\n", - "\n", + "\n", "function\n", - "\n", - "function\n", + "\n", + "function\n", "\n", "\n", "\n" ], "text/plain": [ - "" + "" ] }, - "execution_count": 11, "metadata": {}, - "output_type": "execute_result" + "output_type": "display_data" } ], "source": [ - "# shows multiple modules -- this will be rebuilt correctly if either \"module\" is updated if you\n", - "# turn rebuilding drivers on (`--rebuild-drivers`).\n", - "dr3 = (\n", - " driver.Builder()\n", - " .with_modules(joke, hello)\n", - " .with_config(conf)\n", - " .build()\n", - " )\n", - "dr3.display_all_functions()" + "%%cell_to_module joke --display --config '{\"knock_joke\": \"true\"}'\n", + "from hamilton.function_modifiers import config\n", + "\n", + "@config.when(knock_joke=\"true\")\n", + "def joke_prompt__knock(topic: str) -> str:\n", + " return f\"Knock, knock. Who's there? {topic}\"" ] }, { "cell_type": "code", "execution_count": 12, - "id": "a2b844b28df52160", - "metadata": { - "ExecuteTime": { - "end_time": "2024-02-13T06:29:24.456838Z", - "start_time": "2024-02-13T06:29:24.447092Z" - }, - "collapsed": false - }, + "id": "39bc255e", + "metadata": {}, "outputs": [ { "data": { - "text/plain": [ - "{'world': 'hi world'}" - ] - }, - "execution_count": 12, - "metadata": {}, - "output_type": "execute_result" - } + "image/svg+xml": [ + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "%3\n", + "\n", + "\n", + "cluster__legend\n", + "\n", + "Legend\n", + "\n", + "\n", + "\n", + "knock_joke\n", + "\n", + "\n", + "\n", + "knock_joke\n", + "true\n", + "\n", + "\n", + "\n", + "joke_prompt\n", + "\n", + "joke_prompt: knock_joke\n", + "str\n", + "\n", + "\n", + "\n", + "_joke_prompt_inputs\n", + "\n", + "topic\n", + "str\n", + "\n", + "\n", + "\n", + "_joke_prompt_inputs->joke_prompt\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "config\n", + "\n", + "\n", + "\n", + "config\n", + "\n", + "\n", + "\n", + "input\n", + "\n", + "input\n", + "\n", + "\n", + "\n", + "function\n", + "\n", + "function\n", + "\n", + "\n", + "\n" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "%%cell_to_module joke --display --config knock_joke=true\n", + "from hamilton.function_modifiers import config\n", + "\n", + "@config.when(knock_joke=\"true\")\n", + "def joke_prompt__knock(topic: str) -> str:\n", + " return f\"Knock, knock. Who's there? {topic}\"" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "id": "ebcc2729", + "metadata": {}, + "outputs": [], + "source": [ + "my_config = dict(knock_joke=\"true\")" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "id": "a30d4cec", + "metadata": {}, + "outputs": [ + { + "data": { + "image/svg+xml": [ + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "%3\n", + "\n", + "\n", + "cluster__legend\n", + "\n", + "Legend\n", + "\n", + "\n", + "\n", + "knock_joke\n", + "\n", + "\n", + "\n", + "knock_joke\n", + "true\n", + "\n", + "\n", + "\n", + "joke_prompt\n", + "\n", + "joke_prompt: knock_joke\n", + "str\n", + "\n", + "\n", + "\n", + "_joke_prompt_inputs\n", + "\n", + "topic\n", + "str\n", + "\n", + "\n", + "\n", + "_joke_prompt_inputs->joke_prompt\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "config\n", + "\n", + "\n", + "\n", + "config\n", + "\n", + "\n", + "\n", + "input\n", + "\n", + "input\n", + "\n", + "\n", + "\n", + "function\n", + "\n", + "function\n", + "\n", + "\n", + "\n" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "%%cell_to_module joke --display --config my_config\n", + "from hamilton.function_modifiers import config\n", + "\n", + "@config.when(knock_joke=\"true\")\n", + "def joke_prompt__knock(topic: str) -> str:\n", + " return f\"Knock, knock. Who's there? {topic}\"" + ] + }, + { + "cell_type": "markdown", + "id": "0fc21caf", + "metadata": {}, + "source": [ + "### 2.6 Build a `Driver`" + ] + }, + { + "cell_type": "markdown", + "id": "37873813", + "metadata": {}, + "source": [ + "The `Driver` definition can be required to properly build some Hamilton dataflow, in particular those using `.with_config() / @config` and `Parallelizable[]/Collect[]`. We can pass a `Builder` object using `-b/--builder`.\n", + "\n", + "Here are examples. Notice how the config from the `Builder` is properly displayed." + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "id": "2582a54d", + "metadata": {}, + "outputs": [], + "source": [ + "my_builder = (\n", + " driver.Builder()\n", + " .enable_dynamic_execution(allow_experimental_mode=True)\n", + " .with_config({\"knock_joke\": \"true\"})\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "id": "922a4255", + "metadata": {}, + "outputs": [ + { + "data": { + "image/svg+xml": [ + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "%3\n", + "\n", + "\n", + "cluster__legend\n", + "\n", + "Legend\n", + "\n", + "\n", + "\n", + "knock_joke\n", + "\n", + "\n", + "\n", + "knock_joke\n", + "true\n", + "\n", + "\n", + "\n", + "topic\n", + "\n", + "\n", + "topic\n", + "Parallelizable\n", + "\n", + "\n", + "\n", + "joke_prompt\n", + "\n", + "joke_prompt: knock_joke\n", + "str\n", + "\n", + "\n", + "\n", + "topic->joke_prompt\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "joke_prompt_collection\n", + "\n", + "\n", + "joke_prompt_collection\n", + "list\n", + "\n", + "\n", + "\n", + "joke_prompt->joke_prompt_collection\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "config\n", + "\n", + "\n", + "\n", + "config\n", + "\n", + "\n", + "\n", + "function\n", + "\n", + "function\n", + "\n", + "\n", + "\n", + "expand\n", + "\n", + "\n", + "expand\n", + "\n", + "\n", + "\n", + "collect\n", + "\n", + "\n", + "collect\n", + "\n", + "\n", + "\n" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "%%cell_to_module joke --display -b my_builder\n", + "from hamilton.htypes import Parallelizable, Collect\n", + "from hamilton.function_modifiers import config\n", + "\n", + "def topic() -> Parallelizable[str]:\n", + " for t in [\"Tom\", \"Jerry\"]:\n", + " yield t\n", + "\n", + "@config.when(knock_joke=\"true\")\n", + "def joke_prompt__knock(topic: str) -> str:\n", + " return f\"Knock, knock. Who's there? {topic}\"\n", + "\n", + "def joke_prompt_collection(joke_prompt: Collect[str]) -> list:\n", + " return list(joke_prompt)" + ] + }, + { + "cell_type": "markdown", + "id": "8c2774be", + "metadata": {}, + "source": [ + "### 2.7 Load external modules\n", + "While developing your dataflow with `%%cell_to_module`, you might want to load nodes from another Python module. To do, simply import it and add it to the `Driver` using `.with_modules()` " + ] + }, + { + "cell_type": "code", + "execution_count": 33, + "id": "c59a633b", + "metadata": {}, + "outputs": [], + "source": [ + "my_builder = driver.Builder().with_modules(joke)" + ] + }, + { + "cell_type": "markdown", + "id": "eb6c1b06", + "metadata": {}, + "source": [ + "The nodes `topic` and `joke_prompt` origin from `joke.py`" + ] + }, + { + "cell_type": "code", + "execution_count": 34, + "id": "2ca32ee2", + "metadata": {}, + "outputs": [ + { + "data": { + "image/svg+xml": [ + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "%3\n", + "\n", + "\n", + "cluster__legend\n", + "\n", + "Legend\n", + "\n", + "\n", + "\n", + "reply\n", + "\n", + "reply\n", + "str\n", + "\n", + "\n", + "\n", + "punchline\n", + "\n", + "punchline\n", + "str\n", + "\n", + "\n", + "\n", + "reply->punchline\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "joke_response\n", + "\n", + "joke_response\n", + "str\n", + "\n", + "\n", + "\n", + "joke_prompt\n", + "\n", + "joke_prompt\n", + "str\n", + "\n", + "\n", + "\n", + "joke_prompt->reply\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "joke_prompt->joke_response\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "function\n", + "\n", + "function\n", + "\n", + "\n", + "\n" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "%%cell_to_module reply --display --builder my_builder\n", + "def joke_response(joke_prompt: str) -> str:\n", + " return f\"{joke_prompt}\\n\\nCowsay who?\"" + ] + }, + { + "cell_type": "markdown", + "id": "2c7624a0", + "metadata": {}, + "source": [ + "### 2.8 Edit external modules\n", + "It is also possible to load the content of a Python module into a notebook cell to be able to edit it interactively!\n", + "\n", + "This is essentially the reverse operation of `%%cell_to_module` hence why it's called `%module_to_cell`. This is a line magic (single `%`) and it reads the content of the line as a file path to a `.py` file." + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "id": "d72ac640", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "# execute this to generate a new cell\n", + "%module_to_cell ./joke.py" + ] + }, + { + "cell_type": "markdown", + "id": "c8331b92", + "metadata": {}, + "source": [ + "If you executed the previous cell, a new code cell was created above with the content of `joke.py`. You can add `--write_to_file` to write the notebook cell back to the file." + ] + }, + { + "cell_type": "markdown", + "id": "ed689861", + "metadata": {}, + "source": [ + "## 3. Execute a dataflow\n", + "One of the best part about notebooks is the ability to execute and immediately inspect results. They provide a \"read-eval-print\" loop (REPL) coding experience. With this extension, you can use a single notebook cell to define and execute your dataflow for a tight feedback loop.\n", + "\n", + "If you're familiar with Hamilton, you probably figured out that you can build a `Driver` from the dynamically defined modules (like the next cell). But we have better interactive options!" + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "id": "ed95343f", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "{'joke_prompt': \"Knock, knock. Who's there? Cowsay\"}" + ] + }, + "execution_count": 22, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "dr = driver.Builder().with_modules(joke).build()\n", + "results = dr.execute([\"joke_prompt\"], inputs=dict(topic=\"Cowsay\"))\n", + "results" + ] + }, + { + "cell_type": "markdown", + "id": "b36418f1", + "metadata": {}, + "source": [ + "### 3.1 Execute cell\n", + "By adding `--execute / -x` to your module definition, the defined dataflow will be executed using `Driver.execute()` with all available nodes.\n", + "\n", + "The `--display` visualization should now include **output** nodes reflecting the executed nodes." + ] + }, + { + "cell_type": "code", + "execution_count": 23, + "id": "0da888d5", + "metadata": {}, + "outputs": [ + { + "data": { + "image/svg+xml": [ + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "%3\n", + "\n", + "\n", + "cluster__legend\n", + "\n", + "Legend\n", + "\n", + "\n", + "\n", + "a_dataframe\n", + "\n", + "a_dataframe\n", + "DataFrame\n", + "\n", + "\n", + "\n", + "reply\n", + "\n", + "reply\n", + "str\n", + "\n", + "\n", + "\n", + "punchline\n", + "\n", + "punchline\n", + "str\n", + "\n", + "\n", + "\n", + "reply->punchline\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "joke_prompt\n", + "\n", + "joke_prompt\n", + "str\n", + "\n", + "\n", + "\n", + "joke_prompt->reply\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "function\n", + "\n", + "function\n", + "\n", + "\n", + "\n", + "output\n", + "\n", + "output\n", + "\n", + "\n", + "\n" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
ab
00a
11b
22c
33d
\n", + "
" + ], + "text/plain": [ + " a b\n", + "0 0 a\n", + "1 1 b\n", + "2 2 c\n", + "3 3 d" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/plain": [ + "\"Knock, knock. Who's there? Cowsay\"" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/plain": [ + "'Cowsay who?'" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/plain": [ + "'No, Cowsay MooOOooo'" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "%%cell_to_module joke --display --execute\n", + "import pandas as pd\n", + "\n", + "def joke_prompt() -> str:\n", + " return f\"Knock, knock. Who's there? Cowsay\"\n", + "\n", + "def reply(joke_prompt: str) -> str:\n", + " _, _, right = joke_prompt.partition(\"? \")\n", + " return f\"{right} who?\"\n", + "\n", + "def punchline(reply: str) -> str:\n", + " left, _, _ = reply.partition(\" \")\n", + " return f\"No, {left} MooOOooo\"\n", + "\n", + "def a_dataframe() -> pd.DataFrame:\n", + " return pd.DataFrame({\"a\": [0, 1, 2, 3], \"b\": [\"a\", \"b\", \"c\", \"d\"]})" + ] + }, + { + "cell_type": "markdown", + "id": "8026ec86", + "metadata": {}, + "source": [ + "👆 As you see, node results are automatically displayed in topologically sorted order. You can hide them with `--hide_results`." + ] + }, + { + "cell_type": "markdown", + "id": "9a5738f6", + "metadata": {}, + "source": [ + "### 3.2 Requesting nodes\n", + "You can a variable name to `--execute` which specifies the list of nodes to execute. This will be reflected in the visualization." + ] + }, + { + "cell_type": "code", + "execution_count": 24, + "id": "d20198e1", + "metadata": {}, + "outputs": [], + "source": [ + "node_to_execute = [\"reply\"]" + ] + }, + { + "cell_type": "code", + "execution_count": 25, + "id": "b7eea8d6", + "metadata": {}, + "outputs": [ + { + "data": { + "image/svg+xml": [ + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "%3\n", + "\n", + "\n", + "cluster__legend\n", + "\n", + "Legend\n", + "\n", + "\n", + "\n", + "topic\n", + "\n", + "topic\n", + "str\n", + "\n", + "\n", + "\n", + "joke_prompt\n", + "\n", + "joke_prompt\n", + "str\n", + "\n", + "\n", + "\n", + "topic->joke_prompt\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "reply\n", + "\n", + "reply\n", + "str\n", + "\n", + "\n", + "\n", + "joke_prompt->reply\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "function\n", + "\n", + "function\n", + "\n", + "\n", + "\n", + "output\n", + "\n", + "output\n", + "\n", + "\n", + "\n" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/plain": [ + "'Cowsay who?'" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "%%cell_to_module joke --display --execute node_to_execute\n", + "def topic() -> str:\n", + " return \"Cowsay\"\n", + "\n", + "def joke_prompt(topic: str) -> str:\n", + " return f\"Knock, knock. Who's there? {topic}\"\n", + "\n", + "def reply(joke_prompt: str) -> str:\n", + " _, _, right = joke_prompt.partition(\"? \")\n", + " return f\"{right} who?\"\n", + "\n", + "def punchline(reply: str) -> str:\n", + " left, _, _ = reply.partition(\" \")\n", + " return f\"No, {left} MooOOooo\"" + ] + }, + { + "cell_type": "markdown", + "id": "e10dd1d9", + "metadata": {}, + "source": [ + "### 3.3 Inspecting results\n", + "Ok, but how do you access results? With the node name! Magic 🧙" + ] + }, + { + "cell_type": "code", + "execution_count": 26, + "id": "5eab142c", + "metadata": {}, + "outputs": [ + { + "data": { + "image/svg+xml": [ + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "%3\n", + "\n", + "\n", + "cluster__legend\n", + "\n", + "Legend\n", + "\n", + "\n", + "\n", + "reply\n", + "\n", + "reply\n", + "str\n", + "\n", + "\n", + "\n", + "punchline\n", + "\n", + "punchline\n", + "str\n", + "\n", + "\n", + "\n", + "reply->punchline\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "topic\n", + "\n", + "topic\n", + "str\n", + "\n", + "\n", + "\n", + "joke_prompt\n", + "\n", + "joke_prompt\n", + "str\n", + "\n", + "\n", + "\n", + "topic->joke_prompt\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "joke_prompt->reply\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "function\n", + "\n", + "function\n", + "\n", + "\n", + "\n", + "output\n", + "\n", + "output\n", + "\n", + "\n", + "\n" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + } ], "source": [ - "dr3.execute([\"world\"], inputs={})" + "%%cell_to_module joke --display --execute --hide_results\n", + "def topic() -> str:\n", + " return \"Cowsay\"\n", + "\n", + "def joke_prompt(topic: str) -> str:\n", + " return f\"Knock, knock. Who's there? {topic}\"\n", + "\n", + "def reply(joke_prompt: str) -> str:\n", + " _, _, right = joke_prompt.partition(\"? \")\n", + " return f\"{right} who?\"\n", + "\n", + "def punchline(reply: str) -> str:\n", + " left, _, _ = reply.partition(\" \")\n", + " return f\"No, {left} MooOOooo\"" ] }, { "cell_type": "code", - "execution_count": null, - "id": "f8ef467eeab94c4d", - "metadata": { - "collapsed": false - }, + "execution_count": 27, + "id": "607d3863", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "'No, Cowsay MooOOooo'" + ] + }, + "execution_count": 27, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "punchline" + ] + }, + { + "cell_type": "markdown", + "id": "ad5f79be", + "metadata": {}, + "source": [ + "The results are assigned to variables matching their node names. This means you can quickly access them by typing their name. In most notebook environment, you get tab-completion for the and can view results in the variable inspector (especially useful for dataframes).\n", + "\n", + "What's the magic trick? 🐰\n", + "\n", + "When executing the cell, we are effectively:\n", + "1. Loading it's content as a module\n", + "2. Building a `Driver` with this module\n", + "3. Executing the module with all nodes\n", + "4. Assigning values from the results of `.execute()` to variables\n", + "\n", + "Consequently, functions defined in `%%cell_to_node` are replaced by their \"value\". You access functions directly through their module:" + ] + }, + { + "cell_type": "code", + "execution_count": 28, + "id": "f9a9df33", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "'No, Foxey MooOOooo'" + ] + }, + "execution_count": 28, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "joke.punchline(\"Foxey\")" + ] + }, + { + "cell_type": "markdown", + "id": "72b09bee", + "metadata": {}, + "source": [ + "### 3.4 Inputs & overrides\n", + "In Hamilton, *inputs* are values external to the dataflow and *overrides* are values to replace the output of a node (it effectively skips upstream operations). You can use `--inputs / -i` and `--outputs / -o` to pass dictionaries of values for execution." + ] + }, + { + "cell_type": "code", + "execution_count": 29, + "id": "ce7d51d4", + "metadata": {}, + "outputs": [], + "source": [ + "my_inputs = dict(topic=\"monday\")\n", + "my_overrides = dict(punchline=\"Bingo bongo!\")" + ] + }, + { + "cell_type": "code", + "execution_count": 30, + "id": "83c61edd", + "metadata": {}, + "outputs": [ + { + "data": { + "image/svg+xml": [ + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "%3\n", + "\n", + "\n", + "cluster__legend\n", + "\n", + "Legend\n", + "\n", + "\n", + "\n", + "reply\n", + "\n", + "reply\n", + "str\n", + "\n", + "\n", + "\n", + "punchline\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "punchline\n", + "str\n", + "\n", + "\n", + "\n", + "reply->punchline\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "joke_prompt\n", + "\n", + "joke_prompt\n", + "str\n", + "\n", + "\n", + "\n", + "joke_prompt->reply\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "_joke_prompt_inputs\n", + "\n", + "topic\n", + "str\n", + "\n", + "\n", + "\n", + "_joke_prompt_inputs->joke_prompt\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "input\n", + "\n", + "input\n", + "\n", + "\n", + "\n", + "function\n", + "\n", + "function\n", + "\n", + "\n", + "\n", + "output\n", + "\n", + "output\n", + "\n", + "\n", + "\n", + "override\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "override\n", + "\n", + "\n", + "\n" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/plain": [ + "'monday'" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/plain": [ + "\"Knock, knock. Who's there? monday\"" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/plain": [ + "'monday who?'" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/plain": [ + "'Bingo bongo!'" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "%%cell_to_module joke --display --execute --inputs my_inputs --overrides my_overrides\n", + "def joke_prompt(topic: str) -> str:\n", + " return f\"Knock, knock. Who's there? {topic}\"\n", + "\n", + "def reply(joke_prompt: str) -> str:\n", + " _, _, right = joke_prompt.partition(\"? \")\n", + " return f\"{right} who?\"\n", + "\n", + "def punchline(reply: str) -> str:\n", + " left, _, _ = reply.partition(\" \")\n", + " return f\"No, {left} MooOOooo\"" + ] + }, + { + "cell_type": "markdown", + "id": "2c21fd58", + "metadata": {}, + "source": [ + "### 3.5 Driver Adapters\n", + "You can modify execution by passing adapters to `--builder / -b`. Adapters are flexible tools that can provide a variety of features. For instance, the next few cells uses `PrintLn()` to execution status after each node." + ] + }, + { + "cell_type": "code", + "execution_count": 31, + "id": "abe2edb4", + "metadata": {}, "outputs": [], - "source": [] + "source": [ + "from hamilton.lifecycle.default import PrintLn\n", + "my_builder = driver.Builder().with_adapters(PrintLn()) # add the adapter" + ] + }, + { + "cell_type": "markdown", + "id": "7d0b177c", + "metadata": {}, + "source": [ + "Notice in the printed statement how the overriden `punchline` node isn't executed." + ] + }, + { + "cell_type": "code", + "execution_count": 32, + "id": "ac0a6585", + "metadata": {}, + "outputs": [ + { + "data": { + "image/svg+xml": [ + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "%3\n", + "\n", + "\n", + "cluster__legend\n", + "\n", + "Legend\n", + "\n", + "\n", + "\n", + "reply\n", + "\n", + "reply\n", + "str\n", + "\n", + "\n", + "\n", + "punchline\n", + "\n", + "punchline\n", + "str\n", + "\n", + "\n", + "\n", + "reply->punchline\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "joke_prompt\n", + "\n", + "joke_prompt\n", + "str\n", + "\n", + "\n", + "\n", + "joke_prompt->reply\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "function\n", + "\n", + "function\n", + "\n", + "\n", + "\n", + "output\n", + "\n", + "output\n", + "\n", + "\n", + "\n" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Executing node: joke_prompt.\n", + "Finished debugging node: joke_prompt in 53.4μs. Status: Success.\n", + "Executing node: reply.\n", + "Finished debugging node: reply in 10.5μs. Status: Success.\n", + "Executing node: punchline.\n", + "Finished debugging node: punchline in 9.78μs. Status: Success.\n" + ] + }, + { + "data": { + "text/plain": [ + "\"Knock, knock. Who's there? Cowsay\"" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/plain": [ + "'Cowsay who?'" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/plain": [ + "'No, Cowsay MooOOooo'" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "%%cell_to_module joke --builder my_builder --display --execute \n", + "def joke_prompt() -> str:\n", + " return f\"Knock, knock. Who's there? Cowsay\"\n", + "\n", + "def reply(joke_prompt: str) -> str:\n", + " _, _, right = joke_prompt.partition(\"? \")\n", + " return f\"{right} who?\"\n", + "\n", + "def punchline(reply: str) -> str:\n", + " left, _, _ = reply.partition(\" \")\n", + " return f\"No, {left} MooOOooo\"" + ] + }, + { + "cell_type": "markdown", + "id": "15cea16a", + "metadata": {}, + "source": [ + "There are ton of awesome adapters that can help you with your notebook experience. Here are a few notable mentions:\n", + "\n", + "1. `hamilton.lifecycle.default.CacheAdapter()` will automatically version the node's code and input values and store its result on disk. When running the same node (code, inputs) pair, it will read the value from disk instead of recomputing. This can help save LLM API costs!\n", + "2. `hamilton.plugins.h_diskcache.DiskCacheAdapter()` same core features as `CacheAdapter()`, but more utilities around cache management\n", + "3. `hamilton.lifecycle.default.PrintLn()` print execution status.\n", + "4. `hamilton.plugins.h_tqdm.ProgressBar()` add a progress bar for execution.\n", + "5. `hamilton.lifecycle.default.PDBDebugger()` allows you to step into a node with a Python debugger, allowing you to execute code line by line.\n", + "\n", + "Note that all of these adapters work with Hamilton outside notebooks too!" + ] } ], "metadata": { "kernelspec": { - "display_name": "Python 3 (ipykernel)", + "display_name": "venv", "language": "python", "name": "python3" }, @@ -675,7 +2085,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.9.13" + "version": "3.11.1" } }, "nbformat": 4, diff --git a/examples/jupyter_notebook_magic/tutorial.ipynb b/examples/jupyter_notebook_magic/tutorial.ipynb deleted file mode 100644 index 602a795e0..000000000 --- a/examples/jupyter_notebook_magic/tutorial.ipynb +++ /dev/null @@ -1,2093 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "id": "cff304da", - "metadata": {}, - "source": [ - "# Hamilton notebook extension\n", - "Jupyter magics are commands that can be executed in notebooks using `%` and `%%` in code cells.\n", - "- **Line magics** start with `%` and apply to the current line\n", - "- **Cell magics** start with `%%`, need to be the first line of a cell, and apply to the entire cell.\n", - "\n", - " You can think of them as Python decorators for lines and cells.\n", - "\n", - "> For example, `%timeit complex_function()` will return the time to execute `complex_function()` and adding `%%timeit` will return the time to execute the entire cell.\n", - "\n", - "This notebook is a tutorial on the Hamilton Jupyter magics and how they can improve your interactive development experience. It is meant to be read and have all cells executed linearly.\n", - "\n", - "- **Section 2** - Dataflow definition\n", - "- **Section 3** - Dataflow execution\n", - "\n", - "> ⚠ This notebook extension is something we're actively developing. If you find any bugs, edge cases, performance impacts, or if you have feature requests, let us know." - ] - }, - { - "cell_type": "markdown", - "id": "245fa568", - "metadata": {}, - "source": [ - "## 1. Loading the extension" - ] - }, - { - "cell_type": "markdown", - "id": "bddc7450", - "metadata": {}, - "source": [ - "To load our Jupyter Magic, we use `%load_ext` with the import path for the Python module (as if you did `import ...`). You only need to load it once, and will need to reload it if you restart the kernel just like you would for a Python module." - ] - }, - { - "cell_type": "code", - "execution_count": 1, - "id": "302defaa", - "metadata": {}, - "outputs": [], - "source": [ - "%reload_ext hamilton.plugins.jupyter_magic\n", - "from hamilton import driver # we'll need this later" - ] - }, - { - "cell_type": "markdown", - "id": "bb28d555", - "metadata": {}, - "source": [ - "After loading the extension, Hamilton magics become available:\n", - "- `%%cell_to_module`\n", - "- `%module_to_cell`\n", - "\n", - "This notebook will cover them one-by-one, but if you need a quick refresher you can prepend `?` to get help." - ] - }, - { - "cell_type": "code", - "execution_count": 2, - "id": "f4a9d444", - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\u001b[0;31mDocstring:\u001b[0m\n", - "::\n", - "\n", - " %cell_to_module [-m [MODULE_NAME]] [-d [DISPLAY]] [-x [EXECUTE]]\n", - " [-b BUILDER] [-c CONFIG] [-i INPUTS] [-o OVERRIDES]\n", - " [--hide_results] [-w [WRITE_TO_FILE]]\n", - " [module_name]\n", - "\n", - "Turn a notebook cell into a Hamilton module definition. This allows you to define\n", - "and execute a dataflow from a single cell.\n", - "\n", - "For example:\n", - "```\n", - "%%cell_to_module dataflow --display --execute\n", - "def A() -> int:\n", - " return 37\n", - "\n", - "def B(A: int) -> bool:\n", - " return (A % 3) > 2\n", - "```\n", - "\n", - "positional arguments:\n", - " module_name Name for the module defined in this cell.\n", - "\n", - "options:\n", - " -m <[MODULE_NAME]>, --module_name <[MODULE_NAME]>\n", - " Alias for positional argument `module_name`. There for\n", - " backwards compatibility. Prefer the position arg.\n", - " -d <[DISPLAY]>, --display <[DISPLAY]>\n", - " Display the dataflow. The argument is the variable\n", - " name of a dictionary of visualization kwargs; else {}.\n", - " -x <[EXECUTE]>, --execute <[EXECUTE]>\n", - " Execute the dataflow. The argument is the variable\n", - " name of a list of nodes; else execute all nodes.\n", - " -b BUILDER, --builder BUILDER\n", - " Builder to which the module will be added and used for\n", - " execution. Allows to pass Config and Adapters\n", - " -c CONFIG, --config CONFIG\n", - " Config to build a Driver. Passing -c/--config at the\n", - " same time as a Builder -b/--builder with a config will\n", - " raise an exception.\n", - " -i INPUTS, --inputs INPUTS\n", - " Execution inputs. The argument is the variable name of\n", - " a dict of inputs; else {}.\n", - " -o OVERRIDES, --overrides OVERRIDES\n", - " Execution overrides. The argument is the variable name\n", - " of a dict of overrides; else {}.\n", - " --hide_results Hides the automatic display of execution results.\n", - " -w <[WRITE_TO_FILE]>, --write_to_file <[WRITE_TO_FILE]>\n", - " Write cell content to a file. The argument is the file\n", - " path; else write to {module_name}.py\n", - "\u001b[0;31mFile:\u001b[0m ~/projects/dagworks/hamilton/hamilton/plugins/jupyter_magic.py" - ] - } - ], - "source": [ - "?%%cell_to_module" - ] - }, - { - "cell_type": "markdown", - "id": "8db7e808", - "metadata": {}, - "source": [ - "## 2. Define a Hamilton dataflow" - ] - }, - { - "cell_type": "markdown", - "id": "6d310207", - "metadata": {}, - "source": [ - "### 2.1 Basics\n", - "The main magic is `%%cell_to_module MODULE_NAME` which turns a cell into a temporary Python module in-memory. Successful cell execution means it's a valid Hamilton dataflow." - ] - }, - { - "cell_type": "code", - "execution_count": 3, - "id": "a2f33575", - "metadata": {}, - "outputs": [ - { - "data": { - "image/svg+xml": [ - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "%3\n", - "\n", - "\n", - "cluster__legend\n", - "\n", - "Legend\n", - "\n", - "\n", - "\n", - "joke_prompt\n", - "\n", - "joke_prompt\n", - "str\n", - "\n", - "\n", - "\n", - "_joke_prompt_inputs\n", - "\n", - "topic\n", - "str\n", - "\n", - "\n", - "\n", - "_joke_prompt_inputs->joke_prompt\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "input\n", - "\n", - "input\n", - "\n", - "\n", - "\n", - "function\n", - "\n", - "function\n", - "\n", - "\n", - "\n" - ], - "text/plain": [ - "" - ] - }, - "metadata": {}, - "output_type": "display_data" - } - ], - "source": [ - "%%cell_to_module joke -d\n", - "def joke_prompt(topic: str) -> str:\n", - " return f\"Tell me a short joke about {topic}\"" - ] - }, - { - "cell_type": "markdown", - "id": "54e60f6b", - "metadata": {}, - "source": [ - "The module name allows to namespace functions " - ] - }, - { - "cell_type": "code", - "execution_count": 4, - "id": "b9676cc6", - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Tell me a short joke about hello\n", - "Tell me a short joke about greetings\n" - ] - } - ], - "source": [ - "print(joke.joke_prompt(topic=\"hello\"))\n", - "print(joke_prompt(topic=\"greetings\"))" - ] - }, - { - "cell_type": "markdown", - "id": "15977369", - "metadata": {}, - "source": [ - "### 2.2 Module imports\n", - "Code found in cells with `%%cell_to_module` is treated like an isolated `.py` file. This means you need to define Python imports in the cell itself. " - ] - }, - { - "cell_type": "code", - "execution_count": 5, - "id": "aaf63f03", - "metadata": {}, - "outputs": [], - "source": [ - "%%cell_to_module joke\n", - "from typing import Optional # remove to get `NameError: name 'Optional' is not defined``\n", - "\n", - "def joke_prompt(topic: Optional[str] = None) -> str:\n", - " return f\"Tell me a short joke about {topic}\"" - ] - }, - { - "cell_type": "markdown", - "id": "b9e3f0a0", - "metadata": {}, - "source": [ - "### 2.3 Display module\n", - "You can visualize with the `--display / -d` argument. It can receive a dictionary of [visualization `kwargs`](https://hamilton.dagworks.io/en/latest/reference/drivers/Driver/#hamilton.driver.Driver.display_all_functions)." - ] - }, - { - "cell_type": "code", - "execution_count": 6, - "id": "378a4552", - "metadata": {}, - "outputs": [ - { - "data": { - "image/svg+xml": [ - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "%3\n", - "\n", - "\n", - "cluster__legend\n", - "\n", - "Legend\n", - "\n", - "\n", - "\n", - "joke_prompt\n", - "\n", - "joke_prompt\n", - "str\n", - "\n", - "\n", - "\n", - "_joke_prompt_inputs\n", - "\n", - "topic\n", - "str\n", - "\n", - "\n", - "\n", - "_joke_prompt_inputs->joke_prompt\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "input\n", - "\n", - "input\n", - "\n", - "\n", - "\n", - "function\n", - "\n", - "function\n", - "\n", - "\n", - "\n" - ], - "text/plain": [ - "" - ] - }, - "metadata": {}, - "output_type": "display_data" - } - ], - "source": [ - "%%cell_to_module joke --display\n", - "def joke_prompt(topic: str) -> str:\n", - " return f\"Tell me a short joke about {topic}\"" - ] - }, - { - "cell_type": "code", - "execution_count": 7, - "id": "94772b5a", - "metadata": {}, - "outputs": [], - "source": [ - "display_config = dict(orient=\"TB\") # orient visualization top to bottom" - ] - }, - { - "cell_type": "code", - "execution_count": 8, - "id": "af95a65e", - "metadata": {}, - "outputs": [ - { - "data": { - "image/svg+xml": [ - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "%3\n", - "\n", - "\n", - "cluster__legend\n", - "\n", - "Legend\n", - "\n", - "\n", - "\n", - "joke_prompt\n", - "\n", - "joke_prompt\n", - "str\n", - "\n", - "\n", - "\n", - "_joke_prompt_inputs\n", - "\n", - "topic\n", - "str\n", - "\n", - "\n", - "\n", - "_joke_prompt_inputs->joke_prompt\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "input\n", - "\n", - "input\n", - "\n", - "\n", - "\n", - "function\n", - "\n", - "function\n", - "\n", - "\n", - "\n" - ], - "text/plain": [ - "" - ] - }, - "metadata": {}, - "output_type": "display_data" - } - ], - "source": [ - "%%cell_to_module joke --display display_config\n", - "def joke_prompt(topic: str) -> str:\n", - " return f\"Tell me a short joke about {topic}\"" - ] - }, - { - "cell_type": "markdown", - "id": "f8b86615", - "metadata": {}, - "source": [ - "### 2.4 Write module to file\n", - "To make the transition from notebook to module easy and avoid copy-pasting, you can use `--write_to_file / -w`. This will copy the content of the file to `{MODULE_NAME}.py`. You can also specify a destination file path explicitly.\n", - "\n", - "> ⛔ Be careful with overwriting files with this command. Use git to version your files.\n", - "\n", - "After the running the next cell, you should see `joke.py` generated in your directory." - ] - }, - { - "cell_type": "code", - "execution_count": 9, - "id": "0b428a30", - "metadata": {}, - "outputs": [], - "source": [ - "%%cell_to_module joke --write_to_file\n", - "def joke_prompt(topic: str) -> str:\n", - " return f\"Knock, knock. Who's there? {topic}\"" - ] - }, - { - "cell_type": "code", - "execution_count": 10, - "id": "f52076cb", - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "\"Knock, knock. Who's there? Cowsays\"" - ] - }, - "execution_count": 10, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "import joke\n", - "joke.joke_prompt(\"Cowsays\")" - ] - }, - { - "cell_type": "markdown", - "id": "6f10bf5f", - "metadata": {}, - "source": [ - "### 2.5 Configure the `Driver`\n", - "When using the `@config` function modifiers, you might need to pass a configuration to properly build your dataflow. You can do this inline with the `-c/--config` argument. It supports 3 different format:\n", - "\n", - "1. **Variable name** `--config my_config` where `my_config` is a variable e.g., `my_config=dict(a=True, b=-1)`\n", - "2. **Key-value** `--config a=True, b=-1` evaluates to `dict(a=\"True\", b=\"-1\")` since everything is interpreted as strings.\n", - "3. **JSON** `--config '{\"a\": true, \"b\": -1}'` evaluates to `dict(a=True, b=-1)`. For valid JSON, you need double quotes `\"` inside and have single quotes `'` outside.\n", - "\n", - "Here are examples. Notice how the config is properly displayed." - ] - }, - { - "cell_type": "code", - "execution_count": 11, - "id": "3c2b11be", - "metadata": {}, - "outputs": [ - { - "data": { - "image/svg+xml": [ - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "%3\n", - "\n", - "\n", - "cluster__legend\n", - "\n", - "Legend\n", - "\n", - "\n", - "\n", - "knock_joke\n", - "\n", - "\n", - "\n", - "knock_joke\n", - "true\n", - "\n", - "\n", - "\n", - "joke_prompt\n", - "\n", - "joke_prompt: knock_joke\n", - "str\n", - "\n", - "\n", - "\n", - "_joke_prompt_inputs\n", - "\n", - "topic\n", - "str\n", - "\n", - "\n", - "\n", - "_joke_prompt_inputs->joke_prompt\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "config\n", - "\n", - "\n", - "\n", - "config\n", - "\n", - "\n", - "\n", - "input\n", - "\n", - "input\n", - "\n", - "\n", - "\n", - "function\n", - "\n", - "function\n", - "\n", - "\n", - "\n" - ], - "text/plain": [ - "" - ] - }, - "metadata": {}, - "output_type": "display_data" - } - ], - "source": [ - "%%cell_to_module joke --display --config '{\"knock_joke\": \"true\"}'\n", - "from hamilton.function_modifiers import config\n", - "\n", - "@config.when(knock_joke=\"true\")\n", - "def joke_prompt__knock(topic: str) -> str:\n", - " return f\"Knock, knock. Who's there? {topic}\"" - ] - }, - { - "cell_type": "code", - "execution_count": 12, - "id": "39bc255e", - "metadata": {}, - "outputs": [ - { - "data": { - "image/svg+xml": [ - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "%3\n", - "\n", - "\n", - "cluster__legend\n", - "\n", - "Legend\n", - "\n", - "\n", - "\n", - "knock_joke\n", - "\n", - "\n", - "\n", - "knock_joke\n", - "true\n", - "\n", - "\n", - "\n", - "joke_prompt\n", - "\n", - "joke_prompt: knock_joke\n", - "str\n", - "\n", - "\n", - "\n", - "_joke_prompt_inputs\n", - "\n", - "topic\n", - "str\n", - "\n", - "\n", - "\n", - "_joke_prompt_inputs->joke_prompt\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "config\n", - "\n", - "\n", - "\n", - "config\n", - "\n", - "\n", - "\n", - "input\n", - "\n", - "input\n", - "\n", - "\n", - "\n", - "function\n", - "\n", - "function\n", - "\n", - "\n", - "\n" - ], - "text/plain": [ - "" - ] - }, - "metadata": {}, - "output_type": "display_data" - } - ], - "source": [ - "%%cell_to_module joke --display --config knock_joke=true\n", - "from hamilton.function_modifiers import config\n", - "\n", - "@config.when(knock_joke=\"true\")\n", - "def joke_prompt__knock(topic: str) -> str:\n", - " return f\"Knock, knock. Who's there? {topic}\"" - ] - }, - { - "cell_type": "code", - "execution_count": 13, - "id": "ebcc2729", - "metadata": {}, - "outputs": [], - "source": [ - "my_config = dict(knock_joke=\"true\")" - ] - }, - { - "cell_type": "code", - "execution_count": 14, - "id": "a30d4cec", - "metadata": {}, - "outputs": [ - { - "data": { - "image/svg+xml": [ - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "%3\n", - "\n", - "\n", - "cluster__legend\n", - "\n", - "Legend\n", - "\n", - "\n", - "\n", - "knock_joke\n", - "\n", - "\n", - "\n", - "knock_joke\n", - "true\n", - "\n", - "\n", - "\n", - "joke_prompt\n", - "\n", - "joke_prompt: knock_joke\n", - "str\n", - "\n", - "\n", - "\n", - "_joke_prompt_inputs\n", - "\n", - "topic\n", - "str\n", - "\n", - "\n", - "\n", - "_joke_prompt_inputs->joke_prompt\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "config\n", - "\n", - "\n", - "\n", - "config\n", - "\n", - "\n", - "\n", - "input\n", - "\n", - "input\n", - "\n", - "\n", - "\n", - "function\n", - "\n", - "function\n", - "\n", - "\n", - "\n" - ], - "text/plain": [ - "" - ] - }, - "metadata": {}, - "output_type": "display_data" - } - ], - "source": [ - "%%cell_to_module joke --display --config my_config\n", - "from hamilton.function_modifiers import config\n", - "\n", - "@config.when(knock_joke=\"true\")\n", - "def joke_prompt__knock(topic: str) -> str:\n", - " return f\"Knock, knock. Who's there? {topic}\"" - ] - }, - { - "cell_type": "markdown", - "id": "0fc21caf", - "metadata": {}, - "source": [ - "### 2.6 Build a `Driver`" - ] - }, - { - "cell_type": "markdown", - "id": "37873813", - "metadata": {}, - "source": [ - "The `Driver` definition can be required to properly build some Hamilton dataflow, in particular those using `.with_config() / @config` and `Parallelizable[]/Collect[]`. We can pass a `Builder` object using `-b/--builder`.\n", - "\n", - "Here are examples. Notice how the config from the `Builder` is properly displayed." - ] - }, - { - "cell_type": "code", - "execution_count": 15, - "id": "2582a54d", - "metadata": {}, - "outputs": [], - "source": [ - "my_builder = (\n", - " driver.Builder()\n", - " .enable_dynamic_execution(allow_experimental_mode=True)\n", - " .with_config({\"knock_joke\": \"true\"})\n", - ")" - ] - }, - { - "cell_type": "code", - "execution_count": 16, - "id": "922a4255", - "metadata": {}, - "outputs": [ - { - "data": { - "image/svg+xml": [ - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "%3\n", - "\n", - "\n", - "cluster__legend\n", - "\n", - "Legend\n", - "\n", - "\n", - "\n", - "knock_joke\n", - "\n", - "\n", - "\n", - "knock_joke\n", - "true\n", - "\n", - "\n", - "\n", - "topic\n", - "\n", - "\n", - "topic\n", - "Parallelizable\n", - "\n", - "\n", - "\n", - "joke_prompt\n", - "\n", - "joke_prompt: knock_joke\n", - "str\n", - "\n", - "\n", - "\n", - "topic->joke_prompt\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "joke_prompt_collection\n", - "\n", - "\n", - "joke_prompt_collection\n", - "list\n", - "\n", - "\n", - "\n", - "joke_prompt->joke_prompt_collection\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "config\n", - "\n", - "\n", - "\n", - "config\n", - "\n", - "\n", - "\n", - "function\n", - "\n", - "function\n", - "\n", - "\n", - "\n", - "expand\n", - "\n", - "\n", - "expand\n", - "\n", - "\n", - "\n", - "collect\n", - "\n", - "\n", - "collect\n", - "\n", - "\n", - "\n" - ], - "text/plain": [ - "" - ] - }, - "metadata": {}, - "output_type": "display_data" - } - ], - "source": [ - "%%cell_to_module joke --display -b my_builder\n", - "from hamilton.htypes import Parallelizable, Collect\n", - "from hamilton.function_modifiers import config\n", - "\n", - "def topic() -> Parallelizable[str]:\n", - " for t in [\"Tom\", \"Jerry\"]:\n", - " yield t\n", - "\n", - "@config.when(knock_joke=\"true\")\n", - "def joke_prompt__knock(topic: str) -> str:\n", - " return f\"Knock, knock. Who's there? {topic}\"\n", - "\n", - "def joke_prompt_collection(joke_prompt: Collect[str]) -> list:\n", - " return list(joke_prompt)" - ] - }, - { - "cell_type": "markdown", - "id": "8c2774be", - "metadata": {}, - "source": [ - "### 2.7 Load external modules\n", - "While developing your dataflow with `%%cell_to_module`, you might want to load nodes from another Python module. To do, simply import it and add it to the `Driver` using `.with_modules()` " - ] - }, - { - "cell_type": "code", - "execution_count": 33, - "id": "c59a633b", - "metadata": {}, - "outputs": [], - "source": [ - "my_builder = driver.Builder().with_modules(joke)" - ] - }, - { - "cell_type": "markdown", - "id": "eb6c1b06", - "metadata": {}, - "source": [ - "The nodes `topic` and `joke_prompt` origin from `joke.py`" - ] - }, - { - "cell_type": "code", - "execution_count": 34, - "id": "2ca32ee2", - "metadata": {}, - "outputs": [ - { - "data": { - "image/svg+xml": [ - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "%3\n", - "\n", - "\n", - "cluster__legend\n", - "\n", - "Legend\n", - "\n", - "\n", - "\n", - "reply\n", - "\n", - "reply\n", - "str\n", - "\n", - "\n", - "\n", - "punchline\n", - "\n", - "punchline\n", - "str\n", - "\n", - "\n", - "\n", - "reply->punchline\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "joke_response\n", - "\n", - "joke_response\n", - "str\n", - "\n", - "\n", - "\n", - "joke_prompt\n", - "\n", - "joke_prompt\n", - "str\n", - "\n", - "\n", - "\n", - "joke_prompt->reply\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "joke_prompt->joke_response\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "function\n", - "\n", - "function\n", - "\n", - "\n", - "\n" - ], - "text/plain": [ - "" - ] - }, - "metadata": {}, - "output_type": "display_data" - } - ], - "source": [ - "%%cell_to_module reply --display --builder my_builder\n", - "def joke_response(joke_prompt: str) -> str:\n", - " return f\"{joke_prompt}\\n\\nCowsay who?\"" - ] - }, - { - "cell_type": "markdown", - "id": "2c7624a0", - "metadata": {}, - "source": [ - "### 2.8 Edit external modules\n", - "It is also possible to load the content of a Python module into a notebook cell to be able to edit it interactively!\n", - "\n", - "This is essentially the reverse operation of `%%cell_to_module` hence why it's called `%module_to_cell`. This is a line magic (single `%`) and it reads the content of the line as a file path to a `.py` file." - ] - }, - { - "cell_type": "code", - "execution_count": 19, - "id": "d72ac640", - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "" - ], - "text/plain": [ - "" - ] - }, - "metadata": {}, - "output_type": "display_data" - } - ], - "source": [ - "# execute this to generate a new cell\n", - "%module_to_cell ./joke.py" - ] - }, - { - "cell_type": "markdown", - "id": "c8331b92", - "metadata": {}, - "source": [ - "If you executed the previous cell, a new code cell was created above with the content of `joke.py`. You can add `--write_to_file` to write the notebook cell back to the file." - ] - }, - { - "cell_type": "markdown", - "id": "ed689861", - "metadata": {}, - "source": [ - "## 3. Execute a dataflow\n", - "One of the best part about notebooks is the ability to execute and immediately inspect results. They provide a \"read-eval-print\" loop (REPL) coding experience. With this extension, you can use a single notebook cell to define and execute your dataflow for a tight feedback loop.\n", - "\n", - "If you're familiar with Hamilton, you probably figured out that you can build a `Driver` from the dynamically defined modules (like the next cell). But we have better interactive options!" - ] - }, - { - "cell_type": "code", - "execution_count": 22, - "id": "ed95343f", - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "{'joke_prompt': \"Knock, knock. Who's there? Cowsay\"}" - ] - }, - "execution_count": 22, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "dr = driver.Builder().with_modules(joke).build()\n", - "results = dr.execute([\"joke_prompt\"], inputs=dict(topic=\"Cowsay\"))\n", - "results" - ] - }, - { - "cell_type": "markdown", - "id": "b36418f1", - "metadata": {}, - "source": [ - "### 3.1 Execute cell\n", - "By adding `--execute / -x` to your module definition, the defined dataflow will be executed using `Driver.execute()` with all available nodes.\n", - "\n", - "The `--display` visualization should now include **output** nodes reflecting the executed nodes." - ] - }, - { - "cell_type": "code", - "execution_count": 23, - "id": "0da888d5", - "metadata": {}, - "outputs": [ - { - "data": { - "image/svg+xml": [ - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "%3\n", - "\n", - "\n", - "cluster__legend\n", - "\n", - "Legend\n", - "\n", - "\n", - "\n", - "a_dataframe\n", - "\n", - "a_dataframe\n", - "DataFrame\n", - "\n", - "\n", - "\n", - "reply\n", - "\n", - "reply\n", - "str\n", - "\n", - "\n", - "\n", - "punchline\n", - "\n", - "punchline\n", - "str\n", - "\n", - "\n", - "\n", - "reply->punchline\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "joke_prompt\n", - "\n", - "joke_prompt\n", - "str\n", - "\n", - "\n", - "\n", - "joke_prompt->reply\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "function\n", - "\n", - "function\n", - "\n", - "\n", - "\n", - "output\n", - "\n", - "output\n", - "\n", - "\n", - "\n" - ], - "text/plain": [ - "" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
ab
00a
11b
22c
33d
\n", - "
" - ], - "text/plain": [ - " a b\n", - "0 0 a\n", - "1 1 b\n", - "2 2 c\n", - "3 3 d" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "data": { - "text/plain": [ - "\"Knock, knock. Who's there? Cowsay\"" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "data": { - "text/plain": [ - "'Cowsay who?'" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "data": { - "text/plain": [ - "'No, Cowsay MooOOooo'" - ] - }, - "metadata": {}, - "output_type": "display_data" - } - ], - "source": [ - "%%cell_to_module joke --display --execute\n", - "import pandas as pd\n", - "\n", - "def joke_prompt() -> str:\n", - " return f\"Knock, knock. Who's there? Cowsay\"\n", - "\n", - "def reply(joke_prompt: str) -> str:\n", - " _, _, right = joke_prompt.partition(\"? \")\n", - " return f\"{right} who?\"\n", - "\n", - "def punchline(reply: str) -> str:\n", - " left, _, _ = reply.partition(\" \")\n", - " return f\"No, {left} MooOOooo\"\n", - "\n", - "def a_dataframe() -> pd.DataFrame:\n", - " return pd.DataFrame({\"a\": [0, 1, 2, 3], \"b\": [\"a\", \"b\", \"c\", \"d\"]})" - ] - }, - { - "cell_type": "markdown", - "id": "8026ec86", - "metadata": {}, - "source": [ - "👆 As you see, node results are automatically displayed in topologically sorted order. You can hide them with `--hide_results`." - ] - }, - { - "cell_type": "markdown", - "id": "9a5738f6", - "metadata": {}, - "source": [ - "### 3.2 Requesting nodes\n", - "You can a variable name to `--execute` which specifies the list of nodes to execute. This will be reflected in the visualization." - ] - }, - { - "cell_type": "code", - "execution_count": 24, - "id": "d20198e1", - "metadata": {}, - "outputs": [], - "source": [ - "node_to_execute = [\"reply\"]" - ] - }, - { - "cell_type": "code", - "execution_count": 25, - "id": "b7eea8d6", - "metadata": {}, - "outputs": [ - { - "data": { - "image/svg+xml": [ - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "%3\n", - "\n", - "\n", - "cluster__legend\n", - "\n", - "Legend\n", - "\n", - "\n", - "\n", - "topic\n", - "\n", - "topic\n", - "str\n", - "\n", - "\n", - "\n", - "joke_prompt\n", - "\n", - "joke_prompt\n", - "str\n", - "\n", - "\n", - "\n", - "topic->joke_prompt\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "reply\n", - "\n", - "reply\n", - "str\n", - "\n", - "\n", - "\n", - "joke_prompt->reply\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "function\n", - "\n", - "function\n", - "\n", - "\n", - "\n", - "output\n", - "\n", - "output\n", - "\n", - "\n", - "\n" - ], - "text/plain": [ - "" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "data": { - "text/plain": [ - "'Cowsay who?'" - ] - }, - "metadata": {}, - "output_type": "display_data" - } - ], - "source": [ - "%%cell_to_module joke --display --execute node_to_execute\n", - "def topic() -> str:\n", - " return \"Cowsay\"\n", - "\n", - "def joke_prompt(topic: str) -> str:\n", - " return f\"Knock, knock. Who's there? {topic}\"\n", - "\n", - "def reply(joke_prompt: str) -> str:\n", - " _, _, right = joke_prompt.partition(\"? \")\n", - " return f\"{right} who?\"\n", - "\n", - "def punchline(reply: str) -> str:\n", - " left, _, _ = reply.partition(\" \")\n", - " return f\"No, {left} MooOOooo\"" - ] - }, - { - "cell_type": "markdown", - "id": "e10dd1d9", - "metadata": {}, - "source": [ - "### 3.3 Inspecting results\n", - "Ok, but how do you access results? With the node name! Magic 🧙" - ] - }, - { - "cell_type": "code", - "execution_count": 26, - "id": "5eab142c", - "metadata": {}, - "outputs": [ - { - "data": { - "image/svg+xml": [ - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "%3\n", - "\n", - "\n", - "cluster__legend\n", - "\n", - "Legend\n", - "\n", - "\n", - "\n", - "reply\n", - "\n", - "reply\n", - "str\n", - "\n", - "\n", - "\n", - "punchline\n", - "\n", - "punchline\n", - "str\n", - "\n", - "\n", - "\n", - "reply->punchline\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "topic\n", - "\n", - "topic\n", - "str\n", - "\n", - "\n", - "\n", - "joke_prompt\n", - "\n", - "joke_prompt\n", - "str\n", - "\n", - "\n", - "\n", - "topic->joke_prompt\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "joke_prompt->reply\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "function\n", - "\n", - "function\n", - "\n", - "\n", - "\n", - "output\n", - "\n", - "output\n", - "\n", - "\n", - "\n" - ], - "text/plain": [ - "" - ] - }, - "metadata": {}, - "output_type": "display_data" - } - ], - "source": [ - "%%cell_to_module joke --display --execute --hide_results\n", - "def topic() -> str:\n", - " return \"Cowsay\"\n", - "\n", - "def joke_prompt(topic: str) -> str:\n", - " return f\"Knock, knock. Who's there? {topic}\"\n", - "\n", - "def reply(joke_prompt: str) -> str:\n", - " _, _, right = joke_prompt.partition(\"? \")\n", - " return f\"{right} who?\"\n", - "\n", - "def punchline(reply: str) -> str:\n", - " left, _, _ = reply.partition(\" \")\n", - " return f\"No, {left} MooOOooo\"" - ] - }, - { - "cell_type": "code", - "execution_count": 27, - "id": "607d3863", - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "'No, Cowsay MooOOooo'" - ] - }, - "execution_count": 27, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "punchline" - ] - }, - { - "cell_type": "markdown", - "id": "ad5f79be", - "metadata": {}, - "source": [ - "The results are assigned to variables matching their node names. This means you can quickly access them by typing their name. In most notebook environment, you get tab-completion for the and can view results in the variable inspector (especially useful for dataframes).\n", - "\n", - "What's the magic trick? 🐰\n", - "\n", - "When executing the cell, we are effectively:\n", - "1. Loading it's content as a module\n", - "2. Building a `Driver` with this module\n", - "3. Executing the module with all nodes\n", - "4. Assigning values from the results of `.execute()` to variables\n", - "\n", - "Consequently, functions defined in `%%cell_to_node` are replaced by their \"value\". You access functions directly through their module:" - ] - }, - { - "cell_type": "code", - "execution_count": 28, - "id": "f9a9df33", - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "'No, Foxey MooOOooo'" - ] - }, - "execution_count": 28, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "joke.punchline(\"Foxey\")" - ] - }, - { - "cell_type": "markdown", - "id": "72b09bee", - "metadata": {}, - "source": [ - "### 3.4 Inputs & overrides\n", - "In Hamilton, *inputs* are values external to the dataflow and *overrides* are values to replace the output of a node (it effectively skips upstream operations). You can use `--inputs / -i` and `--outputs / -o` to pass dictionaries of values for execution." - ] - }, - { - "cell_type": "code", - "execution_count": 29, - "id": "ce7d51d4", - "metadata": {}, - "outputs": [], - "source": [ - "my_inputs = dict(topic=\"monday\")\n", - "my_overrides = dict(punchline=\"Bingo bongo!\")" - ] - }, - { - "cell_type": "code", - "execution_count": 30, - "id": "83c61edd", - "metadata": {}, - "outputs": [ - { - "data": { - "image/svg+xml": [ - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "%3\n", - "\n", - "\n", - "cluster__legend\n", - "\n", - "Legend\n", - "\n", - "\n", - "\n", - "reply\n", - "\n", - "reply\n", - "str\n", - "\n", - "\n", - "\n", - "punchline\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "punchline\n", - "str\n", - "\n", - "\n", - "\n", - "reply->punchline\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "joke_prompt\n", - "\n", - "joke_prompt\n", - "str\n", - "\n", - "\n", - "\n", - "joke_prompt->reply\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "_joke_prompt_inputs\n", - "\n", - "topic\n", - "str\n", - "\n", - "\n", - "\n", - "_joke_prompt_inputs->joke_prompt\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "input\n", - "\n", - "input\n", - "\n", - "\n", - "\n", - "function\n", - "\n", - "function\n", - "\n", - "\n", - "\n", - "output\n", - "\n", - "output\n", - "\n", - "\n", - "\n", - "override\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "override\n", - "\n", - "\n", - "\n" - ], - "text/plain": [ - "" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "data": { - "text/plain": [ - "'monday'" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "data": { - "text/plain": [ - "\"Knock, knock. Who's there? monday\"" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "data": { - "text/plain": [ - "'monday who?'" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "data": { - "text/plain": [ - "'Bingo bongo!'" - ] - }, - "metadata": {}, - "output_type": "display_data" - } - ], - "source": [ - "%%cell_to_module joke --display --execute --inputs my_inputs --overrides my_overrides\n", - "def joke_prompt(topic: str) -> str:\n", - " return f\"Knock, knock. Who's there? {topic}\"\n", - "\n", - "def reply(joke_prompt: str) -> str:\n", - " _, _, right = joke_prompt.partition(\"? \")\n", - " return f\"{right} who?\"\n", - "\n", - "def punchline(reply: str) -> str:\n", - " left, _, _ = reply.partition(\" \")\n", - " return f\"No, {left} MooOOooo\"" - ] - }, - { - "cell_type": "markdown", - "id": "2c21fd58", - "metadata": {}, - "source": [ - "### 3.5 Driver Adapters\n", - "You can modify execution by passing adapters to `--builder / -b`. Adapters are flexible tools that can provide a variety of features. For instance, the next few cells uses `PrintLn()` to execution status after each node." - ] - }, - { - "cell_type": "code", - "execution_count": 31, - "id": "abe2edb4", - "metadata": {}, - "outputs": [], - "source": [ - "from hamilton.lifecycle.default import PrintLn\n", - "my_builder = driver.Builder().with_adapters(PrintLn()) # add the adapter" - ] - }, - { - "cell_type": "markdown", - "id": "7d0b177c", - "metadata": {}, - "source": [ - "Notice in the printed statement how the overriden `punchline` node isn't executed." - ] - }, - { - "cell_type": "code", - "execution_count": 32, - "id": "ac0a6585", - "metadata": {}, - "outputs": [ - { - "data": { - "image/svg+xml": [ - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "%3\n", - "\n", - "\n", - "cluster__legend\n", - "\n", - "Legend\n", - "\n", - "\n", - "\n", - "reply\n", - "\n", - "reply\n", - "str\n", - "\n", - "\n", - "\n", - "punchline\n", - "\n", - "punchline\n", - "str\n", - "\n", - "\n", - "\n", - "reply->punchline\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "joke_prompt\n", - "\n", - "joke_prompt\n", - "str\n", - "\n", - "\n", - "\n", - "joke_prompt->reply\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "function\n", - "\n", - "function\n", - "\n", - "\n", - "\n", - "output\n", - "\n", - "output\n", - "\n", - "\n", - "\n" - ], - "text/plain": [ - "" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Executing node: joke_prompt.\n", - "Finished debugging node: joke_prompt in 53.4μs. Status: Success.\n", - "Executing node: reply.\n", - "Finished debugging node: reply in 10.5μs. Status: Success.\n", - "Executing node: punchline.\n", - "Finished debugging node: punchline in 9.78μs. Status: Success.\n" - ] - }, - { - "data": { - "text/plain": [ - "\"Knock, knock. Who's there? Cowsay\"" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "data": { - "text/plain": [ - "'Cowsay who?'" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "data": { - "text/plain": [ - "'No, Cowsay MooOOooo'" - ] - }, - "metadata": {}, - "output_type": "display_data" - } - ], - "source": [ - "%%cell_to_module joke --builder my_builder --display --execute \n", - "def joke_prompt() -> str:\n", - " return f\"Knock, knock. Who's there? Cowsay\"\n", - "\n", - "def reply(joke_prompt: str) -> str:\n", - " _, _, right = joke_prompt.partition(\"? \")\n", - " return f\"{right} who?\"\n", - "\n", - "def punchline(reply: str) -> str:\n", - " left, _, _ = reply.partition(\" \")\n", - " return f\"No, {left} MooOOooo\"" - ] - }, - { - "cell_type": "markdown", - "id": "15cea16a", - "metadata": {}, - "source": [ - "There are ton of awesome adapters that can help you with your notebook experience. Here are a few notable mentions:\n", - "\n", - "1. `hamilton.lifecycle.default.CacheAdapter()` will automatically version the node's code and input values and store its result on disk. When running the same node (code, inputs) pair, it will read the value from disk instead of recomputing. This can help save LLM API costs!\n", - "2. `hamilton.plugins.h_diskcache.DiskCacheAdapter()` same core features as `CacheAdapter()`, but more utilities around cache management\n", - "3. `hamilton.lifecycle.default.PrintLn()` print execution status.\n", - "4. `hamilton.plugins.h_tqdm.ProgressBar()` add a progress bar for execution.\n", - "5. `hamilton.lifecycle.default.PDBDebugger()` allows you to step into a node with a Python debugger, allowing you to execute code line by line.\n", - "\n", - "Note that all of these adapters work with Hamilton outside notebooks too!" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "venv", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.11.1" - } - }, - "nbformat": 4, - "nbformat_minor": 5 -}