Skip to content

Support notebooks#333

Merged
pappasam merged 16 commits intopappasam:mainfrom
seeM:notebook-support
Mar 17, 2025
Merged

Support notebooks#333
pappasam merged 16 commits intopappasam:mainfrom
seeM:notebook-support

Conversation

@seeM
Copy link
Contributor

@seeM seeM commented Feb 27, 2025

This PR adds notebook support. Fixes #324.

How it works

  1. Identify when a textDocument/* request is for a notebook cell.
  2. Use the concatenated notebook source instead of just the individual cell.
  3. Change any input positions and ranges to be relative to the concatenated notebook instead of the individual cell.
  4. Finally, change any output positions and ranges to be relative to individual cells instead of the concatenated notebook.

Unchanged language features

The following methods are excluded because the full notebook source shouldn't affect the result of a given cell:

  1. textDocument/documentSymbol
  2. textDocument/highlight

Unsupported cases

There are also a few methods which work well if the source of the request is from a notebook cell, but ignore any other .ipynb files. I think the root of this issue is that Jedi doesn't support .ipynb files:

  1. textDocument/references
  2. textDocument/rename
  3. workspace/symbol

Questions

Possible diagnostics performance concern

I decided to publish diagnostics independently for each cell. This means we might start many threads when opening a large notebook. Do you think that's a concern?

It should be less of an issue for changing notebooks, since I don't think many cells should change in a short period of time.

We could instead:

  1. Calculate diagnostics once-off for the concatenated notebook. It'd be less demanding but would hide any syntax errors after the first in the notebook.
  2. Limit the number of concurrent diagnostics requests, either globally or for each notebook.

Package for use by other pygls servers?

I ended up keeping jedi_utils as untouched as possible to keep the existing decoupling, and adding most of the new logic into notebook_utils. This ends up being quite verbose in server but I think it keeps things much simpler overall. What do you think?

I think it might be possible to totally decouple this approach from jedi-language-server, either making it a separate package, or as a PR to pygls. Kind of like a decorator over LSP feature functions that modifies document.source and any positions/ranges in params, and similarly for the output. The benefit would be that it could be used in other pygls servers too. I haven't given it too much thought though, so there might be some hard blockers there.

@seeM
Copy link
Contributor Author

seeM commented Feb 28, 2025

@pappasam this is ready for review

@pappasam
Copy link
Owner

Looks like the tests are failing on Windows because of line endings

@seeM
Copy link
Contributor Author

seeM commented Mar 3, 2025

@pappasam this is ready for review again. I looked into the Windows issue and it appears to be caused by an unrelated bug in how pygls handles line endings versus how parso does (#159 (comment)). I've skipped the test on Windows for now.

Copy link
Owner

@pappasam pappasam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for spending the time crafting this thorough PR! Support for .ipynb will definitely be a killer feature.

Responses

  1. I agree with your decision to stay away from jedi_utils, especially during this early stage. We can always generalize later.
  2. I think it's worth exploring a limit on the number of concurrent diagnostics requests, either globally or for each notebook. I'd prefer that we slow things down instead of reducing capabilities.

Architectural Thoughts

One concern I have is how we're considering both .py and .ipynb dynamically for each server endpoint. We're adding filetype checking logic to each endpoint, which complicates the code.

Might it be better to duplicate server.py into something like server_ipynb.py and provide either a command line flag (--target notebook/script) or a different binary (jedi-language-notebook-server...)? A project with both notebooks and scripts might need two server instances, but this would keep implementations separate and provide a cleaner path for supporting additional filetypes.

Still thinking through this one, but putting these thoughts here to spur discussion if you've considered these points already.

Testing Request

Could you provide:

  1. A simple example notebook
  2. Steps for setting up a basic Jupyter environment
  3. Instructions for configuring an editor to use our language server with notebooks

Since I use Neovim, I'd likely need to test through the notebook server in a browser, so any guidance there would be especially helpful.

@seeM
Copy link
Contributor Author

seeM commented Mar 4, 2025

Thank you for the feedback! Here's how you can test the branch in an editor:

  1. Download VSCode.

  2. Install the Python extension.

  3. Check out this branch and override the bundled jedi-language-server in the Python extension with this branch:

    pip install <path to jedi-language-server at branch seeM:notebook-support> --no-deps -t ~/.vscode/extensions/ms-python.python-2025.0.0-darwin-arm64/python_files/lib/jedilsp/ --upgrade

    The extension might be at a slightly different location depending on version, platform, and architecture. I think you'll have to repeat this if you update the extension. You can rerun the same command replacing the path with jedi-language-server to revert this.

  4. Configure the python.languageServer setting to "Jedi".

  5. Run the "Create: New Jupyter Notebook" command.

  6. Select a Python environment to use. It'll prompt you to install ipykernel (or you can install it manually into an environment first).

You can try out any of the test notebooks used in this branch. The main thing is testing that things work across cells.

@seeM
Copy link
Contributor Author

seeM commented Mar 4, 2025

One concern I have is how we're considering both .py and .ipynb dynamically for each server endpoint. We're adding filetype checking logic to each endpoint, which complicates the code.

@pappasam I agree that the code is complicated by the script/notebook decision spread over too many places. Although I think we can address that without requiring an additional server.

We could add a server_ipynb.py as you suggest, but have a single actual server that dispatches requests to the correct implementation. I suspect we'd want to reuse code across those, so we might end up extracting a function for each LSP method – maybe those could live in jedi_utils.

I doubt that the dispatching would add any real performance overhead. What do you think?

A project with both notebooks and scripts might need two server instances, but this would keep implementations separate and provide a cleaner path for supporting additional filetypes.

I suspect that .ipynb is a unique case and that we wouldn't need to support other file types anytime soon.

@pappasam
Copy link
Owner

pappasam commented Mar 6, 2025

We could add a server_ipynb.py as you suggest, but have a single actual server that dispatches requests to the correct implementation. I suspect we'd want to reuse code across those, so we might end up extracting a function for each LSP method – maybe those could live in jedi_utils.

Yes, I like this approach!

Only clarification I'd like to make here: jedi_utils modularizes the Jedi API itself. Abstractions for the server functions themselves (a pieced together mish-mash of jedi_utils, text_edit_utils, pygls_utils, and the new notebook_utils) should probably live in a new module called server_utils (or something like that...)

@seeM
Copy link
Contributor Author

seeM commented Mar 7, 2025

@pappasam there's a very rough but working approach in 3e0b496 that uses a decorator to add notebook support to LSP features. 4ed862f is closer to what we were discussing, but I think the decorator approach is more concise and still sufficient. What do you think?

@pappasam
Copy link
Owner

pappasam commented Mar 9, 2025

Exciting progress! I finally spun up code to test and noticed a couple things:

First: my (albeit underpowered) laptop is on fire

With code+notebook open:

image

Once code is closed:

image

Second:

It looks like Goto Definition doesn't work for variables that are defined in cells (it does work for things installed in the virtualenv):

Peek 2025-03-09 09-58

In fact, it looks like cross-cell references don't really work. Do you have a different experience? If yes, maybe you can share a video demonstrating successful usage of the various features.

My system:

OS:

Distributor ID: Ubuntu
Description:       Ubuntu 24.04.2 LTS
Release:               24.04
Codename:             noble

Laptop:
  System Information
      Manufacturer: Dell Inc.
      Product Name: XPS 15 9560

@seeM
Copy link
Contributor Author

seeM commented Mar 9, 2025

That's unfortunate! Can you try on 936f4a9? I might've broken something while refactoring.

I'll take a look at the CPU usage, fix any bugs that might've crept in, and post a video on Monday.

Does CPU usage only start when the language server is busy in a notebook or could that be related to something else in VSCode?

@pappasam
Copy link
Owner

pappasam commented Mar 9, 2025

Hard to pin down the exact difference, but FWIW, I tested out vscode with a regular .py file on jls's lastest main branch, and my old CPU's are working less hard (non-zero effort, but my fans aren't blowing out):

image

@seeM
Copy link
Contributor Author

seeM commented Mar 11, 2025

@pappasam Sorry about that, I had introduced a bug while refactoring. It should work on the latest commit, here's a video:

Screen.Recording.2025-03-11.at.13.44.44.mov

Note that if the Python extension updated, you'll have to change the path you pip install to and restart VSCode.

  • I haven't been able to reproduce the performance issues you saw. I wouldn't be surprised if it was caused by something else in VSCode, especially if it was your first time using the editor e.g. the Python extension will try to discover all interpreters without a cache. You could also try checking CPU usage during the tests since they're end-to-end.
  • I've also limited the number of concurrent diagnostics to 10.
  • I've found that the extract variable/function code actions sometimes add new text to the wrong cells, so it might be better to disable those code actions in notebooks for now.
  • I tried out a generalized decorator approach but admit that maybe something more direct/verbose would be simpler. Only challenge there is how to actually implement the dispatcher.

Please take a look and let me know what you think 😄

Copy link
Owner

@pappasam pappasam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the late response, I was traveling and finally found time to devote to this.

  1. The performance issues don't seem to be affecting with your latest commits! Either it was a spurious vs-code issue, or your fixes did the trick.
  2. Overall, I've manually tested various features, and things appear to work great (goto definition successfully navigatest to appropriate cells, python files open, auto-completion works, etc).

One question for you inline regarding the semaphore.

All-in-all, this looks like it's just about ready to go!

Comment on lines +83 to +84
func(*args, **kwargs)
_debounce_semaphore.release()
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason we're not wrapping this in a try/finally? Haven't fully wrapped my head around the implication, but it feels like we might want to release the semaphore even if an exception occurs in func.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!

@pappasam
Copy link
Owner

@pappasam Sorry about that, I had introduced a bug while refactoring. It should work on the latest commit, here's a video:
Screen.Recording.2025-03-11.at.13.44.44.mov

Note that if the Python extension updated, you'll have to change the path you pip install to and restart VSCode.

* I haven't been able to reproduce the performance issues you saw. I wouldn't be surprised if it was caused by something else in VSCode, especially if it was your first time using the editor e.g. the Python extension will try to discover all interpreters without a cache. You could also try checking CPU usage during the tests since they're end-to-end.

* I've also limited the number of concurrent diagnostics to 10.

* I've found that the extract variable/function code actions sometimes add new text to the wrong cells, so it might be better to disable those code actions in notebooks for now.

* I tried out a generalized decorator approach but admit that maybe something more direct/verbose would be simpler. Only challenge there is how to actually implement the dispatcher.

Please take a look and let me know what you think 😄

Yeah, we should probably disable variable/function code actions for notebooks.

Note: after seeing it, I don't hate the decorator approach you've designed and implemented here

@seeM
Copy link
Contributor Author

seeM commented Mar 17, 2025

Thanks for the review! I've wrapped the semaphore release and disabled code actions for notebooks.

Note: after seeing it, I don't hate the decorator approach you've designed and implemented here

Cool, let's go with this approach for now since I'm also running out of time on this.


If we ever revisit this in future, I saw a middleware approach in the JS language client that might be more generally useful. The idea is to be able to register a middleware class that implements methods like:

def completion(self, server, params, next):

Where next calls the next registered middleware in the chain. That would ideally be implemented in pygls but could be done here too.

@seeM seeM requested a review from pappasam March 17, 2025 11:06
Copy link
Owner

@pappasam pappasam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

Amazing, thanks so much for contributing this feature, and for all the thought you've put into both the current design and next steps

@pappasam pappasam merged commit 821c55e into pappasam:main Mar 17, 2025
12 checks passed
@seeM seeM deleted the notebook-support branch March 17, 2025 14:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support for ipynb Notebooks

2 participants