Skip to content

flask.copy_current_request_context+gevent+FlaskInstrumentor cause improper span lifecycles #1551

@matthewgrossman

Description

@matthewgrossman

high-level problem

To create a new request-aware greenlet within a flask request, we copy the existing flask-req-ctx and push a new req-ctx onto the req-ctx-stack. When this greenlet+req-ctx ends, it tries to call the patched _teardown_request within flask-instrumentor, which throws errors because the greenlet doesn't "own" the span, and therefore shouldn't be allowed to end it.

Describe your environment

I created a test repo with all of the source code for reproduction: https://github.com/matthewgrossman/otel-flask-repro. The important file is the app.py.

From trying a few different versions, it doesn't seem that the pinned deps matter too much, but here's a requirements.txt on macOS 13.0.1:

gevent==22.10.2
flask==1.1.4
opentelemetry-instrumentation-flask
opentelemetry-sdk
opentelemetry-api

If you'd prefer, I could turn this env into a docker image to make reproing easier for others. They key deps are flask, gevent, and opentelemetry-instrumentation-flask, and I've noticed this problem in every variation of those deps that I've tried.

Steps to reproduce

$ python3.8 -m venv venv
$ source venv/bin/activate
$ pip install -r requirements.txt
$ python -m flask run -p 8080

in other shell:

curl localhost:8080/test -H 'x-test-header: val`

In the running shell with the flask service, you should see an error like:

(venv)  ~/src/otel-flask-repro  » python -m flask run -p 8080
 * Environment: production
   WARNING: This is a development server. Do not use it in a production deployment.
   Use a production WSGI server instead.
 * Debug mode: off
 * Running on http://127.0.0.1:8080/ (Press CTRL+C to quit)
Failed to detach context
Traceback (most recent call last):
  File "/Users/mgrossman/src/otel-flask-repro/venv/lib/python3.8/site-packages/opentelemetry/context/__init__.py", line 157, in detach
    _RUNTIME_CONTEXT.detach(token)  # type: ignore
  File "/Users/mgrossman/src/otel-flask-repro/venv/lib/python3.8/site-packages/opentelemetry/context/contextvars_context.py", line 50, in detach
    self._current_context.reset(token)  # type: ignore
ValueError: <Token var=<ContextVar name='current_context' default={} at 0x10f888040> at 0x10f88f240> was created in a different Context
Failed to detach context
Traceback (most recent call last):
  File "/Users/mgrossman/src/otel-flask-repro/venv/lib/python3.8/site-packages/opentelemetry/context/__init__.py", line 157, in detach
    _RUNTIME_CONTEXT.detach(token)  # type: ignore
  File "/Users/mgrossman/src/otel-flask-repro/venv/lib/python3.8/site-packages/opentelemetry/context/contextvars_context.py", line 50, in detach
    self._current_context.reset(token)  # type: ignore
ValueError: <Token var=<ContextVar name='current_context' default={} at 0x10f888040> at 0x10f88f180> was created in a different Context
127.0.0.1 - - [15/Dec/2022 11:17:54] "GET /test HTTP/1.1" 200 -

What is the expected behavior?

I wouldn't expect to see this error message.

What is the actual behavior?

This error message is caused by the greenlet's flask-req-ctx trying to call _teardown_request for the overall parent flask-req-ctx. This causes issues with contextvars and span ownership.

When a new request comes into the flask application, FlaskInstrumentor set the current span+ctxmanager on the flask.request.environ, seen here: flask-instrumentor

When we create a new greenlet, we copy in the current contextvars context, as well as create a new flask-req-context based upon the existing flask-req-context. When this greenlet ends, the teardown gets called and attempts to call activation.__exit__. This causes the error seen in the above reproduction steps.

Just a few lines above, there already is a guard clause meant to stop this exact problem: https://github.com/open-telemetry/opentelemetry-python-contrib/blob/main/instrumentation/opentelemetry-instrumentation-flask/src/opentelemetry/instrumentation/flask/__init__.py#L434-L435.

However, this only ends up protecting us in situations where the ending-req-ctx has a fresh flask.request.environ, like with a test_request_context (noted in the comments of the guard clause). When creating greenlets based on the existing flask request context, this guard clause doesn't sufficiently realize that this ending-req-ctx should also escape early, and not attempt to end the span

Additional context

In the reproduction source code, I provided a workaround. When you create a new greenlet based on the current request context, first pop off the _ENVIRON_ACTIVATION_KEY to ensure the code does go through the guard clause (workaround source code). Commenting out the existing wrap_fn_in_req_context__broken and commenting-in the wrap_fn_in_req_context__workaround removes the error.

However, I'm hoping there's some way to fix such that my code shouldn't need to pop a specific key from an environ before spawning greenlets. A couple options:

  • Somehow indicate in the environ which flask-req-ctx-id ended up creating the span. When calling _teardown_request, escape early if the current flask-req-ctx doesn't match the original req-ctx that created the span in the first place
  • ?? open to suggestions!

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions