Remove global from task_runner supervisor-comms#59876
Remove global from task_runner supervisor-comms#59876jscheffl wants to merge 11 commits intoapache:mainfrom
Conversation
d1fdaaa to
4bb5081
Compare
4bb5081 to
780ce42
Compare
There was a problem hiding this comment.
imo globals can be bad, but this specific use case is one of the legitimate ones, in fact better than the approach without globals.
def supervisor_comms() -> CommsDecoder[ToTask, ToSupervisor]:
return _SupervisorCommsHolder.commsBut then violates it in supervisor.py:1669:
task_runner._SupervisorCommsHolder.comms = temp_comms # and every call site becomes verbose:
Before:
SUPERVISOR_COMMS.send(msg)After (unnecessary function call)
supervisor_comms().send(msg)Every access now requires a function call + None check instead of direct variable access.
My 2 cents :)
| # If this is set it means are in some kind of execution context (Task, Dag Parse or Triggerer perhaps) | ||
| # and should use the Task SDK API server path | ||
| if hasattr(sys.modules.get("airflow.sdk.execution_time.task_runner"), "SUPERVISOR_COMMS"): | ||
| from airflow.sdk.execution_time.task_runner import is_supervisor_comms_initialized |
There was a problem hiding this comment.
this used hasattr on sys so we don't need to have airflow.sdk installed on the server components
There was a problem hiding this comment.
Yeah. I think we should solve it differently, I would say here just checking if "task-sdk" is installed should be a better check?
There was a problem hiding this comment.
Or smth else - but this should be revised after we complete the isolation.
There was a problem hiding this comment.
Okay, have reverted/adjusted to previous state - as it was repeated Copy&Paste have moved it into a utility in https://github.com/apache/airflow/pull/59876/changes#diff-7694d13e2f87c84d20b0b8b44797bf96d754ae270204217e518082decc74649bR104
There was a problem hiding this comment.
Would leave other improvements of detection to another PR.
I think if we add I like this pattarn that Jens introduced a lot more to be honest, it is more unit-test friendly IMHO. Also that removes the None-check for every call. When None is detected in a first call - RuntimeException is thrown and interpreter will exit - so None will be checked only at first use when cache is set. |
amoghrajesh
left a comment
There was a problem hiding this comment.
I would be OK with the current pattern we have for the sake of ease of understanding but if we have to change it, I do not think there is a better solution than this one. Would require some getting used to to use the new pattern
@kaxil @amoghrajesh fair points and I am also not 100% convinced the solution is perfect. Te usage of the supervisor_send(data)...as a shortcut? Would it make it better to handle the sending in the function and have better readable code w/o global? By the way with |
I don't think so. Calling a function in Python is generally very slow operation. It's not a classic pointer jump. Functions calls in Python are done by interpreter, they are not using jumps as C programs do. Interpreter has to look-up the method to call, create a new frame, push it on "Python stack" and clean the frame after it returns. This is all done "in the interpreter" - it's not even using the processor stack. "Python stack" for method frames is actually stored in heap memory, not in processor stack - so any stack manipulation (calling and returning from function) is kinda slow. I did some basic micro-benchmarks: import time
from functools import lru_cache
class MethodBenchmark:
def __init__(self):
self.call_count = 0
def empty_method(self):
"""Empty method without caching"""
self.call_count += 1
return None
@lru_cache(maxsize=128)
def cached_empty_method(self):
"""Empty method with caching"""
return None
def benchmark():
obj = MethodBenchmark()
iterations = 1_000_000
# Benchmark non-cached method
start = time.perf_counter()
for _ in range(iterations):
obj.empty_method()
non_cached_time = time.perf_counter() - start
# Benchmark cached method
start = time.perf_counter()
for _ in range(iterations):
obj.cached_empty_method()
cached_time = time.perf_counter() - start
# Results
print(f"Non-cached method: {non_cached_time:.6f} seconds")
print(f"Cached method: {cached_time:.6f} seconds")
print(f"Speedup: {non_cached_time / cached_time:.2f}x")
print(f"Non-cached call count: {obj.call_count}")
if __name__ == "__main__":
benchmark()Result with Python 3.10 This means that when you put I modified the code of both methods to do single if: And there are the results: (100001 is because cached call increased it by 1) There are plenty of optimisations in Python 3.11 - 3.14 that might skew this simple example (specializing adaptive interpreter changes and JIT) - so I run it with Python 3.10 Similar discussion: https://stackoverflow.com/questions/14648374/python-function-calls-are-really-slow |
I think that's a good idea, it's likely better to expose "comms actions" than "comm" itself. |
|
BTW. To be perfectly honest, In this case I think performance is not as important (at least until we will start doing the communication very frequently - for example following @dabla optimisation / async task reporting back status of individual async coroutines back to scheduler). The comms overhead for inter-process communication and serialization of data involved (every such call needs to serialize data sent across the wire - even if shared memory is used to communicate between processes) is already likely order of magnitude slower than single method call in Python, so this should not be too much of a concern. In this case I think we should optimise for readability, I also do not like the extra |
I agree with this. If we have to go down the way of removing globals here, I would prioritise using |
780ce42 to
df71bea
Compare
|
Thanks for the feedback, have it now adjusted to use Re-review appreciated :-D (Am surprised actually that I wanted to remove two global statements but actually am slimming down the codebase by 40 LoC now :-D) |
2df8bfe to
18f93b3
Compare
|
Up by another 109 commits - can I have another round of feedback @ashb / @amoghrajesh / @kaxil ? |
18f93b3 to
c103ddf
Compare
|
Up by another 59 commits |
I like that refactor. |
c103ddf to
a5aeb48
Compare
|
up by 178 commits |
a5aeb48 to
66b9a1a
Compare
66b9a1a to
9ec7b07
Compare
|
Up by 200 commits |
dabla
left a comment
There was a problem hiding this comment.
Really like this refactoring, DRY-principle was applied, even more typing, looks really clean to me.
|
Much as @kaxil said in his very first comment, I'm still -1 on this, You've changed one kind of global for another. before the global was task_runner.SUPERVISOR_COMMS = comms_decoderAfter the global is SupervisorComms().set_comms(comms_decoder)This is a change for no difference to my mind. Honestly I don't see the point of the change. |
I'm not that opinionated on that point, so I'm neutral here but I like the other refactorings that have been done:
So I think this PR has added value, but that's my personal opinion. |
I tried to follow your request to adjust. I had a static class instance before and went into a singleton pattern. Also please see that compared to Yes, technically a singleton is also a global but with the construct it allows a controlled access and makes clear that access is validated. All calls can trust that the method will fail with a clear error. If the codebase grows we will lose control over time where the global is spread in the context. I saw multiple times when testing changes here that a pytest failed on the global set to None and needed to search where some other test messed-up the global. I put effort in this because I'd prefer to have a codebase where we are clean and can be proud of, local optimizations with |
It’s a bit unfortunate that Python doesn’t have a built-in Dependency Injection mechanism comparable to what frameworks like Spring provide in the Java ecosystem. Then components like the CommsSupervisor would be a singleton by default and lifecycle/visibility are explicitly controlled via configuration or annotations in Java which would translate to decorators in Python. |
Another small (in this case rather medium complex) increment to remove global statements for PR #58116
This removes 2 global statements from task_runner.py where explicitly a global wariable was used as shared SUPERVISOR_COMMS by intent. Proposing to change with via a static class and accessor-methods to prevent usage of global variables.
globalis evil.For this PR to merge seeking for explicit approval from one of the Task SDK creators @ashb, @kaxil and/or @amoghrajesh