An IPython/Jupyter extension that instruments registered Python function calls and sends non-blocking usage telemetry to remote endpoints.
Full documentation: https://access-py-telemetry.readthedocs.io
- Non-blocking telemetry — async in Jupyter; subprocess outside Jupyter
- YAML-driven configuration — register any function from any package to track
- CLI for enabling/disabling telemetry in IPython startup
- Decorator API for library authors to instrument functions at definition time
pip install access_py_telemetryOr via conda:
conda install accessnri::access_py_telemetryThe access-py-telemetry CLI manages the IPython startup script that registers telemetry across all notebook cells.
It installs the following code to your IPython startup profile:
try:
from access_py_telemetry import capture_registered_calls
from IPython import get_ipython
get_ipython().events.register("shell_initialized", capture_registered_calls)
print("Intake telemetry extension loaded")
except ImportError as e:
print("Intake telemetry extension not loaded")
raise eIf you are using the conda/analysis3 environment, telemetry will be enabled by default.
To enable, disable, or check the status of telemetry from a notebook:
!access-py-telemetry --enable
!access-py-telemetry --disable
!access-py-telemetry --statusOr from the command line:
$ access-py-telemetry --enable
$ access-py-telemetry --disable
$ access-py-telemetry --statusThis needs to be added to the system config for ipython, or it can be added to your user config (~/.ipython/profile_default/startup/) for testing. See Ipython documentation for more information.
If this package is used within a Jupyter notebook, telemetry calls will be made asynchronously, so as to not block the execution of the notebook. This means that the telemetry calls will be made in the background, and will not affect the performance of the notebook.
Outside a Jupyter notebook, telemetry calls will be made in a new python process using the multiprocessing module, and so will be non-blocking but may have a small overhead.
- License: Apache Software License 2.0
- Documentation: https://access-py-telemetry.readthedocs.io
The TelemetryRegister class is used to register and deregister functions for telemetry. By default, it will read from config.yaml to get the list of functions to register.
A sample config.yaml file is shown below:
intake:
catalog:
- esm_datastore.search
- DfFileCatalog.search
- DfFileCatalog.__getitem__
payu:
run:
- Experiment.run
restart:
- Experiment.restartThis config file has two main purposes: to provide a list of function calls which ought to be tracked, and to specify where the telemetry data should be sent.
In this example, there are three endpoints:
intake/catalogpayu/runpayu/restart
which track the corresponding sets of functions:
{esm_datastore.search, DfFileCatalog.search, DfFileCatalog.__getitem__}{Experiment.run}{Experiment.restart}
Service Names are built from the config file, and are built by replacing the / with a _ in the endpoint name - ie.
intake_catalog<=>intake/catalogpayu_run<=>payu/runpayu_restart<=>payu/restart
Typically, the top level part service name (eg. intake) will correspond to both a Django app and a single client side package (eg. intake, Payu, etc that you wish to track), and the rest of the endpoint will correspond to a view within that app. For example, if you had a package named executor for which you wanted to track run and save_results functions in separate tables, you would have the following config:
executor:
run:
- executor.run
save_results:
- executor.save_resultsThe corresponding models in the tracking_services Django app would be ExecutorRun and ExecutorSaveResults:
class ExecutorRun(models.Model):
function_name = models.CharField(max_length=255)
args = JSONField()
kwargs = JSONField()
session_id = models.CharField(max_length=255)
interesting_data = JSONField()
timestamp = models.DateTimeField(auto_now_add=True)
class ExecutorSaveResults(models.Model):
function_name = models.CharField(max_length=255)
args = JSONField()
kwargs = JSONField()
session_id = models.CharField(max_length=255)
timestamp = models.DateTimeField(auto_now_add=True)
save_filesize = models.IntegerField()
user_id = models.CharField(max_length=255)
execution_time = models.FloatField()
memory_usage = models.FloatField()
cpu_usage = models.FloatField()To add a function to the list of functions about which usage information is collected when telemetry is enabled, use the TelemetryRegister class, and it's register method. You can pass the function name as a string, or the function itself.
from access_py_telemetry.registry import TelemetryRegister
registry = TelemetryRegister('my_service')
registry.register('some_func')You can additionally register a number of functions at once, by passing either the functions or their names as strings:
registry.register(some_func, 'some_other_func', another_func)To remove a function from the list of functions about which usage information is collected when telemetry is enabled, use the deregister_telemetry function.
registry.deregister(some_func)or
registry.deregister(some_func, some_other_func, another_func)If you plan to add telemetry to your library & it's main use case is within a Jupyter notebook, it is recommended to use the ipy_register_func decorator to register your functions.
Otherwise, use the register_func decorator to register your functions.
To register a user defined function, use the access_telemetry_register decorator.
from access_py_telemetry.decorators import ipy_register_func
@ipy_register_func("my_service")
def my_func():
...or
from access_py_telemetry.decorators import ipy_register_func
@ipy_register_func("my_service", extra_fields=[
{"interesting_data_1" : something},
{"interesting_data_2" : something_else},
])
def my_func():
...Specifying the extra_fields argument will add additional fields to the telemetry data sent to the endpoint. Alternatively, these can be added later:
from access_py_telemetry.api import ApiHandler
from access_py_telemetry.decorators import ipy_register_func
@ipy_register_func("my_service")
def my_func():
...
api_handler = ApiHandler()
api_handler.add_extra_field("my_service", {"interesting_data": interesting_data})Adding fields later may sometimes be necessary, as the data may not be available at the time of registration/function definition, but will be when the function is called.
We can also remove fields from the telemetry data, using the pop_fields method. This might be handy for example, if you want to remove a default field. For example, telemetry will include a session ID (bound to the Python interpreter lifetime) by default - if you are writing a CLI tool, you will probably want to remove this field.
from access_py_telemetry.api import ApiHandler
from access_py_telemetry.decorators import register_func
@register_func("my_service", extra_fields = [{"cli_config" : ...}, {"interesting_data" : ...}])
def cli_execute():
"""
Function to execute the CLI tool
"""
...
api_handler = ApiHandler()
api_handler.pop_fields("my_service", ["session_id"])Note: Wherever you instantiate the ApiHandler class, the same ApiHandler instance will be returned - you do not need to pass around a single ApiHandler instance to ensure consistency: See Implementation details for more information.
from access_py_telemetry.decorators import register_func
@register_func("my_service",extra_fields=[
{"interesting_data_1" : something},
{"interesting_data_2" : something_else},
])
def my_func():
pass(Assuming my_func has been registered as above)
>>> intake_registry = TelemetryRegister('intake_catalog')
>>> print(intake_registry)
["esm_datastore.search", "DfFileCatalog.search", "DfFileCatalog.__getitem__"]
>>> my_registry = TelemetryRegister('my_service')
>>> print(my_registry)
["my_func"]When you are happy with your telemetry configuration, you can update the default registry with your custom registry. This should be done via a PR, in which you update the config.yaml file with your additional functionality to track:
In the case of my_service, you would add the following to config.yaml:
intake:
catalog:
- esm_datastore.search
- DfFileCatalog.search
- DfFileCatalog.__getitem__
+ my:
+ service:
+ - my_func
+ - my_other_funcIn order to send telemetry, you will need an endpoint in the ACCESS-NRI Tracking Services to send the telemetry to.
If you do not have an endpoint, you can use the following endpoint for testing purposes:
TBAPresently, please raise an issue on the tracking-services repository to request an endpoint.
Once you have an endpoint, you can send telemetry using the ApiHandler class.
from access_py_telemetry.api import ApiHandler
from xyz import interesting_data
my_service_name = "my_service"
api_handler = ApiHandler()
api_handler.add_extra_field(my_service_name, {"interesting_data": interesting_data})
# NB: If you try to add extra fields to a service without an endpoint, it will raise an exception:
api_handler.add_extra_field("my_other_service", {"interesting_data": interesting_data})
> KeyError: Endpoint 'my_other_service' not found. Please add an endpoint for this service.The ApiHandler class will send telemetry data to the endpoint you specify. To send telemetry data, use the ApiHandler.send_api_request() method.
If you visit the endpoint in your browser, you should see sent data, which will be of the format:
{
"id": 1,
"timestamp": "2024-12-19T07:34:44.229048Z",
"name": "u1166368",
"function": "function_name",
"args": [],
"kwargs": {
"test": true,
"variable": "search"
},
"session_id": "83006a25092df6bae313f1e4b6be93f81e62205967fa5aa68fc4f1b081095299",
"interesting_data": interesting_data
},If you have not registered any extra fields, the interesting_data field will not be present.
Configuration of extra fields, etc, should be performed as import time side effects of you code in order to ensure telemetry data are sent correctly & consistently.
The ApiHandler class is a singleton, so if you want to configure extra fields to send to your endpoint, you do not need to take care to pass the correct instance around - simply instantiate the ApiHandler class in the module where your extra data is and call the add_extra_field method on it:
eg. myservice/component1.py
from access_py_telemetry.api import ApiHandler
api_handler = ApiHandler()
service_component1_config = {
"component_1_config": interesting_data_1
}
api_handler.add_extra_field("myservice", service_component1_config)and myservice/component2.py
from access_py_telemetry.api import ApiHandler
api_handler = ApiHandler()
service_component2_config = {
"component_2_config": interesting_data2
}
api_handler.add_extra_field("myservice", service_component2_config)Then, when telemetry is sent, you will see the component_1_config and component_2_config fields in the telemetry data:
{
"id": 1,
"timestamp": "2024-12-19T07:34:44.229048Z",
"name": "u1166368",
"function": "function_name",
"args": [],
"kwargs": {
"test": true,
"variable": "search"
},
"session_id": "83006a25092df6bae313f1e4b6be93f81e62205967fa5aa68fc4f1b081095299",
"component_1_config": interesting_data_1,
"component_2_config": interesting_data_2,
}In order to track user sessions, this package uses a Session Identifier, generated using the SessionID class:
>>> from access_py_telemetry.api import SessionID
>>> session_id = SessionID()
>>> session_id
"83006a25092df6bae313f1e4b6be93f81e62205967fa5aa68fc4f1b081095299"Session Identifiers are unique to each python interpreter, and only change when the interpreter is restarted.