Skip to content

Add pymemcache instrumentation#772

Merged
toumorokoshi merged 33 commits intoopen-telemetry:masterfrom
DataDog:add_pymemcache
Jun 12, 2020
Merged

Add pymemcache instrumentation#772
toumorokoshi merged 33 commits intoopen-telemetry:masterfrom
DataDog:add_pymemcache

Conversation

@ericmustin
Copy link
Contributor

summary

👋 This PR adds pymemcache instrumentation and tests, addressing: #766 . Still a few open questions I have, but at a place where I think this can be reviewed. By and large I tried to port things from dd-trace-py's pymemcache instumentation

  • Open Questions / Todos:

    • not 100% how to handle changelog and versioning, as well as setup.cfg and setup.py files. Took some educated guesses and basically just borrowed the structure of other integrations as a guide, let me know if I've botched anything here

    • Tried to attach licenses everywhere I could but lmk if I've missed anything.

    • this opentelemetry-ext-pymemcache implementation differs a bit from dd-trace-py, in that dd-trace-py replaces the entire Client with a wrapped client, whereas opentelemetry-ext-pymemcache is only wrapping specific methods on the original Client. The dd-trace-py approach relies on Pin, which isn't a construct available in opentelemetry-python afaik. There were a few gotchas with this approach (like ensuring methods aren't patched multiple times) but overall it behaves the same.

    • Tried to write the tests as closely as I could to the originals in dd-trace-py but was not super sure how to test some of config/client setting stuff, so left that out for now.

  • Example Usage:

    from opentelemetry import trace
    from opentelemetry.sdk.trace import TracerProvider
    from opentelemetry.ext.pymemcache import PymemcacheInstrumentor
    trace.set_tracer_provider(TracerProvider())
    from pymemcache.client.base import Client
    client = Client(('localhost', 11211))
    client.set('some_key', 'some_value')

Thanks for taking a look, happy to help however i can and address any feedback.

@ericmustin ericmustin requested a review from a team June 3, 2020 15:13
Copy link
Contributor

@majorgreys majorgreys left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM on a first pass!

"""
Usage
-----
The OpenTelemetry ``jinja2`` integration traces templates loading, compilation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update this usage

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 good catch , updated

_CMD, kind=SpanKind.INTERNAL, attributes={}
) as span:
try:
span.set_attribute("service", tracer.instrumentation_info.name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The service name is handled by the Resource in OpenTelemetry so we can skip setting this attribute.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah gotcha, makes sense, i've removed this entirely then.

Copy link
Member

@cnnradams cnnradams left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Have a few minor comments, but other than that looks great!

from opentelemetry.ext.pymemcache import PymemcacheInstrumentor
trace.set_tracer_provider(TracerProvider())
from pymemcache.client.base import Client
client = Client(('localhost', 11211))
Copy link
Member

@cnnradams cnnradams Jun 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should there be a PymemcacheInstrumentor().instrument() before creating the client?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah good catch, ty, updated

See `BaseInstrumentor`
"""

def _instrument(self, **kwargs):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A lot of other similar instrumentors also expose an instrument_client(client) function that would instrument methods on a single instantiated Client instead of every Client in every file that is created. Is that possible here, and if so do you think it provides enough value to add?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that sounds possible, I think some of the dd-trace-py instrumentations expose something to that effect, and it could have some value, but I don't think it's trivial to add. Is that convention established anywhere in opentelemetry-python at the moment?

Copy link
Member

@cnnradams cnnradams Jun 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good question, I don't think it's written down anywhere, I just see it a lot (all db instrumentations,most webserver instrumentations). There's probably no need to add it in this PR unless there was actually a use case where you would want to instrument one client but not others

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no strong convention there, but see flask for an example:https://github.com/open-telemetry/opentelemetry-python/blob/master/ext/opentelemetry-ext-flask/src/opentelemetry/ext/flask/__init__.py#L194

One thing is that "app" is probably a misnomer in the context of memcache, so the convention is loose at best.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no convention for any other method outside _instrument and _uninstrument. Since these instrumentations deal with different third party libraries it is ok to add more methods as needed (like it was done for the Flask instrumentation) to better adapt the instrumentation to the characteristics of the third party libraries.

instance.server
)

except Exception: # pylint: disable=broad-except
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would cause an exception here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just was being overly defensive, removed

elif isinstance(arg, bytes):
keys = arg.decode()
elif isinstance(arg, list) and len(arg) >= 1:
if isinstance(arg[0], str):
Copy link
Member

@cnnradams cnnradams Jun 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we guaranteed that every argument in the argument list is a string if the first one is? not familiar with pymemcache, so maybe it is guaranteed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a pymemcache expert but yes I believe that's the case, first arg is either:

  • keys, a list or dict of key strings, like here: https://github.com/pinterest/pymemcache/blob/f02ddf73a28c09256589b8afbb3ee50f1171cac7/pymemcache/client/base.py#L503
  • or a single key, a string: https://github.com/pinterest/pymemcache/blob/f02ddf73a28c09256589b8afbb3ee50f1171cac7/pymemcache/client/base.py#L324

Also, just to clarify, we're only grabbing the keys of the query, not the full command


_set_connection_attributes(span, instance)
except Exception: # pylint: disable=broad-except
pass
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be logged?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 added a warning log

return address_attributes


def _get_query_string(args):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this function only ever deals with the first element of the list, may be it should take only a single element as argument instead of a list?

Copy link
Contributor

@ocelotl ocelotl Jun 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, this function is only used in one place in your code, better to move it there to avoid having to jump to some other file to read your code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@owais agreed, i've updated this function (and moved it to the same file as the function where it's being called) to simplify things a bit

# pull out the first arg which will contain any key
arg = args[0]

# if we get a dict, convert to list of keys
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: these comments are a bit unnecessary

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fair enough, removed

def test_set_success(self):
client = self.make_client([b"STORED\r\n"])
result = client.set(b"key", b"value", noreply=False)
assert result is True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
assert result is True
self.assertTrue(result)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 updated

Copy link
Contributor

@ocelotl ocelotl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some minor comments

@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Jun 9, 2020

CLA Check
The committers are authorized under a signed CLA.

Copy link
Member

@toumorokoshi toumorokoshi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not 100% how to handle changelog and versioning, as well as setup.cfg and setup.py files. Took some educated guesses and basically just borrowed the structure of other integrations as a guide, let me know if I've botched anything here

Looks good! Not much to it. See https://github.com/open-telemetry/opentelemetry-python/blob/master/RELEASING.md for our whole process which includes bumping versions.

this opentelemetry-ext-pymemcache implementation differs a bit from dd-trace-py, in that dd-trace-py replaces the entire Client with a wrapped client, whereas opentelemetry-ext-pymemcache is only wrapping specific methods on the original Client. The dd-trace-py approach relies on Pin, which isn't a construct available in opentelemetry-python afaik. There were a few gotchas with this approach (like ensuring methods aren't patched multiple times) but overall it behaves the same.

Do you have any docs / links to "Pin"? Maybe it's a concept we should add in.

Tried to write the tests as closely as I could to the originals in dd-trace-py but was not super sure how to test some of config/client setting stuff, so left that out for now.

Do you have links? curious about these as well.

Thanks! overall LGTM.

return keys


def _unwrap(obj, attr):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a now a utility in opentelemetry-instrumentation. it would be great to refactor to use that method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah nice, updated to use the util method

See `BaseInstrumentor`
"""

def _instrument(self, **kwargs):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no strong convention there, but see flask for an example:https://github.com/open-telemetry/opentelemetry-python/blob/master/ext/opentelemetry-ext-flask/src/opentelemetry/ext/flask/__init__.py#L194

One thing is that "app" is probably a misnomer in the context of memcache, so the convention is loose at best.

Copy link
Contributor

@ocelotl ocelotl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving, I commented about a couple minor changes that could be addressed 👍

from opentelemetry.ext.pymemcache.version import __version__
from opentelemetry.instrumentation.instrumentor import BaseInstrumentor
from opentelemetry.trace import SpanKind, get_tracer
from opentelemetry.trace.status import Status, StatusCanonicalCode
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two are unused

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch, updated (removed)

See `BaseInstrumentor`
"""

def _instrument(self, **kwargs):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no convention for any other method outside _instrument and _uninstrument. Since these instrumentations deal with different third party libraries it is ok to add more methods as needed (like it was done for the Flask instrumentation) to better adapt the instrumentation to the characteristics of the third party libraries.

@toumorokoshi
Copy link
Member

@ericmustin do you have time to look at Diego's comments today? We're good to merge after maybe a final look by you.

return wrapped(*args, **kwargs)


def _get_query_string(arg):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if some of these methods defined here could go into util.py and leave only the wrapping, instrumenting code in __init__.py.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ericmustin Thoughts on this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

apologies, i missed this comment. yea i mean, i don't have strong opinions here at all, this was actually originally in util and it was suggested to move it here 😅 #772 (comment)

Happy to move _get_query_string back if that's preferred

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ocelotl has a point that it's only being used once. Maybe just move all the funcitons into this file and get rid of the utils.py?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, I think that probably makes sense, updated (merged util.py helpers into __init__.py and removed util.py ).

Copy link
Contributor

@lzchen lzchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Some non-blocking comments.

@ericmustin
Copy link
Contributor Author

@toumorokoshi apologies, i'm afk rn (i'm in CEST timezone 🇫🇷 ) so likely cannot grab these before EOD US 🇺🇸 . let me circle back to these first thing in the morning, everyone's comments seem very reasonable i should be able to address them pretty quickly. thanks for all the the thoughtful reviews everyone!

@toumorokoshi
Copy link
Member

@toumorokoshi apologies, i'm afk rn (i'm in CEST timezone ) so likely cannot grab these before EOD US . let me circle back to these first thing in the morning, everyone's comments seem very reasonable i should be able to address them pretty quickly. thanks for all the the thoughtful reviews everyone!

no worries! sorry there's no timeline here. Just @ me when things look good and I'll merge.

@ericmustin
Copy link
Contributor Author

@toumorokoshi I think I've addressed the relevant feedback here from @ocelotl , @lzchen , @cnnradams and others. Thanks again all for the great feedback! I'm still a little bit confused on what I should include in the CHANGELOG for pymemcache, and whether I should be cutting a release or whatnot, I assume not since I'm not a maintainer?

Addressing the other feedback points:

  1. regarding the question on Pin
    • Do you have any docs / links to "Pin"? Maybe it's a concept we should add in.

  • Yup sure, you can find some details on it in the dd-trace-py pypi docs

    • As I understand it, it basically allows user to add specific context to different connections, and it exposes to helper methods to do so

    • Pin (a.k.a Patch Info) is a small class which is used to set tracing metadata on a particular traced connection. This is useful if you wanted to, say, trace two different database clusters.

  1. regarding the client/config tests that i didn't port over
  • for reference they can be found here . Taking a closer look it's mostly just testing some datadog specific settings, so I don't believe it's useful.
  1. Lastly, regarding adding a method to trace specific pymemcache clients, as opposed to all of them.
  • I think it's useful and the flask example is helpful to look at for guidance, but would prefer to move it outside the scope of this PR, as I think it's probably worth considering adding a general helper class like "Pin" first to make this sort of work more standard.

Lmk if there's anything else needed here or I've missed something, happy to help make sure this gets brought across the finish line.

@lzchen
Copy link
Contributor

lzchen commented Jun 10, 2020

@ericmustin
As for your CHANGELOG question, a good example would be this PR in what to include for the CHANGELOG for a new package. Place it under unreleased and when we do an actual release, the maintainers will update all the CHANGELOG and version.py files to the correct ones.

@ericmustin ericmustin requested a review from majorgreys June 10, 2020 19:34
@toumorokoshi
Copy link
Member

@ericmustin sorry, but this needs to be updated to the new version (0.10.dev.0) since that was released yesterday.

can you either update the branch yourself, or change the PR configuration so we can make changes to your branch?

@ericmustin
Copy link
Contributor Author

@toumorokoshi Updated to 0.10.dev.0 , thanks! Going forward i'll allow maintainer edits on the PR to make stuff like this easier.

@toumorokoshi toumorokoshi merged commit ca232c9 into open-telemetry:master Jun 12, 2020
@toumorokoshi
Copy link
Member

Great, thanks a lot! Sorry for all the back and forth.

srikanthccv pushed a commit to srikanthccv/opentelemetry-python that referenced this pull request Nov 1, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants