Skip to content

Support for initializing span processors and exporters without initializing the NodeSDK #594

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wrose504 opened this issue Apr 17, 2025 · 6 comments · Fixed by #596
Closed

Comments

@wrose504
Copy link

Currently calling initialize imported from @traceloop/node-server-sdk will initialize the OpenTelemetry SDK by instantiating NodeSDK, configuring it, and then calling sdk.start(). It would be good to provide a way to just get the OLTP exporter configured for Traceloop, and the batch or simple span processor linked to that exporter, so these can be added to an existing configuration that is already calling sdk.start().

While trying to copy just the parts of the initialize() implementation necessary to set up the span processor, I ended up having to copy a significant chunk of code as the span processor refers to context keys (e.g. WORKFLOW_NAME_KEY, ENTITY_NAME_KEY) that are also not exported, meaning that I need to create copies of these keys and then make copies of withEntity, etc, that use the copies.

I think this could be solved by making an exported function that constructs a span processor given the exporter and disableBatch flags, or by exporting the context keys so that a span processor implementation can be built that still interacts with withEntity.

Would a PR for either of these approaches be acceptable?

@galkleinman
Copy link
Contributor

Hey @wrose504, and thanks for submitting this one! 💪

A PR will be more than welcome in general :)
I'll give it a look later today and try to asses what will be the best approach for this one, if it'll require a major effort we'll do it ourselves and otherwise you can definitely submit a PR.

Let's keep in touch.
We've a shared slack channel btw (#traceloop-picnichealth) so we can also communicate over there/DMs rather than Github.

@wrose504
Copy link
Author

wrose504 commented Apr 17, 2025

Thanks! I should possibly describe a little more what I'm trying to do in case I'm asking for a weird solution to a non-problem 😅

We have a Hapi.js app server where a handful of our routes make AI agent calls and Traceloop is monitoring these calls because we've called initialize() at server startup.

Now I want to add in tracing generally for all our app routes, again using OpenTelemetry, with several other instrumentations added in (a customized Hapi instrumentation, plus Postgres and Redis instrumentations).

Ideally I don't want to send all the traces for all these services and all app API calls to Traceloop. I do want to send all our Hapi/Postgres/Redis traces to GCP Cloud Trace, or some other observability platform. I can send the AI agent traces to the other observability platform too, but I won't do a lot of analysis with them there given we're using Traceloop for that.

I'd also like to use GCP resource detection since we run in a GKE cluster.

I'm struggling with this as while it looks like I could customize the exporter via initialize(), I can't necessarily add resource detectors, and I can't do anything clever in terms of filtering. I might be able to deploy the OpenTelemetry collector, direct all my tracing there, and then try to modify the spans to add resource information and split out what I want to send to Traceloop and what I want to send to GCP Cloud Trace.

But looking at the OpenTelemetry Node.js SDK, it seems like I should also be able to set up multiple span processors and do the same work in-process without needing to spin up the OpenTelemetry Collector. So my first thought was that I needed to be able to construct NodeSDK like this:

// import {TraceExporter} from '@google-cloud/opentelemetry-cloud-trace-exporter'
export const gcpTraceExporter = new TraceExporter({resourceFilter: /.*/});
export const gcpSpanProcessor = new BatchSpanProcessor(gcpTraceExporter);

// Custom sampler for sampling decisions
const sampler = new PicnicTraceSampler();

// Pretend I can import these from '@traceloop/node-server-sdk'
const traceExporter = setupTraceloopTraceExporter(config.traceloopApiKey);
const traceloopSpanProcessor = setupTraceloopSpanProcessor(traceExporter);

const sdk = new NodeSDK({
  serviceName: 'picnic-app',

  resourceDetectors: [
    hostDetector,
    gcpDetector
  ],
  autoDetectResources: true,

  spanProcessors: [
    gcpSpanProcessor,
    traceloopSpanProcessor
  ],

  sampler: sampler,

  instrumentations: [
    // General app instrumentations
    new PostgresInstrumentation(),
    new RedisInstrumentation(),
    new HttpInstrumentation(),
    new UndiciInstrumentation(),

    // Traceloop instrumentations
    new OpenAIInstrumentation(),
    new AzureOpenAIInstrumentation()
  ]
});

Where this is falling apart is that setupTraceloopSpanProcessor doesn't exist, hence my original issue.

@galkleinman
Copy link
Contributor

Initial thoughts:

  • Since our SDK is doing some black-magic under the hood in addition to the SpansProccessor, I think that neither exposing our BatchProcessor out nor the context keys will actually work (unfortunately).
  • Internally we're initialising the NodeSDK of Otel, so we can actually support all of its capabilities (eg. Resource Detectors) and in general it's better to avoid initialize it twice.
  • The OTEL way of "routing" different spans to different destinations in case you don't want to bootstrap a collector will be to use two different FilteringSpanProcessors as far as i understand.

Suggested Solution:

Given all of the above, what I suggest, is that we'll expose in our SDK all of the capabilities of the NodeSDK, either by letting you pass the NodeSDK instance as an argument to Traceloop SDK or by exposing the same arguments exposed by the NodeSDK constructor.

You'll then initialize traceloop sdk instead of the NodeSDK, with the Resource Detector and everything you with, and with two different FilteringSpanProcessors and exporters, one for GoogleCloudTrace filtering the relevant spans, and one for Traceloop.

@wrose504 LMK WDYT

@wrose504
Copy link
Author

wrose504 commented Apr 18, 2025

Thanks @galkleinman! I'm not completely sure I understand how I'll be able to provide a filtering span processor that can wrap Traceloop's span processor. Will there be an extra initialization option like decorateSpans or something that accepts a function that wraps a span processor?

What I have done so far to get things working is to import most of the Traceloop SDK code that works with the spans into a new file traceloop_sdk.ts and made a filter traceloop_filter.ts that I'm then initializing in my instrumentation.ts file. I've copied the code to a gist so you can see what I'm doing. I've had to adapt places in our code that were importing the decorators (i.e. withEntity, withWorkflow, withTask etc) to use the versions I'm defining in traceloop_sdk.ts as those need to use the same context keys.

Part of the gist shows how I'm wrapping the Traceloop span processor:

let traceloopSpanProcessor: SpanProcessor | undefined;

if (traceloopApiKey) {
  // Setup the Traceloop trace exporter and span processor.
  const traceExporter = setupTraceloopTraceExporter(traceloopApiKey);
  traceloopSpanProcessor = setupTraceloopSpanProcessor(traceExporter);

  // Wrap the span processor in a filter that only emits spans that are
  // related to Traceloop.
  traceloopSpanProcessor = new TraceloopSpanFilter(traceloopSpanProcessor);
} else {
  console.warn('Traceloop API key not provided, skipping initialization');
}

I think you're proposing allowing Partial<NodeSDKConfiguration> as input to Traceloop's initialize(), such that I could pass in my own array of spanProcessors, but if you don't expose something like setupTraceloopSpanProcessor, I'm not sure how I would instantiate new TraceloopSpanFilter(traceloopSpanProcessor) to then pass into initialize().

@wrose504
Copy link
Author

With a bit more work I think I'm starting to see what you meant about black magic @galkleinman 😅 I had to copy enough of the SDK that in the end it was easier to create a fork and make some edits there.

Here's a diff that I hope is publicly accessible.

@galkleinman galkleinman linked a pull request Apr 21, 2025 that will close this issue
@galkleinman
Copy link
Contributor

Well, just saw your latest comment here @wrose504, right after I decided to take your approach and try to expose a standalone span processor 😂

Anyway, i'll keep you posted both here and on Slack.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants