Skip to content

如何在异步场景下使用探针Agent

HaojunRen edited this page Aug 2, 2025 · 19 revisions

DiscoveryAgent不仅适用于Discovery框架,也适用于一切具有类似使用场景的基础框架(例如:Dubbo)和业务系统

Discovery框架存在着如下全链路传递上下文的场景,包括

  • 策略路由Header全链路从网关传递到服务
  • 调用链埋点全链路从网关传递到服务
  • 业务自定义的上下文的传递

上述上下文会在如下异步场景中丢失,包括

  • WebFlux Reactor响应式异步
  • Spring异步,@Async注解异步
  • Hystrix线程池隔离模式异步
  • 线程,线程池异步
  • SLF4J日志异步

通过DiscoveryAgent,解决上述痛点。Discovery框架利用DiscoveryAgent字节码增强技术,完美解决各种调用场景下的异步,包括

  • Spring Cloud Gateway过滤器中的上下文传递
  • Zuul过滤器中的上下文传递
  • Feign拦截器中的上下文转发
  • RestTemplate拦截器中的上下文转发
  • WebClient拦截器中的上下文转发

异步场景下DiscoveryAgent解决方案

DiscoveryAgent不仅适用于Discovery框架,也适用于一切具有类似使用场景的基础框架(例如:Dubbo)和业务系统

ThreadLocal的作用是提供线程内的局部变量,在多线程环境下访问时能保证各个线程内的ThreadLocal变量各自独立。在异步场景下,由于出现线程切换的问题,例如,主线程切换到子线程,会导致线程ThreadLocal上下文丢失。DiscoveryAgent通过Java Agent方式解决这些痛点

涵盖所有Java框架的异步场景,解决如下10个异步场景下丢失线程ThreadLocal上下文的问题

  • WebFlux Reactor
  • @Async
  • Hystrix Thread Pool Isolation
  • Runnable
  • Callable
  • Supplier
  • Single Thread
  • Thread Pool
  • Virtual Thread
  • SLF4J MDC

需要注意,DiscoveryAgent不支持含有Lambda语法的异步代码。使用Lambda去实现的Runnable/Callable类会生成一个匿名内部类,这个匿名内部类和DiscoveryAgent使用的是不同的类加载器,导致DiscoveryAgent无法去修改Lambda表达式生成的Runnable/Callable的实现类。具体原因如下:

  • 字节码生成时机问题

    Lambda表达式在编译时不会生成完整的字节码,而是在运行时由JVM动态生成。Java Agent通常是在类加载时进行字节码转换,而此时Lambda表达式对应的实现类尚未生成

  • 匿名类的特殊处理

    Lambda表达式在底层被编译为使用invokedynamic指令和匿名类实现。这些匿名类的生成发生在JVM运行时,而不是编译时,因此Java Agent无法在类加载阶段捕获和修改这些类

  • 方法句柄的复杂性

    Lambda表达式依赖于方法句柄(MethodHandle)机制,这使得它们在字节码层面比普通方法调用更加复杂,难以被传统的字节码操作工具(如ASM)正确处理

  • 类加载顺序问题

    Lambda表达式相关的类(如LambdaMetafactory)是由引导类加载器加载的,而Java Agent通常无法修改这些由引导类加载器加载的类

某些JDK新特性的写法,可以改成如下形式,来规避Lambda表达式

CompletableFuture.runAsync(new Runnable() {
    @Override
    public void run() {

    }
});

CompletableFuture<String> completableFuture = CompletableFuture.supplyAsync(new Supplier<String>() {
    @Override
    public String get() {
        return "";
    }
});	

异步跨线程DiscoveryAgent获取

插件获取方式有两种方式

异步跨线程DiscoveryAgent清单

① discovery-agent-starter-${discovery.version}.jar为Agent引导启动程序,JVM启动时进行加载

② agent.config为基准扫描目录配置文件

绝大多数情况下不需要修改,当然使用者也可以增加和删除agent.config的基准扫描目录。默认配置如下

# Base thread scan packages
agent.plugin.thread.scan.packages=reactor.core.publisher;org.springframework.aop.interceptor;com.netflix.hystrix

基准扫描目录,含义如下

  • WebFlux Reactor异步场景下的扫描目录对应为reactor.core.publisher
  • @Async场景下的扫描目录对应为org.springframework.aop.interceptor
  • Hystrix线程池隔离场景下的扫描目录对应为com.netflix.hystrix

③ plugin/discovery-agent-starter-plugin-strategy-${discovery.version}.jar插件,解决Nepxion Discovery上下文异步场景

④ plugin/discovery-agent-starter-plugin-mdc-${discovery.version}.jar插件,解决SLF4J MDC日志上下文异步场景

⑤ 业务系统可以自定义plugin,解决业务自己定义的上下文异步场景

异步跨线程DiscoveryAgent使用

① 使用示例

  • 通过如下-javaagent启动,基本格式,如下
-javaagent:C:/opt/discovery-agent/discovery-agent-starter-${discovery.agent.version}.jar -Dthread.scan.packages=com.nepxion.discovery.guide.service.feign

② 参数说明

  • C:/opt/discovery-agent:Agent所在的目录,需要对应到实际的目录上
  • -Dthread.scan.packages:Runnable/Callable/Thread/ThreadPool/Virtual Thread等异步类所在的扫描目录,该目录下的异步类都会被装饰
    • 扫描目录最好精细和准确,目录越详细,越可以减少被装饰的对象数,从一定程度上可以提高性能
    • 扫描目录如果有多个,用“;”分隔
    • 扫描目录如果含有“;”,可能会在某些操作系统中无法被识别,请用""进行引入,例如,-Dthread.scan.packages="com.abc;com.xyz"
    • 扫描目录下没有Runnable/Callable/Thread/ThreadPool等异步类存在,那么thread.scan.packages也不需要配置,最终启动命令行简化为-javaagent:C:/opt/discovery-agent/discovery-agent-starter-${discovery.agent.version}.jar
  • -Dthread.gateway.enabled:Spring Cloud Gateway端策略Header输出到异步子线程。默认开启
  • -Dthread.zuul.enabled:Zuul端策略Header输出到异步子线程。默认开启
  • -Dthread.service.enabled:服务端策略Header输出到异步子线程。默认开启
  • -Dthread.mdc.enabled:SLF4J MDC日志输出到异步子线程。默认开启
  • -Dthread.request.decorator.enabled:异步调用场景下在服务端的Request请求的装饰,当主线程先于子线程执行完的时候,Request会被Destory,导致Header仍旧拿不到,开启装饰,就可以确保拿到。默认为开启,根据实践经验,大多数场景下,需要开启该开关

③ 安装校验

Spring Cloud 20xx版的应用上支持如下配置,一般通过-Dspring.application.strategy.agent.validation.enabled=true或者false来启动和关闭

# 启动和关闭DiscoveryAgent安装校验,一旦启动,如果未安装DiscoveryAgent,则抛错退出应用,该配置只适用于Spring Cloud 202x版。缺失则默认为true
# spring.application.strategy.agent.validation.enabled=true

异步跨线程DiscoveryAgent冲突

IDEA DebugAgent支持Reactive Streams的Reactor调试,如果开启会使DiscoveryAgent的Reactor模块失效,所以必须关闭IDEA的Reactor调试模式

异步跨线程DiscoveryAgent扩展

  • 根据规范开发一个插件,插件提供了钩子函数,在某个类被加载的时候,可以注册一个事件到线程上下文切换事件当中,实现业务自定义ThreadLocal的跨线程传递
  • plugin目录为放置需要在线程切换时进行ThreadLocal传递的自定义插件。业务自定义插件开发完后,放入到plugin目录下即可

具体步骤介绍,如下

① SDK侧工作

  • 新建ThreadLocal上下文类
public class MyContext {
    private static final ThreadLocal<MyContext> THREAD_LOCAL = new ThreadLocal<MyContext>() {
        @Override
        protected MyContext initialValue() {
            return new MyContext();
        }
    };

    public static MyContext getCurrentContext() {
        return THREAD_LOCAL.get();
    }

    public static void clearCurrentContext() {
        THREAD_LOCAL.remove();
    }

    private Map<String, String> attributes = new HashMap<>();

    public Map<String, String> getAttributes() {
        return attributes;
    }

    public void setAttributes(Map<String, String> attributes) {
        this.attributes = attributes;
    }
}

② Agent侧工作

  • 新建一个模块,引入如下依赖
<dependency>
    <groupId>com.nepxion</groupId>
    <artifactId>discovery-agent-starter</artifactId>
    <version>${discovery.agent.version}</version>
    <scope>provided</scope>
</dependency>
  • 新建一个ThreadLocalHook类继承AbstractThreadLocalHook
public class MyContextHook extends AbstractThreadLocalHook {
    @Override
    public Object create() {
        // 从主线程的ThreadLocal里获取并返回上下文对象
        return MyContext.getCurrentContext().getAttributes();
    }

    @Override
    public void before(Object object) {
        // 把create方法里获取到的上下文对象放置到子线程的ThreadLocal里
        if (object instanceof Map) {
            MyContext.getCurrentContext().setAttributes((Map<String, String>) object);
        }
    }

    @Override
    public void after() {
        // 线程结束,销毁上下文对象
        MyContext.clearCurrentContext();
    }
}
  • 新建一个Plugin类继承AbstractPlugin
public class MyContextPlugin extends AbstractPlugin {
    private Boolean threadMyPluginEnabled = Boolean.valueOf(System.getProperty("thread.myplugin.enabled", "false"));

    @Override
    protected String getMatcherClassName() {
        // 返回存储ThreadLocal对象的类名,由于插件是可以插拔的,所以必须是字符串形式,不允许是显式引入类
        return "com.nepxion.discovery.example.sdk.MyContext";
    }

    @Override
    protected String getHookClassName() {
        // 返回ThreadLocalHook类名
        return MyContextHook.class.getName();
    }

    @Override
    protected boolean isEnabled() {
        // 通过外部-Dthread.myplugin.enabled=true/false的运行参数来控制当前Plugin是否生效。该方法在父类中定义的返回值为true,即缺省为生效
        return threadMyPluginEnabled;
    }
}
  • 定义SPI扩展,在src/main/resources/META-INF/services目录下定义SPI文件

名称为固定如下格式

com.nepxion.discovery.agent.plugin.Plugin

内容为Plugin类的全路径

com.nepxion.discovery.example.agent.MyContextPlugin
  • 执行Maven编译,把编译后的包放在discovery-agent/plugin目录下

  • 给服务增加启动参数并启动,如下

-javaagent:C:/opt/discovery-agent/discovery-agent-starter-${discovery.agent.version}.jar -Dthread.scan.packages=com.nepxion.discovery.example.application -Dthread.myplugin.enabled=true

③ Application侧工作

  • 执行MyApplication,它模拟在主线程ThreadLocal放入Map数据,子线程通过DiscoveryAgent获取到该Map数据,并打印出来
@SpringBootApplication
@RestController
public class MyApplication {
    private static final Logger LOG = LoggerFactory.getLogger(MyApplication.class);

    public static void main(String[] args) {
        SpringApplication.run(MyApplication.class, args);

        invoke();
    }

    public static void invoke() {
        RestTemplate restTemplate = new RestTemplate();

        for (int i = 1; i <= 10; i++) {
            restTemplate.getForEntity("http://localhost:8080/index/" + i, String.class).getBody();
        }
    }

    @GetMapping("/index/{value}")
    public String index(@PathVariable(value = "value") String value) throws InterruptedException {
        Map<String, String> attributes = new HashMap<String, String>();
        attributes.put(value, "MyContext");

        MyContext.getCurrentContext().setAttributes(attributes);

        LOG.info("【主】线程ThreadLocal:{}", MyContext.getCurrentContext().getAttributes());

        new Thread(new Runnable() {
            @Override
            public void run() {
                LOG.info("【子】线程ThreadLocal:{}", MyContext.getCurrentContext().getAttributes());

                try {
                    Thread.sleep(5000);
                } catch (InterruptedException e) {
                    e.printStackTrace();
                }

                LOG.info("Sleep 5秒之后,【子】线程ThreadLocal:{} ", MyContext.getCurrentContext().getAttributes());
            }
        }).start();

        return "";
    }
}

输出结果,如下

2020-11-09 00:08:14.330  INFO 16588 --- [nio-8080-exec-1] c.n.d.example.application.MyApplication  : 【主】线程ThreadLocal:{1=MyContext}
2020-11-09 00:08:14.381  INFO 16588 --- [       Thread-4] c.n.d.example.application.MyApplication  : 【子】线程ThreadLocal:{1=MyContext}
2020-11-09 00:08:14.402  INFO 16588 --- [nio-8080-exec-2] c.n.d.example.application.MyApplication  : 【主】线程ThreadLocal:{2=MyContext}
2020-11-09 00:08:14.403  INFO 16588 --- [       Thread-5] c.n.d.example.application.MyApplication  : 【子】线程ThreadLocal:{2=MyContext}
2020-11-09 00:08:14.405  INFO 16588 --- [nio-8080-exec-3] c.n.d.example.application.MyApplication  : 【主】线程ThreadLocal:{3=MyContext}
2020-11-09 00:08:14.406  INFO 16588 --- [       Thread-6] c.n.d.example.application.MyApplication  : 【子】线程ThreadLocal:{3=MyContext}
2020-11-09 00:08:14.414  INFO 16588 --- [nio-8080-exec-4] c.n.d.example.application.MyApplication  : 【主】线程ThreadLocal:{4=MyContext}
2020-11-09 00:08:14.414  INFO 16588 --- [       Thread-7] c.n.d.example.application.MyApplication  : 【子】线程ThreadLocal:{4=MyContext}
2020-11-09 00:08:14.417  INFO 16588 --- [nio-8080-exec-5] c.n.d.example.application.MyApplication  : 【主】线程ThreadLocal:{5=MyContext}
2020-11-09 00:08:14.418  INFO 16588 --- [       Thread-8] c.n.d.example.application.MyApplication  : 【子】线程ThreadLocal:{5=MyContext}
2020-11-09 00:08:14.421  INFO 16588 --- [nio-8080-exec-6] c.n.d.example.application.MyApplication  : 【主】线程ThreadLocal:{6=MyContext}
2020-11-09 00:08:14.422  INFO 16588 --- [       Thread-9] c.n.d.example.application.MyApplication  : 【子】线程ThreadLocal:{6=MyContext}
2020-11-09 00:08:14.424  INFO 16588 --- [nio-8080-exec-7] c.n.d.example.application.MyApplication  : 【主】线程ThreadLocal:{7=MyContext}
2020-11-09 00:08:14.425  INFO 16588 --- [      Thread-10] c.n.d.example.application.MyApplication  : 【子】线程ThreadLocal:{7=MyContext}
2020-11-09 00:08:14.427  INFO 16588 --- [nio-8080-exec-8] c.n.d.example.application.MyApplication  : 【主】线程ThreadLocal:{8=MyContext}
2020-11-09 00:08:14.428  INFO 16588 --- [      Thread-11] c.n.d.example.application.MyApplication  : 【子】线程ThreadLocal:{8=MyContext}
2020-11-09 00:08:14.430  INFO 16588 --- [nio-8080-exec-9] c.n.d.example.application.MyApplication  : 【主】线程ThreadLocal:{9=MyContext}
2020-11-09 00:08:14.431  INFO 16588 --- [      Thread-12] c.n.d.example.application.MyApplication  : 【子】线程ThreadLocal:{9=MyContext}
2020-11-09 00:08:14.433  INFO 16588 --- [io-8080-exec-10] c.n.d.example.application.MyApplication  : 【主】线程ThreadLocal:{10=MyContext}
2020-11-09 00:08:14.434  INFO 16588 --- [      Thread-13] c.n.d.example.application.MyApplication  : 【子】线程ThreadLocal:{10=MyContext}
2020-11-09 00:08:19.382  INFO 16588 --- [       Thread-4] c.n.d.example.application.MyApplication  : Sleep 5秒之后,【子】线程ThreadLocal:{1=MyContext} 
2020-11-09 00:08:19.404  INFO 16588 --- [       Thread-5] c.n.d.example.application.MyApplication  : Sleep 5秒之后,【子】线程ThreadLocal:{2=MyContext} 
2020-11-09 00:08:19.406  INFO 16588 --- [       Thread-6] c.n.d.example.application.MyApplication  : Sleep 5秒之后,【子】线程ThreadLocal:{3=MyContext} 
2020-11-09 00:08:19.416  INFO 16588 --- [       Thread-7] c.n.d.example.application.MyApplication  : Sleep 5秒之后,【子】线程ThreadLocal:{4=MyContext} 
2020-11-09 00:08:19.418  INFO 16588 --- [       Thread-8] c.n.d.example.application.MyApplication  : Sleep 5秒之后,【子】线程ThreadLocal:{5=MyContext} 
2020-11-09 00:08:19.422  INFO 16588 --- [       Thread-9] c.n.d.example.application.MyApplication  : Sleep 5秒之后,【子】线程ThreadLocal:{6=MyContext} 
2020-11-09 00:08:19.425  INFO 16588 --- [      Thread-10] c.n.d.example.application.MyApplication  : Sleep 5秒之后,【子】线程ThreadLocal:{7=MyContext} 
2020-11-09 00:08:19.428  INFO 16588 --- [      Thread-11] c.n.d.example.application.MyApplication  : Sleep 5秒之后,【子】线程ThreadLocal:{8=MyContext} 
2020-11-09 00:08:19.432  INFO 16588 --- [      Thread-12] c.n.d.example.application.MyApplication  : Sleep 5秒之后,【子】线程ThreadLocal:{9=MyContext} 
2020-11-09 00:08:19.434  INFO 16588 --- [      Thread-13] c.n.d.example.application.MyApplication  : Sleep 5秒之后,【子】线程ThreadLocal:{10=MyContext} 

如果不加异步Agent,则输出结果,如下,可以发现在子线程中ThreadLocal上下文全部都丢失

2020-11-09 00:01:40.133  INFO 16692 --- [nio-8080-exec-1] c.n.d.example.application.MyApplication  : 【主】线程ThreadLocal:{1=MyContext}
2020-11-09 00:01:40.135  INFO 16692 --- [       Thread-8] c.n.d.example.application.MyApplication  : 【子】线程ThreadLocal:{}
2020-11-09 00:01:40.158  INFO 16692 --- [nio-8080-exec-2] c.n.d.example.application.MyApplication  : 【主】线程ThreadLocal:{2=MyContext}
2020-11-09 00:01:40.159  INFO 16692 --- [       Thread-9] c.n.d.example.application.MyApplication  : 【子】线程ThreadLocal:{}
2020-11-09 00:01:40.162  INFO 16692 --- [nio-8080-exec-3] c.n.d.example.application.MyApplication  : 【主】线程ThreadLocal:{3=MyContext}
2020-11-09 00:01:40.163  INFO 16692 --- [      Thread-10] c.n.d.example.application.MyApplication  : 【子】线程ThreadLocal:{}
2020-11-09 00:01:40.170  INFO 16692 --- [nio-8080-exec-5] c.n.d.example.application.MyApplication  : 【主】线程ThreadLocal:{4=MyContext}
2020-11-09 00:01:40.170  INFO 16692 --- [      Thread-11] c.n.d.example.application.MyApplication  : 【子】线程ThreadLocal:{}
2020-11-09 00:01:40.173  INFO 16692 --- [nio-8080-exec-4] c.n.d.example.application.MyApplication  : 【主】线程ThreadLocal:{5=MyContext}
2020-11-09 00:01:40.174  INFO 16692 --- [      Thread-12] c.n.d.example.application.MyApplication  : 【子】线程ThreadLocal:{}
2020-11-09 00:01:40.176  INFO 16692 --- [nio-8080-exec-6] c.n.d.example.application.MyApplication  : 【主】线程ThreadLocal:{6=MyContext}
2020-11-09 00:01:40.177  INFO 16692 --- [      Thread-13] c.n.d.example.application.MyApplication  : 【子】线程ThreadLocal:{}
2020-11-09 00:01:40.179  INFO 16692 --- [nio-8080-exec-8] c.n.d.example.application.MyApplication  : 【主】线程ThreadLocal:{7=MyContext}
2020-11-09 00:01:40.180  INFO 16692 --- [      Thread-14] c.n.d.example.application.MyApplication  : 【子】线程ThreadLocal:{}
2020-11-09 00:01:40.182  INFO 16692 --- [nio-8080-exec-7] c.n.d.example.application.MyApplication  : 【主】线程ThreadLocal:{8=MyContext}
2020-11-09 00:01:40.182  INFO 16692 --- [      Thread-15] c.n.d.example.application.MyApplication  : 【子】线程ThreadLocal:{}
2020-11-09 00:01:40.185  INFO 16692 --- [nio-8080-exec-9] c.n.d.example.application.MyApplication  : 【主】线程ThreadLocal:{9=MyContext}
2020-11-09 00:01:40.186  INFO 16692 --- [      Thread-16] c.n.d.example.application.MyApplication  : 【子】线程ThreadLocal:{}
2020-11-09 00:01:40.188  INFO 16692 --- [io-8080-exec-10] c.n.d.example.application.MyApplication  : 【主】线程ThreadLocal:{10=MyContext}
2020-11-09 00:01:40.189  INFO 16692 --- [      Thread-17] c.n.d.example.application.MyApplication  : 【子】线程ThreadLocal:{}
2020-11-09 00:01:45.136  INFO 16692 --- [       Thread-8] c.n.d.example.application.MyApplication  : Sleep 5秒之后,【子】线程ThreadLocal:{} 
2020-11-09 00:01:45.160  INFO 16692 --- [       Thread-9] c.n.d.example.application.MyApplication  : Sleep 5秒之后,【子】线程ThreadLocal:{} 
2020-11-09 00:01:45.163  INFO 16692 --- [      Thread-10] c.n.d.example.application.MyApplication  : Sleep 5秒之后,【子】线程ThreadLocal:{} 
2020-11-09 00:01:45.171  INFO 16692 --- [      Thread-11] c.n.d.example.application.MyApplication  : Sleep 5秒之后,【子】线程ThreadLocal:{} 
2020-11-09 00:01:45.174  INFO 16692 --- [      Thread-12] c.n.d.example.application.MyApplication  : Sleep 5秒之后,【子】线程ThreadLocal:{} 
2020-11-09 00:01:45.177  INFO 16692 --- [      Thread-13] c.n.d.example.application.MyApplication  : Sleep 5秒之后,【子】线程ThreadLocal:{} 
2020-11-09 00:01:45.181  INFO 16692 --- [      Thread-14] c.n.d.example.application.MyApplication  : Sleep 5秒之后,【子】线程ThreadLocal:{} 
2020-11-09 00:01:45.183  INFO 16692 --- [      Thread-15] c.n.d.example.application.MyApplication  : Sleep 5秒之后,【子】线程ThreadLocal:{} 
2020-11-09 00:01:45.187  INFO 16692 --- [      Thread-16] c.n.d.example.application.MyApplication  : Sleep 5秒之后,【子】线程ThreadLocal:{} 
2020-11-09 00:01:45.190  INFO 16692 --- [      Thread-17] c.n.d.example.application.MyApplication  : Sleep 5秒之后,【子】线程ThreadLocal:{} 

完整示例,请参考https://github.com/Nepxion/DiscoveryAgent/tree/master/discovery-agent-example。上述自定义插件的方式,即可解决使用者在线程切换时丢失ThreadLocal上下文的问题




2017-2050 ©Nepxion Studio Apache License

           

Total visits

讲义篇

集成篇

概念篇

实践篇

功能篇

配置篇

扩展篇

测试篇

升级篇

贡献篇

Clone this wiki locally