-
Stanford University
- Stanford, CA
- miaolu3.github.io
Pinned Loading
-
Regularized-Preference-Optimization
Regularized-Preference-Optimization PublicForked from YSLIU627/Regularized-Preference-Optimization
Code for: [NeurIPS 2024] Provably Mitigating Overoptimization in RLHF: Your SFT Loss is Implicitly an Adversarial Regularizer
Python
-
MEX
MEX PublicForked from agentification/MEX
Code for: [NeurIPS 2023] Maximize to Explore: One Objective Function Fusing Estimation, Planning, and Exploration
Python
-
YSLIU627/RL-for-Markov-Exchange-Economy
YSLIU627/RL-for-Markov-Exchange-Economy PublicCodes for the ICML 2022 accepted paper: *Welfare Maximization in Competitive Equilibrium: Reinforcement Learning for Markov Exchange Economy*.
Jupyter Notebook 6
-
Learning-Pruning-Friendly-Networks-via-Frank-Wolfe-One-Shot-Any-Sparsity-and-No-Retraining
Learning-Pruning-Friendly-Networks-via-Frank-Wolfe-One-Shot-Any-Sparsity-and-No-Retraining PublicCode for: [ICLR 2022] Learning Pruning-Friendly Networks via Frank-Wolfe: One-Shot, Any-Sparsity, and No Retraining
-
RL-SCPO
RL-SCPO PublicForked from MIRALab-USTC/RL-SCPO
Code for: [AAAI 2022] Learning Robust Policy against Disturbance in Transition Dynamics via State-Conservative Policy Optimization
Python
-
If the problem persists, check the GitHub status page or contact support.