Reinforcement learning class (RofuncRL)#

1.  Online reinforcement learning#

Rofunc RL

Overview of the RofuncRL subpackage

RofuncRL A2C (Advantage Actor-Critic)

A2C implementation and explanation of tricks in RofuncRL

RofuncRL PPO (Proximal Policy Optimization)

PPO implementation and explanation of tricks in RofuncRL

RofuncRL TD3 (Twin Delayed Deep Deterministic Policy Gradient)

TD3 implementation and explanation of tricks in RofuncRL

RofuncRL SAC (Soft Actor-Critic)

SAC implementation and explanation of tricks in RofuncRL

2.  Offline reinforcement learning#

RofuncRL CQL (Conservative Q-Learning)

CQL implementation and explanation of tricks in RofuncRL

RofuncRL BCQ (Batch-Constrained Q-Learning)

BCQ implementation and explanation of tricks in RofuncRL

RofuncRL DTrans (Decision Transformer)

DTrans implementation and explanation of tricks in RofuncRL

RofuncRL TD3+BC (Twin Delayed Deep Deterministic Policy Gradient with Batch-Constrained)

TD3+BC implementation and explanation of tricks in RofuncRL

RofuncRL EDAC (Ensemble-Diversified Actor Critic)

EDAC implementation and explanation of tricks in RofuncRL

3.  Mixline (Mixing online and offline) reinforcement learning#

RofuncRL AMP (Adversarial Motion Priors)

AMP implementation and explanation of tricks in RofuncRL

RofuncRL ASE (Adversarial Skill Embeddings)

ASE implementation and explanation of tricks in RofuncRL

RofuncRL ODTrans (Online Decision Transformer)

ODTrans implementation and explanation of tricks in RofuncRL