Reinforcement learning class (RofuncRL)#

1. Online reinforcement learning#

Rofunc RL: Overview of the RofuncRL subpackage
RofuncRL A2C (Advantage Actor-Critic): A2C implementation and explanation of tricks in RofuncRL
RofuncRL PPO (Proximal Policy Optimization): PPO implementation and explanation of tricks in RofuncRL
RofuncRL TD3 (Twin Delayed Deep Deterministic Policy Gradient): TD3 implementation and explanation of tricks in RofuncRL
RofuncRL SAC (Soft Actor-Critic): SAC implementation and explanation of tricks in RofuncRL

RofuncRL CQL (Conservative Q-Learning): CQL implementation and explanation of tricks in RofuncRL
RofuncRL BCQ (Batch-Constrained Q-Learning): BCQ implementation and explanation of tricks in RofuncRL
RofuncRL DTrans (Decision Transformer): DTrans implementation and explanation of tricks in RofuncRL
RofuncRL TD3+BC (Twin Delayed Deep Deterministic Policy Gradient with Batch-Constrained): TD3+BC implementation and explanation of tricks in RofuncRL
RofuncRL EDAC (Ensemble-Diversified Actor Critic): EDAC implementation and explanation of tricks in RofuncRL

RofuncRL AMP (Adversarial Motion Priors): AMP implementation and explanation of tricks in RofuncRL
RofuncRL ASE (Adversarial Skill Embeddings): ASE implementation and explanation of tricks in RofuncRL
RofuncRL ODTrans (Online Decision Transformer): ODTrans implementation and explanation of tricks in RofuncRL