rofunc.learning.RofuncRL.models.actor_models#

1.  Module Contents#

1.1.  Classes#

BaseActor

ActorPPO_Beta

ActorPPO_Gaussian

ActorSAC

ActorTD3

ActorAMP

1.2.  API#

class rofunc.learning.RofuncRL.models.actor_models.BaseActor(cfg: omegaconf.DictConfig, observation_space: Optional[Union[int, Tuple[int], gym.Space, gymnasium.Space, List]], action_space: Optional[Union[int, Tuple[int], gym.Space, gymnasium.Space]], state_encoder: Optional[torch.nn.Module] = EmptyEncoder())[source]#

Bases: torch.nn.Module

state_norm(state: torch.Tensor) torch.Tensor[source]#
freeze_parameters(freeze: bool = True) None[source]#

Freeze or unfreeze internal parameters :param freeze: freeze (True) or unfreeze (False)

update_parameters(model: torch.nn.Module, polyak: float = 1) None[source]#

Update internal parameters by hard or soft (polyak averaging) update - Hard update: \(\theta = \theta_{net}\) - Soft (polyak averaging) update: \(\theta = (1 - \rho) \theta + \rho \theta_{net}\) :param model: Model used to update the internal parameters :param polyak: Polyak hyperparameter between 0 and 1 (default: 1).

A hard update is performed when its value is 1

class rofunc.learning.RofuncRL.models.actor_models.ActorPPO_Beta(cfg: omegaconf.DictConfig, observation_space: Optional[Union[int, Tuple[int], gym.Space, gymnasium.Space]], action_space: Optional[Union[int, Tuple[int], gym.Space, gymnasium.Space]], state_encoder: Optional[torch.nn.Module] = EmptyEncoder())[source]#

Bases: rofunc.learning.RofuncRL.models.actor_models.BaseActor

forward(state: torch.Tensor)[source]#
get_dist(state)[source]#
mean(state)[source]#
class rofunc.learning.RofuncRL.models.actor_models.ActorPPO_Gaussian(cfg: omegaconf.DictConfig, observation_space: Optional[Union[int, Tuple[int], gym.Space, gymnasium.Space]], action_space: Optional[Union[int, Tuple[int], gym.Space, gymnasium.Space]], state_encoder: Optional[torch.nn.Module] = EmptyEncoder())[source]#

Bases: rofunc.learning.RofuncRL.models.actor_models.BaseActor

forward(state, action=None, deterministic=False)[source]#
get_entropy()[source]#
get_value(state)[source]#
class rofunc.learning.RofuncRL.models.actor_models.ActorSAC(cfg: omegaconf.DictConfig, observation_space: Optional[Union[int, Tuple[int], gym.Space, gymnasium.Space]], action_space: Optional[Union[int, Tuple[int], gym.Space, gymnasium.Space]], state_encoder: Optional[torch.nn.Module] = EmptyEncoder())[source]#

Bases: rofunc.learning.RofuncRL.models.actor_models.BaseActor

forward(state, action=None)[source]#
class rofunc.learning.RofuncRL.models.actor_models.ActorTD3(cfg: omegaconf.DictConfig, observation_space: Optional[Union[int, Tuple[int], gym.Space, gymnasium.Space]], action_space: Optional[Union[int, Tuple[int], gym.Space, gymnasium.Space]], state_encoder: Optional[torch.nn.Module] = EmptyEncoder())[source]#

Bases: rofunc.learning.RofuncRL.models.actor_models.ActorSAC

forward(state)[source]#
class rofunc.learning.RofuncRL.models.actor_models.ActorAMP(cfg: omegaconf.DictConfig, observation_space: Optional[Union[int, Tuple[int], gym.Space, gymnasium.Space]], action_space: Optional[Union[int, Tuple[int], gym.Space, gymnasium.Space]], state_encoder: Optional[torch.nn.Module] = EmptyEncoder())[source]#

Bases: rofunc.learning.RofuncRL.models.actor_models.ActorPPO_Gaussian