Skip to content

Diversity Is All You Need: Learning Skills without a Reward Function #7

@flrngel

Description

@flrngel

https://arxiv.org/abs/1802.06070

Abstract

  • Learn skills by maximizing information using maximum entropy policy
  • Train typical reinforcement learning with best skill after unsupervised learning

1. Introduction

  • Skill is just a policy
  • Key Idea is discriminability of skills
    • Skills has to be distinguishable
    • Skills has to be as diverse as possible

2. Related Work

  • Three important distinction of paper
    1. Using maximum entropy policies to force skills to be diverse
    2. Fix distribution p(z)
    3. Watches every states

Paper says that maximizing diversity is better than specific reward on complex behaviors

3. Diversity is all you need

image
image

3.1. How it works

H[a|s] = MI(a,z|s) from continuous action space

F(Θ) = H[a|s,z] + H[z] - H[z|s]

  • H[a|s,z]: skill act randomly
  • H[z]: p(z) to have high entropy
  • H[z|s]: infer z from current state

3.2. Implementation

image

4. What skills are learned?

image
(alpha with 0.01 is best discriminative illustration)

Question

  • Is this model similar to random forest?
  • What is critic network?
  • What is M-Projection?

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions