Your task is to build a function f which takes the current state observation (a 41 dimensional vector) and returns the muscle excitations action (18 dimensional vector) in a way that maximizes the reward.

The trial ends either if the pelvis of the model goes below 0.65 meters or if you reach 1000 iterations (corresponding to 10 seconds in the virtual environment). Your total reward is the position of the pelvis on the x axis after the last iteration minus a penalty for using ligament forces. Ligaments are tissues which prevent your joints from bending too much - overusing these tissues leads to injuries, so we want to avoid it. The penalty in the total reward is equal to the sum of forces generated by ligaments over the trial, divided by 10,000,000.

After each iteration you get a reward equal to the change of the x axis of pelvis during this iteration minus the magnitude of the ligament forces used in that iteration.

You can test your model on your local machine. For submission, you will need to interact with the remote environment: crowdAI sends you the current observation and you need to send back the action you take in the given state. You will be evaluated at three different levels of difficulty.