Pole Balancing Applet [Theory]

Prev: 3SAT - Short Pigeonhole Refutations
Next: The Money Part [Civic Networking]

From: Thorsten Kiefer on 9 Aug 2010 15:10

Hi Folks,
I wrote this little applet :
http://tokis-edv-service.de/index.php/beispiele/pole-balancing

My program uses Q(0) learning with Dynamic Programming.
The state space is continuous, so I partitioned it to make Dynamic
Programming possible.
Unfortunately it never learns to balance the pole.
I think I implemented the Q(0) algorithm correctly.
If anyone could have a look at my code and tell me, what is wrong, I'd be
very grateful.

Best wishes
Thorsten

From: Thorsten Kiefer on 11 Aug 2010 19:05

Thorsten Kiefer wrote:

> Hi Folks,
> I wrote this little applet :
> http://tokis-edv-service.de/index.php/beispiele/pole-balancing
>
> My program uses Q(0) learning with Dynamic Programming.
> The state space is continuous, so I partitioned it to make Dynamic
> Programming possible.
> Unfortunately it never learns to balance the pole.
> I think I implemented the Q(0) algorithm correctly.
> If anyone could have a look at my code and tell me, what is wrong, I'd be
> very grateful.
>
> Best wishes
> Thorsten

OK I fixed it.
I found some minor bugs and played with the parameters.
Please visit my applet again and start training for about 5 minutes and see
the result.
Please leave me a comment how you like it.

Thanks and best wishes
Thorsten

From: Thorsten Kiefer on 11 Aug 2010 19:08

Thorsten Kiefer wrote:

> Thorsten Kiefer wrote:
>
>> Hi Folks,
>> I wrote this little applet :
>> http://tokis-edv-service.de/index.php/beispiele/pole-balancing
>>
>> My program uses Q(0) learning with Dynamic Programming.
>> The state space is continuous, so I partitioned it to make Dynamic
>> Programming possible.
>> Unfortunately it never learns to balance the pole.
>> I think I implemented the Q(0) algorithm correctly.
>> If anyone could have a look at my code and tell me, what is wrong, I'd be
>> very grateful.
>>
>> Best wishes
>> Thorsten
>
> OK I fixed it.
> I found some minor bugs and played with the parameters.
> Please visit my applet again and start training for about 5 minutes and
> see the result.
> Please leave me a comment how you like it.
>
> Thanks and best wishes
> Thorsten

I'd like to start a discussion about why I chose shuch parameters and how I
found them
I'll add some gimmicks to that applet.

|
Pages: 1
Prev: 3SAT - Short Pigeonhole Refutations
Next: The Money Part [Civic Networking]