Issue452

Title Adding e-greedy random exploration
Priority feature Status resolved
Superseder Nosy List jendrik, malte, masataro, pvonreth
Assigned To pvonreth Keywords
Optional summary

Created on 2014-08-13.10:43:06 by pvonreth, last changed by malte.

Messages
msg5314 (view) Author: malte Date: 2016-05-08.21:39:19
Hi Masataro,

sorry we didn't answer to this yet. I got swamped in email and this one got
buried in the pile. Let me quickly revive this by replying to some of your
comments. Please write again if there are more points we should discuss.

'RE: "It is usually desirable to" ...  FIFO is not desirable at all. It
 is the *worst*. (Asai,Fukunaga 2016).'

Does your paper continue a study of epsilon-greedy search? From my memory, I
thought it was about optimal planning only, but if there is something about
satisficing planning in there, it's a completely different ballgame and I have
to check again. But if you're referring to your tie-breaking for A* here, please
note that the starting point is very different here because FIFO vs. LIFO (for
example) tie-breaking has a huge impact on plan quality in greedy search.

'RE: "If FIFO ordering is ignored ... Otherwise, the removal is linear"
 Removal of the first/last entry from deque is O(1), not
 linear. Therefore O(m+n) is actually O(n).'

Please reread the text, as it already acknowledges this: "If FIFO ordering is
ignored, one can use swap-and-pop to remove the entry in constant time." But the
point is that in the general case we guarantee maintaining the FIFO ordering,
and then the operation takes linear time. (Swap-and-pop would destroy the ordering.)

Obviously, other alternatives to FIFO ordering are possible, and you're very
welcome to explore them and report the results to us. :-)
msg5197 (view) Author: masataro Date: 2016-03-20.05:49:54
Removal of the first/last entry from deque is O(1), not linear. << I mainly use 
http://en.cppreference.com/w/cpp/container/deque as a reference. If this
reference cite is wrong, then I might be wrong.
msg5196 (view) Author: masataro Date: 2016-03-20.05:46:23
Current implementation assumes last-resort FIFO queue when h-value is the same. 
Changing the operator< definition from pair(id, h) < pair(other.id, other.h) to
something like (h < other.h && id > other.id)  would implement last-resort LIFO
queue, but I'm not sure if I can implement last-resort RANDOM queue this way.
msg5195 (view) Author: masataro Date: 2016-03-20.05:40:57
- Therefore O(m+n) is actually O(n).
+ Therefore O(m+n) is actually O(m).
msg5194 (view) Author: masataro Date: 2016-03-20.05:37:19
Sorry, my previous comment was not even correct, so I deleted it. However, I
still have some concern about the implementation. Here is the comment in my
current working diff ---

    Epsilon-greedy open list based on Valenzano et al. (ICAPS 2014).

    With probability epsilon the next entry is selected uniformly
    randomly, otherwise the minimum entry is chosen. While the original
    implementation by Valenzano et al. is based on buckets (personal
    communication with the authors), this implementation stores entries
    in a heap. It is usually desirable to let open lists break ties in
    FIFO order. When using a heap, this can be achieved without using
    significantly more time by assigning increasing IDs to new entries
    and using the IDs as tiebreakers for entries with identical values.
    On the other hand, FIFO tiebreaking induces a substantial worst-case
    runtime penalty for bucket-based implementations. In the table below
    we list the worst-case time complexities for the discussed
    implementation strategies.

    n: number of entries
    m: number of buckets

                            Buckets       Buckets (no FIFO)    Heap
    Insert entry            O(log(m))     O(log(m))            O(log(n))
    Remove random entry     O(m + n)      O(m)                 O(log(n))
    Remove minimum entry    O(log(m))     O(log(m))            O(log(n))
        
    These results assume that the buckets are implemented as deques and
    are stored in a sorted dictionary, mapping from evaluator values to
    buckets. For inserting a new entry and removing the minimum entry the
    bucket-based implementations need to find the correct bucket
    (O(log(m))) and can then push or pop from one end of the deque
    (O(1)). For returning a random entry, bucket-based implementations
    need to loop over all buckets (O(m)) to find the one that contains
    the randomly selected entry. If FIFO ordering is ignored, one can use
    swap-and-pop to remove the entry in constant time. Otherwise, the
    removal is linear in the number of entries in the bucket (O(n), since
    there could be only one bucket).
    
    Comments from Masataro Asai:
    
    I reverted above changes because of the following reasons.
    
    RE: "It is usually desirable to" ...  FIFO is not desirable at all. It
    is the *worst*. (Asai,Fukunaga 2016).
    
    RE: "If FIFO ordering is ignored ... Otherwise, the removal is linear"
    Removal of the first/last entry from deque is O(1), not
    linear. Therefore O(m+n) is actually O(n).
    
    RE:"loop over all buckets (O(m)) ..." This can be improved to
    O(log(m)) using a binary search tree (not implemented yet).
    
    In this BST, each node have a pointer to the parent, a value, and two
    children.  The value of each node is the sum of the values of its two
    children.  Leaf nodes have a pointer to a bucket, and its value is the
    number of elements in the bucket.  The root node has the sum of the
    sizes of all buckets.
        
    When inserting a search node, it should propagate the increase up to the
root, which is O(log(m)). When
    removing a node, select LEFT node with probability
    value(LEFT)/(value(LEFT)+value(RIGHT)), (resp. RIGHT), which requires
O(log(m)) coin flips.
    There is a hashtable from each bucket to the leaf node, enabling propagation.
    The tree should be balanced. Thus the actual implementation would use
Red-Black trees.

    Overall, switching FIFO/LIFO/RANDOM last-resort tiebreaking 
    on heap-based implementation does not seem trivial. this is the main reason
I fall back to the
    bucket-basaed implementation.
msg5189 (view) Author: malte Date: 2016-03-19.18:12:59
Hi Masataro, regarding your comment, unfortunately the C++ standard library
interface doesn't allow removing a random element and doesn't expose the "bubble
up" or "trickle down" operations, which is why we had to build our own heap
implementation here.

It would be better to use the existing algorithms where possible, but I don't
think it's possible. If we overlooked something we could use here, we're open to
suggestions!
msg4600 (view) Author: jendrik Date: 2015-09-16.11:22:36
This has been merged recently. Thanks again, Patrick!
msg3330 (view) Author: pvonreth Date: 2014-08-20.17:19:05
Dear Prof. Helmert,
The issue should be ready for your review.
I hope you agree with my changes.
msg3315 (view) Author: pvonreth Date: 2014-08-13.14:24:18
https://bitbucket.org/TheOneRing/hg.fast-downward.org/pull-request/2/adding-e-
greedy-random-exploration
msg3314 (view) Author: pvonreth Date: 2014-08-13.10:43:06
Adding e-greedy random exploration based on 
"Richard Valenzano, Nathan R. Sturtevant, Jonathan Schaeffer, and Fan Xie. A 
comparison of knowledge-based GBFS enhancements and knowledge-free exploration. 
In Proceedings of the Twenty-Fourth International Conference on Automated 
Planning and Scheduling (ICAPS 2014), pages 375–379. AAAI Press, 2014."
To test it run 
downward --search "lazy(random(ff(),epsilon=0.04,pref_only=false))"
History
Date User Action Args
2016-05-08 21:39:20maltesetmessages: + msg5314
2016-03-20 05:49:54masatarosetmessages: + msg5197
2016-03-20 05:46:23masatarosetmessages: + msg5196
2016-03-20 05:40:57masatarosetmessages: + msg5195
2016-03-20 05:37:19masatarosetmessages: + msg5194
2016-03-19 18:12:59maltesetnosy: + masataro
messages: + msg5189
2016-03-19 17:40:09masatarosetsummary: The code implements an original version of a heap structure by itself. It should be better to use the standard C++ library if possible. ->
2016-03-19 17:39:28masatarosetsummary: The code implements an original version of a heap structure by itself. It should be better to use the standard C++ library if possible.
2015-09-16 11:22:36jendriksetstatus: in-progress -> resolved
messages: + msg4600
2014-08-21 22:40:55jendriksetassignedto: pvonreth
2014-08-20 17:19:05pvonrethsetmessages: + msg3330
2014-08-13 14:24:18pvonrethsetmessages: + msg3315
2014-08-13 10:43:06pvonrethcreate