| msg3455 (view) |
Author: florian |
Date: 2014-09-19.14:51:18 |
|
Merged.
|
| msg3454 (view) |
Author: malte |
Date: 2014-09-19.13:43:46 |
|
Fine with me.
|
| msg3453 (view) |
Author: florian |
Date: 2014-09-19.13:40:43 |
|
Great, should I merge it then?
|
| msg3452 (view) |
Author: malte |
Date: 2014-09-19.13:39:42 |
|
The experiments show no serious degradation in performance (and we can't read
much into total times in debug mode anyway), so I think we can go ahead with
this one. Failed assertion aren't really a problem for this issue if they also
occur before we worked on this issue.
|
| msg3451 (view) |
Author: florian |
Date: 2014-09-19.12:33:22 |
|
What are the next steps for this issue? Should we wait for any of the issues
that fix failed assertions? (issue467, issue468, issue469, issue470)
|
| msg3448 (view) |
Author: malte |
Date: 2014-09-18.18:39:56 |
|
Thanks! It would probably be cleaner if the portfolio code included a line of
output that lab could detect to dispatch automatically.
Once we've verified that this now works as intended, the attributes that don't
make sense for portfolios should be removed from the portfolio report.
|
| msg3447 (view) |
Author: jendrik |
Date: 2014-09-18.18:35:56 |
|
lab uses a different search_parser depending on whether we parse the results of a portfolio or a
single search. Since the built-in portfolios can be added to an experiment via the "ipc"
commandline switch and without using --portfolio, we have to detect them automatically in lab. I
had correctly added the FDSS portfolios to the list of built-in portfolios, but forgot the M&S
ones. This is fixed now.
|
| msg3441 (view) |
Author: florian |
Date: 2014-09-18.17:11:18 |
|
Here are the satisficing configs:
http://ai.cs.unibas.ch/_tmp_files/pommeren/issue462-sat-compare-core-configs.html
http://ai.cs.unibas.ch/_tmp_files/pommeren/issue462-sat-compare-portfolio-configs.html
I don't know why some attributes are shown and others are not. Maybe Jendrik can
help?
|
| msg3440 (view) |
Author: malte |
Date: 2014-09-18.14:58:14 |
|
The optimal core configs look fine.
For the portfolio configs, is there a reason why so much data is missing? For
example, there are results for score_expansions (and other attributes) for the
merge-and-shrink portfolio, but not for any of the others. One might say that
certain attributes don't make sense for portfolios, but then why are they
present for merge-and-shrink?
|
| msg3439 (view) |
Author: florian |
Date: 2014-09-18.12:37:33 |
|
Here are the separated reports for portfolios and core configs for the optimal
planning test:
http://ai.cs.unibas.ch/_tmp_files/pommeren/issue462-opt-compare-core-configs.html#search_time
http://ai.cs.unibas.ch/_tmp_files/pommeren/issue462-opt-compare-portfolio-configs.html#first_run_search_time
The config h2 ran slower across all domains, but the effect is small. For
portfolios the numbers are much more mixed.
|
| msg3432 (view) |
Author: malte |
Date: 2014-09-17.17:55:52 |
|
> Ahh, I think lab changed in a way that I didn't expect. Portfolios now do not
> have a search_time entry anymore.
I searched the lab documentation for some explanation, but didn't find any. In
general, documentation of the reports is a bit sparse. Once we consider them to
be somewhat stable, we should perhaps change this.
Separating portfolio configs from others makes sense to me. There are many
attributes for which it's not clear how to best compare them.
|
| msg3431 (view) |
Author: florian |
Date: 2014-09-17.17:53:00 |
|
> But it isn't empty -- it shows cost and coverage. Why do these behave
> differently from, say, score_total_time?
Ahh, I think lab changed in a way that I didn't expect. Portfolios now do not
have a search_time entry anymore. I guess that makes sense because it actually
has a time for each config. We could probably split everything up in two
reports: one for regular configs comparing 'search_time' and one for portfolios
comparing the first entry of 'search_time_all'. For this issue, we could also
ignore the portfolios since we already test their components. What do you think?
|
| msg3426 (view) |
Author: malte |
Date: 2014-09-17.14:10:58 |
|
Also, I don't think it's the case that all tasks have unexplained errors. If I
looked at the opt results correctly, there are 663 tasks with unexplained errors
in at least one config. I don't recall exactly how many tasks are in this suite,
but I think there are 1000+. For example, there don't seem to be unexplained
errors for zenotravel tasks 1-8.
|
| msg3425 (view) |
Author: malte |
Date: 2014-09-17.13:57:49 |
|
> This is the main reason, why I think it will be difficult to evaluate this
> issue on debug experiments before the assertions are fixed. A lot of runs
> fail, so their data does not show up in the tables. If there is at least one
> config, for which a task fails, this task will not show up in the summary. In
> our case, this happens for all tasks, so the summary stays empty.
But it isn't empty -- it shows cost and coverage. Why do these behave
differently from, say, score_total_time?
|
| msg3424 (view) |
Author: florian |
Date: 2014-09-17.13:55:25 |
|
> I don't understand why the summary table only contains "cost" and "coverage".
> Can the lab experts explain?
This is the main reason, why I think it will be difficult to evaluate this issue
on debug experiments before the assertions are fixed. A lot of runs fail, so
their data does not show up in the tables. If there is at least one config, for
which a task fails, this task will not show up in the summary. In our case, this
happens for all tasks, so the summary stays empty.
|
| msg3420 (view) |
Author: malte |
Date: 2014-09-17.13:46:31 |
|
> I suggest we run experiments for this issue in release mode, so we do not have
> to wait with merging until we fix the other things.
I'm not a fan. We have ~500 assertions in the code, and many of them are about
bounds and such and are directly affected by the int vs. size_t changes. For
example, 68 assertions in the default branch contain the word "size". How can we
have confidence that we didn't mess them up if we don't test them?
We should of course fix the problems with assertions we currently have, but we
can do a before/after comparison even without fixing them. The main thing to
test is that we don't have huge changes in behaviour anywhere, and I think this
shouldn't be difficult to test even in the presence of the unexplained errors. I
had a look at the "opt" results, and coverage looks OK, but I don't understand
why the summary table only contains "cost" and "coverage". Can the lab experts
explain?
Of course I don't mind release experiments *in addition* to the debug
experiments if someone wants to see this data.
|
| msg3417 (view) |
Author: florian |
Date: 2014-09-17.12:19:51 |
|
Unfortunately, the new experiments also had heaps of unexplained errors. I ran
the experiments in debug mode as you suggested and this showed at least four
(unrelated) problems. The most common was an assertion about the memory padding.
I will try to reproduce them and open issues for them. I suggest we run
experiments for this issue in release mode, so we do not have to wait with
merging until we fix the other things.
Here are the reports for the debug experiments if you are interested (WARNING:
they are ~70MB in size and hard to read because so much tasks are classified as
errors)
http://ai.cs.unibas.ch/_tmp_files/pommeren/issue462-base-issue462-v1-compare-opt.html
http://ai.cs.unibas.ch/_tmp_files/pommeren/issue462-base-issue462-v1-compare-sat.html
|
| msg3416 (view) |
Author: florian |
Date: 2014-09-16.12:31:17 |
|
The first batch of experiments failed on a lot of tasks because the timeout
(30s) was so small that we ran into issues with the grid file system which
resulted in wall clock times of over a minute.
I'll run a new experiment with a 5 minute time limit today.
|
| msg3415 (view) |
Author: malte |
Date: 2014-09-16.12:12:01 |
|
It would be good to merge this soon because it will cause many merge conflicts.
Are the experiments running?
|
| msg3406 (view) |
Author: malte |
Date: 2014-09-03.13:21:29 |
|
> I fixed most of the warnings but wasn't sure how best to handle the cases in
> priority_queue.h (lines 174 and 206 now). Could you fix those?
Done.
> I agree, we should test different g++ versions with the buildbot (maybe
> also a version with and one without OSI).
Any volunteers to open an issue for this?
|
| msg3405 (view) |
Author: florian |
Date: 2014-09-03.01:16:21 |
|
I fixed most of the warnings but wasn't sure how best to handle the cases in
priority_queue.h (lines 174 and 206 now). Could you fix those?
I agree, we should test different g++ versions with the buildbot (maybe also a
version with and one without OSI).
|
| msg3404 (view) |
Author: malte |
Date: 2014-09-02.23:40:30 |
|
Seem to be legitimate complaints there that we should fix, though it's hard to
be sure without seeing the full output.
In any case, if there's a strong compiler version dependency on this, it may be
problematic to keep this warning enabled *and* have it as an error, since it
means the build can easily break for others without us noticing.
Of course, there is the more general problem of the build working on some g++
versions but not others, which we've already faced in the past for other reasons
(e.g. implied #includes). This will only get worse once we start using C++11
features.
One idea: have the buildbot try to compile on as many g++ versions as we can
easily support. It should be no problem to install multiple ones in parallel,
and e.g. my notebook (Ubuntu 14.04) comes with 4.4, 4.6, 4.7 and 4.8 all in the
standard repository. Our buildbot machine seems to have 4.4, 4.5 and 4.6,
although I didn't try to install them all yet to see if they all work.
|
| msg3403 (view) |
Author: florian |
Date: 2014-09-02.23:30:54 |
|
The old g++ seems to be a bit more pedantic about comparing ints and size_ts.
The following popped up:
open_lists/open_list_buckets.cc line 58:
int key = last_evaluated_value;
assert(key >= 0);
if (key >= buckets.size())
per_state_information.h lines 185, 199 and 200:
size_t virtual_size = registry->size();
assert(state_id >= 0 && state_id < virtual_size);
...
int state_id = state.get_id().value;
assert(state_id >= 0 && state_id < registry->size());
if (state_id >= entries->size()) {
priority_queue.h lines 171, 173 and 205:
if (key >= buckets.size())
buckets.resize(key + 1);
else if (key < current_bucket_no)
...
if (key >= MIN_BUCKETS_BEFORE_SWITCH && key > num_pushes) {
|
| msg3402 (view) |
Author: malte |
Date: 2014-09-02.23:15:07 |
|
Even if that works for the grid, we don't really want to break compatibility
with older g++ versions needlessly.
|
| msg3401 (view) |
Author: jendrik |
Date: 2014-09-02.23:14:12 |
|
You could also try to load a newer gcc with "module load gcc" or similar, but the
last time I tried to do that for mercurial all kinds of incompatibility problems
arose.
|
| msg3400 (view) |
Author: malte |
Date: 2014-09-02.22:52:47 |
|
Change it to "-pedantic" then, which is the previous name.
|
| msg3399 (view) |
Author: florian |
Date: 2014-09-02.22:50:04 |
|
I tried to started experiments today but unfortunately g++ on our grid is still
on version 4.4.7 and does not recognize "-Wpedantic".
|
| msg3397 (view) |
Author: malte |
Date: 2014-09-02.11:40:00 |
|
Please run a set of experiments in debug mode, too, as we touched many of the
assertions. For all experiments, a short timeout (e.g. 30 seconds) is enough.
|
| msg3396 (view) |
Author: malte |
Date: 2014-09-02.10:26:05 |
|
No; if we want to set some new conventions there, there's no reason to do it
within the scope of this issue. I'll look at your changes as requested, and then
we can start the experiments.
|
| msg3395 (view) |
Author: florian |
Date: 2014-09-02.00:34:46 |
|
I think "++array[index];" is clear enough, though I see your point about the
reading order and some of the other expressions are confusing me too. Other than
that I don't have much of an opinion about this, both of your suggestions sound
ok to me. Should I wait for this with the experiments?
|
| msg3394 (view) |
Author: malte |
Date: 2014-08-31.13:14:21 |
|
Side note: I found some of the more complex pre-/post-decrement/increment cases
very hard to read.
Examples:
https://bitbucket.org/malte/downward/commits/1f71ad9c5e787ce001defda8bce14abdbfe4ea3d?at=issue462
https://bitbucket.org/malte/downward/commits/31a77a58b00f246b880e472f31310df2e00b4a09?at=issue462
You will find expressions like
"--unsat_pc_count_[info.first].second[info.second]" and
"--unary_op->unsatisfied_preconditions", which I find difficult for two reasons:
1) The reading order doesn't match the order of operation. You first see that
something is being decremented, and only at the end do you see what. For all
other similar operations (e.g. decrementing by 2, not by 1), the "action" would
be given at the end, not the start.
I think this may be one of the reasons why originally in C the post-increment
form was usually preferred. (The performance issues only really arose in C++
with operator overloading.)
2) Knowing that "--" has a rather high priority, I am frequently uncertain about
the operator precedence in the expression. For example, is
"--unary_op->unsatisfied_preconditions" equal to
"(--unary_op)->unsatisfied_preconditions" or to
"--(unary_op->unsatisfied_preconditions)"? (This particular case might be
simpler to understand if you weren't brought up on pointer arithmetic and hence
the wrong first interpretation appears unlikely, but then you're up for a nasty
surprise with expressions like "x = *p++".)
So I'm considering a suggestion for our style guide: only use -- and ++ with
expressions that consist of a single identifier. In other cases, write
"expression -= 1" or "expression += 1".
(Exception: when advancing/rewinding an iterator, we must use ++/--. In these
cases we should perhaps also strive to make the expression as simple as
possible, but I didn't find complex cases of this in the code anyway, and I
recall none that looked ambiguous.)
Alternatively, I wouldn't mind something even stricter: only use "++" and "--"
in the third part of a for statement or to advance/move back an iterator.
What do you think? (Before you answer, it may be worth going over the two
commits above and mentally compare the "++" and "+= 1" forms. I think they
contain all cases that would be affected by the first version of the rule. The
second one would affect a few more.)
|
| msg3392 (view) |
Author: malte |
Date: 2014-08-31.12:32:06 |
|
I've pushed some more changes to the two Makefiles.
One thing I noticed was that the preprocessor and search Makefile were out of
sync again. (I had recently synced them.) After changes to one of the Makefile,
it's generally a good idea to run meld on the two Makefiles and see that the
only differences are in places where we expect them to differ. I think they went
out of sync in the recent change for Windows compilation.
Having them as similar as possible helps with future maintenance, and difference
are also often a sign of bugs. Looking at the diff found two bugs in the
Makefile this time.
|
| msg3391 (view) |
Author: malte |
Date: 2014-08-31.12:03:07 |
|
PS: Warning, that link only shows you the diff for the most recent commit on the
branch. This is perhaps a better link:
https://bitbucket.org/malte/downward/commits/branch/issue462
|
| msg3390 (view) |
Author: malte |
Date: 2014-08-31.02:13:44 |
|
Done, but untested:
https://bitbucket.org/malte/downward/commits/issue462
There are several large commits, and not all the changes were completely
trivial. Is it worth doing a real experiment for this? If yes, can someone start it?
Also, anyone willing to do a code review? If yes, how should I set it up? (I
don't know if it would be better to review the different commits individually,
or to review all changes in the branch at once.)
|
| msg3389 (view) |
Author: malte |
Date: 2014-08-30.14:58:12 |
|
I'd suggest only making this change in the search code, not in the preprocessor,
given that the preprocessor is supposed to go away.
|
| msg3388 (view) |
Author: malte |
Date: 2014-08-30.14:52:54 |
|
As discussed recently on downward-dev, we want to clean up the usage of size_t
vs. int in the code. The goal is to be able to get rid of "-Wno-sign-compare" in
the Makefiles.
At the same time, it makes sense to change our occurrences of i++ into ++i, as
these affect mostly the same places in the code.
|