Common Scenarios¶
Within the SMT community, the notion of failure is generally defined as one of the following:
abnormal termination or crashes (including segmentation faults and assertion failures)
performance regressions (solver performs significantly worse on an input than reference solver)
unsoundness (solver answers sat instead of unsat and vice versa)
model unsoundness (the determined satisfiability model is incorrect)
conflict unsoundness (the determined proof of unsatisfiability is incorrect)
Abnormal termination and crashes are the most common and the default use case of ddSMT.
Model unsoundness and conflict unsoundness issues, on the other hand, are more involved since they typically require some checking mechanism to determine if a generated model or proof is incorrect. Most SMT solvers implement such mechanisms and will throw an assertion failure in debug mode when such a failure is detected, and in that case, these failures fall into the first category. However, if they don’t, external checking tools are required. ddSMT does not provide such tools.
For perfomance regressions and unsoundness issues, ddSMT provides
easy-to-use and easy-to-adapt wrapper scripts in directory scripts
.
In this guide, we show how to debug issues that fall into these two categories
and address how to approach other common scenarios users may face.
Debugging Unsoundness¶
Unsoundness issues are issues where a solver terminates successfully, but
with the wrong answer (sat instead of unsat and vice versa).
This is a particular nasty case for delta debugging if the solver cannot detect
the issue (for example, by checking the generated model or proof).
Some solvers check their answer against the expected status of an SMT-LIB input
if it is provided via (set-info :status ...)
, but for delta debugging
purposes, it is a bad idea to rely on this status to check for soundness since
mutations can flip this status but still reveal the underlying problem.
A better approach is to check the answer against the answer of a reference
solver.
This can be achieved by either using the builtin --cross-check
option, or
the provided scripts/results_differs.py
script.
Using Option --cross-check
¶
If --cross-check
is given, ddSMT checks whether the command
under test and the reference command behave the same as their respective golden
runs (see How Behavior is Compared with the Golden Run),
As with the default use case, cross checking can be refined in terms of what
output to consider via
--ignore-output-cc
(both stdout and stderr channels are ignored),
--match-err-cc
(pattern match stderr against a given string)
and --match-out-cc
(pattern match stdout against a given string).
Using Wrapper Script result_differs.py
¶
For cases where the --cross-check
option is not be flexible enough,
ddSMT provides a wrapper script scripts/result_differs.py
.
The script runs two solvers A
and B
, expects them to output
sat or unsat and compares their output.
This can be useful if you want to allow that the input flips from sat
to unsat (or vice versa) as long as the solver under test still disagrees.
This script can be adapted to any other kind of further analysis that you might need but is not supported by the cross check option, for example, if you want to compare the output of the two solvers and want to allow any output to compare models or error messages.
1#!/usr/bin/env python3
2
3import signal
4import subprocess
5import sys
6
7
8def handler(sig, frame):
9 print("Aborted")
10 sys.exit(1)
11
12
13signal.signal(signal.SIGTERM, handler)
14
15
16def run(cmd):
17 cmd = cmd + [sys.argv[1]]
18 res = subprocess.run(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
19 return res.stdout.decode('utf8').strip()
20
21
22A = run(['solverA', '--option'])
23B = run(['solverB', '--option'])
24
25if A not in ['sat', 'unsat']:
26 print(f'Unexpected output for A: {A}')
27 sys.exit(2)
28
29if B not in ['sat', 'unsat']:
30 print(f'Unexpected output for B: {B}')
31 sys.exit(2)
32
33print(f'{A} / {B}')
34if A == B:
35 sys.exit(0)
36else:
37 sys.exit(1)
Debugging Performance Issues¶
Performance issues are issues where a solver performs significantly worse than a reference solver. Typically, this means that the solver under test terminates successfully but with a significantly higher run time, or doesn’t terminate at all within a given time limit. Performance issues are similarly nasty to delta debug as unsoundness issues. And as with unsoundness, it usually only makes sense to debug performance issues with respect to a reference solver.
ddSMT provides a wrapper script
scripts/compare_time.py
for debugging performance issues.
It runs two solvers A
and B
and checks whether the slower one
is (still) much slower than the faster one.
Since it outputs some information about the run time of the two solvers, it
should be used in combination with option --ignore-output
.
1#!/usr/bin/env python3
2
3import subprocess
4import sys
5import time
6
7filename = sys.argv[1]
8
9solvers = {
10 'A': ['solverA', '--option'],
11 'B': ['solverB', '--option'],
12}
13# minimal factor between the faster and the slower solver
14threshold = 10
15
16slow = 'A'
17
18dur = {}
19
20CUR_PROC = None
21
22
23def stop_process(proc):
24 proc.terminate()
25 try:
26 proc.wait(timeout=0.5)
27 except subprocess.TimeoutExpired:
28 proc.kill()
29 try:
30 proc.wait(timeout=0.5)
31 except subprocess.TimeoutExpired:
32 print(f'Killing pid {proc.pid} failed. '
33 'Please check manually to avoid memory exhaustion.')
34
35
36def handler(sig, frame):
37 if CUR_PROC is not None:
38 stop_process(CUR_PROC)
39
40
41for s in solvers:
42 before = time.perf_counter()
43 try:
44 CUR_PROC = subprocess.Popen(solvers[s] + [filename],
45 stdout=subprocess.PIPE,
46 stderr=subprocess.PIPE)
47 CUR_PROC.communicate(timeout=10)
48 except subprocess.TimeoutExpired:
49 stop_process(CUR_PROC)
50 duration = time.perf_counter() - before
51 print('{} -> {}'.format(s, duration))
52 dur[s] = duration
53
54factor = max(dur.values()) / min(dur.values())
55print('Factor: {}'.format(factor))
56
57# discard if the slower solver became too fast
58if dur[slow] < 3:
59 print('-> 0')
60 sys.exit(0)
61
62if factor > threshold:
63 # still much slower
64 print('-> 1')
65 sys.exit(1)
66# not that much slower
67print('-> 0')
68sys.exit(0)
Eliminate let
Binders¶
SMT-LIB let
binders are notorious for impeding simplification during
delta debugging.
While the mutators that target let
binders
(ddsmt.mutators_smtlib.LetElimination
and
ddsmt.mutators_smtlib.LetSubstitution
) are applied in an early stage
for both strategies ddmin and hierarchical,
it may still take a while until they are taken care of.
If the input contains a lot of let
binders, it is sometimes useful to
first eliminate them by using only these two mutators, and then delta debug the
resulting files without let binders.
$ ddsmt --disable-all --let-elimination --let-substitution <input> <output> <solver>
If you are reasonably sure that the issue is unrelated to let binders itself,
you can use option --unchecked
to not even execute the command under
test.
Note that you should do two separate runs for let
substitution and
let
elimination in this case:
$ ddsmt --unchecked --disable-all --let-substitution <input> <output> <solver>
$ ddsmt --unchecked --disable-all --let-elimination <input> <output> <solver>