Common Scenarios

Within the SMT community, the notion of failure is generally defined as one of the following:

abnormal termination or crashes (including segmentation faults and assertion failures)
performance regressions (solver performs significantly worse on an input than reference solver)
unsoundness (solver answers sat instead of unsat and vice versa)
model unsoundness (the determined satisfiability model is incorrect)
conflict unsoundness (the determined proof of unsatisfiability is incorrect)

Abnormal termination and crashes are the most common and the default use case of ddSMT.

Model unsoundness and conflict unsoundness issues, on the other hand, are more involved since they typically require some checking mechanism to determine if a generated model or proof is incorrect. Most SMT solvers implement such mechanisms and will throw an assertion failure in debug mode when such a failure is detected, and in that case, these failures fall into the first category. However, if they don’t, external checking tools are required. ddSMT does not provide such tools.

For perfomance regressions and unsoundness issues, ddSMT provides easy-to-use and easy-to-adapt wrapper scripts in directory scripts. In this guide, we show how to debug issues that fall into these two categories and address how to approach other common scenarios users may face.

Debugging Unsoundness

Unsoundness issues are issues where a solver terminates successfully, but with the wrong answer (sat instead of unsat and vice versa). This is a particular nasty case for delta debugging if the solver cannot detect the issue (for example, by checking the generated model or proof). Some solvers check their answer against the expected status of an SMT-LIB input if it is provided via (set-info :status ...), but for delta debugging purposes, it is a bad idea to rely on this status to check for soundness since mutations can flip this status but still reveal the underlying problem.

A better approach is to check the answer against the answer of a reference solver. This can be achieved by either using the builtin --cross-check option, or the provided scripts/results_differs.py script.

Using Option `--cross-check`

If --cross-check is given, ddSMT checks whether the command under test and the reference command behave the same as their respective golden runs (see How Behavior is Compared with the Golden Run), As with the default use case, cross checking can be refined in terms of what output to consider via --ignore-output-cc (both stdout and stderr channels are ignored), --match-err-cc (pattern match stderr against a given string) and --match-out-cc (pattern match stdout against a given string).

Using Wrapper Script `result_differs.py`

For cases where the --cross-check option is not flexible enough, ddSMT provides a wrapper script scripts/result_differs.py. The script runs two solvers A and B, expects them to output sat or unsat and compares their output. This can be useful if you want to allow that the input flips from sat to unsat (or vice versa) as long as the solver under test still disagrees.

This script can be adapted to any other kind of further analysis that you might need but is not supported by the cross check option, for example, if you want to compare the output of the two solvers and want to allow any output to compare models or error messages.

#!/usr/bin/env python3

import signal
import subprocess
import sys


def handler(sig, frame):
    print("Aborted")
    sys.exit(1)


signal.signal(signal.SIGTERM, handler)


def run(cmd):
    cmd = cmd + [sys.argv[1]]
    res = subprocess.run(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
    return res.stdout.decode('utf8').strip()


A = run(['solverA', '--option'])
B = run(['solverB', '--option'])

if A not in ['sat', 'unsat']:
    print(f'Unexpected output for A: {A}')
    sys.exit(2)

if B not in ['sat', 'unsat']:
    print(f'Unexpected output for B: {B}')
    sys.exit(2)

print(f'{A} / {B}')
if A == B:
    sys.exit(0)
else:
    sys.exit(1)

Debugging Performance Issues

Performance issues are issues where a solver performs significantly worse than a reference solver. Typically, this means that the solver under test terminates successfully but with a significantly higher run time, or doesn’t terminate at all within a given time limit. Performance issues are similarly nasty to delta debug as unsoundness issues. And as with unsoundness, it usually only makes sense to debug performance issues with respect to a reference solver.

ddSMT provides a wrapper script scripts/compare_time.py for debugging performance issues. It runs two solvers A and B and checks whether the slower one is (still) much slower than the faster one. Since it outputs some information about the run time of the two solvers, it should be used in combination with option --ignore-output.

#!/usr/bin/env python3

import subprocess
import sys
import time

filename = sys.argv[1]

solvers = {
    'A': ['solverA', '--option'],
    'B': ['solverB', '--option'],
}
# minimal factor between the faster and the slower solver
threshold = 10

slow = 'A'

dur = {}

CUR_PROC = None


def stop_process(proc):
    proc.terminate()
    try:
        proc.wait(timeout=0.5)
    except subprocess.TimeoutExpired:
        proc.kill()
    try:
        proc.wait(timeout=0.5)
    except subprocess.TimeoutExpired:
        print(f'Killing pid {proc.pid} failed. '
              'Please check manually to avoid memory exhaustion.')


def handler(sig, frame):
    if CUR_PROC is not None:
        stop_process(CUR_PROC)


for s in solvers:
    before = time.perf_counter()
    try:
        CUR_PROC = subprocess.Popen(solvers[s] + [filename],
                                    stdout=subprocess.PIPE,
                                    stderr=subprocess.PIPE)
        CUR_PROC.communicate(timeout=10)
    except subprocess.TimeoutExpired:
        stop_process(CUR_PROC)
    duration = time.perf_counter() - before
    print('{} -> {}'.format(s, duration))
    dur[s] = duration

factor = max(dur.values()) / min(dur.values())
print('Factor: {}'.format(factor))

# discard if the slower solver became too fast
if dur[slow] < 3:
    print('-> 0')
    sys.exit(0)

if factor > threshold:
    # still much slower
    print('-> 1')
    sys.exit(1)
# not that much slower
print('-> 0')
sys.exit(0)

Eliminate `let` Binders

SMT-LIB let binders are notorious for impeding simplification during delta debugging. While the mutators that target let binders (ddsmt.mutators_smtlib.LetElimination and ddsmt.mutators_smtlib.LetSubstitution) are applied in an early stage for both strategies ddmin and hierarchical, it may still take a while until they are taken care of. If the input contains a lot of let binders, it is sometimes useful to first eliminate them by using only these two mutators, and then delta debug the resulting files without let binders.

$ ddsmt --disable-all --let-elimination --let-substitution <input> <output> <solver>

If you are reasonably sure that the issue is unrelated to let binders itself, you can use option --unchecked to not even execute the command under test. Note that you should do two separate runs for let substitution and let elimination in this case:

$ ddsmt --unchecked --disable-all --let-substitution <input> <output> <solver>
$ ddsmt --unchecked --disable-all --let-elimination <input> <output> <solver>

Common Scenarios

Debugging Unsoundness

Using Option --cross-check

Using Wrapper Script result_differs.py

Debugging Performance Issues

Eliminate let Binders

Using Option `--cross-check`

Using Wrapper Script `result_differs.py`

Eliminate `let` Binders