Academic Session: 2021-22
Lab 2: Reasoning in FOL
Francisco & Giles
A few points before we begin:
• Submission is via git for this lab - see the section at the end of this document about submission. You
should use your comp24412 repository (branch lab2). To get the provided scripts you will need to run
git checkout lab2
./refresh.sh
• This lab isn’t about writing code.
• Most of the lab is about running the Vampire theorem prover and explaining the results. If you fully
understood what happened in lectures then exercises 1-5 should take less than 30 minutes.But I expect
that this won’t be the case for most people and it will be necessary to revise some concepts.
• The last part is about modelling something in FOL. You can be as ambitious as you want here but be
warned - it can be addictive. I suggest (i) writing a simple fallback knowledge base to start with, (ii)
keep lots of copies of your knowledge base as you try out different things - you will make a change you
don’t like and want to ’roll back’!
• Some of the activities don’t have any associated marks - these are clearly labelled.
• In addition, every place where you can get marks is signposted with symbol ☞
Good luck and have fun!
Learning Objectives
At the end of this lab you should be able to:
1. Translate first-order formulas into the TPTP format
2. Run Vampire on TPTP problems and explain what the proofs mean
3. Relate Vampire options to the given-clause algorithm (and related concepts e.g. ordered resolution,
redundancy, reasoning with theory axioms etc) given in lectures
4. Use Vampire to answer queries on a ‘real’ knowledge base
1
Activity 1: Starting with Vampire (Unmarked)
We’ve looked at simple reasoning methods which could be implemented in a few 100 lines of Python. In this
part we’re going to play with a complicated reasoning tool. The Vampire theorem prover consists of about
200k of C++ code, although the parts you will use probably use less than half of this.
In this activity you’ll get used to using Vampire. Nothing is submitted or assessed. The aim here is to run
Vampire on a few small problems and in doing so learn
1. How to get hold of Vampire and where to look for more information
2. How to represent first-order formulas in the TPTP syntax
3. How to run Vampire form the command line
4. How to read Vampire proofs
Getting Vampire
Before you can begin you’ll need to grab a version of Vampire. You can either download a pre-compiled
binary or build your own (e.g. on your own machine) by following instructions here: https://vprover.
github.io/download.html. Any relatively modern version of Vampire will work, although you’ll need a
version with Z3 for parts of Activity 7. There are additional notes on getting Vampire on Blackboard.
Getting More Information
Hopefully this lab manual is reasonably self contained but if you want more information about the things
mentioned in it then here are some places to look:
• There is an Academic Paper from 2013 that gives a good theoretical overview of Vampire and how it
works
• Giles gave a day tutorial on Vampire in 2017 and the material can be found here although it is not
very well organised at the moment.
• There is a lot more information about TPTP at their website and information about other solvers can
be found at the competition website
Writing Problems in TPTP
The first thing we’re going to try and solve in Vampire is the following small example problem from lectures.
We want to check the following entailment.
∀x.(rich(x) → happy(x))
rich(giles)
|= happy(giles)
The first step is to write this problem down in a format that Vampire can read. The default1
input format
for Vampire is TPTP, which is almost exactly an ACSII version of the syntax we’re used to.
The above problem will be written as follows in TPTP
fof(one,axiom, ![X] : (rich(X) => happy(X))).
fof(two,axiom, rich(giles)).
fof(three, conjecture, happy(giles)).
1Vampire also accepts SMT-LIB format, which is a lisp-based syntax used by SMT solvers.
2
This consists of three formulas where a formula is of the form form(name,kind,formula) where
• form describes the logic being used. Here we will usually use fof standing for first-order form, cnf
standing for conjunctive normal form, or tff standing for typed first-order form.
• name is a name given to the formula. Vampire only uses this in proofs. It doesn’t impact proof search
at all. Names don’t even need to be distinct.
• kind describes the kind of formula. We will mainly use axiom and conjecture although later you
will see us using question as well. The meaning of a TPTP file is that the conjunction of the axiom
formulas entails the conjecture formula. You can have at most one conjecture. You can have no
conjecture and then you are checking the consistency of the axioms.
• formula is the logical formula being described, we discuss this further below.
The syntax of formulas is what one might expect for an ASCII based formula representation. It is important
to note that TPTP copies Prolog by using words beginning with upper-case letters for variables and those
beginning with lower-case letters for constant/function/predicate symbols. Briefly the relevant syntax is:
• ~ for not
• & for conjunction (and), | for disjunction (or)
• => for implication, <=> for equivalence
• ![X]: for universal quantification. Note that you can group these e.g. ![X,Y]:
• ?[X]: for existential quantification
• = for equality and != for disequality
Now that you know the syntax try writing out the following formula in TPTP syntax
∀x.((∃y.p(x, y)) → q(f(x)))
if that was tricky then go back and read the previous explanation again or ask for help. You can check
whether your solution is syntactically correct by running ./vampire --mode ouptut problem.p where you
put your TPTP problem in problem.p. This will print your problem back to you if you got the syntax right
or give you a (hopefully helpful) error message if you did not.
Proving things with Vampire
Save the above TPTP problem for the rich(X) => happy(X) example in rich.p. The first thing Vampire
will do is turn the problem into a set of clauses. We can see how Vampire does this by running the following
command
./vampire --mode clausify rich.p
Does the result make sense? Can you write out the resulting clauses in English and in the first-order logic
syntax we are used to?
Now let’s solve the problem by running
./vampire rich.p
The proof shows the full clausification process and then the reasoning steps. The clausification process
contains a flattening step, this is just an intermediate rewriting step that Vampire often performs
- here it was unnecessary. The proof should consist of two reasoning steps labelled resolution and
subsumption resolution. Remember, this second rule is a special kind of resolution that we perform
as a simplification. We don’t see the difference in the proof as this just impacts the search space (we delete
one of the parents of the resolution). So, for the proof you can think of submsumption resolution just as
resolution. However, remember that the submsumption resolution rule is applied at a different point in
proof search than resolution. We can turn this off with the option -fsr off.
3
Now try running
./vampire -fsr off --selection 0 rich.p
Do you get a different proof? You should because we have changed how we are doing literal selection for
ordered resolution. The value 0 just means select everything so we get unordered resolution in this case. If
you run
./vampire -fsr off --selection 0 rich.p --show_active on --show_new on
you will see the order in which clauses are selected and which clauses are produced by activating (selecting)
them.
Reminder about given clause algorithm. Vampire runs a given-clause algorithm (as discussed in the lectures).
Remember that this works by iteratively selecting a clause and adding it to a set of active (selected) clauses
and performing all inferences between the selected clause and the active set. This is illustrated in the below
diagram where we have an active set, a passive set, and an unprocessed set (as we did in the algorithm shown
in lectures). When we do --show_active on we are asking Vampire to print out the clauses it selects for
activation. When we do --show_new on we are asking Vampire to print out all of the clauses produced by
inferences.
Active b Passive
Unprocessed
To avoid doubt, we will reproduce what you should have found here. I have also added the option
--proof extra free to include some extra information in the output. Without setting selection to 0 you
should get the following new and active clauses
[SA] new: 7. happy(X0) | ~rich(X0) [cnf transformation 6] {a:0,w:4}
[SA] new: 8. rich(giles) [cnf transformation 2] {a:0,w:2}
[SA] new: 9. ~happy(giles) [cnf transformation 5] {a:0,w:2,goal:1}
[SA] active: 9. ~happy(giles) [cnf transformation 5] {a:0,w:2,nSel:1,goal:1}
[SA] active: 8. rich(giles) [cnf transformation 2] {a:0,w:2,nSel:1}
[SA] active: 7. ~rich(X0) | happy(X0) [cnf transformation 6] {a:0,w:4,nSel:1}
[SA] new: 10. happy(giles) [resolution 7,8] {a:1,w:2}
[SA] active: 10. happy(giles) [resolution 7,8] {a:1,w:2,nSel:1}
[SA] new: 11. $false [resolution 10,9] {a:2,w:0,goal:1}
and with selection as 0 you should get
[SA] new: 7. happy(X0) | ~rich(X0) [cnf transformation 6] {a:0,w:4}
[SA] new: 8. rich(giles) [cnf transformation 2] {a:0,w:2}
[SA] new: 9. ~happy(giles) [cnf transformation 5] {a:0,w:2,goal:1}
[SA] active: 9. ~happy(giles) [cnf transformation 5] {a:0,w:2,nSel:1,goal:1}
[SA] active: 8. rich(giles) [cnf transformation 2] {a:0,w:2,nSel:1}
[SA] active: 7. happy(X0) | ~rich(X0) [cnf transformation 6] {a:0,w:4,nSel:2}
[SA] new: 10. ~rich(giles) [resolution 7,9] {a:1,w:2,goal:1}
[SA] new: 11. happy(giles) [resolution 7,8] {a:1,w:2}
[SA] active: 10. ~rich(giles) [resolution 7,9] {a:1,w:2,nSel:1,goal:1}
[SA] new: 12. $false [resolution 10,8] {a:2,w:0,goal:1}
4
What’s the difference? Can you explain what’s happening. Note that this is ordered resolution being
performed and without setting selection to 0 we use a custom selection function.
Let me briefly explain what we mean by the line
[SA] active: 10. ~rich(giles) [resolution 7,9] {a:1,w:2,nSel:1,goal:1}
The part [SA] active: means that this is coming from the Saturation Algorithm and is about clause
activation (when a clause is selected in the given clause algorithm). The 10 is the id number of the clause
- clauses are numbered as they are produced. Then we have the clause ~rich(giles) in cnf format. The
next bit, [resolution 7,9] describes how the clause was generated e.g. this was generated by resolution
from clauses 7 and 9. Finally, the extra string {a:1,w:2,nSel:1,goal:1} gives the age of the clause, the
weight of the clause, the number of selected literals in the clause (which are always the leftmost literals in
active clauses), and whether this clause is derived from the goal. This is the information that is used for
clause and literal selection.
This problem is not big enough yet to be able to explore different clause selection strategies but you can
control how Vampire performs clause selection using the --age_weight_ratio option where you specify a
ratio between selecting clauses by age and by weight e.g. 10:1 says that for every 10 clauses selected by age
you select one by weight.
On Equality. For now you should add the option
--equality_proxy RSTC
to translate equality into a special predicate as described in lectures. If you drop this then you will see steps
marked as superposition or demodulation, which are techniques mentioned briefly in the final part of lectures.
WARNING: Remember that equality here is different from the notion of equality we met in the Prolog
fragment. There we were working with assumptions such as unique name assumption and a term algebra
interpretation where equality was the same as syntactic equality (e.g. unifiability). Here equality is a binary
predicate with some well-defined axioms.
Missing Things
There are lots of things in Vampire that I haven’t told you about. We want to turn off most of these during
this lab so you should generally run Vampire using the following set of options
./vampire -updr off -fde none -nm 0 -av off -fsr off -s 1 -sa discount -ep RSTC
To help I have created a shell script run_vampire that wraps up Vampire and uses this set of options for
you. If you don’t use this then you might get different results in these and later exercises. You can just add
extra options after this and they override the ones in the file e.g.
./run_vampire --selection 0 problem.p
You need to have the vampire binary in the same directory for the script to work.
If you’re interested in what an option does then you can find out by running, for example,
./vampire -explain updr
Entailment and Question Answering
The above shows how to check ground entailments by writing a ground conjecture such as the conjecture
fof(three,conjecture, happy(giles)). We can also use conjectures to ask non-ground conjectures e.g.
we can ask if there exists somebody who is happy as fof(three, conjecture, ?[X] : happy(X)) - what
does the proof look like? We can also check universal entailments e.g.
fof(a,axiom, ![X,Y] : (f(X,Y) => ~f(Y,X))).
fof(a,conjecture, ![X] : ~f(X,X)).
which checks whether f being anti-symmetric entails f being irreflexive.
5
This is fine but we aren’t getting the nice answers we had for Datalog and Prolog. As mentioned in lectures,
we can use a trick to extract these. If we add --question_answering answer_literal to the Vampire options
and run with an existential conjecture we get an answer tuple. This works by rewriting the conjecture
such that solving the conjecture would involve a single final resolution step and then intercepting this step.
At the time of writing advanced features for question answering in Vampire exist in an experimental branch
but I am exploring putting them into the master branch and this text will be updated if I do this. Currently
Vampire will only return one answer but the advanced features allow it to return many answers. Note that
in general, due to the open-world interpretation in first-order logic, there may no longer be a notion of all
of the answers.
Exercise 2: Exploring Some Simple Proofs
Consider these two simple problems:
∀x.(happy(x) ↔ ∃y.(loves(x, y)))
∀x.(rich(x) → loves(x, money))
rich(giles)
|= happy(giles)
∀x.(require(x) → require(depend(x)))
depend(a) = b
depend(b) = c
require(a)
|= require(c)
For each problem you should do the following:
☞ 1 Translate the problem to TPTP
☞ 2 Clausify the problem by hand and compare this to the output of clausification with Vampire
☞ 3 Solve each problem using Vampire and record the proofs
☞ 4 For the first problem:
(a) Solve the problem using the default options (the ones provided in the run_vampire script) and
select a resolution step and identify the selected literals and the unifier required to perform the
step
(b) Solve the first problem using unordered resolution (add --selection 0) and comment on how
proof search changes. If you get a different proof then why?
☞ 5 For the second problem:
(a) Solve the problem using default options (including -ep RSTC)
(b) Solve it again adding --selection 0 --age_weight_ratio 10:1 and comment on how proof
search changes.
(c) Now add -ep off (keeping the options from the previous step) and comment on what happens
when we don’t translate away equality. Can you identify a non-trivial case (e.g. where some
unification is required).
You should save (and submit) your TPTP problems as greed.p and dependency.p respectively. Your
exploration and answers to questions should be saved in a file exercise2.txt.
6
Exercise 3: Exploring a More Complex Proof
Download the problem PUZ031+1.p from TPTP. Back in the 80s this was a problem that was famously too
difficult to solve due to the state space explosion.
Do the following steps and answer the associated questions:
☞ 1 Clausify the problem. How many clauses are there? How many Skolem functions are introduced?
☞ 2 Try and solve the problem using unordered resolution (add --selection 0). What happens?
☞ 3 Now run with the option --statistics full using the default given options. This tells you what
Vampire did during proof search e.g. the number of clauses generated and activated and different kinds
of inferences performed. Answer the following questions:
(a) How many clauses were generated during proof search?
(b) How many clauses were activated (selected)?
(c) How many times was the resolution rule applied?
(d) How long is the proof (i.e. how many steps)? (Hint: this is not the same as the number of the
last clause as many generated clauses are not included in the proof)
☞ 4 Play with the value of age_weight_ratio to see if selecting more clauses by age or weight generates
fewer clauses overall for this particular problem. Remember, -awr 10:1 selects 10 clauses by age for
every 1 it selects by weight and vice versa for -awr 1:10.
☞ 5 Run with --show_reductions on and extract and explain a subsumption. Recheck the statistics and
see how many subsumption deletions there were.
☞ 6 Now let’s turn subsumption deletion off with --forward_subsumption off. How does this impact (i)
the number of generate clauses, and (ii) the length of the proof?
You should save your answers in exercise3.txt
Exercise 4: Exploring Saturations
So far we have just looked at proofs of entailment involving producing the empty clause. Here we briefly
look at the converse; saturation showing consistency.
Consider this modified version of a previous problem
∀x.(happy(x) ↔ ∃y.(loves(x, y)))
∀x.(rich(x) → loves(x, money))
happy(giles)
|= rich(giles)
You should
☞ 1 Translate the problem into TPTP (this should just involve copying and modifying your previous file)
☞ 2 Try and prove the problem using default options and describe what happens and what this means
☞ 3 Repeat the previous step with unordered resolution (--selection 0) and explain why something
different happens
☞ 4 Now run with the option --unused_predicate_definition_removal on and explain why something
different happens. It might help to run ./vampire -explain updr.
☞ 5 Finally, run with the option -sos on and explain why something different happens. Again, running
./vampire -explain sos is likely to help. It might also help to refer back to the above description
of the given clause algorithm and consider what saturation means.
You should save your answers in exercise4.txt.
7
Activity 5: Exploring Model Building (Unmarked)
I haven’t spoken to you about model building but it can be interesting to see alternative reasoning methods.
Take the problem from Exercise 4 (on saturation e.g. we know that the problem is consistent) and add an
axiom to express giles 6= money and then run
./run_vampire --saturation_algorithm fmb problem.p
What you will get is a finite model that makes the axioms true but the conjecture false. This is why the
reported status is CounterSatisfiable. The finite model building process implemented in Vampire works
by translating the problem of finding a model of a particular size into a SAT problem and iterating this
process for larger model sizes.
Exercise 6: Modelling and Answering Questions
This is a more open-ended exercise. The work required to get full marks is high (but it shouldn’t take that
much effort to get most of the way there). There are four steps for you to follow - you should pick a domain,
model it, check your model makes sense, and then write some conjectures representing queries that you want
to check for entailment. See below for details.
Your knowledge base in TPTP format should go in exercise6.p and your conjectures should go in
exercise6_conjectures.p with one conjecture per line in TPTP format (so we can check them automatically
- you don’t need to put any explanations in this file). Finally, you should write a brief explanation
of what you’ve done in exercise6.txt - we suggest including a heading per step and writing something
beneath each heading. In particular, if you find that you need to run Vampire in a certain way to get things
to work then mention this in exercise6.txt.
☞ Step 1 - The Domain. You can pick anything you want but you will probably find it easier if you pick
something where there are some clear extensional relations (e.g. objects you can describe the properties of)
and some clear intensional relations (i.e. those defined in terms of other relations). Ideally you will have a
recursively defined relation. Some ideas include modelling a football league, different pizza toppings, parts
of the body, the rules of a simple game, or your degree courses.
☞ Step 2 - Modelling. Use the TPTP language to model your chosen domain. By now you should be
familiar with the TPTP syntax (see above for links if not). Remember that parenthesis are important for
defining the scope of quantification! You will need to choose either the fof or tff language. In general, you
are likely to have the following things in your model:
1. If you choose tff then you will need to define the signature e.g. the types of the symbols you will use.
2. Defining some individuals (constants) and their properties and relations to each other (using ground
facts). You will probably want to declare that these individuals are unique and you may want to close
a particular (sub)domain - see below for hints.
3. Defining some helper predicates (relations). For example, you may want to define a structure (e.g.
lists), an ordering, or a way of converting between objects.
4. Defining the main predicates (relations) and their relation with each other. You should aim to make
use of first-order features that are not available in the Datalog or Prolog fragments e.g. disjunctive
’rule heads’ and (nested) existential quantification.
In the last section of this document Giles gives more detailed hints on modelling and a couple of examples.
8
☞ Step 3 - Checking your Knowledge Base. Ideally your knowledge base will be consistent and minimal.
Write down some thoughts in exercise6.txt on how you can do this with Vampire. Why do we want our
knowledge base to be consistent? What might we mean by minimal? As an example, consider this small
knowledge base:
∀x.playsIn(x, homeGround(x)
∀x.(¬team(x) ∨ hasManager (x, managerOf (x)))
∀x.(¬team(x) ∨ womansTeam(x) ∨ hasManager (x, managerOf (x)))
homeGround(arsenal) = emirates
playsIn(arsenal, emirates)
Are there any axioms that could be removed without changing what is true?
☞ Step 4 - Queries and Conjectures. Finally, you should write some queries (e.g. ground or existential
conjectures) and/or conjectures that you want to check on your knowledge base. You can ask things that
you expect to be entailed and those you don’t expect to be but in the latter case you won’t get an answer
if you use arithmetic (recall that reasoning in FOL with arithmetic is always incomplete). If you can’t get
Vampire to answer your queries/conjectures in the default mode provided try the portfolio mode which uses
a lot of options you haven’t met in a long sequence of strategies. (The next activity invites you to experiment
with --mode portfolio)
Activity 7: Exploring Arithmetic (Unmarked)
Let’s take a look at how Vampire deals with arithmetic, a common domain required for modelling interesting
things. In lectures we heard that reasoning with arithmetic is hard. Let’s look at some examples.
Download the following four problems:
• GEG025=1.p - Given some information about cities show that there exists a big nearby city.
• ARI744=1.p - Showing that the definition of square root entails the square root of 4 being 2.
• MSC023=2.p - Demonstrating Fahrenheit 451 to Celcius conversion
• DAT105=1.p - Showing that a certain definition of lists entails a given property
Write out the conjectures in English. You may want to break it down into bullet points, especially for the
first one (you can summarise what the two d and inh predicates are doing).
Now try and solve each problem with Vampire. Here you should use Vampire directly instead of via the
run_vampire script as the problems are hard and you want to use the things that this script turns off - in
particular equality proxy doesn’t work with arithmetic. By default Vampire will add theory axioms and try
and use them to solve the problem. Which problems can Vampire solve in this way? Which theory axioms
did we use (look at the proof)?
There should be at least one problem that we cannot solve (in any reasonable time) just using theory axioms.
There are two approaches that can help in such circumstances:
• Running with -sas z3 runs the method mentioned in lectures that uses an SMT solver, as described
in my paper AVATAR Modulo Theories. This helps if some ground part of the search space is theory
inconsistent.
• Running with -thi all -uwa all is the most general way of running the techniques described in Giles’
paper Unification with Abstraction and Theory Instantiation. Together these help make unification
aware of theories e.g. to allow us to resolve things when we can make things equal in the theory, not
just syntactically.
9
Do either of these help with the problem(s) you couldn’t solve? Can you see which bit of the proof uses the
new options (don’t worry if you can’t).
Finally, there is the ‘killer’ option of --mode portfolio that runs Vampire in a mode where it attempts
lots of different complicated strategies in sequence. This kind of portfolio mode (there are others) is how
Vampire is typically run in the ‘real world’. If you still haven’t solved one of the above problems, does this
help? If so, what is the ‘strategy’ string that is used?
A strategy string is something like
dis+1011_10_add=large:afr=on:afp=4000:afq=1.0:amm=off:anc=none:lma=on:nm=64:nwc=4:sac=on:sp=occurrence_2
and is an encoding of the set of options that define a strategy. You can ask Vampire what any of the named
options mean e.g. ./vampire -explain nm but the explanations for some of the more experimental options
can be a bit vague!
Submission and Mark Scheme
As usual, submission is via git. You must work on the lab2 branch and tag your submission as lab2 solution.
The provided submit.sh script does this for you. Note that you must push your tagged commit to Gitlab
before the deadline (it is not good enough just to make the commit locally).
Your work will be marked offline and returned to you with comments.
The marking scheme is given below. Note that the marks may seem a bit uneven. I have purposefully given
Exercise 2 more weight as it contains the fundamental things I want you to take away from this lab. The
challenging parts are the second marks of exercise 4 and the final two marks of exercise 6 (e.g. the difference
between 80% and 100%).
In general, written answers do not need to be long (some answers may be a few words) but they need to
address the question being asked.
Mark Scheme:
Ex 2 (6 marks) - Three marks for each problem (1 mark for steps 1+2, 2 marks for steps 2+4/5). Full marks
for good answers, half marks for reasonable attempts.
Ex 3 (4 marks) - There are 0.5 marks for each question except questions 3 and 4 which are worth 1 mark
each.
Ex 4 (2 marks) - There are 0.5 marks each for steps 2-5.
Ex 6 (6 marks) - This is split as follows:
– 4 marks for the knowledge base itself. Something that is simple but (i) goes beyond what we
can do in Prolog/Datalog and (ii) works (e.g. the Simple Example) gets 2 marks. An extra
mark for additional complexity (e.g. good use of functions, recursive relations, and/or quantifier
alternation). The final mark is for something impressive (you can ask). A maximum of 2
marks will be awarded if the explanation is inadequate.
– 1 mark for the set of conjectures. For full marks you need at least four conjectures, with at least
one being universally quantified, and a discussion for each conjecture covering whether it holds.
– 1 mark for the answer to Step 3 (0.5 for thoughts on consistency, 0.5 for thoughts on minimality)
10
Hints for Exercise 6
This section can be updated on request although also see the forum.
You should apply Occam’s razor - simplicity is best for two reasons: (1) it’s easier to write, and (2) it will be
easier to do reasoning. These two are often at odds. For example, for (2) it can be better to avoid equality
and the built-in arithmetic language but these can help with (1) e.g. make things easier to write. Don’t be
afraid to reduce, reformulate or remove something later if it turns out to be unhelpful for reasoning.
Some further hints:
• In TPTP one can write fof(d,axiom,$distinct(a,b,c)). as a shortcut for the formula fof(d,axiom,
a!=b & a!=c & b!=c) e.g. that the constants are distinct. This only works on the formula level and
only works with constants. It is useful for encoding unique names.
• If you want to specify domain closure, e.g. that the only fruit are bananas or apples, then the standard
way of doing this is
∀x.(fruit(x) → (x = apple ∨ x = banana))
i.e. if something is a fruit then it is either an apple or banana.
• In the tff language you can introduce the notion of sorts to reduce the amount of guarding you need
to do in different places. For example, if I have sorts I can replace the above domain closure axiom
with a non-guarded one:
tff(fruitType, type, fruit : $tType).
tff(appleType, type, apple : fruit).
tff(bananaType, type, banana : fruit).
tff(dcf, axiom, ![X:fruit] : (X=apple | X=banana)).
although in this case the guard ¬fruit(x) is something that can be selected, thus blocking unnecessary
use of the axioms in certain cases - so you might make reasoning harder by removing it. If that doesn’t
make sense, try the two alternatives and see what happens!
• If you’re tempted to use numbers and orderings on them just because you want to order things then
pause and remember to keep things simple. Numbers come with lots of extra baggage. You probably
just want to add your own ordering with axioms. Consider whether you really need transitivity as in
general it slows down proof search. If you have a small (and finite) number of objects you are ordering
you might find it easier to make the ordering explicit.
• Remember that equality in FOL is not the same as equality in Prolog where equality was just syntactic
equality. This is helpful, as it allows us to write useful things, but it is also important that you don’t
rely on this.
• TPTP doesn’t have a native idea of lists. Lists can be captured by the function cons(x, list) which
constructs a list out of a new element x and another list list and the constant empty list empty. You
will also need to add some other axioms:
– ∀x, list.cons(x, list) 6= empty
– ∀x1, x2, list1, list2.(cons(x1, list1) = cons(x2, list2) → (x1 = x2 ∧ list1 = list2))
11
To prevent infinite models of lists we also need to ensure that lists are acyclic, which requires an acyclic
sublist relation:
– ∀x, list.(sublist(list, cons(x, list)))
– ∀x, y, z.((sublist(x, y) ∧ sublist(y, z)) → sublist(x, z))
– ∀x.¬sublist(x, x)
Note that, due to the remark below, this is a conservative extension of the theory of lists as we cannot
use first-order logic to make sublist the true transitive closure of the direct sublist relation.
Remark. We cannot represent the transitive closure of a relation in first-order logic. We do it all the time
in Prolog so that might seem strange. What we can do is state the a relation is transitive (the definition
should be obvious) but to say that one relation is the transitive closure of another requires us to say that it
is the smallest such relation and first-order logic isn’t strong enough to say this as it requires us to quantify
over relations (which we can do in second-order logic but not first-order). So why can we do it in Prolog?
Because of the closed-world assumption which implicitly quantifies over all defined relations and says that
they are the smallest relations satisfying the given constraints! What does this mean practically? Any proof
using a transitive relation still holds for the smallest such relation, but non-proofs (e.g. saturations) may
not as we don’t know which relation is used by the associated model.
A Simple Example. Imagine your domain is about fruit and vegetables. You might write the following
simple knowledge base which describes some vegetables and fruit.
fof(one,axiom, vegetable(carrot)).
fof(two,axiom, vegetable(lettuce)).
fof(three,axiom, vegetable(cucumber)).
fof(four,axiom, fruit(apple)).
fof(five,axiom, fruit(banana)).
fof(distinct,axiom, ![X] : ~(fruit(X) & vegetable(X))).
fof(distinct,axiom, $distinct(carrot,lettuce,cucumber,apple,banana)).
fof(six,axiom, ![X] : (food(X) <=> (vegetable(X) | fruit(X)))).
i.e. that we have some vegetables and fruit, that something cannot be a fruit and a vegetable, that all named
vegetables and fruit are distinct, and that all food is either vegetable or fruit. Note that I haven’t specified
that the only fruit/vegetables are the ones named (domain closure).
I might have initially added
fof(tomato, axiom, vegetable(tomato)).
fof(tomato, axiom, fruit(tomato)).
but then found that my knowledge base was consistent. I could try and add some exceptions to my rules
e.g. I could extend my knowledge base as follows (keeping axioms 1-6 the same):
fof(tomato, axiom, vegetable(tomato)).
fof(tomato, axiom, fruit(tomato)).
fof(exception,axiom, ![X] : (exception(X) <=> (X=tomato))).
fof(distinct,axiom, ![X] : (~exception(X) => ~(fruit(X) & vegetable(X)))).
fof(distinct,axiom, $distinct(carrot,lettuce,cucumber,apple,banana,tomato)).
Now I have one place I can list my exceptions to the ‘fruit and vegetables are different’ rule and I use this
exception to guard the mutual exclusion rule.
12
If we wanted to know if there is some food that is not a vegetable (it’s not guaranteed just by axiom six)
then we want to ask if such a thing exists e.g.
∃x.(food(x) ∧ ¬vegetable(x))
Expressing this as a conjecture
fof(query,conjecture, ?[X] : ( food(X) & ~vegetable(X))).
will produce a proof with some final steps looking something like this
46. vegetable(apple) [resolution 41,35]
48. $false [resolution 46,36]
That is, before we added exceptions. After adding them we need to show that we are not in an exceptional
case so the last step will probably involve apple != tomato. From this we could infer that the food that is
not a vegetable was an apple but doing this by looking at the proof is cumbersome. We can use the question
answering mode mentioned earlier. If we run
./run_vampire fruit.p --question_answering answer_literal
then we get
% SZS answers Tuple [[apple]|_] for fruit
Next I might want to know whether being a fruit implies being a food. I could check this by posing the
conjecture
fof(query, conjecture, ![X] : (fruit(X) => food(X))).
which is something we can’t do in Prolog. Here we are looking for a Theorem result - it’s not an existential
query so we don’t use question answering. If Vampire returns Theorem it means that the conjecture is always
true (valid), thus the entailment holds.
If I posed a different conjecture e.g.
fof(query, conjecture, ![X] : (food(X) => fruit(X))).
I might get a CounterSatisfiable answer, which means that there is a model for the knowledge base that
is also a model for the negated conjecture e.g. the conjecture does not hold in all models of the knowledge
base and is not entailed by it.
If you use the experimental question answering branch (see Blackboard) you can get more answers e.g.
./vampire_qa --question_answering answer_literal -av off fruit.p
% SZS answers Tuple [[banana],[apple]|_] for fruit
This experimental branch won’t terminate when it finds one answer but carry on going until there are no
more answers or it has found question_count answers (specified by an option). Because of this, you won’t
get a proof (as there would need to be a proof per answer).
13
An Advanced Example. Most of you will have a knowledge base that models some real world thing with
individuals, properties, and relations. That’s fine and you can (quite straightforwardly) get full marks for a
knowledge base of that kind. But you can also model something a bit more abstract. Here I have a go at
modelling the quicksort algorithm and attempt to show that it always sorts lists. I don’t quite get to the
end of this journey but the journey would definitely count as ’impressive’ in the mark scheme.
The first thing I need to do is define our notion of lists and ordering, both necessary for the question of
sorting. As per the above hints, I can do this as follows. Note that I introduce my own ordering rather than
relying on something like < from the theory of arithmetic.
% Basic List axioms
fof(l1, axiom, ![X,L] : cons(X,L) != empty).
fof(l2, axiom, ![X1,X2,L1,L2] : ( cons(X1,L1) = cons(X2,L2) => (X1=X2 & L1=L2))).
fof(l3, axiom, ![X,L] : sublist(L,cons(X,L))).
fof(l4, axiom, ![X,Y,Z] : ( ( sublist(X,Y) & sublist(Y,Z)) => sublist(X,Z) )).
fof(l5, axiom, ![X] : ~sublist(X,X)).
% ordering of elements
fof(o1, axiom, ![X,Y] : ( (ord(X,Y) & ord(Y,X)) => X=Y)). % antisymmetry
fof(o2, axiom, ![X,Y,Z] : ((ord(X,Y) & ord(Y,Z)) => ord(X,Z))). % transitivity
fof(o3, axiom, ![X] : ord(X,X)). % reflexivity
fof(o4, axiom, ![X,Y] : (ord(X,Y) | ord(Y,X) )). % totality
Note that axioms l4 and o2 are dangerous. They allow me to generate arbitrarily complex consequences.
One way of taming either (a little) would be to stratify by using a ’direct’ sublist/less-than relation. These
relations are the main (but not only) reason that some reasoning steps below don’t work. In some cases I
could turn off (comment out) some of these axioms as they are not always useful for proof search.
The next thing I’m going to specify is sortedness. I introduce a predicate on lists that is meant to be true if
that list is sorted (wrt my ordering relation).
% sorted
fof(s1, axiom, sorted(empty)).
fof(s2, axiom, ![X] : sorted(cons(X,empty))).
fof(s3, axiom, ![X,Y,L] : (sorted(cons(X,cons(Y,L))) <=> (sorted(cons(Y,L)) & ord(X,Y)))).
At this point I can test this using a few conjectures. For this I need some ordered elements. I can either add
some extra facts to my knowledge base for testing i.e.
% My elements
fof(e1, axiom, ord(a,b) & ord(b,c) & ord(c,d) & ord(d,e) & ord(e,f) & ord(f,g)).
fof(e2, axiom, $distinct(a,b,c,d,e,f,g)).
and use these to ask conjectures such as
fof(c, conjecture, sorted(cons(a,cons(b,cons(c,cons(d,empty)))))).
or I could include the ordering information in the conjecture e.g.
fof(c,conjecture, ($distinct(a,b,c) & ord(a,b) & ord(b,c))
=> sorted(cons(a,cons(b,cons(c,empty))))).
14
Now I can start so specify quicksort but before I do that I realise that I’m going to need a function to append
lists. So I specify this as
% append
fof(a1, axiom, ![L]: append(empty,L) = L).
fof(a2, axiom, ![X,L1,L2]: append(cons(X,L1),L2) = append(L1,cons(X,L2))).
and can test it using conjectures such as
fof(c, conjecture, append(cons(a,empty),cons(b,empty)) = cons(a,cons(b,empty))).
At this point I could try and establish more general properties such as the length of append(X, Y ) is equal
to the length of X plus the length of Y . However, I would need induction for this and first-order logic isn’t
strong enough for that by itself - we’ll return to this point later.
Okay, now I’m ready to specify quicksort and I do so as follows:
% quicksort
fof(q1, axiom, quicksort(empty) = empty).
fof(q2, axiom, ![L] : (L!=empty => quicksort(L) =
append(quicksort(left(L)),cons(pivot(L),quicksort(right(L)))))).
I made some choices here that could have been made differently. I specify quicksort simply in terms of a
recursive call to a left and right partition of the list. The left and right partitions are defined in terms of a
general partition function that I define next. At this point I reflect that FOL is not a programming language
but my definitions here look quite similar to something I might write in a logical or functional programming
language.
For partition I choose to separate out the different cases rather than attempt to put them all in one definition.
But I have decided to have a single partition predicate that takes an ’order’ argument rather than defining
something like filterLess and filterGreater. Note that partition and left and right are undefined
for empty lists. This doesn’t mean they don’t equal something, I just don’t constrain what they are allowed
to equal.
% partition
fof(p1, axiom, ![L] : (L!=empty => left(L) = partition(L,pivot(L),less))).
fof(p2, axiom, ![L] : (L!=empty => right(L) = partition(L,pivot(L),greater))).
fof(p3, axiom, ![E,C] : partition(empty,E,C) = empty).
fof(p4, axiom, ![E,X,L] : ((ord(X,E) & X!=E) =>
partition(cons(X,L),E,less) = cons(X,partition(L,E,less)))).
fof(p5, axiom, ![E,X,L] : ((ord(E,X))
=> partition(cons(X,L),E,less) = partition(L,E,less))).
fof(p5, axiom, ![E,X,L] : ((ord(E,X) & X!=E)
=> partition(cons(X,L),E,greater) = cons(X,partition(L,E,greater)))).
fof(p6, axiom, ![E,X,L] : ((ord(X,E))
=> partition(cons(X,L),E,greater) = partition(L,E,greater))).
% pivot picks the first element. I could try something more complicated but that
% would probably be too complicated
fof(p7, axiom, ![X,L] : pivot(cons(X,L)) = X).
It might have been more general to write partition as
fof(p8, axiom, ![E,X,L,C] : ((((C=greater & ord(E,X) & X!=E)) | (C=less & ord(X,E) & X!=E))
=> partition(cons(X,L),E,C) = cons(X,partition(L,E,C)))).
fof(p9, axiom, ![E,X,L,C] : ((((C=greater & ord(X,E))) | (C=less & ord(E,X)))
=> partition(cons(X,L),E,C) = partition(L,E,C))).
But that gets a bit complicated and the extra structure can bet in the way during proof search.
15
I test the partition function with a few of conjectures:
fof(t1,axiom, test = cons(b,cons(g,cons(c,cons(d,cons(a,cons(f,empty)))))) ).
fof(c, conjecture, partition(test,c,less) = cons(b,cons(a,empty))).
fof(c, conjecture, partition(test,c,greater) = cons(g,cons(d,cons(f,empty)))).
Remember that we only add one conjecture at a time. But trying to solve them with default mode times
out. So I resort too --mode portfolio and one of the strategies solves each problem (I have two conjectures
so two problems). I’ve minimised the strategy to
ott+1002_8:1_bd=off:bs=unit_only:fsr=off:nm=32:nwc=10:sp=occurrence:urr=on:updr=off_14
based on my understanding of what the options mean. A few options here are interesting. The nwc option
makes proof search more goal-directed. The sp option changes the symbol ordering. The options bd, bs, fs,
and updr are turning off certain simplifications.
Now that I’m finished I can test the quicksort function with a number of ground queries
fof(c, conjecture, quicksort(cons(a,cons(b,empty))) = cons(a,cons(b,empty))).
fof(c, conjecture, quicksort(cons(b,cons(a,empty))) = cons(a,cons(b,empty))).
fof(c, conjecture, quicksort(cons(c,cons(b,cons(a,empty)))) = cons(a,cons(b,cons(c,empty)))).
fof(c, conjecture, sorted(quicksort(cons(a,cons(b,empty))))).
fof(c, conjecture, sorted(quicksort(test))).
Again, I need portfolio mode to solve these. The last one requires me to turn off transitivity of ord in a bit
of a hack as otherwise this transitivity swamps the search space.
At one point things weren’t working and I used question answering mode to find out what was going wrong
by asking
fof(c,conjecture, ?[X,L] : quicksort(cons(a,cons(b,empty))) = cons(X,L)).
Note that I cannot simply ask ?[L] : quicksort(cons(a,cons(b,empty))) = L as in FOL we immediately
have a term that equals L - it’s quicksort(cons(a,cons(b,empty))).
Finally, I can ask some more general questions about arbitary lists as follows.
fof(c, conjecture, ![X] : quicksort(cons(X,empty)) = cons(X,empty) ).
fof(c, conjecture, ![X] : (ord(X,Y)
=> quicksort(cons(X,cons(Y,empty))) = cons(X,cons(Y,empty)) )).
fof(c, conjecture, ![X,Y,L] : ((ord(X,Y) & sorted(cons(Y,L)))
=> quicksort(cons(X,cons(Y,L))) = cons(X,cons(Y,L)) )).
fof(c, conjecture, ![L] : sorted(quicksort(L))).
But currently Vampire can only establish the first one. The middle two shouldn’t be out of its reach and
I haven’t explored what’s going on (perhaps my model is a bit off). I know that the last one isn’t possible
without explicitly providing induction axioms (or using the in-built induction for the theory of datatypes
but that requires rewriting the problem in a different format).
16
版权所有:编程辅导网 2021 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。