Feature Vector Indexing

 

Experimental Results


Feature vector indexing is a simple but practically efficient non-perfect indexing technique for first-order clause subsumption. Clauses are represented by constant-lengths vectors of numerical feature values, where each feature has the property that a subsuming clause never has a larger value than a subsumed clause. The index is a trie representing these feature vectors, with indexed clauses stored at the leafs. The same index supports retrieval of subsuming clauses (at each node, follow all branches with feature values smaller than or equal to the query), and subsumed clauses (follow branches with feature values greater than or equal to the query).


Feature vector indexing was first described in [Schulz:IJCAR-WS-2004]. For the data used in the old paper, see the link on the main E Data page. I have recently revisited the topic and made some additional experiments with a newer release of E, a much more current version of the TPTP, and on modern hardware. Moreover, E has been instrumented for profiling code segments of particular interest, so the new experiments allow for a much more detailed analysis.


  1. Test runs were performed with the development versions E 1.4-010 to E 1.4-013. The latest version contains all the tested feature vector functions.

  2. E 1.4-013 source (~1.7 MB, .tgz)

  3. Test runs were done on the University of Miami Pegasus Cluster

  4. Jobs were run on 2.6 GHz 8 Core Intel Xeon machines with 16 GB of main memory each

  5. 8 concurrent processes were scheduled per node, i.e. one per core

  6. A CPU time limit of 300 seconds and a memory limit of 512 MB were in force for each individual test run

  7. The tests were run on all CNF and FOF problems from TPTP-5.2.0.

  8. The results are stored in protocol files

  9. Each protocol contains results for all TPTP problems for a given parameter setting

  10. Lines starting with a # are comment lines

  11. In particular, the first line gives the command line options for the test run

  12. Lines starting with a problem name contain data for that particular problem

  13. The important columns are described in the legend.txt file.

  14. The results: EFVRes.tgz (~6.3 MB)