PEST: Precision Estimated by Sampling Traits

PEST: Precision Estimated by Sampling Traits
Version 3.0
by Cynthia Zujko-Miller and Jeremy A. Miller
Overview

This is version 3.0 of PEST. Here is a brief history of PEST versions: PEST 2.0 (Zujko-Miller and Miller, 2003) is an application written in Microsoft .NET developed to assess congruence between tree files for Continuous Jackknife Function analysis as described in Miller (2003). Version 2.1 (Zujko-Miller and Miller, 2003) added the capability of parsing tree files in nexus format. Version 2.2 (Zujko-Miller and Miller, 2003) added the capability of counting conflicts between trees. Version 2.3 added the capability of counting unresolved nodes and also added an option for discounting the most inclusive resolved node. This latter modification means that users no longer have to modify the reference tree by collapsing the basal node to form a trichotomy. Version 3.0 has an entirely new interface with the capability to spawn tree searches in NONA (Goloboff, 1993). Miller (2003) used PEST 1.0, written in JAVA, to describe Continuous Jackknife Function analysis. PEST 2.3 is faster, easier to use, and more feature rich than PEST 1.0. However, PEST 2.0 and later versions abandon the cross platform potential of the JAVA language. PEST versions 2.0 and later require instillation of the .NET Framework. The original Pest 1.0 is available from this link; PEST 2.3 is available from this link. Pest 3.0 has been tested in Windows 2000 and XP.

Download

Pest.zip program files, demonstration files, .NET Framework, and instructions

Documentation

ReadMe.txt instructions for runing a Continuous Jackknife Function analysis using Pest 3.0

Links

	Microsoft .NET Framework
	NONA and WinClada
	Systematic Biology

Introduction to Continuous Jackknife Function Analysis

Systematists expect their hypotheses to be asymptotically precise, that is, as the number of phylogenetically informative characters increases, a given set of relationships should stabilize on some topology. The best way to assess progress toward phylogenetic precision is a graph where some index of congruence is plotted against accumulated number of characters. Continuous Jackknife Function (CJF) analysis is a graphical method for assessing whether the available data are converging on a specified phylogenetic hypothesis, the reference tree. The method involves removing characters with increasing probability, running phylogenetic analysis on the rarefied data matrices, and assessing the number of clades shared between each of the resulting trees and the reference tree. As more characters are removed, the number of clades shared between trees from a rarefied matrix and the reference tree should decrease. Stable phylogenies take the form of a decreasing curve with nearly 100% congruence for substantial part of the curve. Less stable phylogenies will lose congruent nodes quickly as characters are excluded, resulting in a more or less straight inclined slope or a decreasing function with much less than 100% congruence. Curves can be interpreted as predictors of whether the addition of new data of the same type is likely to alter the phylogenetic hypothesis under test. Continuous Jackknife Function analysis makes statistical assumptions about the collection of character data. To the extent that CJF curves are sensitive to violations of unbiased character collection, they will be misleading as predictors. Convergence of data on a reference tree does not guarantee that the hypothesis is historically accurate, but it does predict that the accumulation of further data of the same type will not lead to rapid changes in the hypothesis.

What You Need

	The Pest.zip file contains the installation program including the .NET framework
	NONA (Goloboff, 1993), which can be downloaded from http://www.cladistics.com
	An xread data file that is compatible with NONA, or NEXUS tree files
	Microsoft Excel
	Optional: A tree file containing the set of most parsimonious trees

Instructions

Run setup.exe and follow the installation instructions. Pest should now run by launching Pest.exe. Pest.exe should bring up a window with four tabs across the top: PEST, Tree search, Congruence, and About. The PEST tab is used to select which portions of the program will be run and how tree congruence will be assessed. The Tree search tab is used to select options related to the use of NONA. The Congruence tab is available for users who want more flexibility in selecting the trees they compare. Users can compare trees found by PEST and NONA at an earlier date or trees found by a variety of phylogenetic software packages, including those that use NEXUS format. Note that options under the Congruence tab are only available when Tree search in NONA (under the PEST tab window) is unchecked and Assess congruence (under the PEST tab window) is checked. The About tab has information on version, authors, and availability.

The PEST tab.
Tree search in NONA. This check box determines whether PEST will be used to launch a tree search pattern in NONA.
Assess congruence. This check box determines whether PEST will assess congruence between trees. Three options for assessing congruence are available: matches, conflicts, and unresolved. Matches counts the number of clades with identical composition (although not necessarily identical internal arrangement) in a comparison of two trees; conflicts counts the number of nodes in conflict between two trees; unresolved counts polytomies in rarefied trees that are resolved in the reference tree. The sum of matches, conflicts, and unresolved nodes will be equal to the number of resolved nodes in the unrooted reference tree.
Reference tree. The user can specify an xread tree file to serve as the reference tree or can allow NONA to determine the reference tree. Under the default setting, the strict consensus of the most parsimonious trees found in NONA under the options selected in the Tree search tab will serve as the reference tree.
Collapse basal node? PEST scores are calculated on an unrooted network. Networks contain one less resolved node than rooted trees. NONA saves tree files with this additional node, not a basal trichotomy. It is recommended that unmodified trees saved using NONA have the basal node collapsed.
Output file. Select the name of the output file and use the browse button to select its location. The output file will be a tab delimited text file. When PEST is finished running, import this file into Excel and graph the results.
Run. Begin analysis. Make sure the proper settings under the Tree search tab have been selected first.

Tree search tab.
Path to NONA. Use the Browse button to indicate the path to NONA.
Path to matrix. Use the Browse button to indicate the path to the data matrix.
Replicates for each batch of rarefied characters. This is the number of times the data matrix will be subjected to each level of character removal.
Interval between batches. This will determine how fine grained the analysis is. If 2 is selected, NONA will first run a series of analyses in which probability of any character being removed from the matrix is 2%, then 4%, until the probability of removal for each character is 98%. If 5 is selected, the probability of character removal will first be 5%, then 10%, continuing until 95%. It is recommended that the interval between batches selected divide evenly into 100.
Number of trees to keep after each search. This is the HOLD/ command in NONA. This limits the number of trees kept in each replication of "MULT." Keeping this number low speeds up individual replicate searches, but can result in missing most parsimonious trees.
Number of random taxon addition replicates. This is the MULT* command in NONA. MULT* performs TBR branch swapping on a Wagner tree generated using random taxon addition. The number selected indicates the number of MULT* replicates for each rarefied matrix. High numbers can dramatically increase analysis time, but low numbers can lead to inaccurate results.
Branch swapping. This is the MAX* command in NONA. This can increase the accuracy of results, but will increase analysis time. Some branch swapping already occurs under the MULT* search.
amb= / amb-. The amb= setting allows NONA to count a node as resolved if its only and uncontradicted support comes from an ambiguously optimized character that can be alternatively optimized to support some other node; the amb- setting would consider that node collapsed.
Tree files. Use the Browse button to select a location where tree files will be saved.
Tree file prefix. Give a name to identify the tree files. One tree file will be created for each percent probability of data removal. A number indicating the percent probability data removal will be appended to the root of the file name.
Save the batch file. This is an option to save the series of commands given to NONA. Use the Browse button to select the path and file name for the batch file. Save the batch file with current settings using the Write Batch File button.

Congruence tab.
This tab is useful for comparing trees found using other programs (including NEXUS programs like PAUP*), or comparing trees found using PEST and/or NONA at an earlier date. Options under this tab will only be available when Tree search in NONA (under the PEST tab window) is unchecked and Assess congruence (under the PEST tab window) is checked. A reference tree must also be selected under the Pest tab window.
Comparison trees. Use the browse button to select the folder containing the reference trees.
Filter trees. This is the file extension used to identify tree files. PEST will compare all files with this extension found in the selected folder to the reference tree selected in the Pest tab window.
Tree format. Select xread or NEXUS file format. Both the reference and comparison trees must be in the same file format.

Typical Use of PEST.
Under the PEST tab, the following options should be selected: Tree search in NONA, Assess congruence, Reference tree: use default reference tree from NONA search. Under Output data, select Matches. Choose whether to also include Conflicts and/or Unresolved. Enter the name and directory path for the output file. The output file will be a tab delimied text file that can be imported into Excel.
Under the Tree search tab, use the Path to NONA browse button to indicate the path to NONA. Use the Path to matrix browse button to indicate the path to the xread data matrix that is being analyzed. Select the number of replicate matrices to be created for each level of character removal. Select the interval between batches. Select the number of trees to hold for each MULT* search. Select the number of MULT* search replicates for each rarefied matrix. Choose whether to use MAX* branch swapping. Choose whether to use the amb= or amb- criterion for dealing with ambiguous character support for nodes. Select a folder where output trees will be saved. Indicate the file name prefix for tree files. Choose whether to generate a batch file containing the commands that will be sent to NONA.
Return to the PEST tab and press the Run button.

When the analysis is complete, open the output file in Excel. The content of this file will depend on the Output data selections made (matches, conflicts, unresolved). If matches were selected, the output file will contain congruence, which is the average raw number of matching clades for each percent probability character removal, and scaled congruence score, which is the congruence divided by the number of nodes in the reference tree. If conflicts were selected, the output file will contain conflicts, which is the average raw number of conflicting clades for each percent probability character removal, and scaled conflicts score, which is the conflicts divided by the number of nodes in the reference tree. If unresolved was selected, the output file will contain unresolved, which is the average raw number of unresolved clades for each percent probability character removal, and scaled unresolved score, which is the unresolved divided by the number of nodes in the reference tree. All of these results are given for each value of percent probability character removal (remember, the higher the number, the more data is removed). Create an XY plot in Excel with percent probability character removal as the X-axis, and the scaled results (e.g., scaled congruence score) as the Y-axis.

Literature Cited

Goloboff, P. 1993. NONA (a bastard son of Pee-Wee). Version. 2.0. Program and documentation. Available from http://www.cladistics.com. Published by the author, Tucumán, Argentina

Nixon, K. C. 2002. WinClada. Version 1.00.08. Program and documentation. Available from http://www.cladistics.com. Published by the author, Ithaca, NY, USA.

Miller, J. A. 2003. Assessing progress in systematics with continuous jackknife function analysis. Systematic Biology, 52:55-65.

Zujko-Miller, C. AND J. A. Miller. 2002, 2003, 2004. PEST. Version 1.0, 2.0, 2.1, 2.2, and 3.0. Program and documentation. Available online at http://www.gwu.edu/~spiders/pest.htm. Published by the authors, Washington, D.C., USA.

Back to Araneoid Spider Systematics