How do I run PolyPhred on a MacIntosh computer?
PolyPhred, as well as Phred, Phrap and Consed, run on the UNIX or Linux operating system. The Mac OSX operating system provides a feature that allows one to run such programs within a UNIX-like environment. To do this, you must open a "terminal window" using the following steps: open the Applications folder, then the Utilities folder, and click on the Terminal application.
Within the terminal window, you need to enter UNIX commands to set up directories, move files around, and run the programs. If you are unfamiliar with UNIX, here is a brief tutorial to help get started. Then follow the instructions in the PolyPhred documentation under "Installing PolyPhred", and below that, "Running PolyPhred".
Why does PolyPhred miss SNPs in a data set with only a few reads?
Before PolyPhred searches for SNPs, it carries out a processing step. During this phase, PolyPhred calculates at each position in the alignment an average homozygous peak. To do this, it uses all of the sites in each read that appear initially to be homozygous.
During the search phase, PolyPhred compares each site against these average homozygous peaks. Significant differences from the average contribute to a high score for heterozygosity.
If the data set contains only a few reads, then the average homozygous peaks might not represent a true average at some positions. This is particularly true if one is trying to analyze data sets with only one or two reads. In this case, at positions where there are no homozygous sites represented, an average homozygous site will not be calculated, and PolyPhred will fail to find the heterozygotes at that position.
We have not conducted a rigorous study to determine the optimal number of reads to include in a data set. Based on statistics, it can be assumed that the more reads there are (to a point), the better the average. We have found that with a data set containing eight reads, most of the known SNPs were found.
If one wishes to analyze only one or a few traces at a time, one could include in the data set several traces from a source that is known to be homozygous at all positions. These traces should come from independent sequencing runs.
When I turn indel detection on, Consed reports an error and does not run. How do I fix this problem?
When PolyPhred identifies a putative indel site, it inserts an 'indelSite' tag in the ace file and 'indel' tags in some of the phd files. The current version of Consed is not able to interpret these tags without customizing the .consedrc file. Among other things, this file allows for the specification of user-defined tags.
To edit the .consedrc file, one must first locate the file, or if it does not exist, create one. The easiest way to determine if the file exists and where it is located is to ask the person who installed Consed. If this is not possible (or if that person is you), then you will need to locate the file or create it yourself.
First, try locating the file using the following procedures.
1) Type this line:
env | grep CONSEDIf this line is appears:
CONSED_PARAMETERS=[path]where [path] is the directory containing the .consedrc file, then you have located the file. Skip to 'Editing the .consedrc file' below.
2) Look in your home directory. Type:
cd ls -aIf .consedrc is listed among the files in your home directory, you have found it. Skip to 'Editing the .consedrc file' below.
slocate .consedrcIf this works, it will show you where a .consedrc file is located. Skip to 'Editing the .consedrc file' below.
4) Look to see if it is with the Consed executable file. Type:
where consedTypically, the executable is in a directory like:
/usr/local/genome/bin/If this is the case, then type something similar to:
cd /usr/local/genome/ find . -name .consedrc -print
Editing the .consedrc file
Once you have located a .consedrc file, you are ready to edit the file. If you could not locate the file, then you will need to create it. You can create the file in your home directory, which will give access only to you. Or you can create it in a 'global' directory like /usr/local/genome/ so that other Consed users can access it.
To edit the .consedrc file, open it with your favorite text editor. Add the following
consed.customConsensusTag1: indelSite consed.tagColorCustomConsensusTag1: DarkCyan consed.customTag1: indel consed.tagColorCustomTag1: DarkOrangeIf the tags already exist in the file, then change the finally '1' to a different number to make the tags unique.
Finally, if the .consedrc file is located in a directory other than your home directory, you need to add a line to your shell script that tells Consed where to find it. Locate your shell script in your home directory and add one of these lines (where [path] is the complete path to the .consedrc file).
setenv CONSED_PARAMETERS [path]
and add CONSED_PARAMETERS to the 'export' line.
What is the format of the .poly files?
The .poly files are written by the program Phred when the -dd flag is used. These files provide additional information that PolyPhred needs to identify putative SNPs.
The first line of the poly file contains the name of the corresponding trace file, followed by five numbers. The first number is the minimum of the following four numbers. These four numbers are scaling factors for the A trace, C trace, G trace and T trace, respectively.
The remaining lines have information for each called base in the sequence. The fields are: