Documentation for haploPowerCalc Version 1.0 Beta
Last modified: 2007/07/25

Contents

1. Description of Features
Introduction
Haplotype estimation
Flags
The Output report

2. Setup and Operating Instructions
Input file format
Running haploPowerCalc

3. More Information
Whom to contact with questions and problems

1. Description of Features

Introduction

HaploPowerCalc is a tool for estimating power to detect
disease association by a set of markers (e.g. a tag SNP panel or SNPs on an
array), at any user-specified polymorphic site(s), under arbitrary disease
model and sample sizes. It is intended for users who wish to estimate the power
(or sample sizes required to obtain adequate power) in their association
study. HaploPowerCalc uses an approach based on haplotype-sampling. 
In brief, it works as follows:

It first determines a region-wide (or chromosome-wide)
significance threshold on the value of the test statistic by randomly sampling
of haplotypes (see below for more on haplotype-estimation) to create
control-control (null) panels. Results obtained using these null panels will be
comparable to those by permutation tests. We prefer this approach because it is
computationally less demanding. Based on the disease prevalence, relative risks
and allele frequency at every causal SNP specified by the user, it then
computes the genotype frequencies in cases and controls. The genotype
frequencies in controls are assumed to be the same as in the input data, while
those in cases are computed by numerically solving a system of equations they induce.
Conditional on the genotype, it then samples haplotype to create a case-control
panel. User can specify the number of cases and controls. It simulates a
user-specified number of such panels and computes power as the proportion of
panels in which the computed test-statistic exceeds the region-wide threshold.
User can perform either the standard Chi-square test or the Cochram-Armitage
Trend Test.

Haplotype estimation

HaploPowerCalc uses as input the estimated haplotype
frequencies, in a format similar to that of PHASE v2 output (_freq file). User
can however, estimate haplotype using any preferred method.

Flags

-a [value]  number of cases (affected) individuals. Default 1000.
-d [value]  disease prevalence. Default value 0.01
-f [file ]  PHASE output _freq file
-h           display this help
-i [file ]  PHASE input file .inp standard format
-N [value]  number of case-control panels Default 100
-r [file ]  file containing rs numbers (or any unique label) of target and tagging SNPs
-S          random number seed
-s          signifance level. Default 0.05
-t          association test e.g. ca (Cochram-Armitage Trend Test). Default chi-square if no -t flag used
-u [value]  number of controls (unaffected). Default 1000.
-x [value]  heterozygous relative risk Pr(disease|Aa)/ Pr(disease|aa)
-y [value]  homozygous relative risk Pr(disease|AA)/ Pr(disease|aa)

The output report
Results are written to the standard output. First few lines
starting with # report the values of the parameters used. These lines are
followed by a table with one SNP (from the target SNPs; see below) per row. The
columns are as follows:
SNP location
risk allele frequency
label1 (name of the SNP set)
power to detect the SNP using the SNP-set label1
label2
power to detect the SNP using the SNP-set label2
.
.
labelN
power to detect the SNP using the SNP-set labelN

2. Setup and Operating Instructions
Input file format:
See the included example files:
a4galt.ceu.hapHyb.inp (PHASE input file)
a4galt.ceu.phase.out_freqs (PHASE output freq file)
a4galt.snplabels

the rsfile (a4galt.snplabels) contain 3 columns. 
rs Number (or any unique label)
SNP location
SNP label

The SNP label: "target" is reserved for the SNPs for which the power is to be computed.
Remaining SNP labels correspond to SNP sets (e.g. in the included file, ilmn corresponds 
to Illumina 650K). NB if a SNP has multiple labels (i.e. SNP is common to multiple sets) a 
new line is required for every label.

Running HaploPowerCalc
Currently we are only distributing linux binaries (executables) of the program.
An example of the command line for running the program is in the included shell script com.sh

Whom to contact with questions and problems

If you have questions regarding PHASE or you need to obtain a version of PHASE, please 
see the web site at: http://stephenslab.uchicago.edu/

If you have questions or problems with haploPowerCalc,
please
1. read this documentation carefully;
2. visit web site: http://pga.gs.washington.edu