STEP-BY-STEP LINKAGE PROGRAMS GUIDE FOR HUMGEN USERS
LINKAGE HOME FORMATTING PRIMARY DATA UNIX EDITING TIPS ERROR CHECKING FINAL DATA FORMATTING PARAMETER FILE PREPARATION LINKAGE PROGRAMS RELATED LINKS

The following instructions are directed toward humgen users in the Wash. Univ . Division of Human Genetics.

FORMATTING DATA FOR LINKAGE PROGRAMS

The major linkage programs (for disease gene mapping) require two input files, often referred to as datain.dat and pedin.dat. The pedin.dat file contains pedigree information and marker genotypes. The datain.dat is a formatted parameter file describing the disease model, the markers and their allele frequencies, and the marker map. The map and marker information in the datain.dat file must apply to specific data represented in the pedin.dat file. The primary data must be free of meiotic errors, alleles recoded, and reformatted exactly as each analysis program requires.
There are many steps required to format data. Some of the formatting programs have interactive interfaces and require some preparation before they are run.

General formatting steps are:

First prepare a Genotyper table (ABI software) containing the primary data and transfer the file (GenotyperTable) to your Unix account.

In your Unix account:
  1. Prepare a pedigreeFile using a text editor (or use one that is already prepared, e.g., with a Filemaker database). These will be located on humgen under /usr/local/pedigree.

  2. Run the gtyper2.pl program using pedigreeFile and GenotyperTable files as input.
    (outputs *.adb, *.pre, *.name files)

  3. Perform error-checking.
    Start with running the appropriate program to check for meiotic inheritance errors:

    • Run adbcheck.pl on the .adb file (for sib-pair data only).
    • Run pedmanager's newprep program on the .pre file (for sib-pairs and large pedigrees).
    • Use program output to examine Genotyper calls (on Macintosh) and correct errors in table.
    • Repeat process until no errors are left, or you're sure of calls (remaining meiotic errors can be zeroed out before running linkage analyses).

  4. If you need to combine several tables for your analyses, you may cat (concatenate) files and then run gtyper2.pl. Or you can merge and unmerge .adb files using the utility programs adbmerge.pl and adbunmerge.pl, then output the desired .pre file using another program, adb2pre.pl.

  5. Note the marker order in the .adb (and the corresponding *.pre) file. Adb files store marker genotypes in alpha-numeric order. Determine the actual map order and intermarker recombination fractions for the markers (usually using the Marshfield comprehensive maps and/or physical map information). You will use this information when creating two files: linkage.ped (precursor to the pedin.dat) and datain.dat.

  6. Run the corrected .pre file through newprep again, with these differences:

    • Issue the command: zero errors on before loading the .pre file.
    • Direct newprep to create linkage.ped file(s) with the command: linkage pedigrees. A linkage.ped file is a linkage.pre file with a liability class column; when using the parameter 'zero errors on' the resulting file will have meiotic errors zeroed.
    • Direct newprep to use 'all' pedigrees (or choose a subset of pedigrees from the list given).
    • Finally, specify the markers in the order you wish them to be in the resulting linkage.ped file. You may repeat the linkage pedigrees command to produce additional .ped files from the same input, if desired.

  7. Run makeped with linkage.ped file as input (command structure: makeped linkage.ped linkage.mk).

  8. Run recode with the makeped output (linkage.mk). Recode interactively asks for information about the input, then produces three files: the recoded linkage.rec, and key.dat and recode.dat. The recode.dat file is used in constructing the datain.dat file.

  9. Prepare the datain.dat file in a text editor, using information from recode.dat along with disease model and map information.

  10. The linkage.rec file can be used in LINKAGE, but needs to be processed further for GENEHUNTER and ANALYZE. The final processing removes of some of the extra columns makeped inserted. For GENEHUNTER, the linkage.rec file is processed with an awk script called mk2gh. For ANALYZE, the linkage.rec file is processed with an awk script called awk11 (removes 11th column in file). ANALYZE requires the input files be called datain.dat and pedin.dat.


Detailed explanations, starting with the primary data.
HOME

last updated Feb3, 2001