| STEP-BY-STEP LINKAGE PROGRAMS GUIDE FOR HUMGEN USERS | |||||||
| LINKAGE HOME | FORMATTING PRIMARY DATA | UNIX EDITING TIPS | ERROR CHECKING | FINAL DATA FORMATTING | PARAMETER FILE PREPARATION | LINKAGE PROGRAMS | RELATED LINKS |
FIRST, we must make sure that the information you want analyzed is in the proper format.Our linkage studies primarily concern mapping human disease genes. Most computer programs for disease gene mapping use input files in the pre-LINKAGE input format we call ".pre" format. To get our genotype data into ".pre" format takes several steps.
After the DNA samples are PCR-amplified with microsatellite markers and run on ABI sequencing machines, genotypes are determined using an ABI program, Genotyper (on the Macintosh). The Genotyper table, which has one line per genotype, is transferred to the Sun.
The items in the ABI Genotyper table have been standardized for our lab. There is a header followed by lines of genotype data, e.g.:
File Name Lane Dye Category Label(s) Overflow 1331-01 14 B D11S1385 205 209 1331-02 15 B D11S1385 209 209 1331-01 14 B D11S903 99 103 1331-02 15 B D11S903 99 99The important factors for the format of this Genotyper table are:
This table, together with another file describing the pedigree relationships and affection status for individuals in the families (a pedigree file, described below), is used as input to the data formatting program, gtyper2.pl.
- The data items are separated by tabs or spaces
- None of the 'items' have spaces in them
- The first item of the data rows is always the family-individual. The family and individual are hyphenated (e.g., 1331-01 is individual 01 from family 1331).
- The second and third data items (Lane Dye) are present
- The fourth data item (Category) is the markername
- The fifth and sixth data items (Labels) are the alleles, which if present, must be integers.
- The marker has also be used with control DNA of known genotype, to which to all genotypes for that marker have been standardized.
You can use a text-editor to prepare any of the input files. However, when you have many families and/or individuals and marker genotypes, it is easier to use computer programs to format the data for you, and the result won't include transcription errors.
PEDIGREE FILE
This is the file that contains the familial information of the pedigree(s) you are analyzing. Each line represents one individual, and has the same number of items (5 or 6), separated by spaces:
- Line Item 1 - Hyphenated representation of the family and individual (e.g. CA-100 where CA is the family ID and 100 is the individual ID number). Individual IDs must be integers. Family IDs are usually integers too (If not, they will be changed to integers later by the program MAKEPED.) Each family-individual item must be unique in the pedigree file
- Line Item 2 - The ID number of the individual's father
- Line Item 3 - The ID number of the individual's mother
- Line Item 4 - The individual's sex, where 1 = male and 2 = female (0= unknown)
- Line Item 5 - The disease affected status, where 1 = unaffected, 2 = affected, and 0 = unknown
- Line Item 6 (OPTIONAL) - liability code number usually age-related and specific to the disease in the study
In the example shown to the right, the file name is pCA. The family ID is CA. There is no liability column.
CA-027   0   0   1   2
CA-031   0   0   1   1
CA-032   0   0   2   1
CA-104   0   0   2   1
CA-101   027   104   1   1
CA-102   027   104   1   1
CA-103   031   032   2   2
CA-105   031   030   2   2
CA-100   027   104   2   1
CA-030   0   0   2   2The pedigree file is one of the input files to gtyper2.pl, which creates the .adb and the .pre file. In general, no ABI control lanes are included in the pedigree file because gtyper2.pl will use each line of the pedigree file in the output.adb file. The allele information comes from the Genotyper table.
Gtyper2.pl is a utility program that, given a file with pedigree relationships, converts data from ABI-Genotyper Tables into the pre-Linkage program format. Gtyper2.pl produces the .pre (pre-Linkage) file and a corresponding .name file. The .pre file has relationship information for individual family members and their genotypes, but no marker names are associated with the genotypes (!). The .name file has the marker names. To keep the genotype data associated with marker names all in one file, we developed the gtyper2.pl program to put family/relationship, marker, and genotype data into one additional file. We call this file an ".adb" file (Allele DataBase). The .adb files can then be manipulated to produce the expected input files for the linkage programs. .Adb files can be merged and unmerged using adbmerge.pl and adbunmerge.pl. At any time a .adb file can be converted to the pre-LINKAGE (.pre) file format with the conversion program adb2pre.pl.
user@humgen% gtyper2.pl pCA tab1
RUNNING GTYPER2.PL
You need to have two files ready before running gtyper2.pl : the pedigree file and the genotyper table file, described above.
Gtyper2.pl takes two arguements: the name of the pedigree file and the name of the ABI Genotyper file. The program automatically creates three output files with their name prefix derived from the input file names. The file name extensions reflect the type of file they are ('.adb', '.pre' or '.name'). You can change the naming of the output using options.
You must provide a pedigree file and an allele file.
Usage: gtyper2.pl [options] ped_file allele_file
Options:
-h --help Prints this message.
-v --verbose Lots of logging.
-a --adb FILE Prints the resulting adb file to FILE.
-p --pre FILE Prints the resulting pre file to FILE.
-n --name FILE Prints the resulting name file to FILE.
-l --log FILE Logs to file.
OUTPUT FILE FORMATS
The .adb file has two components: the ordered marker list (one per line), and a sorted list of individuals with their specific data. Each individual is listed with line items separated by spaces: (familyID-individualID, fatherID, motherID, sex, affected status, followed by allele pairs corresponding to the ordered marker list). The two differences between the second component of .adb files and .pre files are the heading (PEDIGREE-ALLELE-DATA) and hyphenation of familyID-individualID in the .adb file (the hyphen is replaced by a space in .pre.
.adb File (alleles data base)
Lists the marker names, family relationships, and genotypes. This file may be used to merge and unmerge sets of markers for the same individuals with the programs adbmerge.pl and adbunmerge.pl. The program adb2pre.pl produces .pre and .name files from .adb file input.
FILE NAME: pCAtab1.adb
MARKER LIST:
D10S677
D1S1679
PEDIGREE-ALLELE-DATA:
CA-027   0   0   1   2   0   0   0   0
CA-030   0   0   2   2   0   0   0   0
CA-031   0   0   1   1   0   0   0   0
CA-032   0   0   2   1   0   0   0   0
CA-100   027   104   2   1   213   221   160   168
CA-101   027   104   1   1   197   213   0   0
CA-102   027   104   1   1   213   213   160   172
CA-103   031   032   2   2   201   201   156   164
CA-104   0   0   2   1   0   0   0   0
CA-105   031   030   2   2   0   0   156   156
.pre File (in 'pre-MAKEPED', or 'pre-LINKAGE' format)
Lists family relationships and assigned alleles. This file is an initial input file to the LINKAGE calculation programs.
The file contains the following information:
CA 027   0   0   1   2   0   0   0   0
CA 030   0   0   2   2   0   0   0   0
CA 031   0   0   1   1   0   0   0   0
CA 032   0   0   2   1   0   0   0   0
CA 100   027   104   2   1   213   221   160   168
CA 101   027   104   1   1   197   213   0   0
CA 102   027   104   1   1   213   213   160   172
CA 103   031   032   2   2   201   201   156   164
CA 104   0   0   2   1   0   0   0   0
CA 105   031   030   2   2   0   0   156   156
.name File
An ordered marker list for allele data in the .pre file.
MARKER-LIST for pCAtab1.pre
D10S677
D1S1679
TROUBLESHOOTING
HOME
Last updated Feb. 3, 2001