STEP-BY-STEP LINKAGE PROGRAMS GUIDE FOR HUMGEN USERS
LINKAGE HOME FORMATTING PRIMARY DATA UNIX EDITING TIPS ERROR CHECKING FINAL DATA FORMATTING PARAMETER FILE PREPARATION LINKAGE PROGRAMS RELATED LINKS

  • General Formatting Steps

    DETAILS IN FORMATTING DATA FOR LINKAGE PROGRAMS


    PRIMARY DATA

    FIRST, we must make sure that the information you want analyzed is in the proper format.

    Our linkage studies primarily concern mapping human disease genes. Most computer programs for disease gene mapping use input files in the pre-LINKAGE input format we call ".pre" format. To get our genotype data into ".pre" format takes several steps.

    After the DNA samples are PCR-amplified with microsatellite markers and run on ABI sequencing machines, genotypes are determined using an ABI program, Genotyper (on the Macintosh). The Genotyper table, which has one line per genotype, is transferred to the Sun.

    The items in the ABI Genotyper table have been standardized for our lab. There is a header followed by lines of genotype data, e.g.:

     
    File Name Lane Dye     Category     Label(s)      Overflow
    1331-01   14    B       D11S1385     205    209 
    1331-02   15    B       D11S1385     209    209 
    1331-01   14    B       D11S903       99    103
    1331-02   15    B       D11S903       99     99
    

    The important factors for the format of this Genotyper table are:

    • The data items are separated by tabs or spaces
    • None of the 'items' have spaces in them
    • The first item of the data rows is always the family-individual. The family and individual are hyphenated (e.g., 1331-01 is individual 01 from family 1331).
    • The second and third data items (Lane Dye) are present
    • The fourth data item (Category) is the markername
    • The fifth and sixth data items (Labels) are the alleles, which if present, must be integers.
    • The marker has also be used with control DNA of known genotype, to which to all genotypes for that marker have been standardized.
    This table, together with another file describing the pedigree relationships and affection status for individuals in the families (a pedigree file, described below), is used as input to the data formatting program, gtyper2.pl.

    You can use a text-editor to prepare any of the input files. However, when you have many families and/or individuals and marker genotypes, it is easier to use computer programs to format the data for you, and the result won't include transcription errors.


    THE PEDIGREE FILE

    PEDIGREE FILE
    This is the file that contains the familial information of the pedigree(s) you are analyzing. Each line represents one individual, and has the same number of items (5 or 6), separated by spaces:

    • Line Item 1 - Hyphenated representation of the family and individual (e.g. CA-100 where CA is the family ID and 100 is the individual ID number). Individual IDs must be integers. Family IDs are usually integers too (If not, they will be changed to integers later by the program MAKEPED.) Each family-individual item must be unique in the pedigree file
    • Line Item 2 - The ID number of the individual's father
    • Line Item 3 - The ID number of the individual's mother
    • Line Item 4 - The individual's sex, where 1 = male and 2 = female (0= unknown)
    • Line Item 5 - The disease affected status, where 1 = unaffected, 2 = affected, and 0 = unknown
    • Line Item 6 (OPTIONAL) - liability code number usually age-related and specific to the disease in the study

    In the example shown to the right, the file name is pCA. The family ID is CA. There is no liability column.

    CA-027   0   0   1   2
    CA-031   0   0   1   1
    CA-032   0   0   2   1
    CA-104   0   0   2   1
    CA-101   027   104   1   1
    CA-102   027   104   1   1
    CA-103   031   032   2   2
    CA-105   031   030   2   2
    CA-100   027   104   2   1
    CA-030   0   0   2   2

    The pedigree file is one of the input files to gtyper2.pl, which creates the .adb and the .pre file. In general, no ABI control lanes are included in the pedigree file because gtyper2.pl will use each line of the pedigree file in the output.adb file. The allele information comes from the Genotyper table.


    USING GTYPER2.PL

    Gtyper2.pl is a utility program that, given a file with pedigree relationships, converts data from ABI-Genotyper Tables into the pre-Linkage program format. Gtyper2.pl produces the .pre (pre-Linkage) file and a corresponding .name file. The .pre file has relationship information for individual family members and their genotypes, but no marker names are associated with the genotypes (!). The .name file has the marker names. To keep the genotype data associated with marker names all in one file, we developed the gtyper2.pl program to put family/relationship, marker, and genotype data into one additional file. We call this file an ".adb" file (Allele DataBase). The .adb files can then be manipulated to produce the expected input files for the linkage programs.
    .Adb files can be merged and unmerged using adbmerge.pl and adbunmerge.pl. At any time a .adb file can be converted to the pre-LINKAGE (.pre) file format with the conversion program adb2pre.pl.


    RUNNING GTYPER2.PL

    You need to have two files ready before running gtyper2.pl : the pedigree file and the genotyper table file, described above.

    Gtyper2.pl takes two arguements: the name of the pedigree file and the name of the ABI Genotyper file. The program automatically creates three output files with their name prefix derived from the input file names. The file name extensions reflect the type of file they are ('.adb', '.pre' or '.name'). You can change the naming of the output using options.

    user@humgen% gtyper2.pl pCA tab1

  • If you do not include the input files when you invoke gtyper2.pl, you will see the help message which lists the available options
  • 
    You must provide a pedigree file and an allele file.
    Usage: gtyper2.pl [options] ped_file allele_file
    
    Options: 
      -h --help      Prints this message.
      -v --verbose   Lots of logging.
      -a --adb FILE  Prints the resulting adb file to FILE.
      -p --pre FILE  Prints the resulting pre file to FILE.
      -n --name FILE Prints the resulting name file to FILE.
      -l --log FILE  Logs to file.
     


    OUTPUT FILE FORMATS

    The .adb file has two components: the ordered marker list (one per line), and a sorted list of individuals with their specific data. Each individual is listed with line items separated by spaces: (familyID-individualID, fatherID, motherID, sex, affected status, followed by allele pairs corresponding to the ordered marker list). The two differences between the second component of .adb files and .pre files are the heading (PEDIGREE-ALLELE-DATA) and hyphenation of familyID-individualID in the .adb file (the hyphen is replaced by a space in .pre.

    .adb File (alleles data base)
    Lists the marker names, family relationships, and genotypes. This file may be used to merge and unmerge sets of markers for the same individuals with the programs adbmerge.pl and adbunmerge.pl. The program adb2pre.pl produces .pre and .name files from .adb file input.
    FILE NAME: pCAtab1.adb
    MARKER LIST:
    D10S677
    D1S1679
    PEDIGREE-ALLELE-DATA:
    CA-027   0   0   1   2   0   0   0   0
    CA-030   0   0   2   2   0   0   0   0
    CA-031   0   0   1   1   0   0   0   0
    CA-032   0   0   2   1   0   0   0   0
    CA-100   027   104   2   1   213   221   160   168
    CA-101   027   104   1   1   197   213   0   0
    CA-102   027   104   1   1   213   213   160   172
    CA-103   031   032   2   2   201   201   156   164
    CA-104   0   0   2   1   0   0   0   0
    CA-105   031   030   2   2   0   0   156   156
    .pre File (in 'pre-MAKEPED', or 'pre-LINKAGE' format)
    Lists family relationships and assigned alleles. This file is an initial input file to the LINKAGE calculation programs.
    The file contains the following information:
    • Column 1: pedigree number
    • Column 2: individual ID number
    • Column 3: father's ID number
    • Column 4: mother's ID number
    • Column 5: sex (1=male, 2=female)
    • Column 6: affected status (1=unaffected, 2=affected)
    • Subsequent Columns: alleles for each marker, ordered as they appear in the .name file
    CA 027   0   0   1   2   0   0   0   0
    CA 030   0   0   2   2   0   0   0   0
    CA 031   0   0   1   1   0   0   0   0
    CA 032   0   0   2   1   0   0   0   0
    CA 100   027   104   2   1   213   221   160   168
    CA 101   027   104   1   1   197   213   0   0
    CA 102   027   104   1   1   213   213   160   172
    CA 103   031   032   2   2   201   201   156   164
    CA 104   0   0   2   1   0   0   0   0
    CA 105   031   030   2   2   0   0   156   156
    .name File
    An ordered marker list for allele data in the .pre file.
    MARKER-LIST for pCAtab1.pre
    D10S677
    D1S1679


    TROUBLESHOOTING


    Next Step: Error Checking
    HOME


    Last updated Feb. 3, 2001