STEP-BY-STEP LINKAGE PROGRAMS GUIDE FOR HUMGEN USERS
LINKAGE HOME FORMATTING PRIMARY DATA UNIX EDITING TIPS ERROR CHECKING FINAL DATA FORMATTING PARAMETER FILE PREPARATION LINKAGE PROGRAMS RELATED LINKS

ERROR CHECKING GENOTYPE DATA FILES

The following instructions are directed toward humgen users in the Wash. Univ . Division of Human Genetics.

Because most linkage analysis programs will not be able to calculate a lodscore for markers containing non-Mendelian inheritances (lodscore will be zero for that marker or, in the worst case, the program may crash) all genotype data must undergo meiotic error checking before linkage analyses. We use two programs for checking errors: adbcheck.pl and newprep.

Adbcheck.pl was written here by Kevin Krouse for a quick check of *.adb files. The program works very well for checking marker data from simple family structures, such as sibpairs, and two parents with offspring. Adbcheck.pl produces easy to read output.

Newprep is part of Mapmaker/Pedmanager, written by Eric Lander's group at MIT. The authors indicate it is unfinished and under development (as of June 1997), but have made no update available (at their FTP site). Newprep takes the *.pre file as input. It has a more extensive and reliable error-checking algorithm, but is slow and produces output that is harder to interpret compared to adbcheck.pl. However, newprep is the program to use with larger family structures.

These step-by-step instructions will guide you through the programs. On the left side of each table are instructions and on the right is what appears on the user's screen. User input on the right side is in boldface text.


ADBCHECK.PL

ADBCHECK.PL INPUT

Input to the adbcheck.pl program is the .adb file (.adb files are created by the gtyper2.pl program. Invoke adbcheck.pl with the command adbcheck.pl. If no arguements are given to the command, the usage/help appears, as in the example to the right. Note that STDOUT means standard output (which prints to the screen).

user@humgen%adbcheck.pl
Usage: adbcheck.pl [-h] [-s] [-l LOGFILE] [-o OUTFILE] file.adb

Options:
  -h --help             prints this message.
  -s --no-summary       suppresses printing of the summary.
  -l --log, LOGFILE     prints warnings and messages to LOGFILE.
  -o --out, OUTFILE     errors are zeroed and printed to OUTFILE.

Note:
  If the -o flag is not used, the default is to create
  a file called "file.adb.fix" with errors zeroed.

  If the -l flag is not used, the default is to print
  all warnings and messages to STDOUT.
Normal use of the program is to invoke adbcheck.pl with an .adb file and to redirect the output to a new file. In the example to the right, the .adb file is called pBjd19.T.adb and the output file (with error list) is called err-test. The file pBjd19.T.adb would have been produced by the gtyper2.pl program from an ABI Genotyper table called jd19.T and a pedigree file called pB. After running adbcheck.pl, the user looks at the file using the less command.
user@humgen%adbcheck.pl pBjd19.T.adb > err-test 
user@humgen% less err-test 
Entry has non-Mendelian inheritance for marker D19S432.
    child:   14-11: 184, 200
    father:  14-13: 180, 184
    mother:   14-7: 184, 188
Setting 14-11's alleles for D19S432 to 0.

=== SUMMARY ===
Input file: pBjd19.T.adb
Output file: pBjd19.T.adb.fix
Log file: STDOUT
Format Error Count: 0
Mendelian Error Count: 1
Use the output to check the original Genotyper table for errors, correct any that can be corrected, and repeat the process until all errors are corrected. If you wish to use the 'corrected' data for linkage analyses directly, without going through newprep (below), you may use the *.adb.fix file to create a *.pre linkage file as input to the makeped program by using the adb2pre.pl program (example at right).
user@humgen%adbcheck.pl pBjd19.T.adb > err-test 
user@humgen%adb2pre.pl pBjd19.T.adb.fix 

 files created for pBjd19.T.adb.fix.pre, and pBjd19.T.adb.fix.name


NEWPREP

Newprep provides a means to identify meiotic error(s) in the *.pre file and produce output with the erronous genotypes zeroed. It also allows us to ouput marker genotypes in whatever marker order is desired (described later, under 'Final Data Formatting'). Two of the functions, for error checking and zeroing of errors in files, work well and we use them routinely.

The help documentation for newprep is available at /usr/local/pedmanager/pedprep.help. It gives details of the extensive error-checking newprep does. To read the help, type at your prompt:

less /usr/local/pedmanager/pedprep.help

Note that their documentation states that the Unix command 'pedmngr' is used to invoke the program, but our version of the compiled program is actually invoked using the command newprep (hence, why we call it 'newprep').

Newprep processes data from a '.pre' file (produced by either hand or through gtyper2.pl). The user interface is interactive, but for a large number of files, the newprep commands may be entered into a file or set of files and run from the Unix command line. Instructions to run newprep either way follow.


RUNNING NEWPREP INTERACTIVELY

These step-by-step instructions will guide you through the program. On the left side of the table are instructions and on the right is what appears on the user's screen. User input on the right side is in bold text.

1. At the user prompt, type newprep and press (RETURN).
user@humgen%   newprep
2. A PEDMANAGER title screen should appear. Wait for the prompt before typing a command (it may take a few seconds). At the pedmngr:1> prompt, type photo followed by the name of the text file in which you wish to save the output. Press (RETURN).

If you do not specify an extension (e.g.,.txt), the program will give your file name a .out extension automatically. In this example, the photo file will be err-pedigree1.pre and the input will be the file pedigree1.pre. To enter your *.pre file as input, type load followed by the name of your *.pre file. Press (RETURN).


*****************************************************************
*                                                               *
*     PEDMANAGER - check, preprocess, and draw pedigrees        *
*                       (version 0.9)                           *
*                                                               *
*****************************************************************

Type 'help' or '?' for help.

pedmngr:1> photo err-pedigree1.pre

'photo' is on: file is 'err-pedigree1.pre'

pedmngr:2> load pedigree1.pre
If there are any non-Mendelian inheritances, each one will be listed in the format shown to the right.

NOTE: The output shown on the right is taken from an actual analysis and does not correspond to the sample data pedigree1.pre


=============================================================
Non-Mendelian inheritance in pedigree CA at marker 3
can't assign child 129's alleles

parents' IDs: 108 & 018
parents' genotypes:         230 230, and   0   0
parents' forced/(assigned): 230 230, and   0   0

child  024's alleles:   0   0, (forced alleles:   0   0)
child  129's alleles: 234 242
=============================================================
At the end of the list of errors, there will be a summary showing the total number of errors (non-Mendelian inheritances) and pedigree statistics.

NOTE: The output shown on the right is taken from an actual analysis and does not correspond to the sample data pedigree1.pre


1 pedigree checked -> 27 Mendelian-inheritance errors


             ==========PEDIGREE STATS==========

1 pedigree loaded  --> genotype data for 7 markers

Average pedigree size: 71.0 (71 individuals total)
        Smallest pedigree 71 individuals (CA pedigree)
        Largest pedigree 71 individuals (CA pedigree)

Number of individuals missing genotypes at all markers: 40 (56.3%)
   for the remaining individuals 4.6% of the genotypes are missing
             ==================================


pedmngr:3>
If there are no Mendelian inheritance errors, the output should look something like what is shown on the right.

NOTE: The output shown on the right is taken from an actual analysis and does not correspond to the sample data pedigree1.pre


no errors -> 1 pedigree checked and loaded


             ==========PEDIGREE STATS==========

1 pedigree loaded  --> genotype data for 4 markers

Average pedigree size: 71.0 (71 individuals total)
        Smallest pedigree 71 individuals (CA pedigree)
        Largest pedigree 71 individuals (CA pedigree)

Number of individuals missing genotypes at all markers: 57 (80.3%)
   for the remaining individuals 75.0% of the genotypes are missing
             ==================================


pedmngr:3> 
3. You may exit the program now by typing q. Press (RETURN). The text file with the contents of the error report should show up in the user's directory. Examine the file with the command: less
pedmngr:3> q
user@humgen% less err-pedigree1.pre
At this point, the contents of the error report do not specify what the marker names are, since the program works on *.pre data only (which include no marker names). The report lists marker by its position in the .pre file, along with family and individual ID information.

4. To attach the marker name file to the output, at the user prompt, issue a cat command to concatenate the corresponding *.name file (listing marker names) and the error report file.

Here the user creates a new file, eped1, which is the concatenation of the marker names file (pedigree1.name) and the error report file (err-pedigree1.pre). When eped1 is printed, the marker names appear immediately before the error report.


user@humgen% cat pedigree1.name err-pedigree1.pre > eped1
user@humgen%lpr eped1


ZEROING ERRORS AND CREATING THE LINKAGE.PED FILE

--under construction--

Run the corrected .pre file through newprep again, with these differences:

Next Step: MAKEPED


PROCESSING MULTIPLE FILES THROUGH NEWPREP

*.pre files may be processed directly from the Unix command line instead of through the newprep user interface, which can be a significant time-saver when the number of files to be processed at one time increases dramatically. You will need to prepare additional files to run newprep. These are described below.

CREATE A NEWPREP RUN-FILE

In the directory where your gtyper2.pl *.pre files are to be processed, prepare (or put) a newprep run-file(s).

  • Each line of a run-file is a command you would normally type when using the newprep user interface.
  • The commands should be in the same sequence as with the user interface.
  • Save each run-file as with its own filename, adding the extension '.in'. (The program requires that whatever you name your run-file, it must end with the extension '.in'.)
The contents of example file run1.in (for gel 1) are at right. In this example, the gtyper2.pl *.pre file that is the input to newprep is called gtyper1.pre and we are naming the output file err-gtyper1.pre. Prepare a similar run-file for each *.pre file you will be error-checking. In this example, we will not be processing the file(s) for linkage analysis until after the error-checking process is complete, so there is no need to zero errors. You may make more complicated run-files for zeroing errors and creating the linkage.ped files by following the above guidelines.

photo err-gtyper1.pre
load gtyper1.pre
q

To call newprep to run this example file, the user would type at the prompt:
newprep < run1.in

The '<' directs the program to take its input from the file 'filename.in'. Output from PEDMANAGER will be sent to the screen AND written to whatever photo file specified in your run file (in example above, file err-gtyper1.pre.

user@humgen% newprep < run1.in 
RUNNING MORE THAN ONE RUN-FILE AT A TIME

We often have files for multiple gels to process at the same time, but we wish to keep the output separate for error checking purposes. To run multiple run-files through newprep requires some additional Unix manipulations. Create another file listing the Unix commands to run all the run-files you have created (one per line), and change that file's permissions mode to executable. Then issuing the name of that file at the prompt will cause the system to execute it, and newprep will be called multiple times with input you specify.

Here are the contents of an example file, 10runs, which will call newprep 10 times with a different run-file each time:



newprep < run1.in 
newprep < run2.in 
newprep < run3.in 
newprep < run4.in 
newprep < run5.in 
newprep < run6.in 
newprep < run7.in 
newprep < run8.in 
newprep < run9.in 
newprep < run10.in 
After the file 10runs is created, the next step is to make the file executable. To do this, type at your prompt: chmod u+x filename. The chmod command changes your file mode from read and write-only (the default) to: read, write, AND execute. See man chmod for more information about mode changes. This example file is then executed with the command: 10runs.
user@humgen%chmod u+x 10runs 
user@humgen%10runs
10runs has multiple calls to newprep and uses a different run-file with each call (and though not shown here, the photo command for each file is set to write output to a different file each time. The 10 error reports are produced with the single command.)

To see that the error reports have been created, give the command to list all filenames starting with 'err': ls err*

user@humgen%ls err*
err-gtyper1.pre
err-gtyper2.pre
err-gtyper3.pre
err-gtyper4.pre
err-gtyper5.pre
err-gtyper6.pre
err-gtyper7.pre
err-gtyper8.pre
err-gtyper9.pre
err-gtyper10.pre
These error reports may be combined into one file using the cat command. In this example, the resulting concatenated file is called allerr, which is then printed to the user's default printer using the lpr command.
user@humgen%cat err* > allerr
user@humgen%lpr allerr


Use the error report to correct the meiotic errors in the GenotyperTable. If a genotype appears correctly called, leave it in the table as a record of your call. The apparent error may be due to mutation at the marker locus or sample mix-up. In either of these cases, the anomolous genotype should be zeroed by newprep prior to running the linkage analyses.

Next Step: MAKEPED
HOME


Last updated Feb. 3, 2001