GRFT – genetic records family tree web applet
- Department of Biology, Stanford University, Stanford, CA, USA
Researchers whose model organism reproduces sexually must keep track of each cross performed in order to trace lineages, determine the genetic make-up of specific individuals, and track the progeny of an individual. The most common choices are Microsoft Excel and general purpose database applications such as MS Access or FileMaker Pro because of wide availability, familiarity of use, and, in the case of Excel, an overlap with the data gathering, and analysis features. Although these general use tools allow searches to find crosses of interest, determining the lineage of an individual requires multiple lookups, patience, and often a sheet of paper to outline a clear graphical pedigree or “family tree.” In these situations, a computer-generated family tree display for an individual would be extremely useful, particularly if combined with the ability to redraw the tree using any individual within the currently displayed tree. Some software packages to display family trees already exist, including Pedigraph, PediTree, and CraneFoot. However, these tools generate static pictures of family trees rather than providing an interactive interface for exploring them and generally require local installation (Van Berloo and Hutten, 2005; Mäkinen, 2006; Garbe and Da, 2008).
Materials and Methods
The data file should be in tab-delimited text format and each row should contain values for the five built-in columns in order (name, male parent, female parent, description, and owner) followed by values for any custom columns. It should not contain column headings. If an individual is missing data for certain columns, it can generally be omitted without problems but may limit how the data can be displayed. For example, the male and female parent names need to be included and match an individual name from a prior generation in order to generate an accurate and complete family tree. The records should be organized sequentially with the oldest individuals at the beginning and the newest at the end so that the list function displays them in the correct order.
Further customization is also possible. In the method getRecordList near the end of the script all the columns in the data file (male parent, female parent, etc.) are defined and named. These built-in columns should not be removed, but a user can add additional lines to define custom columns. A comment in the script at this location gives more detailed instructions.
The input file of genetic records data contains a unique name or identifier for each individual as well as its male and female parents. In addition, it normally includes a short description of the individual and the name of the person who conducted the cross. The tool loads the file into memory on startup and displays a message indicating the number of records read. At this point the user can access the data using any of the three display methods: family tree, list, or search.
Feature: Family Tree Display
The family tree feature requires two parameters – the query individual (i.e., generation 0) and the number of historical generations to include. Although the screen size can be limiting, up to 4 parental generations can normally be displayed with no overlapping branches. A single row at the bottom shows the individuals for which the query individual was the male or female parent (Figure 1).
Figure 1. Family tree display. A family tree of individual YY43 showing four parental generations and one child generation of two offspring. Each box represents an individual used in or resulting from a cross. Individuals used as a maternal parent are colored pink, paternal parents are colored blue, and self-crossed individuals are orange. Individuals without a record in the database are yellow (e.g., an acquired individual). The child generation colors indicate whether individual YY43 was a female parent, male parent, or self-crossed.
Individuals acting as female parents in the pedigree are colored pink, individuals acting as male parents are colored blue, and individuals that have been self-crossed are colored orange. The query individual is white. A yellow individual indicates a parent whose information is not in the data file (e.g., an acquired line).
For the progeny of the query individual, color reflects the parental role of the query individual in the cross. For example, a pink child indicates that the query individual acted as the female parent (see Figure 1).
When the user clicks on an individual in the tree, the tree is immediately redrawn with that individual as the query individual (generation 0). As shown in Figure 2, hovering the mouse pointer over an individual on the tree diagram will open a pop-up window containing more information on the individual, including the description if present.
Figure 2. Detail of pop-up box feature of the family tree display. A pop-up box with a gray background is shown for individual Y51, the maternal parent of YY43, is shown. Mousing over an individual generates a pop-up.
Each individual in the tree is shown below in a table containing all the genetic information supplied in the data file (see Figure 3). The table is organized by order of increasing parental generation with the list of any F1 progeny at the end. Rows in the table can be selected and copy/pasted into a document editing or spreadsheet application.
Figure 3. Color-coded table provided with the family tree display. Table of all individuals charted in the family tree for individual YY43. Row color coding matches the boxes in the family tree drawing.
The code in the first column, labeled “Lvl,” indicates the parental generation with a “P” followed by the generation number. In each row, the columns following “Lvl” are color-coded the same as in the tree, so the color indicates the male and female parent for an individual in the next (more recent) generation. In contrast, the color of the “Lvl” column shows whether that individual is from the maternal or paternal side of the tree with respect to the query individual. For example, Figure 3 shows three rows with a “Lvl” code of “P2.” The first two such rows, for individuals V112 and W23, have the “Lvl” column colored pink to indicate they are from the maternal side of the query individual’s family tree (in this case the parents of the query individual’s mother). The third row, for individual X82, is colored blue in the “Lvl” column to indicate it is from the paternal side of the query individual’s family tree and the rest of the columns are colored orange to indicate this individual was used in a self-cross to generate individual Y10 (the query individual’s father) in the P1 generation.
To avoid printing duplicate sections of a tree, individuals that would otherwise appear in more than one place in the tree are replaced with a code of one or more asterisks. Figure 4 shows an example of a duplicated branch representing the parents of individual SS66. This branch is printed once on the paternal side of the tree and marked there with a single asterisk, with the subsequent duplicate branch on the maternal side replaced by the single asterisk. Each distinct duplicated branch is marked with a different number of asterisks as needed (for example the distinct lineages of individuals SS64 and QH63 in Figure 4 are marked by sets of two and three asterisks respectively).
Figure 4. Family tree containing duplicate record markings. Family tree for individual UU94 showing duplicate marking “*” on individual SS66 because it is used both as the paternal grandmother and a maternal great-grandmother. Parental information for SS66 is therefore only showed once.
Feature: Sequential List of Individuals
Selecting the “list” option will generate a list of individuals based on up to three user parameters. Figure 5 shows a list example with the standard input file fields. The first parameter is the name of an individual in the data file and provides a starting point for the list. The second parameter is the number of individuals from that generation to include in the list. Individuals will be listed in the same order as they were recorded in the input file. The third, optional parameter allows the user to choose an “owner” (person who has conducted crosses) from a drop-down menu and filter the results so that only the crosses conducted by that individual are displayed.
Figure 5. Sequential list display. Table showing 12 individuals starting from individual Z10 and listed in the order they are found in the data file. Table columns are customizable.
Feature: List of Individuals Based on a Keyword Search
The search feature examines all fields in the data for one or more given search terms as illustrated in Figure 6. It is case sensitive and will return results in which the search term is a fragment of a longer word. As with the list, the results can be filtered by owner. Multiple keywords are separated by spaces and the search can either return records with all of the keywords (using the AND option) or with any of the keywords (using the OR option).
Figure 6. Search display. Table showing search results of 15 records having either “mac1” or “am1” or “afd1” keywords. Table columns are customizable.
A major advantage of the GRFT tool is its speed. Even with over 15000 founding individuals in our data file (representing hundreds of thousands of crosses), the individual lists, family trees, and searches can be called up almost instantaneously after the input file is read into memory when the page first loads (a process which usually requires several seconds, depending on the connection speed). This speed is possible because the full set of records resides in memory and the graphics are implemented entirely with HTML and CSS. Combined with the feature that allows users to redraw the tree by clicking on individuals within it, GRFT’s speed allows researchers to explore and interact with lineage in an unprecedented way. Other software packages for pedigree visualization, which generate static images and require more complicated commands to create them, do not support this level of interactivity.
The applet can be used with any sexually reproducing organism. All that it requires is parental information provided in a tab-delimited text file, which is easily generated using Microsoft Excel or any database or scripting program. With a slight modification to the script (described later), users can include additional columns with useful information such as the inbred or hybrid line, the cytoplasmic designation of the maternal parent, or genotype information.
One limitation of the GRFT tool is its lack of support for modifying genetic records. Currently any changes to the genetic records must be entered in the original application containing the records. Afterward, the complete set of records must be re-exported to the web server. This makes updating the data file cumbersome. In future releases of the applet, however, features could be extended to allow updates from users.
A sample web tool is available at http://stanford.edu/~walbot/grft-sample.html.
Conflict of Interest Statement
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
The Supplementary Material for this article can be found online at http://www.frontiersin.org/plant_genetics_and_genomics/ 10.3389/fgene.2011.00014/abstract
File S1. grft.js – main source file of the GRFT web applet.
File S2. genetic_records_sample.txt – sample input file of tab-delimited genetic records.
File S3. grft-sample.html – source file of the host webpage needed to run the GRFT applet.
File S4. grft.css – style sheet controlling the format and appearance of text and images in the GRFT applet.
Additional libraries used:
attributes.js and attributes.htc – helper script designed to resolve an inconsistency in the way different browsers set style attributes. Written by Paul Sowden. Available at http://delete.me.uk/2004/09/ieproto.html (paste the sections of code written out in the blog post into files named attributes.js and attributes.htc respectively).
Keywords: pedigree, family tree, records visualization
Citation: Pimentel S, Walbot V and Fernandes J (2011) GRFT – genetic records family tree web applet. Front. Gene. 2:14. doi:10.3389/fgene.2011.00014
Received: 18 February 2011; Paper pending published: 23 February 2011;
Accepted: 03 March 2011; Published online: 25 March 2011.
Edited by:Shawn Kaeppler, University of Wisconsin-Madison, USA
Reviewed by:Shawn Kaeppler, University of Wisconsin-Madison, USA
Karen Mcginnis, Florida State University, USA
Copyright: © 2011 Pimentel, Walbot and Fernandes. This is an open-access article subject to an exclusive license agreement between the authors and Frontiers Media SA, which permits unrestricted use, distribution, and reproduction in any medium, provided the original authors and source are credited.
*Correspondence: Samuel Pimentel, Department of Biology, Stanford University, Stanford, CA 94305-5020, USA. e-mail: firstname.lastname@example.org