Wide use of automatic equipment makes it possible to generate profiles of samples automatically and store them in files. For example, you need to compare genotypes of evidentiary sample and suspect. These genotypes are stored in Excel table. Entering this data manually is a time consuming process and a source of possible errors. Grape provides sophisticated import data tools which reduces routine human job significantly and minimizes the chances of incorrect data entering.

The procedure of importing includes several steps. First of all, you need to save your original file in text format: as a rule electronic tables (as Excel) allow you to do it. The text file you get is the source for Grape. Clearly, the format of this source file can be almost arbitrary. This is the main problem: a computer program can not import data if the data format is not rigorously specified.

We introduce methodology that gives you the possibility to import data from virtually any such  file. For this purpose, firstly you need to create an import template. Import template is a file (usually with extension .imp) that specifies roughly speaking, where Grape should seek for particular pieces of data (like allele notations, genotypes etc.). As soon as template is created, Grape will import data in accordance with that template. So generally you need to create import template only once given that you always import data in the same format.

Importing functionality can be used in two different manners that have the same user interface.

(a) Importing onto Desktop.  For Simple Case or Typical Rape Case it is possible to create the complete case choosing "Import Data" in the "File" menu. Grape will automatically load the genotypes of the Evidentiary Sample, Suspect and Victim (if applicable).

(b) Importing into Database. In more complex cases, with more persons involved, Grape uses another more universal approach. Data from the file is imported to a database of your choice using "Import Records" button in Samples Database dialog. You can create your case in a regular manner and at the time when you need to enter genotypes of the correspondent person in the Set Genotypes dialog, use "Load from Database" button rather than perform manual genotype input.

We will explain below in details how to create an import template for both (a) and (b) situations. But for now let us assume that you already did it and want to import data. Choose “File”, “Import Data” in Menu for (a) or click "Import Records" button for (b) and you will get a dialog similar to the following one.



Browse and find the data file to import. At the bottom of the dialog you see the box with the list of all available import templates. Choose the one that corresponds to your data file and press "Open" button. Grape imports the data automatically.

If you don't see the import template you need in the list then probably it is located in a different folder. In this case  check the box "Add new templates" and press "Open" button to browse. Do the same if you want to create a new template. The following dialog appears.



Now you can browse to find the import template you need. Then press "Open" to return to the previous dialog. The file you have found will be the first in the list. Grape remembers the location where you found the template and next time you import data it will list all template files from that location automatically.

If you want to make changes in a template file select this file and press "Edit/New" button. Do the same if you want to create a new template file, but in this case enter a new file name. You will see the dialog that gives you a possibility to edit a template (you can do it also using any text editor).




Edit the file in this window exactly in the same manner as in any text editor (like Microsoft Notepad) and press "OK" when finished. You will return to the previous dialog.

Now let us explain in details the structure of an import template file and how a new template can be created. Import template can contain the following keywords
that are different for (a) and (b).


For (a) the list of keywords includes:

DOCUMENT_NAME   The name that appears as a title of Grape document after the data will be imported.
LOCUS_NAME   The notation of a locus.
SUBPOPULATION   The notation of a subpopulation for the current locus.
EVIDENCE_ALLELE   The notation of the evidentiary sample's allele.
SUSPECT_ALLELE   The notation of the suspect's allele.
VICTIM_ALLELE   The notation of the victim's allele.
THETA   Coancestry coefficient.
VOID   Any information that is irrelevant and should be ignored by Grape.

For (b) the list includes:

DOCUMENT_NAME    The name that appears as a title of Grape document after the data will be imported.
LOCUS_NAME   The notation of a locus.
SUBPOPULATION   The notation of a subpopulation for the current locus.
SAMPLE_NAME   The name of the sample to be imported.
REFERENCE_NUMBER   The reference number of the sample to be imported.
NOTES   Any additional information relevant to the sample to be imported.
FOLDER   The name of the folder the sample will be imported into.
ALLELE_NAME   The notation of allele of the sample to be imported (for the given locus).
THETA   Coancestry coefficient.
VOID   Any information that is irrelevant and should be ignored by Grape.

Template can also contain arbitrary delimiters between keywords and special symbols {, } and .... The use of them will be clarified below.

Now we give several examples to clarify usage of templates: first for (a) and then for (b).

Importing onto Grape's desktop.

Example 1(a). Let your data file be the following:

D21S11
27 -> 28

The first line is the locus notation; the second line contains alleles of the victim that are separated with Tab. Here and below we use symbol ->  to designate Tab. The template for such file will be the following

LOCUS_NAME
VICTIM_ALLELE -> VICTIM_ALLELE

Grape takes the first line in the data file and considers it as a locus name because the first line in the template is LOCUS_NAME. Note that all information below will be considered as information about this particular locus D21S11. If you need to change locus the keyword LOCUS_NAME should appear in the template one more time and corresponding place in the data file should tell the new locus name. Clearly, keywords like VICTIM_ALLELE should appear after keyword LOCUS_NAME otherwise Grape would not know what locus to consider.

After reading the string D21S11 Grape will compare this name against the set of all available loci (that are listed in the file loci.loc). Error message will appear if this particular name can not be found. Make sure that notations of the loci in your data file are exactly the same as in the Available Loci library.

The Tab (i.e. ->) in the second line is the delimiter between keywords. You can use not only Tab, but virtually any combination of symbols as a delimiter. Grape finds this delimiter and considers the data before this delimiter (i.e. 27) as an allele of the victim. The rest of the line after this delimiter (i.e. 28) will be considered as the notation of the second victim's allele.

It is important to note that this template is invalid for homozygote. If data file looks like

D21S11
27

then an error occurs. Grape tries to find delimiter Tab in the second line according to the template. But now Tab does not exist and you will get error message.

There is a special tool in Grape to handle such kind of situations. You write template as follows:

LOCUS_NAME
{VICTIM_ALLELE -> }...

When Grape meets in the template the string like {EXPRESSION}... it reads the data according to the EXPRESSION format again and again while the format of the data string is in accordance with EXPRESSION. So it the same as write EXPRESSION many times in the template. The biggest advantage is that you should not specify how many : Grape detects it automatically. For the given example Grape will take 27 as the first victim's allele and then it examines the rest of the data string. If no other information is present (as for homozygote) the reading is complete. Otherwise, Grape will skip delimiter Tab and take another piece of data (as 28 in heterozygote case). This reading cycle will continue as many times as needed. It is especially convenient for mixed stains because the evidentiary sample can clearly contains many different alleles.

Below we will give more complex examples of using { }... construction. For now let us note one technical difficulty associated with the usage of this construction. Grape interprets

{VICTIM_ALLELE -> }...

as

VICTIM_ALLELE ->

for homozygote and

VICTIM_ALLELE -> VICTIM_ALLELE ->

for heterozygote. Strictly speaking, both data strings 27 and 27 -> 28 do not match these templates: delimiter Tab is absent at the end of the both strings. It is better to have data string as 27 ->  and 27 -> 28 -> to have exact correspondence between data and template. But Grape is smart enough to understand what is going on and reads the data correctly.

Example 2(a). Let you have the following data file:

******************************************
******* Here are some comments ***********
******************************************
D21S11
27

D16S539
12 14

The first 3 lines here contain some information that is not really needed. Then locus name D21S11 and victim's genotype follow. The same block of information is repeated for the locus D16S539 . Delimiters between allele notations are just spaces now, not Tabs as in the Example 1(a).

The simplest import template that corresponds this data file is the following

VOID
VOID
VOID
LOCUS_NAME
{VICTIM_ALLELE }...
VOID
LOCUS_NAME
{VICTIM_ALLELE }...
VOID

We put the keyword VOID any time we want the information to be ignored by Grape. Note that we put VOID even for blank lines.

This template is good only for two loci. We have a situation very similar to the Example 1(a). The solution is almost the same: a universal template that is good for any number of loci has the following form:

VOID
VOID
VOID
{
LOCUS_NAME
{VICTIM_ALLELE }...
VOID
}
...

Grape will repeat the block

LOCUS_NAME
{VICTIM_ALLELE }...
VOID

as many times as needed: until the data is no longer fit the block's format or until the end of the file. In difficult situations it is a good idea to mark the end of the reading zone in some way. For example, let the data file contain some information at the end that is irrelevant to the studied case:

******************************************
******* Here are some comments ***********
******************************************
D21S11
27

D16S539
12 14

The rest of the file contains
irrelevant information

We need to ignore the last part of the file while importing data. Generally, Grape automatically does it comparing the structure of the repeating block and data. But in this particular case an error will occur: Grape will interpret the string The rest of the file contains as the new locus name. Then it will look at the next string and will find out that it contains two words irrelevant and information separated by space, i.e. exactly by the same delimiter that should separate alleles. So nothing in the file will prevent Grape to interpret these two words as victim's allele notations. As a result you will get error message: Grape will be unable to import information.

To avoid such sort of mistakes just put any specific word (arbitrary combination of symbols) at the end of data to be imported and the same word in the template. For example, put the words End of data as in the example below.

******************************************
******* Here are some comments ***********
******************************************
D21S11
27

D16S539
12 14

End of data
The rest of the file contains
irrelevant information

Grape reads this file correctly using the following template:

VOID
VOID
VOID
{
LOCUS_NAME
{VICTIM_ALLELE }...
VOID
}
...
End of dataVOID
{
VOID
}
...

Note that we put VOID after the words End of data in the import template. It is optional. Generally, every line of import template should contain at least one keyword but Grape will add VOID at the end of the line if no other keywords are found in that line.

Grape considers the phrase End of data as a delimiter. So delimiters are not necessary simple symbols like spaces or tabs. Actually, it is an arbitrary string that we put between keywords. Often delimiters act like bookmarks to indicate the beginning and the end of the useful piece of information inside the import file. For example, in the example above the first three lines of the import file contains irrelevant information and should be ignored. But what we should do if we do not know in advance how many such garbage lines the file contains (or if this amount is different for different cases). It is clearly inconvenient to have many templates if the number of lines to be ignored equals 1, 2, 3, ... Much better way is to put specific string at the beginning of useful piece of data to indicate that all information above should be skipped. Consider the following example:

******************************************
******* Here are some comments ***********
******************************************

Vict.
D21S11
27

D16S539
12 14

End of data
The rest of the file contains
irrelevant information

The difference with the previous example is in one line: we put word Vict. just before the beginning of data. Now we do not care that the beginning of the file contains exactly 3 lines of irrelevant information (and therefore it is needed to put VOID also 3 times) and may construct the template in the following way:

{
VOID
}
...
Vict.VOID
{
LOCUS_NAME
{VICTIM_ALLELE }...
VOID
}
...
End of dataVOID
{
VOID
}
...

Grape will find the delimiter Vict. in the import file. According to the template all lines above Vict. are VOID and Grape ignores them. The first line after Vict. will be considered as locus name, then victim's alleles and so on until Grape will meet the words End of data . These words indicate according to the template that the rest of the file contains garbage and can be ignored.

This approach is useful if we need to separate pieces of data about DNA profiles of the evidentiary sample, victim and suspect. For example, let the import file be the following:

******************************************
******* Here are some comments ***********
******************************************
Vict.
D21S11
27

D16S539
12 14

Evid. sample
D21S11
27 30 31

D16S539
12 14

Susp.
D21S11
30 31

D16S539
12

End of data
The rest of the file contains
irrelevant information

Words Vict., Evid. Sample and Susp. indicates that information below relates to victim, evidentiary sample and suspect respectively. The import template should be the following:

{
VOID
}
...
Vict.VOID
{
LOCUS_NAME
{VICTIM_ALLELE }...
VOID
}
...
Evid. sampleVOID
{
LOCUS_NAME
{EVIDENCE_ALLELE }...
VOID
}
...
Susp.VOID
{
LOCUS_NAME
{SUSPECT_ALLELE }...
VOID
}
...
End of dataVOID
{
VOID
}
...

There can be several such "bookmarks" even inside just one line. For example, the same DNA information as in the example above can be presented in the following form:

D21S11
Vict. 27 Evid. sample 27 30 31 Susp. 30 31
D16S539
Vict. 12 14 Evid. sample 12 14 Susp. 12

The correspondent import template is the following:

{
LOCUS_NAME
Vict. {VICTIM_ALLELE }...Evid. sample {EVIDENCE_ALLELE }...Susp. {SUSPECT_ALLELE }...
}
...

There are three more keywords DOCUMENT_NAME, THETA and SUBPOPULATION that we have not explained yet. THETA is the coancestry coefficient that will be used in calculations. DOCUMENT_NAME is the name of the Grape's document that appears when the file will be imported. If no DOCUMENT_NAME is specified in the import file then the default name “untitled” is used.

SUBPOPULATION is the subpopulation name that is to be used for the current locus name. It should exactly match a subpopulation name in the available loci library. If given locus does not have various subpopulations then no need to specify SUBPOPULATION. Alternatively, it can be left blank. For example, if no subpopulations are specified for locus D21S11, then the file (the first line of the file is blank)


D21S11
Vict. 27

will be successfully imported with either one of the following two import templates:

VOID
LOCUS_NAME
Vict. {VICTIM_ALLELE }...

or

SUBPOPULATION
LOCUS_NAME
Vict. {VICTIM_ALLELE }...

In the first case Grape just ignores the first blank line. In the second case Grape takes this blank line and consider it as the name of subpopulation for locus D21S11. Since no symbols are found in this line Grape does not assign any subpopulation to the locus D21S11, so both templates do the same. The second option is useful if some of the loci have subpopulations and some do not. For example, let no subpopulation data are available for locus D21S11, but for the locus D16S539 we have data for Caucasian and Afro-American subpopulations and we want to use Caucasian data. Then the file

Caucasian
D16S539
Vict. 12 14

D21S11
Vict. 27

can be imported with the following template

{
SUBPOPULATION
LOCUS_NAME
Vict. {VICTIM_ALLELE }...
}
...

This example also indicates that subpopulations used do not need to be all the same, i.e different subpopulations can be used for different loci if needed: it can be useful if population statistics for given subpopulation is not available for some loci. However, it is important to specify subpopulation before the locus name or at least in the same line. It is an internal restriction of Grape. This subpopulation will be in effect until you change it again. If you do not specify subpopulation at all it is considered blank. For example, let us modify example above as follows:

D21S11
Vict. 27
D16S539 Caucasian
Vict. 12 14

This file will be correctly imported with the use of the following import template:

{
LOCUS_NAME SUBPOPULATION
Vict. {VICTIM_ALLELE }...
}
...

Importing into Grape database.

The main difference with already considered Importing onto Grape's desktop case is that now we can import many samples at the same time. The situation is similar to the one we discussed abovei: if number of loci is not predefined in advance it is necessary to use the loop construction of type

{

}
...

to read data about all loci. With many samples to be imported (and still many loci) it is typical to have two embedded loops of type

{
{

}
...
}
...

Let us give few examples.

Example 1(b). The data file is the following:

**** Any irrelevant information at the beginning of the file ****
*******************************************************************

begin data
Sample 10.1
1234
Folder 1
My notes
Afghanistan HumVWA 14 15
D3S1358 13 14 16

begin data
Sample 10.2
4321
Folder 2
Other notes
Afghanistan HumVWA 14 16
D3S1358 13 18 20

We are going to import two samples with names Sample 10.1 and Sample 10.2 into a samples database. These samples should be imported into the folders Folder 1 and Folder 2 and get reference numbers 1234 and 4321. "My notes" and "Other notes" should be the notes associated with that samples. Two loci HumVWA and D3S1358 are considered, both for subpopulation Afghanistan and genotypes for these loci are 14,15 and 13,14,16 for the Sample 10.1 and for the Sample 10.2 the genotypes are 14,16 and 13,18,20. The template that is suitable for exporting this data is the following:

{
VOID
}
...
{
begin data
SAMPLE_NAME
REFERENCE_NUMBER
FOLDER
NOTES
SUBPOPULATION LOCUS_NAME {ALLELE_NAME }...
{
LOCUS_NAME {ALLELE_NAME }...
}
...
VOID
}
...

Grape will proceed in the following way. The first part of the data before the first appearance of the phrase "begin data" (which serves as a delimiter or "bookmark" as we explained above) is VOID so it should be ignored.  "begin data" indicates the beginning of the outer loop (with respect to samples), so the block of data between two sequential appearance of these words contains the information about a single sample. The first four lines of any block are sample name, sample reference number, folder and notes associated with this sample. After these four lines, the inner loop (with respect to loci) follows. The first line with genotype information (like Afghanistan HumVWA 14 15) contains also the name of subpopulation and differs in that sense from all other lines (like D3S1358 13 14 16). Grape will stay in the inner loop and will try to interpret each line according to the format LOCUS_NAME {ALLELE_NAME }... (e.g. first word is the locus name, then space, then alleles notations also divided by spaces). In the considered example the inner loop has just two lines: the blank line that follows D3S1358 13 14 16 does not fit the mentioned format. It is the indication for Grape that the inner loop breaks. Grape interprets blank line as VOID according to the template and then switches back to the first line of the outer loop, i.e. to "begin data" words. These cycles continues as many times as needed to read the data about all samples.

There can be difficult situations when it is not clear does a loop end or not. It happens if the format of the next line fits both assumption: whether we should continue the loop or break it. Generally, in such situations Grape always tries to go out of the loop if the format of the following data fits the continuation of the data file: Grape has special "look ahead" capability for this purpose. The only exception from this policy is the situation if the next template line is VOID (as in the considered example). The logic behind this exception can be easily understood looking on the current example: since VOID fits actually any data string, the inner loop (with respect to loci) will be always terminated after the first step that does not make sense: D3S1358 13 14 16 fits VOID and it would be the reason for Grape to terminate the inner loop.

Example 2(b). Consider another example

Sample 1
Ref #: 1234
Folder_1
My notes

Sample 2
Ref #: 4321
Folder_2
Other notes

/// HumVWA    Afghanistan ///

Sample 1
14 15   
Sample 2
14 16

/// D3S1358    Iran ///

Sample 1
13 14 16   
Sample 2
13 14 18

There are two main differences with the Example 1(b). First of all, the information concerning names, reference numbers, folders and notes is located now at the beginning of the file, before genotypes. Secondly, the outer loop is now with respect to loci and the inner loop is with respect to samples (e.g. we list genotypes of  all samples for the first locus, then do the same for the second locus and so on).

The template that allows to import that data correctly is the following:

{
SAMPLE_NAME
Ref #: REFERENCE_NUMBER
FOLDER
NOTES
VOID
}
...
{
/// LOCUS_NAME    SUBPOPULATION VOID
VOID
{
SAMPLE_NAME
{ALLELE_NAME }...
}
...
VOID
}
...

Grape begins extracting data in accordance with the loop

{
SAMPLE_NAME
Ref #: REFERENCE_NUMBER
FOLDER
NOTES
VOID
}
...

This template block is applied two times in the considered example. Absence of words "Ref #: " (that serves again as delimiters or bookmarks)  is the indication for Grape to go outside of the loop. The second part of the template is needed to extract genotypes. Each "///  " indicates the beginning of the block of data related to one particular locus (outer loop). The same line contains the subpopulation name, next line is VOID (it's an empty line in data file) and then the inner loop follows. The inner loop contains all genotypes, it begins with the sample name and the next line contains allele notations separated by space.

Example 3(b). This example is very similar to the Example 2(b). The data file is the following

Sample 1
1234
Folder_1
My notes

Sample 2
4321
Folder_2
Other notes

/// HumVWA    Afghanistan ///

1234: 14 15
4321: 14 16

/// D3S1358    Iraq ///

1234: 13 14 16   
4321: 13 14 18

The first part of the file is almost the same as in the Example 2(b). While reading this part Grape creates new entries in a database corresponding to the Sample_1 and Sample_2 and fills the fields "Reference Number", "Folder" and "Notes". The second part of the data file contains genotypes but they are listed under references numbers not names as in the Example 2(b). The template that "decodes" the data file is the following

{
SAMPLE_NAME
REFERENCE_NUMBER
FOLDER
NOTES
VOID
}
...
{
/// LOCUS_NAME    SUBPOPULATION VOID
VOID
{
REFERENCE_NUMBER: {ALLELE_NAME }...
}
...
VOID
}
...