GENica CMap Import File Preparation
The CMap import file format
- CMap is free software from the GMOD project, please also refer to the GMOD CMAP web-page for information, e.g. for further details on the map data import format.
- Use a spreadsheet program to generate a table as shown here, then use the "Save as" option to save the table as a "tab-delimited text file (*.txt)".
- See the next section for a list of all available fields and for information on how to use them.
- The first line of the finished file should be the tab-separated names of these fields. The order of fields is not important.
- You should use the names for the fields as listed in the next section, but you can use spaces and capitalization for the column names, if you like, as spaces will be converted to underscores and the names lowercased (e.g. "Feature Alt Name" will become "feature_alt_name").
- Important: Please open the CMap import file in "Notepad" or a similar program and verify no "text qualifiers" (e.g. quotation marks) have been automatically inserted when saving the spreadsheet as a tab-delimited *.txt file. If that is the case, remove these, e.g. with the "find / replace" function.
- If you need help with preparing a CMAP import file, please contact the GENica CMap curator for advice.
Example for a CMAP import file. The example shows two fictional barley maps (one for linkage group "1H", one for "2H", note entries in column A) presented for import into CMAP. Please note the use of feature type "accession IDs" in the column entitled "feature_type_acc": "A" for "AFLP", etc.
Back to contents
Fields available in the CMap import file
- map_name [REQUIRED FIELD]
-
The name of the map (AKA "linkage group", "chromosome", etc.); (character(64)).
- Please note: When importing data for an map set that already has data, all existing maps and features with the same name as maps and features in your data will be updated. You also have the option of deleting any data that isn't updated. If you choose this "overwrite" option, any of the pre-existing maps or features that aren't updated will be deleted as it will be assumed that they are no longer present in the dataset.
- map_acc
-
The accession ID of the map; (character(20)).
- Leave this field out, unless you want to update / overwrite an existing map.
- map_display_order
-
The order in which to display the map; (integer).
-
- map_start
-
The start position of the map; (double(8,2)).
- Optional. If not supplied, this is determined from the minimal start position of a map's features, after all the features have been imported.
- map_stop
-
The stop position of the map; (double(8,2)).
- Optional. If not supplied, this is determined from the maximal stop position of a map's features, after all the features have been imported.
- feature_acc
-
The accession ID of the feature; (character(20)).
- Leave this field out, unless you want to update / overwrite an existing feature.
- feature_name [REQUIRED FIELD]
-
The name of the feature; (character(32)) .
-
- feature_aliases
-
Any number of comma-delimited aliases, each of which can be a maximum of 255 characters; (character(255) each).
-
- feature_start [REQUIRED FIELD]
-
The starting position of the feature on the map; (double(8,2)).
- feature_stop
-
The ending position of the feature on the map; (double(8,2)).
- If this is not supplied, it is automatically set to be identical to "feature_start".
- feature_direction
-
The direction of the feature on the map, 1 or -1. 1 is default; (double(8,2)).
-
- feature_type_acc [REQUIRED FIELD]
-
The accession id of the feature's type; (character(32)).
- Use the feature type "accession ID" (not its name) in this section, e.g. use "A", not "AFLP".
- If you use an feature_type_accession entry that does not exist in the CMap Live database, the import will die when it encounters it. You can find the accession IDs for currently existing feature types on the GENica CMap Live Feature Type Info webpage.
- Beside these format requirements, please also refer to the usage conventions described below in "How to select feature types".
- feature_attributes
-
A semi-colon-delimited list of feature attributes.
- Optional.
- Feature attributes are defined as key:value pairs, separated by semi-colons. The example in the box below would define two separate attributes, one of type "Genbank ID" with the value "BH245189" and another of type "Overgo" with the value of "SOG1776." It isn't strictly necessary to place double-quotes around the values of the attributes, but it is recommended:
Genbank ID: "BH245189"; Overgo: "SOG1776";
- Importing cross references for hundreds of features in the "feature_attributes" field can lead to performance issues and is therefore discouraged. If cross references for all the features in a map are to be entered, please contact the GENica CMap curator who will pass your request on to the software developers.
- is_landmark
-
Whether or not the feature should be marked as a landmark feature (allowing it to be labelled when a user chooses "Landmarks" for the "Label Features" option); ("1" or "0").
Back to contents
How to format and choose feature names and aliases
Feature names
- Different marker naming conventions are in use in different laboratories, this is also reflected by alternative versions of marker names used in GENica CMap Live records.
- For a given map, feature names (that are displayed in the map viewer) should be chosen according to a consistent system.
- Where possible, the following considerations should be taken into account when entering feature names and / or aliases:
- "X" is not necessary in front of marker names (gwm197 instead of Xgwm197)
- Chromosome locations are not required after marker names (gwm107 instead of gwm107-3B)
- Numbers need not be preceded by leading zeros (wmc50 instead of wmc0050). This does not apply to feature names that are GenBank accession IDs, e.g. AE004092 cannot be shortened to AE4092.
- Marker names need not be written in italics
- Marker names written in lower case (e.g. gwm107) or in mixed case (e.g. EBmac623) can be easier to read than if they are converted to all upper case (e.g. GWM107)
- Letter suffixes are the preferred way of indicating multiple loci (gwm107b instead of gwm107.2)
- For EST markers, entering the NCBI GenBank accession ID (e.g. EF212872) as a feature name and / or alias allows linking to the relevant GenBank entry
- For AFLP markers, the preferred format is as in this example: P32/M50-138 (instead of E32M36d, E03M30-3, etc.)
- QTLs:
- A wheat QTL format was provided in the 1998 Wheat Gene Catalogue, section 6.2.2:
- "Locus symbols: The 'Q' should be followed by a trait designator, a period, a laboratory designator (see Section 5.6 [of the 1998 Wheat Gene Catalogue]), a hyphen (-) and the symbol for the chromosome in which the QTL is located. The trait designator should consist of no more than four and preferably three letters, the first of which is capitalized. Different QTLs for the same trait that are identified in one chromosome should be assigned the same symbol except for the addition of a period and an Arabic numeral after the chromosome designation. All characters in the locus symbol should be italicized. For example, QYld.psr-7B.1 and QYld.psr-7B.2 would designate two yield QTLs identified in chromosome 7B by the John Innes Centre. On a map of 7B, these could be abbreviated as QYld.psr.1 and QYld.psr.2."
- GrainGenes povides a format for barley QTLs that may be useful for both barley and wheat:
- "The proposed name for the barley QTL, consisting of a "Q", a 2-4 letter acronym for the trait, a ".", a four letter string of the first two letters of the parents, a "-", and the "H" chromosome, e.g. QHD.StMo-2H. See the "Traits" worksheet in this workbook (i.e. the GrainGenes barley QTL workbook) for the current trait list."
(This list of considerations is based on the outcome of a GRDC-funded workshop led by Rudi Appels.)
- Providing a feature name based on these considerations either as the "feature name" or as one of its "aliases" consistently throughout GENica CMAP increases the likelihood that the CMAP application can identify correspondences between features on different maps.
Feature aliases
- CMAP can identify correspondences between maps by matching up features with identical entries in either "feature_name" or "feature_aliases" (see How feature names and types are used to make "Name-Based Correspondences" below for further details). Providing feature_aliases therefore increases the likelihood of correspondences being identified, e.g. when two maps refer to the same locus with different names.
- You can identify the different names currently in use for a feature, by carrying out a CMap Live feature search. Consider entering several of the alternatively spelled feature names that you find in "feature_aliases".
- Provide NCBI GenBank accession IDs as an alias, if possible, even when this is the only marker name used (e.g. for an EST). This is because links to the NCBI GenBank Search are only created for features that have an alias.
- For marker names with a letter appended to indicate multiple loci, consider providing an alias without this appendix to allow correspondences between inconsistently named loci to be identified.
Back to contents
How to select feature types
- Please refer to the GENica CMap Live Feature Type Info webpage for a list of feature types currently available in the database.
- If a new feature type is required, contact the GENica CMap curator, who will check the requested feature type does not yet exist before passing the information on to the software developers.
- Please follow these use conventions as outlined in the GENica CMap Live data standards:
- If information on a "biological aspect" of a feature is available, then the corresponding "biological" feature type (e.g. "Gene" or "Traitlocus") should be used, in preference to a feature type describing a "technical aspect" of how the feature was detected (e.g. "SSR" or "RFLP").
- For marker loci for which a biological feature type cannot be assigned, use the specific marker type (e.g. "RFLP" or "SSR").
- If specific biological or technical information is not available, or if there is any doubt about the appropriate feature type, choose the generic feature type "Locus" (any feature that has been genetically mapped).
Back to contents
How feature names and types are used to make "Name-Based Correspondences"
- As a default, "Name-Based Correspondences" are made between features of the exact same type. For example, no correspondences would be made between an "SSR"-type and an "RFLP"-type feature, even if both had the same name.
- GENica CMap has been specially configured to allow "name-based correspondences"...
- ...between features of type "Locus" and features of any other type. This is why "Locus" should be used as the "default" feature type "when in doubt".
- ...between features of type "Gene" and features of specified types*.
- ...between features of type "Traitlocus" and features of specified types*.
- *specified types (12 April 2007):
- In CMap Live: AFLP, DArT, INDEL, ISOENZ, RAPD, RFLP, SNP and SSR
- In CMap Staging: AFLP, CAPS, DArT, EST, INDEL, ISOENZ, Maize EST, Maize Marker, MMP Unigene, PCR, RAPD, Repeat Region, RFLP, Rice Marker, Rice SSR, SNP, Sorghum CP, Sorghum EST, Sorghum GSS, Sorghum Marker, SSR, Traitlocus and Wheat EST Marker
- If correspondences are not made automatically, they can be manually entered with the "Make Name Based Correspondences" admin tool. Contact the GENica CMap curator if you find that known correspondences between a map set entered on your behalf and other map sets are not identified by the CMap application. (Please be aware that CMap normally identifies correspondences during the night that follows a map set's import.)
Back to contents
Links to related pages
Data standards and instructions:
Go to the actual CMap installations:
This information has been prepared according to the "GENica CMap Live Data Standards", version 1, and the "GENica CMAP Contributors' Roles" description, version 2.
|