CSV File Formats

These pages provide information and formatting assistance for the construction of valid CSV Trial/Nursery files for import into KDSmart and KDXplore.

KDSmart and KDXplore share the same import code base, designed to make the task easier. This introduction will help with understanding what needs to be prepared for a CSV import using either application.

Note

Other data import methods exist, instead of CSV files, for loading Trial/Nursery/Trait/Tag data into KDSmart/KDXplore which can offer better and longer term advantages.


KDSmart/KDXplore Import Mapping

Importing a Trial using CSV format in KDSmart/KDXplore commences with an attribute mapping table, similar to the examples in the following illustration. This is an important step which helps you setup how your Trial or Nursery information is loaded and defined in each application.

Note: This method of attribute selection is employed to make the import task easier, rather than keep editing and adjusting the CSV file format.

Before the import proceeds, CSV Column Headings in the file need to be assigned an Attribute Type using the “Import As” column.

Trait Name Style Import Trial from CSV Column Mapping Example

KDSmart

KDXplore

KDSmart CSV Column Mapping Example
KDXplore CSV Column Mapping Example

The interface examples above illustrate the:

  • File chosen to import is “maize-with-data.csv”;

  • Name of the CSV’s column Headings;

  • First data row; and

  • Blue (KDSmart) ‘Import As’ column.

Selecting the ‘Import As’ entry, for each row as necessary, reveals a drop down list of attributes that can be assigned to the column. KDSmart/KDXplore makes a ‘best guess’ of the attribute type where possible based upon the first row of data.


Trials and Plots

The following tables list each Attribute Type. Pay particular attention to the ones marked as blue text:

Trait Name Style Attribute Type Usage

Attribute Type

Usage

  • Trial Attribute or

  • Plot Attribute

Indicates KDSmart must retain the value as an attribute, either for the Trial or for each Plot respectively.

  • Trait or

  • Trait for Sub-Plot

Applying either of these Attributes to a column causes a Trait of that name to be added to the list of Traits for the Trial.

Ignore

Use this Attribute to ignore columns in the CSV file.

The Attribute Types listed above may be applied to ANY heading in the CSV file.

Any columns with a name starting with “Link:”, “Date:”, “Date_”, “Sub-Plot#:”, or “Sub-Plot#_” will be automatically assigned as Ignore (but you can always change that if you wish).


Trial & Plot Import Options

When importing a heading classified as Trait or Trait for Sub-Plot KDSmart/KDXplore uses hints in the heading to determine whether it is only a Trait Name or if the heading also identifies a Trait Instance.

This is designated as the Trait Name Style.

You may choose one of the following options:

Trait Name Style options:

Option

Base Instance Number

Format

Example / Description

No Trait Instance detection

All Trait headings will have a single Trait Instance created. Headings of:
AMT1,
AMT:1,
AMT__1
will correspond to three different Traits with exactly the names provided.

Ends in colon followed by digits

1

<traitName> : <instanceNumber>

AMT, AMT:2 results in Trait Instance numbers of 1 and 2.

Ends in two underscores then digits

1

<traitName> __ <instanceNumber>

AMT, AMT__2 results in Trait Instance numbers of 1 and 2. Note: Don’t include spaces - only depicted here for display purposes.

The other options you need to select are how KDSmart/KDXplore determines Sub-Plots from the CSV headings. This is described in the Sub-Plot Count and Trait Instances section below.

To specify that a column is for a particular Sub-Plot, add the following suffix #nn (i.e. # followed by digits, which must be an integer greater than zero) where the number identifies the individual.

You may specify whether or not “sub-plot” data is to be collected here or while scoring by touching the Plot icon: plot-icon

The following three tables list, in turn, the headings that are used for different parts of a Trial Import.
Abbreviations used:
ID - Required for Identification
OPT - Optional
ATTR - Attribute.


Trial Specific Data

The first table lists the headings that pertain to Trial-specific data.

Headings For Trial Specific Data

Attribute Type

Headings Automatically Recognised

Description

Trial Planting Date
OPT

  • PlantingDate

  • TrialPlantingDate

  • DatePlanted

  • Planted

  • DateSown

  • Sown

  • SownDate

The base value used for computing ELAPSED_DAYS values for Traits, e.g. Days to Flowering. (Can be changed manually after import)

Trial Name
OPT

Trial Name

If present, this is used as the name of the Trialin. If not present, the name of the CSV file will be used.

Trial Alias
OPT

  • Abbreviation

  • TrialAbbreviation

  • TrialAcronym

  • Acronym

A short name for the Trial that will be used on screens where there is limited space.

Trial Column Name
OPT

  • ColumnName

  • NameForColumn

This is the word used for the X coordinate (see the attribute type X-Column in the Plot headings table below).
Example: for “Range/Row” the value here would be “Range”.

Trial Row Name
OPT

  • RowName

  • NameForRow

This is the word used for the Y coordinate (see the attribute type Y-Row in the Plot headings table).
Example: for “Range/Row” the value here would be “Range”.

Trial Plot Name
OPT

  • CellName

  • NameForCell

  • PlotName

  • NameForPlot

This is the word used to describe each “cell”. For many plant-related field trials this is likely to be “Plot”. For other trials/experiments it may be “Panel”, “Pond”, “Test-tube” etc.

Trial Bundle
OPT

  • TrialBundle

  • TrialGroup

  • TB

  • TG

Assigns the name of a Trial Bundle for the Trial

Trial Database Id
OPT

  • DatabaseTrialId

  • TrialId

  • TrialNumber

May be provided to cross-reference to the originating database.

Trial Attribute
ATTR

May apply to any heading in the file.
See Trial Attribute Headings below.

Retain the value as an attribute of the Trial. If you apply this type to a column then every row must have the same value.


Plot Specific Data

Headings For Plot Specific Data

Attribute Type

Headings Automatically Recognised

Description

Plot Id
ID
Integer

  • CellId

  • PlotId

  • EntryId

  • Plot

If present, uniquely identifies the Plot in the Trial. Must be unique.

PlotBlock
ID
Integer

  • PlotBlock

  • BlockNo

If present uniquely identifies the Block in the Trial.

Plot Column (X)
ID
Integer

  • PlotColumn

  • X

  • Column

  • Col

  • ColumnX

  • X-Column

  • Column_No

  • BlockX

The X–coordinate of the Plot within the Trial.

If present, the (X,Y) pair must uniquely identify the Plot in the Trial, OR

If PlotBlock is selected the (X,Y) pair must be unique within the Block.

Plot Row (Y)
ID
Integer

  • PlotRow

  • Y

  • Row

  • RowY

  • Y-Row

  • Range_No

  • Range

  • BlockY

The Y–coordinate of the Plot within the Trial.

If present, the (X,Y) pair must uniquely identify the Plot in the Trial, OR

If PlotBlock is selected the (X,Y) pair must be unique within the Block.

Plot Note
OPT

  • PlotNote

  • Note

  • TrialUnitComment

A description or information for the imported plot.

Plot Type
OPT

PlotType

If present, this specifies the type of plot (e.g. Control, Check, etc.).

Plot Barcode
OPT

  • PlotBarcode

  • Barcode

  • TrialUnitBarcode

If Plot Barcode column provided it must not be blank.
See the User Guide section on Barcode Scanning for further details of how the data in this column is used.

Plot Tags
OPT

  • Tags,

  • PlotComments,

  • Comments

Additional comments for the imported plot. A list of pipe-separated ( | ) Tag labels.

Plot Database Id
OPT

  • DatabasePlotId,

  • TrialUnitId,

  • TrialUnitNumber

May be provided to cross-reference to the originating database.

Sub-Plot Count
OPT
Integer

  • SpecimenCount,

  • IndividualCount,

  • Plant Count

Number of Sub-Plots or plants present within the plot to be imported.
Please read the section below Sub-Plot Count and Trait Instances.

Plot Attribute
ATTR

May be applied to any heading in CSV.
See Plot Attribute Headings below.

Retain the value as an attribute of the Plot.
Each row may have a different value (including blanks). Choose any subset to display as Plot Info while scoring.


Sub-Plot Specific Data

Headings For Sub-Plot Specific Data

Attribute Type

Headings Automatically Recognised

Description

Sub-Plot Id
ID
Integer

  • SpecimenDatabaseId

  • Sub-PlotDatabaseId

  • Sub-PlotDbId

  • PlantDatabaseId

  • IndividualDatabaseId

If present, uniquely identifies the Sub-Plot in the Trial. Must be unique.

Sub-Plot Tags
OPT

  • Sub-PlotTags

  • SPTags

  • PlantTags

  • IndividualTags

  • SpecimenTags

Additional comments for the imported Sub-Plot. A list of pipe-separated ( | ) Tag labels.

Sub-Plot Traits
OPT

Any heading with # that is not matched to one of the above Sub-Plot specific headings will default to a Sub-Plot Trait.
E.g Plant_Height#6 will default to Sub-Plot Trait.


Attribute Types for Trait Variations

This table continues from the two above but lists the Attribute Types for Trait variations and for excluding a column from import.

Attribute Types For Trait Variations

Attribute Type

Section

Headings Automatically Recognised

Description

Trait
OPT

Trait

May be applied to any heading in the CSV file.

Marks this column as representing a Trait that is scored only for Plots.

The Trait Name Style governs if a Trait Instance is being referenced or not.

Trait for Sub-Plot
OPT

Trait

May be applied to any heading in the CSV file.

Marks this column as representing a Trait that is scored only for Sub-Plots.

The Trait Name Style governs if a Trait Instance is being referenced or not.

Ignore

May be applied to any heading in the CSV file.

Indicates that this column should not be imported.

For an example see the following A Simple Example topic.

Sub-Plot Count and Trait Instances

Multiple instances for a Trait can be imported by including the instance number in the heading. For example, importing the second instance of the Trait ‘Plant_Height’ can be achieved by including a colon Plant_Height:2.

Alternatively using the Sub-Plot Count column instructs KDSmart/KDXplore to create the specified number of Sub-Plots in each plot.

When Sub-Plot Count is present and non-blank, the integer value in this column will cause creation of the given number of Sub-Plots in the Plot.

This will apply to all of the columns that have an Attribute Type of Trait or Trait for Sub-Plot.

If a column is NOT marked as Sub-Plot Count, no Sub-Plots will be created during the import of the data file; in this case, if you wish to add a Sub-Plot to a plot, use the popup menu that appears when you touch the Plot icon: plot-icon (during the scoring activity) and choose the Add Sub-Plot option. This presents a list of all the Traits that are defined as Trait for Sub-Plot and one of these may be selected to score for the new Sub-Plot.


Notes For Importing Subplot Data

The TRAIT FOR SUB-PLOT column heading requires a suffix of # with the Sub-Plot number the value is referring to.

For example, importing a Plant_Height value for Sub-Plot three have a heading Plant_Height#3. To import for all of the Sub-Plots available (Number seen in Sub-Plot count column) use #*.
If the trait is a specific instance, the instance number can be imported as you would typically just with the Sub-Plot number appended e.g. Plant_Height:2#3.


Conditions For Importing Subplot Data

The Sub-Plot Count column import will create the provided number of empty Sub-Plots for the specified plots.

  • Trait for Sub-Plot will create the required number of Sub-Plots for the value (‘#Number’).

  • Trait Sub-Plot numbers (‘#Number’) cannot be higher than the Sub-Plot Count if both are being imported.

  • Sub-Plot Count column is required if #* (for all Sub-Plots) is used.

  • There cannot be two identical column names with Trait for Sub-Plot (e.g. Plant_Heigh#1, Plant_Height#1).

  • If #* is used for the same Trait as #Number (e.g. Plant_Height#*, Plant_Height#1) the numbered column (Plant_Height#1) takes priority for the specified Sub-Plot.


Plot Attribute Headings

The items below identify headings that are automatically classified as Plot Attributes by KDSmart.

  • Treatment

  • SelectionHistory, Selection_History

  • ReplicateNumber, Replicate, Rep, Repeticion, REP_NO

  • GenotypeName, Genotype, GID

  • Origin, 0rigin (that’s a “zero”)

  • Designation

  • Type

  • CID, SID, CROSS, SOURCE, ENTRY_NO, SUB_BLOCK, PLOT_NO

  • PedigreeName, Pedigree, BreedersPedigree

  • Type

Note

This list will be replaced by the CSV Import Profile functionality at a later release. Also the spelling of ‘Repeticion’ is required as it is within the ‘code’.


Trial Attribute Headings

The following are automatically classified as Plot Attributes by KDSmart:

  • SiteName, Site, Location

  • SiteYear, Year

  • TrialType, TrialTypeName

  • TrialStartDate, StartDate


Values in CSV files

CSV files may contain data in the various data “columns”.

For columns marked to be imported as Traits, the following rules apply:

  1. Blank values will be treated as un-scored Traits as will the word UNSCORED (the latter is not required unless you want to make un-scored trait values very obvious in a CSV file).

  2. The word MISSING will be imported as a special token denoting a missing value.

  3. The word NA will be imported as a special token denoting “not-applicable” or “not-available” (whatever you take it to mean).

Note

All other values will be checked using the validation rule for the trait. Any failure to pass the validation check will cause the entire import to fail.

If you select a column to be used as the Trial Name, the value in the first data line may not be the same as any existing Trial in the database.

Similarly, if you have not chosen a column as the Trial Name, the name of the file (excluding anything from the last “.” onwards) will be used to check for a pre-existing trial in the database.

Note

Trial Name checks are not case sensitive.

A Simple Example

Example File Construction

For this simple example (line no.s shown for clarity) the CSV file with three lines consists of:

1date sown,Row,Range,MZC,MZP,#Sub-Plots,LLEN
22014-12-15,23,4,,,2,
32014-12-15,21,5,,,0,

Using the “After Sub-Plot Count”, described in Sub-Plot Count and Trait Instances, the following definition could be used in KDSmart:

Note: In the following table, the top heading row is NOT part of the data file.

Simple Example: Three Row CSV File (Header Row and Two Data Rows)

Attribute Type

Plating Date

X-Column

Y-Row

Trait

Trait

Sub-Plot Count

Trait for Sub-Plot

CSV Line 1

date sown

Row

Range

MZC

MZP

#Sub-Plots

LLEN

CSV Line 2

2014-12-15

23

4

2

CSV Line 3

2014-12-15

21

5

0


Example Result

The result of importing this diminutive Trial will be:

  • Only two plots at positions (23,4) and (21,5);

  • Two Traits, MZC and MZP, are being scored for each Plot;

  • The first Plot (CSV Line 2) has two Sub-Plots for which the trait LLEN (leaf length) will be scored;

  • The second Plot (CSV Line 3) has no Sub-Plots requiring scoring.
    However, if while Scoring a Sub-Plot was added to the latter Plot, you will then be asked to score the LLEN Trait for the new Sub-Plot.

Additionally, as no columns in this example were designated as X-Column-Name or Y-Row-Name, the default names of “Column” and “Row” is used to describe the Plot coordinates. Following import, the Trial Details can be edited in KDSmart and the row/column names changed to your choice e.g. “Row” and “Range” respectively, if required to make the nomenclature align more closely to your usual vocabulary. Best practice however, is to name them in the CSV file rather than change them in KDSmart.


Spaces, Case Sensitivity and Blank Lines

When reading and parsing the headers, KDSmart removes all spaces and performs a case-insensitive comparison to match CSV headings with the automatically recognised built-in headings.

It also skips over any blank lines and takes the first non-blank line as the headings line.

Traits

The following table describes the required and optional Trait headings for a CSV file that contains details of Traits that will be used by Trials.

Required and Optional Trait CSV File Headings

Heading

Alternatives

Notes

TraitName

Name

This column heading must be present. All others are optional.
The value should be kept short and, if you wish to use the name in KDXplore with CALC Traits no spaces are permitted and the name must begin with a letter.
We recommend using the underscore ( _ ) character or CamelCase to improve readability.
See also the description of the import option regarding Trait Name Style.

TraitAlias OPT

Alias

If provided, should be a shorter form of the TraitName to use during scoring.

TraitLevel OPT

Level

Default: Plot. Versions before (KDSmart V3.0.3, KDXplore Beta1.1.4 or Prod 2.1.4) TraitLevel could specify only Plot or Sub-Plot. For versions equal to or above, TraitLevel can specify the trait is either for the: Plot Level by using “Plot” or “TrialUnit”; or
the Sub-Plot level by using “Sub-Plot”, “Sub-Plot”, “Individual” or “Plant”.

TraitBarcode OPT

Barcode

Used to identify the Trait when using a barcode scanner for scoring.

TraitUnit

Unit, UnitName, TraitUnitName

You can provide anything you like that will assist you in remembering what to enter while Scoring.

TraitDescription

Description, Desc

A longer description of the Trait.

TraitDatatype

Datatype

One of CATEGORICAL, DATE, ELAPSED_DAYS, INTEGER, DECIMAL or TEXT. See Traits for further details.

TraitValidation

TraitValRule, Validation, ValidationRule

See Traits for further details.

DatabaseTraitId

TraitId

If you provide this column, KDSmart will record the value and export it on request (a reference to the originating database). See Protected Traits below.

Note

A new Trait can be created directly in KDSmart. If you do this and are also uploading or synchronising your data to KDXplore or KDDart you may have to reconcile the differences if there is already a Trait of the same name (compared in a case-insensitive manner).

Overwriting Existing Traits

Entries for existing Traits can be specified in a CSV file. The Traits are matched by Trait Name (ignoring case) and incoming Traits with the same name as one already in the database will be checked for compatibility:

Trait Import Overwriting Behaviours

Trait in Database

Trait in CSV File

Action Taken

Data Type is TEXT

Any Data Type

The Trait Data Type in the CSV file will replace the data type in the database.

Any other data type

Data Type is TEXT

A warning is issued identifying the line and the import continues excluding the line from the CSV file.

Any other data type

Data Type is not TEXT

A warning is issued if the data types are not the same (and the import continues)

Description

Description

Descriptions are truncated to the current database limit then compared for equality.

If they are the same (ignoring case), a warning will be issued but the new Trait’s description will replace the current one in the database. Otherwise:

  • CHOICE: must have the choices or the old choices must be a subset of the new choices

  • INTEGER and DECIMAL: the limit “exclusions” must be identical, however the new Trait’s limits may be “wider” than the current Trait in the database

If any of incompatibilities above are identified, a warning is issues and the import continues without altering the current Trait in the database.


Protected Traits

If a Trait has been imported with a DatabaseTraitId, it is deemed to be “protected”.

In this case you may only edit the Trait Alias and changes to the validation rule are not permitted except for INTEGER and DECIMAL Traits where you may change the range of acceptable values but only to make the range smaller.

For example, if an INTEGER Trait has been defined to have values from 0 to 100 you may only change the lower limit to be 1 or more and the upper limit to be 99 or less.


Calculated Traits

These have the Trait Data Type of CALC. No measurements are collected by KDSmart for these Traits.

The value may only be viewed in KDXplore during data curation. KDXplore is able to calculate Trait values from other Traits via entering a formula (similar to Excel).

Note

New Traits cannot be derived from Trait names containing spaces. Recommendation that underscores are used in place of spaces for Trait names, especially if they are intended for use in calculations.

See: Calculated or Derived Traits

Tags

The following details the required and optional headings in a CSV file that contains details of Tags that will be used to annotate Plots and Sub-Plots when Scoring a Trial.

Heading

Alternatives

Mandatory

Description

Label

TagLabel, Comment

Yes

Name of the Tag e.g. BD for bird damage

Description

TagDescription, Desc, CommentDescription

Yes

Tag description

DatabaseLabelId

LabelId, Id

No

Useful for cross-reference to the originating database


Sub-Plot Level Trait Import Examples

The following six cases illustrate part of the input CSV file and how the Trait will be represented in KDSmart. The last example, case 7, shows what not to do!

l-case1

l-case2

l-case3

l-case4

l-case5

l-case6

l-case7

Excel Formula for Converting ISO-8601 Timestamp to Excel Date

When KDXplore or KDSmart use a timestamp for a scored trait value the timestamp is recorded in a format compatible with ISO-8601 format.

This formula should work for any timestamp produced by KDXplore or KDSmart, but is not universally applicable to all date/times produced in the ISO-8601 format format.

Using this format means that you can correctly compare data collected in different time zones.

Unfortunately Microsoft Excel does not automatically recognise this format. However, using the formula below, you can easily convert the date/time string in ISO-8601 format into an Excel date/time value and render this in a more useful form by using an appropriate cell format (e.g. yyyy-mm-dd hh:mm:ss).

The formula is:

=DATEVALUE(LEFT(A1,10))
+ TIMEVALUE(MID(A1,12,8))
+ IF("Z"=MID(A1,20,LEN(A1)-19),0,(INT(MID(A1,20,LEN(A1)-19)/100)*60+MOD(MID(A1,20,LEN(A1)-19),100))/1440)

If you wish to know what the formula is doing, the explanation is below. The formula above is split across three lines to facilitate the following explanation.

Excel’s internal date time format is a real number where the part before the decimal point is the number of days and the portion after the decimal point is the fraction of a day.

The first line of the formula extracts the year/month/day portion and converts it into the full days portion of the result

The second part of the formula extracts the hours/minutes/seconds portion and converts it into the day fraction portion.

The third and final part obtains the time zone portion of the ISO-8601 timestamp and converts it into another day fraction part. It checks for the special Z time zone that indicates a zero offset. If it is not Z, the time zone is in ±HHMM format so the formula extracts the HH portion separate from the MM portion, computes the total as a number of minutes and divides by 1440 (the number of minutes in a day) to arrive at a final day fraction component that is added to the amount determined by the first two parts of the formula. Any leading plus or minus sign is used to correctly perform this computation.