This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Critical Zone Collaborative Network Data Best Practices

This repository contains best practices developed by the Critical Zone Collaborative Network (CZ Net) for sharing data related to the Critical Zone.

For overall data sharing and publication guidance, please read the following first:

Guidance for all CZ Net Data Publication

For guidance on specific data types, please consultt the following best practice guides:

  1. Sharing Samples and Sample Metadata
  2. Sharing Laboratory Analytical Data
  3. Sharing Geospatial Data
  4. Sharing Environmental Time Series Data
  5. Sharing Multiple Data Types Together

License

The content in this repository is released under the Creative Commons Attribution 4.0 International (CC BY 4.0) license.

CC BY 4.0

Funding and Acknowledgements

This material is based upon work supported by the National Science Foundation (NSF) under awards 2012893, 2012748, and 2012593. Any opinions, findings, conclusions, or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the NSF.

1 - Example Page

A short lead description about this content page. It can be bold or italic and can be split over multiple paragraphs.

This is a placeholder page. Replace it with your own content.

Text can be bold, italic, or strikethrough. Links should be blue with no underlines (unless hovered over).

There should be whitespace between paragraphs. Vape migas chillwave sriracha poutine try-hard distillery. Tattooed shabby chic small batch, pabst art party heirloom letterpress air plant pop-up. Sustainable chia skateboard art party banjo cardigan normcore affogato vexillologist quinoa meggings man bun master cleanse shoreditch readymade. Yuccie prism four dollar toast tbh cardigan iPhone, tumblr listicle live-edge VHS. Pug lyft normcore hot chicken biodiesel, actually keffiyeh thundercats photo booth pour-over twee fam food truck microdosing banh mi. Vice activated charcoal raclette unicorn live-edge post-ironic. Heirloom vexillologist coloring book, beard deep v letterpress echo park humblebrag tilde.

90’s four loko seitan photo booth gochujang freegan tumeric listicle fam ugh humblebrag. Bespoke leggings gastropub, biodiesel brunch pug fashion axe meh swag art party neutra deep v chia. Enamel pin fanny pack knausgaard tofu, artisan cronut hammock meditation occupy master cleanse chartreuse lumbersexual. Kombucha kogi viral truffaut synth distillery single-origin coffee ugh slow-carb marfa selfies. Pitchfork schlitz semiotics fanny pack, ugh artisan vegan vaporware hexagon. Polaroid fixie post-ironic venmo wolf ramps kale chips.

There should be no margin above this first sentence.

Blockquotes should be a lighter gray with a border along the left side in the secondary color.

There should be no margin below this final sentence.

First Header 2

This is a normal paragraph following a header. Knausgaard kale chips snackwave microdosing cronut copper mug swag synth bitters letterpress glossier craft beer. Mumblecore bushwick authentic gochujang vegan chambray meditation jean shorts irony. Viral farm-to-table kale chips, pork belly palo santo distillery activated charcoal aesthetic jianbing air plant woke lomo VHS organic. Tattooed locavore succulents heirloom, small batch sriracha echo park DIY af. Shaman you probably haven’t heard of them copper mug, crucifix green juice vape single-origin coffee brunch actually. Mustache etsy vexillologist raclette authentic fam. Tousled beard humblebrag asymmetrical. I love turkey, I love my job, I love my friends, I love Chardonnay!

Deae legum paulatimque terra, non vos mutata tacet: dic. Vocant docuique me plumas fila quin afuerunt copia haec o neque.

On big screens, paragraphs and headings should not take up the full container width, but we want tables, code blocks and similar to take the full width.

Scenester tumeric pickled, authentic crucifix post-ironic fam freegan VHS pork belly 8-bit yuccie PBR&B. I love this life we live in.

Second Header 2

This is a blockquote following a header. Bacon ipsum dolor sit amet t-bone doner shank drumstick, pork belly porchetta chuck sausage brisket ham hock rump pig. Chuck kielbasa leberkas, pork bresaola ham hock filet mignon cow shoulder short ribs biltong.

Header 3

This is a code block following a header.

Next level leggings before they sold out, PBR&B church-key shaman echo park. Kale chips occupy godard whatever pop-up freegan pork belly selfies. Gastropub Belinda subway tile woke post-ironic seitan. Shabby chic man bun semiotics vape, chia messenger bag plaid cardigan.

Header 4

  • This is an unordered list following a header.
  • This is an unordered list following a header.
  • This is an unordered list following a header.
Header 5
  1. This is an ordered list following a header.
  2. This is an ordered list following a header.
  3. This is an ordered list following a header.
Header 6
WhatFollows
A tableA header
A tableA header
A tableA header

There’s a horizontal rule above and below this.


Here is an unordered list:

  • Liverpool F.C.
  • Chelsea F.C.
  • Manchester United F.C.

And an ordered list:

  1. Michael Brecker
  2. Seamus Blake
  3. Branford Marsalis

And an unordered task list:

  • Create a Hugo theme
  • Add task lists to it
  • Take a vacation

And a “mixed” task list:

  • Pack bags
  • ?
  • Travel!

And a nested list:

  • Jackson 5
    • Michael
    • Tito
    • Jackie
    • Marlon
    • Jermaine
  • TMNT
    • Leonardo
    • Michelangelo
    • Donatello
    • Raphael

Definition lists can be used with Markdown syntax. Definition headers are bold.

Name
Godzilla
Born
1952
Birthplace
Japan
Color
Green

Tables should have bold headings and alternating shaded rows.

ArtistAlbumYear
Michael JacksonThriller1982
PrincePurple Rain1984
Beastie BoysLicense to Ill1986

If a table is too wide, it should scroll horizontally.

ArtistAlbumYearLabelAwardsSongs
Michael JacksonThriller1982Epic RecordsGrammy Award for Album of the Year, American Music Award for Favorite Pop/Rock Album, American Music Award for Favorite Soul/R&B Album, Brit Award for Best Selling Album, Grammy Award for Best Engineered Album, Non-ClassicalWanna Be Startin’ Somethin’, Baby Be Mine, The Girl Is Mine, Thriller, Beat It, Billie Jean, Human Nature, P.Y.T. (Pretty Young Thing), The Lady in My Life
PrincePurple Rain1984Warner Brothers RecordsGrammy Award for Best Score Soundtrack for Visual Media, American Music Award for Favorite Pop/Rock Album, American Music Award for Favorite Soul/R&B Album, Brit Award for Best Soundtrack/Cast Recording, Grammy Award for Best Rock Performance by a Duo or Group with VocalLet’s Go Crazy, Take Me With U, The Beautiful Ones, Computer Blue, Darling Nikki, When Doves Cry, I Would Die 4 U, Baby I’m a Star, Purple Rain
Beastie BoysLicense to Ill1986Mercury RecordsnoawardsbutthistablecelliswideRhymin & Stealin, The New Style, She’s Crafty, Posse in Effect, Slow Ride, Girls, (You Gotta) Fight for Your Right, No Sleep Till Brooklyn, Paul Revere, Hold It Now, Hit It, Brass Monkey, Slow and Low, Time to Get Ill

Code snippets like var foo = "bar"; can be shown inline.

Also, this should vertically align with this and this.

Code can also be shown in a block element.

foo := "bar";
bar := "foo";

Code can also use syntax highlighting.

func main() {
  input := `var foo = "bar";`

  lexer := lexers.Get("javascript")
  iterator, _ := lexer.Tokenise(nil, input)
  style := styles.Get("github")
  formatter := html.New(html.WithLineNumbers())

  var buff bytes.Buffer
  formatter.Format(&buff, style, iterator)

  fmt.Println(buff.String())
}
Long, single-line code blocks should not wrap. They should horizontally scroll if they are too long. This line should be long enough to demonstrate this.

Inline code inside table cells should still be distinguishable.

LanguageCode
Javascriptvar foo = "bar";
Rubyfoo = "bar"{

Small images should be shown at their actual size.

Large images should always scale down and fit in the content container.

The photo above of the Spruce Picea abies shoot with foliage buds: Bjørn Erik Pedersen, CC-BY-SA.

Components

Alerts

Another Heading

Add some sections here to see how the ToC looks like. Bacon ipsum dolor sit amet t-bone doner shank drumstick, pork belly porchetta chuck sausage brisket ham hock rump pig. Chuck kielbasa leberkas, pork bresaola ham hock filet mignon cow shoulder short ribs biltong.

This Document

Inguina genus: Anaphen post: lingua violente voce suae meus aetate diversi. Orbis unam nec flammaeque status deam Silenum erat et a ferrea. Excitus rigidum ait: vestro et Herculis convicia: nitidae deseruit coniuge Proteaque adiciam eripitur? Sitim noceat signa probat quidem. Sua longis fugatis quidem genae.

Pixel Count

Tilde photo booth wayfarers cliche lomo intelligentsia man braid kombucha vaporware farm-to-table mixtape portland. PBR&B pickled cornhole ugh try-hard ethical subway tile. Fixie paleo intelligentsia pabst. Ennui waistcoat vinyl gochujang. Poutine salvia authentic affogato, chambray lumbersexual shabby chic.

Contact Info

Plaid hell of cred microdosing, succulents tilde pour-over. Offal shabby chic 3 wolf moon blue bottle raw denim normcore poutine pork belly.

Stumptown PBR&B keytar plaid street art, forage XOXO pitchfork selvage affogato green juice listicle pickled everyday carry hashtag. Organic sustainable letterpress sartorial scenester intelligentsia swag bushwick. Put a bird on it stumptown neutra locavore. IPhone typewriter messenger bag narwhal. Ennui cold-pressed seitan flannel keytar, single-origin coffee adaptogen occupy yuccie williamsburg chillwave shoreditch forage waistcoat.

This is the final element on the page and there should be no margin below this.

2 -

Best Practices for Sharing and Publishing CZNet Data

The CZ Hub Team has developed a Data Submission Portal (https://contribute.criticalzone.org) to meet the needs of the CZ Collaborative Network (CZNet). The Data Submission Portal provides guidance regarding which data repositories into which products should be deposited and also provides functionality that allows submission of data to the appropriate repository through the Portal. We recommend that all data and research products be submitted to the appropriate repository through the Portal, where possible. We provide the following additional guidance for submitting data for sharing and publication.

Content of this General Best Practice Document

How to Contact Us

If you need help with data management or have questions about the guidance provided here, you can contact us in a number of ways.

Join Us in the CZNet Slack Workspace

The CZNet Slack workspace was established to foster communication between the thematic clusters and with the CZNet Hub team. If you haven’t joined the CZNet Slack workspace, we encourage you to join us there. There are specific channels set up within the workspace for topical discussions. If you would like to join the CZNet Slack workspace, please email us at help@cuahsi.org.

Contact Us Directly via Email

For specific questions about CZNet data management, please contact us at help@cuahsi.org. Be sure to include “CZNet” in the subject line of the email you send. We will route these emails to the person(s) who we think can best answer your questions.

Which Repository Should I Use?

Our initial focus in developing the CZ Hub Data Submission Portal and for this data publication guidance is on the repositories listed below. We anticipate adding support for additional repositories via the Data Submission Portal as resources allow us to add them. If the supported repositories listed below will not meet your needs, choose a trusted repository that follows international best practices by offering:

  1. A public landing page for your dataset/research product.
  2. A unique identifier, which may be a digital object identifier (DOI), and a persistent URL (the URL does not change) for accessing the landing page for your dataset/product.
  3. A formal citation for your dataset/product.

You can find directories of data repositories at https://www.re3data.org and https://fairsharing.org.

We provide the general guidance with regard to supported repositories below, but we are happy to discuss any questions you may have and make more specific recommendations:

  1. HydroShare: A general purpose repository for submitting water-science related datasets and models. Submit datasets, models, and other research products that are water-related. This includes time series of hydrologic observations, time series of data from weather stations, geospatial datasets, etc. HydroShare is flexible and allows upload of files having any format. HydroShare also includes a linked JupyterHub server if you wish to upload executable Jupyter Notebooks using Python or R with your content. For information about how to submit to HydroShare, you can access HydroShare’s help system at https://help.hydroshare.org.
  2. EarthChem: A repository for submitting data derived from material samples such as soil, sediment, pore water, or rock specimens; cores; and other physical objects. Primary focus is on geochemical, geochronological, petrological, and mineralogical data. EarthChem strongly recommends use of data submission templates available on the EarthChem web site (https://earthchem.org/ecl/templates/) that provide guidance for properly documenting data quality and provenance. Though data can be contributed to the EarthChem Library in a wide range of formats (https://ecl.earthchem.org/fileformat.php), it always needs to be documented with relevant information regarding the analytical data quality. Submission guidelines for EarthChem can be accessed at https://www.earthchem.org/ecl/submission-guidelines/. For help, go to https://earthchem.org/resources/support/earthchem-library-documentation/ or write to info@earthchem.org. Regardless of whether you submit directly to the EarthChem Library or you submit to the EarthChem Library through the CZNet Data Submission Portal, you shoud review and use the templates and guidance for your data files linked here.
  3. Zenodo: A catch-all repository that may be an appropriate place to upload content that is not appropriate for the other repositories. Zenodo does not limit what you can upload and is not domain specific. For more information about Zenodo and submitting data, see https://zenodo.org/record/787445.
  4. SESAR: A sample registry that catalogs sample metadata, sample images, and other information to make samples more discoverable, accessible, and reusable. SESAR allows you to get an IGSN for your samples, a globally unique identifier that is essential for unambiguously citing samples in datasets and publications (https://www.igsn.org). Use SESAR to submit metadata about your samples and obtain IGSN Global Sample Numbers. Resources for researchers can be accessed at https://www.geosamples.org/resources/researchers, and tutorials and FAQs can be accessed at https://www.geosamples.org/resources/help.
  5. OpenTopography: A repository that facilitates community access to high-resolution, Earth science-oriented, topography data and related tools and resources. Submit high resolution topography data acquired with lidar and other technologies. A tutorial describing how to submit data to OpenTopography can be accessed at: https://cloud.sdsc.edu/v1/AUTH_opentopography/www/docs/CommunityDataspaceTutorial.pdf.

If you choose to submit your data or research products to a repository other than the ones listed above, we will still need to know about those submissions. Please register these datasets with the CZNet Data Submission Portal.

Please see our specific guidance for CZNet data and research products below for information on what to do if you submit to a different repository.

General Guidance for all CZNet Data and Research Products to be Submitted

We offer the following general guidelines for submitting your data and other research products. For a full overview, click here.

  1. Use the Best Practices and Templates Described in this Repository: Please follow the specific best practices, data formats, and templates described in this repository.
  2. Start assembling your data now: We recommend that you begin thinking about which products (e.g., data, images, samples, code) you will generate. Assemble and document them as early as possible after generating them (it will save you time later), and register them within one of the above repositories. You need not wait until your data are ready for formal publication. HydroShare, EarthChem, and Zenodo all offer the ability to upload private content, allowing you to capture and store your data safely while your research and publication process is ongoing. Then share and publish when you are ready. Register your samples online in SESAR.
  3. Create a descriptive title for your data product: Put some thought into the title of your dataset. See the guidance in the section below about how to construct a descriptive title for your dataset.
  4. Tag your data with appropriate metadata: Use the repository’s submission form and templates to enter descriptive metadata for your products. Metadata should include information needed to discover and cite your products, but also additional information required to interpret your content (e.g., procedures used to collect or analyze samples or observations).
  5. Use common, open, and accepted data formats: We know that the data you are creating are diverse and that there is not always specific guidance for how to organize your data and which file formats to use. We have provided some links to general data management resources in the section at the end of this document. If you have questions about what to do, please contact us.
  6. Include a “readme” file: A readme file can include detailed information about the structure or content of your data or research product. It can also be used to describe how to perform specific analyses to make your research results more reproducible. If you upload a “readme.txt” or “readme.md” file to your resource in HydroShare, it will be displayed on the landing page for your resource. Using “readme.md” allows you to use markdown formatting in your readme file that will be rendered on the landing page for the resource.

We have included links to additional documentation that may assist you in preparing your datasets and research products for sharing and publication. We are always happy to discuss specific questions you may have.

Specific Guidance for CZNet Data and Research Products

The CZ Hub Team is providing a coordinated view of all CZNet data/research products submitted to reputable data repositories via the CZnet Cataloging and Discovery system https://discover.criticalzone.org. We are using specific information from the metadata of the resources you submit to find and catalog resources submitted to the different repositories (see below). If the information we request below is missing, we may not be able to find your submitted resource and include it in the CZNet metadata index and discovery tools - or, your resource may not respond to the different search and filtering tools we have provided.

NOTE: The CZNet Data Submission Portal guides you to provide the following information to be entered for all datasets and products submitted through the Portal. However, we know that some may choose to submit directly to a repository without using the Data Submission Portal, so it is important to specify this information regardless of how or when you submit your data.

To accomplish this, we ask that you do the following, regardless of which repository you deposit your data in:

  1. Tag your dataset/product with the subject keyword “CZNet”.
  2. Tag your dataset/product with a subject keyword that is unique to your Thematic Cluster project. Use this keyword consistently for all products you share.
  3. Ensure that you enter funding information, including funding agency (National Science Foundation), award number, and, if possible given the metadata elements available, award title. See the additional recommendations below for funding agency information.
  4. Make sure that your dataset is publicly available. HydroShare, EarthChem, and Zenodo enable private content. We may not be able to discover and catalog your datasets/products until you have made them publicly available.

Guidance for HydroShare, EarthChem, and Zenodo

We will be using “community” and “group” functionality within the different repositories to help group and present data from the cluster projects. When you submit to HydroShare, EarthChem, or Zenodo, we ask that you associate your dataset/product with an appropriate Group or Community, depending on which repository you submit to:

  • HydroShare - share your dataset/product with your Cluster’s HydroShare Group. We have created a HydroShare “Group” for each thematic cluster project. You can share your HydroShare resource with your Group by clicking on the “Manage Access” button on the resource’s landing page and giving view or edit access to your cluster’s group. This can be done at any time after you create your resource in HydroShare.
  • EarthChem - associate your dataset/product with the “Critical Zone” community within EarthChem. See https://earthchem.org/communities/cznet/
  • Zenodo - associate your dataset/product with the “Critical Zone Data and Research Products” community (https://www.zenodo.org/communities/czdata/). When you create a new upload, search for “czdata” or “Critical Zone Data and Research Products” to find the correct community.

Guidance for Data/Products Submitted to Other Repositories

We know that our list of supported repositories may not meet the needs of all of the CZNet cluster projects for all of the different types of data products. If you choose to submit your data/product(s) to a repository other than the ones in our list of supported repositories, we will be unable to discover and catalog those products unless you tell us about them. Please make sure that you register these data products with the CZNet Data Submission Portal.

Construct a Descriptive Title for Your Data Product

One of the most important metadata elements for your data product is the title. This piece of information is cataloged to enable discovery of your data product, and it is also included in the citation for your dataset. Anyone who uses your data product will cite your data using the title you construct. Consider the following suggestions when constructing a title for your data product:

  • Make your title brief, but descriptive of the contents of the dataset. Since data products are first class research products - consider giving your data product a title that is similar to the type of title you construct for your journal papers.

Example: Seasonal changes in organic matter composition and nutrients in coastal urban canals of Miami-Date County, Florida

  • You may want to include a geographic area, site name or code, or an indication that the data product is affiliated with your thematic cluster project in the title.

Example: Air temperature data for Dead Run watershed, Baltimore County, MD, USA

  • Consider including brief information about the content of your data product in the title.

Example: Watershed boundaries, hydrography, stream gage locations, and impervious surface area shapefiles for Pennypack Creek watershed in Bucks, Montgomery and Philadelphia Counties, PA, USA

  • Avoid including acronyms in your title.
  • Avoid including special characters (e.g., “&”, “+”, “%”, or characters that are reserved for programming languages) in your title.
  • Avoid including metadata in your title that belongs in other metadata fields. Try not to overload your title by concatenating a bunch of information into a cryptic title that nobody will understand.
  • If your data product is specifically tied to a journal publication (i.e., it contains the exact data used in that publication), consider titling your data product similar to “Supporting data for X” where “X” is the title of the related publication. If you anticipate that your data product will be used by multiple journal publications, a more generic title would be better.

Example: Supporting data and tools for “Impact of data temporal resolution on quantifying residential end uses of water”

Create a Descriptive Abstract for your Data Product

The Abstract for your data product is important as it’s likely the first (and perhaps last) piece of information a prospective user of your data will read before making a determination of whether they want to spend any more time evaluating or accessing your dataset. In general, you should approach writing the abstract for your data product in the same way you would approach writing the abstract for a journal or conference paper. In general, your Abstract should be a single paragraph, and may include the following elements:

  • Brief background information, if needed
  • Brief description of what the resource contains - what data are in the resource?
  • Brief description of the purpose of the data - why was it produced/collected?
  • Brief description of your research design and methods - how was the data produced/collected?
  • Brief description of location - NOTE: Detailed descriptions of data collection sites and locations should be included in a readme file and/or in a data file or shapefile in your content files.
  • Brief description of the variables observed - NOTE: Also consider adding these as keywords.
  • Brief information about what the data show - e.g., what results have been derived from the data?
  • Brief statement of how the data might be useful or how it might be used

Avoid the following when creating your Abstract:

  • Overloading the Abstract with metadata that belongs elsewhere. For example:
    • Do not put citations for related resources in the Abstract, use the Related Resources metadata instead
    • Put funding agency information in the Credits/Funding Agency metadata rather than in your abstract
  • Going too in depth into the description or structure of your resource - use a readme file for that instead
  • Using complex formatting - this is usually lost when being transferred among different systems*

Select Keywords for your Data Product

Keywords are important metadata that help categorize and describe your data product. They are also important in enabling other scientists and prospective data users to discover your data product. Use the following guidelines to select appropriate keywords to describe your dataset:

  • Tag your dataset/product with the subject keyword “CZNet” (see specific guidance for CZNet data products above).
  • Tag your dataset/product with a subject keyword that is unique to your Thematic Cluster project. Use this keyword consistently for all products you share (see specific guidance for CZNet data products above).
  • Add keywords that describe the subject of your data product (e.g., “Critical Zone”, “Streamflow”, “Water quality”).
  • Add keywords that describe the geographic area to which your dataset applies (e.g., “Utah”, “Logan River Watershed”)
  • Capitalize the first letter of keywords. If your keyword is a phrase, capitalize only the first word (e.g., “Water quality”). If your keyword is a proper name, capitalize the first letter of each word (e.g., “Logan River Watershed”).
  • Avoid misspelling or using variants of the same keyword (e.g., use “Temperature” consistently rather than “Temp.”)
  • Full phrases will have more meaning, but add acronyms if you want to enable discovery using that term. For example “Logan River Observatory” is more informative than “LRO”, but some users may want to search using “LRO”. In this case you may want to add both.
  • Keywords may be a single word or several words (e.g., phrases), but keep them very brief. This isn’t the place for sentences.
  • Choose keywords that are commonly used to describe the type of data product you are describing.
  • Avoid special characters (e.g., characters reserved for programming languages), abbreviations, and contractions.
  • Do not use punctuation.
  • You may consider keywords that describe observed variables (e.g., “Temperature”, “Nitrogen”), and/or methods used to produce the data product. However, avoid using units in your keywords (e.g., use “Discharge” not “Discharge (m3/s)”).
  • Where you choose keywords describing observed variables, consider selecting them from the ODM2 VariableName controlled vocabulary http://vocabulary.odm2.org/variablename/.
  • You may consider using keywords that have been used by other, similar data products to ensure that they can be discovered together. You can even test this by searching HydroShare, EarthChem, or other repositories using specific keywords to see what is returned before you select your final set of keywords.

Remember - the keywords you use to describe your data product will be part of the metadata that is cataloged and used to enable discovery of your data product. Choose keywords that you think will help people find your data product. Leave all of the other descriptive metadata for the other descriptive metadata elements (e.g., Title, Abstract, etc.).

Provide Author Information

Given that data are considered primary products of research that are citable for scholarly credit, it is important that you provide full author information for the data/research products that you deposit into a data repository. Author information ends up in the citation for resources and is also included in the CZNet data discovery capabilities, so it is also important that Author information is specified consistently across all submitted resources with which an Author is affilated. For resources submitted through or registered with the Data Submission Portal, we suggest the following format for Author names:

Family, Given

For example:

Horsburgh, Jeffery S.

The researchers you work with may have standardized on a particular way of expressing their name so that all scholarly products are correctly attributed to them. We encourage anyone who is submitting resources on behalf of another party (e.g., Cluster Data Managers) to check with investigators to ensure that Author names are expressed correctly prior to permanently publishing resources. Once a resource is published and a DOI is issued, author information generally cannot be changed.

It is best practice to also provide affiliation information for each author. Affiliations should be specified as the affiliation under which the resource was created. We recommend that Data Managers work with authors to ensure that affiliation information is entered correctly and consistently (e.g., use the same Organization Name, Address, etc. for each resource with which an Author is affiliated to avoid ambiguity).

Provide Spatial and Temporal Coverage Information

Spatial and temporal coverage information is important for people who may be searching for data collected within or near a particular place or region and/or that were collected within or are representative of a specific time period. The purpose of coverage information is not to visualize the spatial or temporal qualities or complexity of your data. Instead, it is meant to provide viewers with information about where the spatial and/or temporal region to which the data are applicable. The following are recommendations for specifying spatial and temporal coverage for the different repositories supported by CZNet:

  • HydroShare: HydroShare represents spatial coverage information as a bounding box or a discrete point in space. You can choose which to use, and you can input the coordinates for the bounding box or point manually or by using the map control to draw the point or box on the map. HydroShare allows you to enter the temporal coverage as a time period by choosing a starting and ending date.
  • EarthChem: The EarthChem repository allows you to specify the spatial coverage as “Global”, “Regional (Continents, Oceans)”, or “Space/Planetary” along with adding geographic keywords. EarthChem then draws coverage information from the coordinates associated with the dataset’s individual samples. Samples must be registered with SESAR and receive IGSNs for this association to work correctly.
  • Zenodo: The Zenodo repository does not include spatial or temporal coverage information as part of its metadata. If you want to share a resource in Zenodo and you want to specify spatial or temporal coverage information, you could add this information to the “Additional notes” metadata field in Zenodo.
  • Other Repositories: If you are using the Data Submission Portal to register a dataset from a different repository, specifying the spatial and temporal coverage is the same as for the HydroShare repository.

NOTE: The CZNet Cataloging and Discovery system includes a filter for data temporal coverage. Given that specifying the temporal coverage of your dataset is optional, only datasets that have information for temporal coverage will respond to this filter. The CZ Hub team is also working spatial filtering capability, which, similar to temporal coverage, will only work for those datasets for which spatial coverage information is specified.

Provide Funding Agency Information

It is best practice to provide an acknowledgment to the agency or funding source that provided the support under which data/research products were produced. For CZNet Thematic Cluster projects, this is the NSF award that funded the Cluster project. Several of the Thematic Cluster projects have collaborative awards, which means that there are multiple NSF award numbers.

We recommend that you provide full funding agency information for each data/research product that is submitted to a repository. For each product, you should provide the following information:

  • Agency Name - e.g., “National Science Foundation”
  • Award Title - e.g., “Collaborative Research: Network Hub: Enabling, Supporting, and Communicating Critical Zone Research”
  • Award Numver - e.g., “2012748”
  • Funding Agency URL - e.g., “https://www.nsf.gov/awardsearch/showAward?AWD_ID=2012748

For those Thematic Cluster projects that have more than one NSF award, where appropriate we recommend adding multiple funding agency metadata elemements for each award. We also recommend this for products that may have been co-funded by multiple projects and/or agencies. You should include a funding agency element for each one.

The CZNet Catalog and Discovery system uses the NSF award numbers to match your data products to your Thematic Cluster project. Thus, if you want your data products to respond to the CZ Project filter in the Catalog and Discovery system, you must include at least one of your Thematic Cluster project’s NSF award numbers in the funding agency metadata.

Use Common Quality Control/Data Levels

We suggest a set of standardized quality control/data levels that can be used to describe data. These data levels are independent of data type. Including a specific data level as part of the metadata for a dataset will help potential users understand how the data can/should be used. Suggested data levels are listed in the following table.

LevelDefinitionExplanation
0Raw dataRaw and unprocessed data and data products that have not undergone quality control. Depending on the variable, data type, and data transmission system, raw data may be available within seconds or minutes after the measurements have been made. Examples include real time precipitation, streamflow, and water quality measurements.
1Quality controlled dataQuality controlled data that have passed quality assurance procedures such as routine estimation of timing and sensor calibration or visual inspection and removal of obvious errors. An example is USGS published streamflow records following parsing through USGS quality control procedures.
2Derived productsDerived products that require scientific and technical interpretation and may include multiple-sensor data. An example is basin average precipitation derived from rain gages using an interpolation procedure.
3Interpreted productsInterpreted products that require researcher driven analysis and interpretation, model-based interpretation using other data and/or strong prior assumptions. An example is basin average precipitation derived from the combination of rain gages and radar return data.
4Knowledge productsKnowledge products that require researcher driven scientific interpretation and multidisciplinary data integration and include model-based interpretation using other data and/or strong prior assumptions. An example is percentages of old or new water in a hydrograph inferred from an isotope analysis.

Use Consistent Missing Data Labels (NoData Values)

Missing data labels, or NoData values, are values that you insert into your data to indicate that a value for an observation should exist but for some reason doesn’t. Usually, these values are selected such that they are easily recognizable and out of the valid range of data values for whatever the observed variable is. We recommend the following practices for CZNet datasets

  • If you use a NoData value, make sure you define it in associated metadata
  • The preferred NoData value for CZNet datasets is -9999
  • Don’t mix strings and numeric values in a single data column. For example, don’t insert a NoData value of “NA” into a numeric data column.
  • Work to educate students, colleagues, and collaborators what -9999 means in a dataset to minimize potential errors in analyses.

Linking Datasets with Publications

We strongly recommend creating formal linkages between the datasets you have deposited within the above mentioned repositories and any manuscripts that you submit for publication that use or are based on the data. This ensures that anyone who discovers the paper can link to and access the data used and anyone who discovers the data can link to and access publications that have used the data.

After depositing your datasets within one of the above repositories we recommend formally citing the data within any research manuscripts based on the data that you submit for publication. Recommendations in the Author Guide for the journal to which you submit may determine how datasets are cited in the text of the paper. In the absence of specific instructions from the journal, we recommend citing data within the text of the paper using the journal’s citation style (e.g., Author(s), Year) and including a full bibliographic citation to the dataset within the references section of the paper.

Once a paper has been accepted for publication and a digital object identifier (DOI) or citation has been issued for the paper, you should modify the metadata for the dataset in the repository to include a link to the paper as a “related resource.”

Potentally Useful Data Management Resources

The following references are provided for publications that you may find useful in considering the data management needs of your thematic cluster as well as cross-cluster and other collaborations:

Guidelines for Structuring and Formatting Data

Borer, E.T., Seabloom, E.W., Jones, M.B., Schildhauer, M. (2009). Some simple guidelines for effective data management, Bulletin Ecological Society of America, 90(2), 205-214, https://doi.org/10.1890/0012-9623-90.2.205

Broman, K. W., & Woo, K. H. (2018). Data Organization in Spreadsheets. The American Statistician, 72(1), 2–10. https://doi.org/10.1080/00031305.2017.1375989

Goodman, A., Pepe, A., Blocker, A. W., Borgman, C. L., Cranmer, K., Crosas, M., Di Stefano, R., Gil, Y., Groth, P., Hedstrom, M., Hogg, D. W., Kashyap, V., Mahabal, A., Siemiginowska, A., & Slavkovic, A. (2014). Ten Simple Rules for the Care and Feeding of Scientific Data. PLoS Computational Biology, 10(4), e1003542. https://doi.org/10.1371/journal.pcbi.1003542

Hart, E. M., Barmby, P., LeBauer, D., Michonneau, F., Mount, S., Mulrooney, P., Poisot, T., Woo, K. H., Zimmerman, N. B., & Hollister, J. W. (2016). Ten Simple Rules for Digital Data Storage. PLOS Computational Biology, 12(10), e1005097. https://doi.org/10.1371/journal.pcbi.1005097

Wickham, H. (2014). Tidy Data. Journal of Statistical Software, 59(10), Article 10. https://doi.org/10.18637/jss.v059.i10

Guidelines for Citing Data

Colavizza, G., Hrynaszkiewicz, I., Staden, I., Whitaker, K., & McGillivray, B. (2020). The citation advantage of linking publications to research data. PLOS ONE, 15(4), e0230416. https://doi.org/10.1371/journal.pone.0230416

Guidelines for Making Data More Reusable

White, E., Baldridge, E., Brym, Z., Locey, K., McGlinn, D., & Supp, S. (2013). Nine simple ways to make it easier to (re)use your data. Ideas in Ecology and Evolution, 6(2), Article 2. https://doi.org/10.4033/iee.2013.6b.6.f

Guidelines for Selecting a Data Repository

Sansone, S.-A., McQuilton, P., Cousijn, H., Cannon, M., et al (2020). Data Repository Selection: Criteria That Matter. Zenodo. https://doi.org/10.5281/zenodo.4084763