Skip to Main Content

Research Data Management: Finding data for your research

Does the data you need already exist?

 

In today's high volume and data-rich environment it is worth investigating if the data you need for your research project already exists.  However, finding data can present a few challenges:

  • There are so many places to look - government and organisational sources, published and unpublished academic sources, portals and repositories all house data ... but where to start? Third party data can be a daunting and frustrating information space
  • A lot of the data is user-supplied - when this happens you tend to get an uneven level of documentation and metadata, which makes both searching for and using the data difficult
  • Technical hurdles have to be negotiated - many data sources require you to be familiar with software, or even a knowledge of coding principles, before you can use them
  • Data can be hard to rely upon - if you aren't completely sure how the data was collected, who collected it, and why, then it will be hard to be confident enough in its provenance to use it


Despite this, there are various strategies and techniques you can employ to ensure you get the best quality data available, in a timely and economical fashion. Read on to learn more...

How to find data

 

There are a range of places to look for data, but you could start your search in Google or a similar search engine.

Tips on using Google to find data. 

  • Google Advanced Search (links below) - Advanced Search gives you more control over the way your keywords are searched for, and also lets you refine your search results to include things like location (eg: a certain country), domain type (eg: a .gov or .org website) or a certain filetype (eg: xls, .kml)
  • Google Datasets - Google Datasets looks for structured data, which is often tabular in nature and uses metadata tags derived from an internationally recognised schema (Schema.org)
    Google Dataset search
  • You can also use the Library to access free versions of pay-to-view datasets you might find via Google or Google Datasets. Examples include Statista, OECD iLibrary and IbisWorld. All three can be found using the Library's Find Databases page. 

Data repositories

  • Explore platforms like Dryad, Figshare, Zenodo, Dataverse, or discipline-specific repositories depending on your field.  See re3 data below to find other repositories. 
  • re3 data - re3 stands for 'registry of research data repositories'. It is similar to Google Datasets in that it searches for structured data with the same metadata underpinning it (Schema.org). However, re3 looks for data repositories rather than discrete data sets. re3 has a browse-by-subject option and a powerful search engine that can restrict your search to geographic location, licence type and more. 
    re3 data
  • Research Data Australia is discovery portal that allows researchers to find, access, and reuse data from over 100 Australian research organisations (including UTS), government agencies, and cultural institutions.  Provides descriptions of and links to available datasets. 
  • Australian Bureau of Statistics (ABS) - provides a variety of data for researchers, including official statistics on a wide range of economic, social, population, and environmental matters.  More detailed Microdata, custom tables via TableBuilder, and integrated datasets on health, education, and employment are available to UTS researchers, and accessible through the ABS's secure platforms such as DataLab.  Create an account using your UTS email address.
  • UTS Library's Data Science study guide - provides a fuller list of data locations, and links to data tools and training. 

Evaluating data

 

Whilst there is no universally accepted way of establishing data quality (ARDC, 2022) there are ways in which you can test and validate the data you find online and elsewhere. 

Properties of High-Quality data

  • Authors - Ideally you want the people who are creating the data you are using to be as reputable and visible in their field as the academics whose articles and literature you would be citing. Take time to investigate who has produced the data you have found, and the context in which it was created. 
  • Publishers - governmental and accredited industry and professional bodies confer a weight of authority to any data they publish. Similarly, well-known organisational and academic publishers will give you confidence that there has been nothing overlooked in the collection and presentation of the data within their remit
  • Links to published research - a lot of researchers prefer their data to be connected to a research output, as this then implies that the data they have collected has been peer reviewed. When you find an article, and are interested in the data, look to see whether they have supplied the full form of this data in a supplement or repository. If needed, you can also write to the author(s) and ask if they would be willing to share their data with you for your research. 
  • Data labeling - data that is well labeled, is licensed for reuse, and contains contextual information about the way it was collected will give you confidence when relying upon it. Data labeling also is a property of well-managed data, which is further described using the FAIR principles outlined below.

Australian Research Data Commons. (2022, January 27). Describing dataset quality — towards a global best practice..https://ardc.edu.au/article/describing-dataset-quality-towards-a-global-best-practice/

Properties of well managed data (the FAIR principles)

A quality repository will have the elements of TRUST, which stands for: Transparency, Responsibility, User focus, Sustainability and Technology

Data repositories that meet a high standard of TRUST can apply for a badge called the CoreTrustSeal. Repositories that have the CoreTrustSeal are linked below, along with more information about the TRUST framework.

Core Trust Seal logo

Data licensing

 

Depending on the way you are receiving (or sharing) data, a data license or data sharing agreement (or both) may be required. For example, the data you want to use may be licensed with a Creative Commons Share-alike licence, which would mean that if you used that data you would have to license your work in the same way. If you acquire data from a corporate or industry source, you may also have to enter into a data-sharing agreement with the data providers.

Data sharing agreements

A data sharing agreement brings clarity to what can and cannot be done with the data you are sharing as the owner or accessing as the requester. For example, third-party data providers will often want to:

  • Preserve the confidentiality of their data
  • Prohibit reuse of the data beyond the agreed scope of the research project
  • Have access to the work and the data that you might be producing

When created carefully, a data sharing agreement should lay out these expectations, and give you confidence that you are doing the right thing with the data you've been provided, or that your collaborators will be doing the right thing with yours.

To help you create a data-sharing agreement we've linked the ARDC's Data Sharing Agreement Development Guidelines below.

Data licences

A data licence is similar to a data sharing agreement, however it is offered unilaterally to a work placed in the public domain, unlike a data sharing agreement, which is negotiated directly between two parties. Thusly by applying a data license to your work, you can control how your data is going to be reused by others, regardless of whether or not they are in direct contact with you. We have linked a page on data licenses from the ARDC below.

  • This content in this section was sourced from the article 'Data licensing: Taking into account data ownership and use' by Thomson Reuters. The full text of the article is linked below.