Skip to Main Content

Data Science: Finding & Evaluating Datasets

This guide will help Data Science students find information they can use for their studies

How to find datasets? 

 

Datasets are:

  • most often produced by government agencies or non-profit organizations
  • usually free to view/ download
  • located by identifying the agency or organization that focuses on a specific research area of interest

For example, if you are interested in learning about various Australian industries, IbisWorld would be a good place to look, or for data on population, the Australian Bureau of Statistics is a good source.

Using Google to find datasets

Google Dataset Search looks for datasets in thousands of repositories across the web.

It is useful in searching for a broad spectrum of data, such as scientific data, government data, and data provided by news organizations.

Simply enter your search terms, and view the results.

Some datasets may be behind a paywall or require a fee for you to download them.

If so, search for the database's name in the UTS Library catalogue to see if we subscribe to it. Then, in the database, type the title of the dataset you found to view it in full. In the image above, we would search for the database 'Statista' in the Library catalogue. 

In Google, to find open data on a country or state, search using the keywords: open data + the name of a country/ state.

Google Open Data search

Freely available tool for searching public datasets. It makes large, public-interest datasets easy to explore, visualise, and communicate. Select Help for more information on how to use it.

Google Public Data Explorer search


Evaluating datasets

 

Steps to verify datasets

Apply the same evaluation tools to verifying datasets as you would scholarly information. 

  1. Who collected it?
    • Examine the explanatory notes of the data
  2. What are the credentials of the data producer?
    • Are they an expert in the field?
  3. Who sponsored the data collection?
  4. When was the data collected?
    • Are these the newest figures in the field?
  5. Who was included in the data and who was excluded?
    • Is the data biased or representative of all factions?
  6. Do other sources provide similar findings? 
    • Find one/two other sources to support the findings