Datasets for Machine Learning

For machine learning and data science projects data sets are necessary to train and test model. Furthermore, several datasets repository is available on the Internet. Some of them are freely available some need to pay money.

Here I will list some popular datasets resources over the Internet for machine learning projects.


Pie Chart


1-Kaggle Datasets Repository (

Almost 17,000 datasets are freely available ranging from student, marketing, business, cancer, diabetes, plants, social media, sports and many more. You can download the datasets freely after login to the website. Kaggle is now owned by Google and is a subsidiary of Alphabet corporation.

2-UCI ( University of California, Irvine) Datasets Repository


Very popular datasets repository from the University of California, Irvine Campus. A variety of datasets are also available on this website. To download check out the URL in the heading.

3- KDNuggets Datasets Links

KDNuggets does not have a data repository. However, links are given to the data sources which are authentic.

4-  Government of India Datasets


Government of India provides data policies implementation, health improvement, employment, and other public and state-related policies. These datasets are freely available and you can use in your project.

5- Berkeley School of Information, University of California, Berkeley

Datasets  (

This repository contains the following types of datasets.

1-United States government and demographics

2-International government and demographics

3- Health


5- Technology and APIs

6-Sports and Entertainment

7-General Aggregation Sites

Source- URL


6-  Stanford University Datasets (

The dataset repository contains datasets especially about social network and the Internet.

For discussion join the group

Share to Your Friend

Leave a Comment