Academics Torrents was founded to address the needs of science in the era of big data. It is a scalable platform using BitTorrent which distributes the cost of hosting data in order to prevent the rise and fall of dataset hosting providers and the erasure of the data they host. Researchers are empowered to mirror data they are working with and share large datasets without the large costs typically associated with commercial providers.
Academic Torrents is a product of the Institute for Reproducible Research (a U.S. 501(c)3 nonprofit).
UCI Machine Learning Repository of Datasets
Currently nearly 588 datasets are maintained on the UC Irvine Machine Learning Repository site. View all datasets, see newest datasets, and view most popular datasets. Datasets are organized to browse alphabetically by subject but users can sort using the faceting tools on the left sidebar by attribute type, data type, subject area, the number of attributes, the number of instances, and format types.
Data.gov increases the ability of the public to easily find, download, and use datasets that are generated and held by the Federal Government. Data.gov provides descriptions of the Federal datasets (metadata), information about how to access the datasets, and tools that leverage government datasets. The data catalogs will continue to grow as datasets are added. Federal, Executive Branch data are included in the first version of Data.gov.
Use the World Development Indicators database (restricted to current Bentley students, faculty and staff) to create your own datasets.
This database contains information on 264 countries.
Use the OECD.Stat Extracts to access their selection of datasets (available without a subscription).
The United Nations Statistics Division (UNSD) of the Department of Economic and Social Affairs (DESA) launched a new internet based data service for the global user community. It brings UN statistical databases within easy reach of users through a single entry point. Users can now search and download a variety of statistical resources of the UN system.
The Data Catalog provides download access to over 7,000 indicators from World Bank datasets. The World Bank's Open Data initiative is intended to provide all users with access to World Bank data. The data catalog is a listing of available World Bank datasets, including databases, pre-formatted tables and reports.
The datasets contained on this page were compiled for World Bank research, and are provided free of cost to foster the creation of new knowledge.
The Federal Reserve Economic Data (FRED) contains 793,000 economic time series from 104 sources since 1991. Download, graph, and track economic data.
The Bureau of Labor Statistics (BLS) has data on the following: Inflation & Prices, Employment, Unemployment, Pay & Benefits, Spending & Time Use, Productivity, Workplace Injuries, International, and Employment Projections
Tables include topics including: the Index of Consumer Sentiment, Annual Trends in Household Financial Situation, Probability of Personal Income Increase During the Next Year, Change in Likelihood of Comfortable Retirement, Expected Change in Unemployment During the Next Year, Reasons for Opinions for Buying Conditions for Vehicles, Expected Change in Home Values During the Next Year
The ERIM database is data collected by the now-defunct ERIM division of A.C. Nielsen on panels of households in two midsized Midwestern cities. Information is available on the purchases of households in a number of product categories along with household demographic information.
From 1989 to 1994, Chicago Booth and Dominick's Finer Foods entered into a partnership for store-level research into shelf management and pricing. Randomized experiments were conducted in more than 25 different categories throughout all stores in this 100-store chain. As a byproduct of this research cooperation, approximately nine years of store-level data on the sales of more than 3,500 UPCs is available in this database. This data is unique for the breadth of its coverage and for the information available on retail margins.
bayesm is a software package for Bayesian analysis of many models of interest to marketers. In addition, bayesm contains a number of interesting datasets, including scanner panel data, key account level data, store level data and various types of survey data. bayesm is an R package which can be downloaded from the CRAN network of mirror sites around the world. Users running R can install bayesm automatically from within R.
The Consumer Expenditure Survey (CEX) provides information on the buying habits of American consumers, including data on their expenditures, income, and household characteristics. The survey data are collected for the Bureau of Labor Statistics by the U.S. Census Bureau. Free resource.
The Consumer Expenditure Survey (CE) collects data on expenditures, income, and demographics in the United States. The public use microdata (PUMD) files provide this information for individual respondents without any information that could identify respondents.
The MPC is one of the world's leading developers of demographic data resources. Population data includes: international and national harmonized data from 1960 onwards, harmonized data from the Current Population Survey, the North Atlantic Population Project, the National Historical Geographic Inform, and American Time Use Survey-X
Data sets on education are available freely through a number of sites. The National Center for Education Statistics (NCES) collects, analyzes and makes available data related to education in the U.S. and other nations and below you will find links to a few of their data sets.
The 2003 National Assessment of Adult Literacy is a nationally representative assessment of English literacy among American adults age 16 and older. To access the data sets, click on "Data Files" from the left menu.
With its focus on schools and school personnel, the Schools and Staffing Survey (SASS) emphasizes teacher demand and shortage, teacher and administrator characteristics, school programs, and general conditions in schools. SASS also collects data on many other topics, including principals' and teachers' perceptions of school climate and problems in their schools; teacher compensation; district hiring practices and basic characteristics of the student population.
To access data sets, click on the text "Data Products" (text in RED, located under the the title of the site "Schools and Staffing Survey") http://nces.ed.gov/surveys/sass/dataproducts.asp
The School Survey on Crime and Safety (SSOCS) is the primary source of school-level data on crime and safety for the U.S. Department of Education, National Center for Education Statistics (NCES). The SSOCS is a nationally representative cross-sectional survey of about 3,500 public elementary and secondary schools.
Find access to data sets by selecting Data Sources from the blue menu box on the left side of the screen http://nces.ed.gov/surveys/ssocs/data_products.asp
Data sets are available through the Journal of Statistics Education (JSE) data archive. The "Datasets and Stories" department of the Journal of Statistics Education provides a forum for exchanging interesting datasets and discussing ways they can be used effectively in teaching statistics.
ICPSR maintains a data archive of more than 500,000 files of research in the social sciences. It hosts 16 specialized collections of data in education, aging, criminal justice, substance abuse, terrorism, and other fields.
Pew Research Center makes its data available to the public for secondary analysis. Datasets exist for the following Pew Projects: Pew Research Center for the People & the Press, Pew Research Center’s Journalism Project, Pew Research Center’s Hispanic Trends Project, Pew Research Center’s Global Attitudes Project, Pew Research Center’s Internet & American Life Project, Pew Research Center’s Social & Demographic Trends, and Pew Research Center’s Religion & Public Life Project.