The first part of the workshop will cover Data Basics:
What is the difference between data and statistics?
Data:
Statistics:
That is a black and white description of the differences between data and statistics. The reality is that there is a lot of gray area between the two. That mostly depends on what the researcher is trying to do.
FYI – I am mostly focusing on quantitative data in today's workshop.
I really like the types of data defined by Keller, Lancaster, & Shipp (2017):
"Designed Data are generated in the pursuit of scientific discovery. Designed data include statistically designed data collections, such as data generated from: surveys, experimental designs, registries, and intentional observational collections.
Administrative Data, also referred to as “business practice” data, are collected for the administration of an organization, program, or service processes. These data provide an opportunity for gathering information that exists due to normal economic and social activity. Examples of administrative data include Internal Revenue Service data for individuals and businesses, Social Security earnings records, patent and trademark databases, Medicare and Medicaid health utilization data, banking and other financial data, industrial production processes, such as tracking supply chains end-to-end (Pires et al. 2017), taxi trip data, and local data generated from 911 calls and Emergency Management Services (EMS) responses, property assessment and tax data, and data from health and human services, parks and recreation, libraries, and environmental services (e.g., trash and recycling, water and utilities, projects and planning, transportation, and building permits).
Opportunity Data are data generated on an ongoing basis as society moves through its daily paces. Opportunity data are derived from a variety of sources such as GPS systems and embedded sensors, social media exchanges, mobile and wearable devices, and Internet entries. Captured through a variety of methods including direct flows, Internet searches, web crawling and scraping, these data may exist in a variety of electronic and physical modalities.
Procedural Data are data derived from policies, procedures, and legal requirements; they are the rules and regulations that govern and shape our lives. These policies and procedures affect our work, our personal lives, and society. Examples of procedural data include compensation policies, the Affordable Care Act, the Department of Defense policy “Don't Ask, Don't Tell,” and Supreme Court rulings."
Sallie Keller, Vicki Lancaster & Stephanie Shipp (2017) Building Capacity for Data-Driven Governance: Creating a New Foundation for Democracy. Statistics and Public Policy, 4:1, 1-11, DOI: 10.1080/2330443X.2017.1374897.
These are questions that you need to ask your patron in advance. All or only some might apply.
What are the kinds of things I look for when I’m reviewing a new dataset?