Introductory Statistics – Chapter 1: Introduction

Introductory Statistics – Chapter 1: Introduction

June 28, 2019 0 By Kailee Schamberger



welcome to the piece Toby a summer experience the instruction statistic a textbook written christian it's tearin of those university for college level statistics pop introductory statistics is a comprehensive resource that include videos such as this one podcast virtual tutor in early and work activities we can teach each functionality and automated marking any lecture slide and electro resource learning find out more at discogs.com forward slash intruder for then over to you hi I'm Shawn Thompson and welcome to the first summary in the Pisco introductory statistics series in this one we're just going to go over the basic concepts of statistics but before we even get to that we have to ask ourselves why why study statistics what's our motivation for doing that well a pretty simple motivation is that we're interested in something and that something can be pretty much anything and we just want to be able to explain it let's take a political example let's say there's a federal election coming up between two major rivals bill and Bob and let's say you work on Bob's campaign and Bob is the anxious type so even though the elections a month away one day Bob comes up to you and asks am I going to do it am I going to actually win the election we're using statistics you are going to help answer Bob's question and the name that we give to the thing that we are interested in and would like to explain is the population so what's Bob's population well let's see Bob's really interested in all the vote because if you know tell the votes are going to go then he's already answered his question so Bob's populations is said of all votes now it's all votes across the entire federal election and that's a rather big population and Bob's going to want know as much as he can about it the things that he does know about it we call them parameters the idea is Bob wants to know the parameters of his population but if you think about it and this is kind of the whole idea behind statistics is Bob's probably not but not many parameters because the population is too big it's too big to look at and analyze and know anything about so what Bob probably going to do if he's smart is he's probably gonna send you out to run a survey so you might survey say 100 people ask them they're going to vote in doing that you're collecting what we call a sample now a sample is a subset of the population that is looked at and analyzed do we know things about it and the more we know about it the more we can try and conclude about the whole population and in a way that's a motivation of studying statistics under way statistics is in two halves on the one hand statistics is all about data it's about collecting and looking at it and analyzing data that's called descriptive statistics but then there's inferential statistics which is all about drawing conclusions from the data to the whole population which is the thing we're interested in after all now we've just used the word data and it's a word that you've probably heard before and you're probably quite familiar with it but what exactly is data well in statistics data are the observations we make when we go out and collect a sample what are we more precise data are the observed values of the variable that we're looking at now variable is just any characteristic of interest that can take different values so Bob's variable would be who are you going to vote for because that will vary from person to person so when you go out and run your survey you're going to ask a hundred people who they're going to vote for you're going to get a hundred answers those answers that your data so that's data and there are a lot of different types of data and this diagram shows the main types that we have on the left hand side we have categorical data that's data that can take qualitative options and on the right hand side we have numerical data which is data that can take quantitative options which basically means numbers and categorical data can be measured for a nominal scale which means that there's no natural order to the different options that the data can take or it can be measured on an ordinal scale which means that there is a natural order to the different options and numerical data can be measured for a discrete variable which means the different values are countable and counted out or it can be measured for a continuous variable which means that the different values exist on an ordered continuous spectrum and it gets some practice at remembering these different types of data let's use the per disco a workbook and have a go at a question so here we have a question where the Northstar Motor Corporation has handed out a survey to its customers in a survey they asked two questions both of which produce categorical data you're being asked what types of categorical data they are so we'll submit our answers the first question produces nominal data and the second question produces ordinal data and we submit that and now we get personalized feedback for our answers and an explanation for the question so that's data but how do we get data how do we collect it well there are two main methods of data collection available for us there's observational studies and there's experiments now in both methods we basically get data by observing responses from subjects of the study but the wait a minute is differ is in how we treat those subjects now in an observational study we basically don't treat them we just observe them so an observational study could be a survey for example another example say you're studying blood pressure across the country then what you might do is you might go out and click 200 people and then measure the blood pressure in all those people now the values that you write down that would be your data and that would be an observational study because you've just been trying to observe the people you haven't been trying to affect them or affect their blood pressure but in an experiment you are trying to affect the subjects of the study because you try to test the effect of one variable on another so you do provide controlled treatments to the subjects so if you are testing for example the effect of caffeine on blood pressure then what you might do is go out and collect other people but this time split them into two groups of 50 provide a dose of caffeine to the first group and a placebo the second and then measure the blood pressure in everyone and see if there's any difference between the two groups if there is a difference you might conclude that caffeine does affect blood pressure and that's an advantage of experiments over observational studies these experiments allow you to establish cause and effect relationships if you are going to run an observational study well the first thing you're going to have to do is actually collect a sample if you want to survey a hundred people well then you're going to have to go and find a hundred people so you can survey them and that has to be done correctly you can't just go and survey your family and friends because your family and friends probably don't represent the entire population they're trying to explain and when that happens in general when the sample that you collect fails to represent the entire population we call that bias now there's lots of different kinds of bias there's non-response bias there's under coverage there's non-random sampling this talk is just a summary if you want to learn more about those kinds of bias and how to avoid them you can read about them in the per disco introductory statistics textbook right now I'll just go over non sampling that's what I was talking about when I mentioned surveying your family and friends you aren't allowed to do that you have to choose the members of the or from the population and a completely random fashion now the simplest way to do that is to use simple random sampling that means getting all the members of the population in a big long list and just choosing members from that list on a completely random basis using some sort of random number generator now that method is simple but it actually is quite time-consuming so we often use variations on simple random sampling common variations are systematic sampling stratified sampling and cluster sampling again this is just a summary so if you want to learn more about those sampling techniques you can read about them in the participation directory statistics textbook so that's collecting a sample for your observational study but what about experiments the other main method of data collection we have well yes experiments do have their own considerations because we aren't just observing the subjects in an experiment or actually treating them as well now I went over the basic nature of an experiment when I talked about testing the effect of caffeine on blood pressure so right now I'll just go to the other main considerations that we have and we're designing an experiment firstly as with a sample it is absolutely vital that you randomize experiments so here that means assigning the subjects to the different groups to receive the different treatments in a completely random fashion otherwise you're going to bias your results again secondly it can be appropriate to block experiments that's when you think that there are different demographics in the population and in the group of experimental subjects that you think might affect the results of the experiment so say you think that men and women might naturally have different blood pressures well when testing the effect of caffeine on blood pressure you might want to actually split men and women up and then actually run separate experiments on the two genders that way the effect of gender on blood pressure won't get mixed up with testing the effect of caffeine on blood pressure and finally it is also important to use a control group in an experiment that's what I was talking about earlier when I mentioned giving a placebo to a group a placebo was a fake dose of a driver and in general a control group is a group that receives no treatment or receives a trivial level of treatment control groups are important to make sure that the variable do you think is causing an effect is the one that's actually causing it say for example you look at the group that receives a placebo and they don't have a heightened blood pressure for example then if you look to the group that receives caffeine if they do have a higher blood pressure you can be pretty sure the caffeine is that causing it so that's chapter one introduction to statistics the key topics were statistical concepts data collecting data sample design and experimental design