Introductory Statistics – Chapter 2: Presenting data

Introductory Statistics – Chapter 2: Presenting data

July 19, 2019 0 By Kailee Schamberger



welcome to the video summary series for the Disco's introductory statistics textbook in addition to chapter summary videos such as this one introductory statistics also offers podcasts virtual tutor alluring homework activities with anti-cheat and auto grade functionality and detailed instructor resources find out more at pedis co.com forward slash intro stats for now over to the author hi I'm Shawn Thompson and welcome to the second summary in the Pisco introductory statistics series in this one we're going to go over presenting data in particular we'll be going over presenting categorical data presenting numerical data and presenting relationships now we could be flippant and say we present data because it looks nice and this isn't wrong it's sort of true to an extent when we have a bunch of raw data that will be how the data starts off but it isn't really any good to anyone in that form when we say we present data what we mean is that we provide a kind of visual or graphical summary of the data so that we can detect trends in that data so we'll start with categorical data which usually starts life as a list of observed categories in a sample let's take an example let's say the city you live in has proposed that they want to knock down a local library to make room for a highway you've surveyed 200 people about their opinion on this proposal and in a survey people can say they either agree disagree or don't care the data from this survey will be a long list of to hundreds of responses now in this form you can't quickly answer many questions about the data questions like do more people agree with a proposal or disagree with it so different ways of presenting the data are going to help you be able to do that now to start with you can count the data in particular for each of the three possible categories you can count how many times each one is observed in the 200 responses then you can present this in a table called a frequency table and already just by looking at this table we have a bit of feeling for the data if we want we can also provide the relative frequency table which shows the proportion of data values that fall in each category we can put numbers like these into a chart to get more graphical a common chart is the bar chart now the bar chart is a chart that visually shows the number of values that fall in each category as a vertical bar for each category and the height of each bar represents the observed frequency of that category so the bar chart gives the same information as the frequency table but it's more visual and we can see where the data is doing but there's also the pie chart the pie chart is the circle divided up into slices and each slice represents one of the categories in your data the size of each slice represents the proportion of values that occur in each category so this chart gives the same information as the relative frequency table but it does it in a way that lets us quickly look at it and get a feeling for the data so we've just gone over presenting categorical code data and now we'll look at numerical data all the same principles apply we want to convert plain old raw data into graphics that help us get a better feeling for it one major difference is that numerical variables tend to be able to assume lots more values so we'll find that we need to group values together for example say you're studying dexterity and as part of this you get 200 people to catch items toss to them each person continues catching items until they drop one and you record the number of items they successfully catch now for these 200 dollar values there's no limit to what the values could be compare this to the survey we mentioned earlier where there were only three possible responses to make it easier to manage the amiracle data we often group those together into classes and then count the number of observed values in each class this is a frequency distribution table now like with the categorical data we can convert to these numbers into a chart and for numerical data we call the chart a histogram so here's the histogram for the dexterity study it's a lot like a bar chart but due to the fact that numerical data is more structured than categorical data we can use the Instagram to tell us a lot more about a data in particular we can look into things like where the middle of the data is whether or not the data is symmetric and whether there is any skew in the data now this talk is just a summary so I won't go over looking at these things in detail here if you want to see these aspects of a histogram in more detail you can read all about them in the Pisco introductory statistics textbook what I move on to now is time plots sometimes we collect and keep track of data over a time period to see if there are any trends over time for example a retail store might record sales figures each month over the course of a year starting in July what the store can do in this situation is that it can put these data values into a time plot which is a graph where the horizontal axis represents time and the vertical axis represents the data being studied so the retail store fills in points representing the twelve data values the twelve sales figures they recorded at the time they recorded them this is known as time series data and a time plot is how you present such data to finish off this summary we'll have a look at presenting relationships now this is the topic that comes up when you have two variables and you want to see if the variables are related in any way how you present the relationship depends on what sort of variables you have you might have two numerical variables or two categorical variables or you could have one of each now we'll be spending most of our time talking about relationships between two numerical variables in that situation we use what is called a scatter plot now a scatter plot is a two-dimensional graph with one axis for each variable the values of one variable run along the horizontal axis and the values of the other variable run along the vertical axis you draw the scatter plot by placing points on the graph corresponding to the data values you collect so let's take an example say you're looking at a year of 100 school students and you want to see if there is a relationship between the score each student got in the mid-year exam and the score they got in their final exam at the end of the year so for each of the 100 students you have two numerical data values and we do score and a following each score and these 100 pairs correspond to 100 points on the scatterplot like this so what does this tell us well the scatterplot can basically tell us two things what type of relationship exists and how strong the relationship is for the type of relationship we go to the shape of the graph these points tend to follow a basic straight line so we would say that the relationship is linear for the strength of the relationship we look at how tightly gathered together the points are the relationship here isn't mathematically perfect but at the same time the points aren't too scattered so we would say that the relationship is strong if points scattered wildly like this one for example we would say that the relationship is weak so we use a scatter plot to interpret the nature of the relationship between two variables to get some practice doing this let's do a question from the PDCA workbook in this question some data had been collected for the different offices of the main corporation in particular the number of employees at each office and the amount of funding that office receives are recorded for every office in the main corporation this scatter plot shows the data for funding versus number of employees and three analysts have studied the plot and offered suggestions for the relationship that exists you're being asked which description is best Ted offers the best description so we'll submit that and now we see we get personalized feedback and an explanation for the question so that's the relationship between two numerical variables what about if we have one numerical and one categorical variable well this situation is quite common because in experiments you can often provide different levels of a categorical variable and then record the level of a numerical variable in this case we can still use scatter plots but they won't look exactly the same here's an example of a scatterplot showing the relationship between a numerical variable and a categorical variable and finally what about when looking at relationships between two categorical variables a scatterplot won't do here here we can use what is called a side by side bar chart let's look an example let's say a survey is studying the relationship between the gender of a voter and which of three candidates that voter will select in an election coming up so a thousand men and a thousand women are surveyed about this giving us to frequency tables of data one for each gender and each table breaks down each genders voting preference now the side by side bar chart is essentially two bar charts in one the two genders are separated out and for each gender we present bars to show the relative number of votes that each candidate got in the survey for that gender and we can compare the bars between the two bar charts so in this way we can look at the relationship between gender and voter preference so that was presenting data their key topics were presenting categorical data presenting numerical data and presenting relationships