# Statistical diagrams, specifically histograms, boxplots, stem-and-leaf diagrams and time sequence plots, are useful in determining what type of…

Statistical diagrams, specifically histograms, boxplots, stem-and-leaf diagrams and time sequence plots, are useful in determining what type of distribution characterizes a particular set of data. These diagrams are used frequently in the “real world” of chemical engineering. In this workshop, your group will examine these three statistical plots and how they can be used to extract information from a set of data. In this case, you will use the set of measurements of cylinder lengths given in the last page of this workshop. These measurements were made with an accurate and precise caliper set. Directions See the table of cylinder lengths at the end of this document. 1. Create a stem-and-leaf diagram for your data in the space below. _____ _______________________ _____ _______________________ _____ _______________________ _____ _______________________ _____ _______________________ _____ _______________________ _____ _______________________ _____ _______________________ _____ _______________________ _____ _______________________ Do you recognize one or more data that are potential outliers? If yes, remove these data from your set and recreate the stem-and-leaf diagram on the next page. Page 2 _____ _______________________ _____ _______________________ _____ _______________________ _____ _______________________ _____ _______________________ _____ _______________________ _____ _______________________ _____ _______________________ _____ _______________________ _____ _______________________ 2. Now, you will create a boxplot of your complete data set. To do this you are going to need to find the sample “interquartile range.” This range should contain roughly 50% of the sample data points. a. Find the location of the median in the sample: n = the size of the sample location of median = (n+1)/2 median location: __________ b. Round the median location DOWN to the nearest whole number X = rounded median location X : __________ c . Find the “quartile” location; defined as q = (X +1)/2 q : __________ (OK not to be an integer) d. In the table on the attached page, arrange your data in order from highest value down to lowest value. Number the values from top to bottom, 1 to 30. Locate the median in your data. median value: __________ e. Find q1 by counting up q values from the bottom of the arranged data series. For example, if your value of q was three, you would simply count three samples from Page 3 the bottom and label that value as q1. However, if q is not an integer, then q1 is the average of the data points in positions just above and below the q location. q1 : __________ f. Find q3 by counting down q values from the top of the arranged data series and labeling the data point in that position as q3. Similarly to part e, if q is not an integer let q3 equal the average of the data series values in positions just above and below the q location. q3 : __________ g. Now, find the value of iqr, the sample interquartile range. This range contains close to 50% of the data. iqr = q3 -q1 iqr : __________ 3. With these values, you are ready to construct your box plot. a. Using the attached chart, construct a linear, horizontal scale. Make sure the scale is large enough to include all your data plus about 25% on each side. Choose an appropriate, convenient increment value for your scale. b. Label the points q1, q3 and the range iqr on this scale. c . Find the points referred to as “inner fences”: f1 = q1 -1.5 iqr f3 = q3 +1.5 iqr f1 : __________ f3 : __________ Label these points on your scale. d. Find adjacent value points, defined as follows; a1 is the data point that is closest to the value of f1 without being less than f1 in value. a3 is the data point that is closest to the value of f3 without being greater than f3 in value. a1 : __________ a3 : __________ Label points a1 and a3 on your scale with an x. e. Find the value of the “outer fences”, defined as follows; Page 4 F1 = q1 -2(1.5) iqr F3 = q3 +2(1.5) iqr F1 : __________ F3 : __________ Label these points on your scale. f. Connect the values of q1 and q3 in a box. Draw a line across the box to represent the median value as a divider. Connect the values of a1 and a3 to the box with dashed lines. Label any points in the data series between the inner and outer fences with circles. Label all points beyond the outer fence by asterisks. 4. Your boxplot is now complete. The position of the median line within the “box” indicates the general symmetry of the distribution. If the median line doesn’t split the box evenly, that means the distribution is skewed. Comment on your median line. Points falling between the inner and outer fence locations are called mild outliers. Does your boxplot have any of these? If so how many? ______________________ Points falling outside of the outer fence are considered extreme outliers. How many extreme outliers does your boxplot contain? ______________________ 5. Remove outliers from your data set as indicated by your boxplot and create a welldesigned histogram of your cylinder diameter data using the attached graph sheet. Does the resulting histogram show evidence of a normal distribution? Check normality with a normal probability plot. ____________________________________________________________________ _ ____________________________________________________________________ _ _______________________________________________________________ ______ ____________________________________________________________________ _ 6. Make a sequence plot of the measurements, assuming they were taken in the order presented. Comment on any patterns you recognize. Page 5 1 39.88 2 40.32 3 40.32 4 39.27 5 40.35 6 39.34 7 39.88 8 38.68 9 39.29 10 39.85 11 38.13 12 39.72 13 39.80 14 40.06 15 40.16 16 39.78 17 40.25 18 40.30 19 39.22 20 41.94 21 39.30 22 39.84 23 37.86 24 39.22 25 39.88 26 39.77 27 39.67 28 39.78 29 40.06 30 40.02