The CREATEBOXPLOTDATA function takes a raw input dataset and generates the data needed as input into the BOXPLOT function.

CREATEBOXPLOTDATA returns five values for each input dataset: the minimum (excluding possible outliers), the lower quartile, the median, the upper quartile, and the maximum (excluding possible outliers). If neither outlier nor suspected outliers are calculated then the minimum and maximum returned values will be the minimum and maximum of the dataset. If outliers or suspected outliers are calculated then the minimum and maximum returned will be the smallest and largest value (respectively) in the dataset that is not included in the outlier or suspected outlier data.

Examples


Copy and paste the following code to the IDL command line to create data for use in BOXPLOT.

 
; Create an array of average speeds on two different bicycles
; to use in CREATEBOXPLOTDATA
bike_mph = [ $
   [12.2, 16.2], $
   [12.1, 16.4], $
   [10.7, 16.9], $
   [11.6, 17.0], $
   [10.2, 16.5], $
   [10.9, 16.1], $
   [11.8, 17.1], $
   [10.9, 16.0], $
   [12.4, 16.8], $
   [12.9, 16.9], $
   [13.1, 17.5], $
   [13.0, 17.4]]
;Create the data and store mean and outlier values
bpd = CREATEBOXPLOTDATA(bike_mph, MEAN_VALUES=means, OUTLIER_VALUES=outliers)
 
;Display the data created to be used in BOXPLOT
PRINT, bpd

IDL displays:

  10.200000       16.000000
  11.250000       16.450001
  12.150000       16.900000
  12.950000       17.250000
  13.100000       17.500000
 
; Display the mean values created
PRINT, means

IDL displays:

  11.8167      16.7333
   
; Display the outlier values created
PRINT, outliers

IDL displays:

  !NULL

Syntax


result = CREATEBOXPLOTDATA(data [, IGNORE=value] [, CI_VALUES=variable] [FINITE_INDICES=variable] [, MEAN_VALUES=variable] [, OUTLIER_VALUES=variable] [, SUSPECTED_OUTLIER_VALUES=variable)

Return Value


An M x 5 element array, where M is the number of distinct datasets containing data for use in BOXPLOT. IDL creates data in the order needed for BOXPLOT: minimum, lower quartile, median, upper quartile, and maximum values.

Arguments


Data

The input data used to generate the results for the BOXPLOT function. The input data may be any of the following:

  • an M x N array of data where M is the number of distinct datasets and N is the number of data values for each dataset.
  • an N-element array of pointers. Each pointer denotes one dataset.
  • an N-element list. Each list element denotes one dataset.

Keywords


IGNORE

Set this keyword to a value to treat as bad data and to ignore when calculating the results.

CI_VALUES

Set this keyword to a named variable to return an N-element array denoting the confidence interval value around the median for each box. These values are used for the boundaries of the notch in the BOXPLOT function, if displayed.

FINITE_INDICES

Set this keyword to a named variable to return a vector containing the indices of the datasets in which valid data was returned. This useful when your data contains NaN's or infinite values, e.g., some datasets can not be used to create the five needed values for BOXPLOT.

MEAN_VALUES

Set this keyword to a named variable to return an M-element vector containing the mean values for each input dataset.

OUTLIER_VALUES

Set this keyword to a named variable to return a 2 x N-element array containing any outliers from each input dataset. For each value [x, y], x represents the box location and y represents the value at that location.

SUSPECTED_OUTLIER_VALUES

Set this keyword to a named variable to return a 2 x N element array containing any suspected outliers from each input dataset. For each value [x, y], x represents the box location and y represents the value at that location.

Notes on CREATEBOXPLOTDATA Calculations


Values returned by CREATEBOXPLOTDATA are calculated using the conventions outlined below. Given an ordered dataset with n elements:

  • The position of the lower quartile (Q1) is 0.25 * (n + 1). If this position is not an integer then the weighted average of the two surrounding positions is used.
  • The position of the median is 0.50 * (n + 1). If this position is not an integer then the weighted average of the two surrounding positions is used.
  • The position of the upper quartile (Q3) is 0.75 * (n + 1). If this position is not an integer then the weighted average of the two surrounding positions is used.
  • The Inner Quartile Range (IQR) = Q3 - Q1.
  • Suspected outliers are those values that fall within the following ranges: [Q1 - 3 * IQR, Q1 - 1.5 * IQR] or [Q3 + 1.5 * IQR, Q3 + 3 * IQR].
  • Outliers are those values that are either less than Q1 - 3 * IQR or greater than Q3 + 3 * IQR.
  • The Confidence Interval (CI) value is calculated as (1.57 * IQR) / sqrt(n). When this value is passed into BOXPLOT, a notch will be displayed around the median using the values of median +/- CI.

Version History


8.2.2 Introduced

See Also


BOXPLOT