Robust Location Estimates#

A location estimate refers to a typical or central value which best describes a given dataset. The mean and median are both examples of location estimators. However, the mean has a severe sensitivity to data outliers and can give erroneous values when even a small number of outliers are present. The median on the other hand, has a strong insensitivity to data outliers, but due to its non-smoothness it can behave unexpectedly in certain situations. GSL offers the following alternative location estimators, which are robust to the presence of outliers.

Trimmed Mean#

The trimmed mean, or truncated mean, discards a certain number of smallest and largest samples from the input vector before computing the mean of the remaining samples. The amount of trimming is specified by a factor \(\alpha \in [0,0.5]\). Then the number of samples discarded from both ends of the input vector is \(\left\lfloor \alpha n \right\rfloor\), where \(n\) is the length of the input. So to discard 25% of the samples from each end, one would set \(\alpha = 0.25\).

gsl_stats_trmean_from_sorted_data(data, alpha)#

This function returns the trimmed mean of sorted_data. The elements of the array must be in ascending numerical order. There are no checks to see whether the data are sorted, so the function gsl_sort() should always be used first. The trimming factor \(\alpha\) is given in alpha. If \(\alpha \ge 0.5\), then the median of the input is returned.

Gastwirth Estimator#

Gastwirth’s location estimator is a weighted sum of three order statistics,

\[gastwirth = 0.3 \times Q_{\frac{1}{3}} + 0.4 \times Q_{\frac{1}{2}} + 0.3 \times Q_{\frac{2}{3}}\]

where \(Q_{\frac{1}{3}}\) is the one-third quantile, \(Q_{\frac{1}{2}}\) is the one-half quantile (i.e. median), and \(Q_{\frac{2}{3}}\) is the two-thirds quantile.

gsl_stats_gastwirth_from_sorted_data(sorted_data)#

This function returns the Gastwirth location estimator of sorted_data. The elements of the array must be in ascending numerical order. There are no checks to see whether the data are sorted, so the function gsl_sort() should always be used first.