banner



Which Is A Reason To Use Statistics To Evaluate Data?

Previous Page Table of Contents Next Page


half-dozen BASIC STATISTICAL TOOLS

At that place are lies, damn lies, and statistics......
(Anon.)


6.ane Introduction
6.2 Definitions
6.three Basic Statistics
6.4 Statistical tests


6.1 Introduction

In the preceding capacity basic elements for the proper execution of analytical piece of work such every bit personnel, laboratory facilities, equipment, and reagents were discussed. Before embarking upon the actual analytical piece of work, however, one more tool for the quality assurance of the work must be dealt with: the statistical operations necessary to control and verify the analytical procedures (Affiliate 7) as well as the resulting data (Chapter 8).

It was stated earlier that making mistakes in analytical piece of work is unavoidable. This is the reason why a complex system of precautions to foreclose errors and traps to observe them has to be gear up up. An of import attribute of the quality control is the detection of both random and systematic errors. This can be washed past critically looking at the operation of the assay as a whole and too of the instruments and operators involved in the chore. For the detection itself as well as for the quantification of the errors, statistical treatment of information is indispensable.

A multitude of dissimilar statistical tools is available, some of them simple, some complicated, and often very specific for certain purposes. In analytical work, the most important common functioning is the comparing of data, or sets of data, to quantify accuracy (bias) and precision. Fortunately, with a few unproblematic user-friendly statistical tools near of the information needed in regular laboratory work tin can exist obtained: the "t-test, the "F-test", and regression assay. Therefore, examples of these will exist given in the ensuing pages.

Clearly, statistics are a tool, not an aim. Simple inspection of data, without statistical handling, by an experienced and dedicated analyst may be just as useful as statistical figures on the desk-bound of the disinterested. The value of statistics lies with organizing and simplifying information, to permit some objective guess showing that an assay is under control or that a change has occurred. Equally important is that the results of these statistical procedures are recorded and tin can be retrieved.

half-dozen.2 Definitions


6.2.1 Fault
6.two.2 Accuracy
6.2.3 Precision
6.2.4 Bias


Discussing Quality Control implies the use of several terms and concepts with a specific (and sometimes confusing) meaning. Therefore, some of the most important concepts volition be defined beginning.

6.2.1 Error

Error is the collective noun for any departure of the effect from the "true" value*. Belittling errors can be:

1. Random or unpredictable deviations betwixt replicates, quantified with the "standard divergence".

2. Systematic or anticipated regular deviation from the "true" value, quantified equally "mean difference" (i.e. the difference between the true value and the mean of replicate determinations).

3. Constant, unrelated to the concentration of the substance analyzed (the analyte).

4. Proportional, i.e. related to the concentration of the analyte.

* The "true" value of an attribute is by nature indeterminate and often has merely a very relative meaning. Peculiarly in soil science for several attributes there is no such thing equally the true value equally whatsoever value obtained is method-dependent (due east.g. cation exchange chapters). Evidently, this does not mean that no adequate analysis serving a purpose is possible. It does, nonetheless, emphasize the need for the establishment of standard reference methods and the importance of external QC (see Chapter 9).

6.2.2 Accuracy

The "trueness" or the closeness of the belittling event to the "true" value. Information technology is constituted by a combination of random and systematic errors (precision and bias) and cannot exist quantified directly. The examination result may be a mean of several values. An accurate determination produces a "true" quantitative value, i.eastward. it is precise and free of bias.

half-dozen.ii.3 Precision

The closeness with which results of replicate analyses of a sample agree. It is a mensurate of dispersion or scattering effectually the mean value and usually expressed in terms of standard deviation, standard mistake or a range (difference between the highest and the everyman result).

half dozen.2.four Bias

The consistent deviation of analytical results from the "true" value caused by systematic errors in a procedure. Bias is the opposite but most used measure for "trueness" which is the agreement of the hateful of analytical results with the true value, i.e. excluding the contribution of randomness represented in precision. There are several components contributing to bias:

1. Method bias

The difference between the (hateful) examination outcome obtained from a number of laboratories using the same method and an accepted reference value. The method bias may depend on the analyte level.

two. Laboratory bias

The difference betwixt the (mean) examination result from a item laboratory and the accepted reference value.

three. Sample bias

The divergence between the mean of replicate test results of a sample and the ("true") value of the target population from which the sample was taken. In practise, for a laboratory this refers mainly to sample preparation, subsampling and weighing techniques. Whether a sample is representative for the population in the field is an extremely important attribute simply usually falls outside the responsibility of the laboratory (in some cases laboratories have their ain field sampling personnel).

The relationship between these concepts tin exist expressed in the post-obit equation:

Figure

The types of errors are illustrated in Fig. 6-1.

Fig. 6-one. Accuracy and precision in laboratory measurements. (Note that the qualifications use to the mean of results: in c the mean is accurate but some individual results are inaccurate)

six.3 Basic Statistics


6.3.1 Mean
vi.3.2 Standard difference
six.3.3 Relative standard deviation. Coefficient of variation
6.3.4 Confidence limits of a measurement
6.3.v Propagation of errors


In the discussions of Chapters seven and viii basic statistical treatment of information will exist considered. Therefore, some understanding of these statistics is essential and they will briefly exist discussed here.

The bones supposition to be made is that a set up of information, obtained by repeated analysis of the same analyte in the same sample under the same conditions, has a normal or Gaussian distribution. (When the distribution is skewed statistical treatment is more complicated). The principal parameters used are the mean (or average) and the standard difference (see Fig. half dozen-2) and the principal tools the F-exam, the t-examination, and regression and correlation assay.

Fig. 6-2. A Gaussian or normal distribution. The figure shows that (approx.) 68% of the data fall in the range ¯ x± s, 95% in the range ¯x ± 2s, and 99.vii% in the range ¯x ± 3south.

six.3.one Hateful

The average of a fix of northward data 10 i :

¯

(6.1)

6.3.2 Standard deviation

This is the most ordinarily used measure of the spread or dispersion of data effectually the mean. The standard deviation is defined as the foursquare root of the variance (V). The variance is divers as the sum of the squared deviations from the mean, divided past northward-1. Operationally, there are several means of adding:

(six.1)

or

(6.iii)

or

(6.4)

The adding of the hateful and the standard difference can easily exist done on a calculator but most conveniently on a PC with figurer programs such as dBASE, Lotus 123, Quattro-Pro, Excel, and others, which have uncomplicated ready-to-utilize functions. (Warning: some programs utilise n rather than n- ane!).

half-dozen.3.three Relative standard departure. Coefficient of variation

Although the standard difference of analytical data may not vary much over limited ranges of such data, it usually depends on the magnitude of such data: the larger the figures, the larger due south. Therefore, for comparison of variations (e.chiliad. precision) it is often more than user-friendly to apply the relative standard deviation (RSD) than the standard deviation itself. The RSD is expressed as a fraction, but more than usually as a per centum and is then called coefficient of variation (CV). Oft, nevertheless, these terms are dislocated.

(6.5; half dozen.6)

Notation. When needed (east.g. for the F-exam, run across Eq. 6.11) the variance tin, of course, exist calculated by squaring the standard deviation:

6.3.iv Confidence limits of a measurement

The more an assay or measurement is replicated, the closer the mean ten of the results will approach the "true" value 1000, of the analyte content (assuming absence of bias).

A unmarried analysis of a examination sample tin can be regarded every bit literally sampling the imaginary ready of a multitude of results obtained for that test sample. The doubt of such subsampling is expressed past

(6.viii)

where

m = "true" value (mean of large set of replicates)
¯x = mean of subsamples
t = a statistical value which depends on the number of data and the required conviction (unremarkably 95%).
s = standard deviation of hateful of subsamples
n = number of subsamples

(The term is also known as the standard fault of the mean.)

The critical values for t are tabulated in Appendix 1 (they are, therefore, here referred to equally t tab ). To discover the applicable value, the number of degrees of freedom has to be established by: df = n -1 (see also Section 6.4.2).

Example

For the determination of the clay content in the particle-size analysis, a semi-automated pipette installation is used with a 20 mL pipette. This volume is approximate and the functioning involves the opening and closing of taps. Therefore, the pipette has to be calibrated, i.e. both the accuracy (trueness) and precision take to exist established.

A tenfold measurement of the volume yielded the following set up of data (in mL):

nineteen.941

19.812

19.829

19.828

19.742

xix.797

19.937

19.847

19.885

19.804

The mean is 19.842 mL and the standard difference 0.0627 mL. According to Appendix 1 for n = 10 is ttab = ii.26 (df = ix) and using Eq. (six.8) this scale yields:

pipette book = 19.842 ± 2.26 (0.0627/ ) = 19.84 ± 0.04 mL

(Note that the pipette has a systematic deviation from 20 mL as this is exterior the found conviction interval. See also bias).

In routine analytical work, results are usually single values obtained in batches of several examination samples. No laboratory will clarify a test sample 50 times to be confident that the result is reliable. Therefore, the statistical parameters have to exist obtained in some other mode. Most usually this is done past method validation (see Affiliate 7) and/or by keeping command charts, which is basically the collection of analytical results from one or more control samples in each batch (run across Chapter 8). Equation (half dozen.8) is and then reduced to

(6.9)

where

m = "true" value
x = single measurement
t = applicable ttab (Appendix 1)
s = standard departure of set of previous measurements.

In Appendix 1 can exist seen that if the ready of replicated measurements is large (say > thirty), t is close to 2. Therefore, the (95%) conviction of the result x of a single test sample (n = 1 in Eq. six.8) is approximated by the commonly used and well known expression

(6.10)

where Southward is the previously determined standard deviation of the large fix of replicates (see also Fig. half-dozen-two).

Note: This "method-s" or s of a control sample is not a constant and may vary for dissimilar test materials, analyte levels, and with analytical weather.

Running duplicates volition, according to Equation (half-dozen.viii), increase the confidence of the (mean) consequence by a factor :

where

¯x = mean of duplicates
s = known standard deviation of big set

Similarly, triplicate analysis volition increase the confidence by a factor , etc. Duplicates are further discussed in Section eight.three.three.

Thus, in summary, Equation (half-dozen.eight) tin can exist applied in various ways to determine the size of errors (conviction) in analytical piece of work or measurements: single determinations in routine piece of work, determinations for which no previous data exist, certain calibrations, etc.

half-dozen.3.5 Propagation of errors


6.3.five.one. Propagation of random errors
vi.three.five.2 Propagation of systematic errors


The final result of an analysis is often calculated from several measurements performed during the procedure (weighing, calibration, dilution, titration, instrument readings, moisture correction, etc.). Every bit was indicated in Section six.2, the total error in an analytical result is an adding-upwards of the sub-errors fabricated in the various steps. For daily practice, the bias and precision of the whole method are usually the most relevant parameters (obtained from validation, Chapter 7; or from command charts, Affiliate 8). However, sometimes it is useful to go an insight in the contributions of the subprocedures (and and then these have to be determined separately). For case if one wants to alter (part of) the method.

Because the "adding-up" of errors is usually not a uncomplicated summation, this will be discussed. The master distinction to exist fabricated is betwixt random errors (precision) and systematic errors (bias).

half dozen.3.5.1. Propagation of random errors

In estimating the full random error from factors in a final calculation, the treatment of summation or subtraction of factors is different from that of multiplication or division.

I. Summation calculations

If the final issue ten is obtained from the sum (or difference) of (sub)measurements a, b, c, etc.:

10 = a + b + c +...

so the total precision is expressed by the standard departure obtained by taking the square root of the sum of individual variances (squares of standard deviation):

If a (sub)measurement has a constant multiplication factor or coefficient (such every bit an extra dilution), then this is included to calculate the effect of the variance concerned, due east.g. (2b)2

Case

The Effective Cation Exchange Capacity of soils (ECEC) is obtained by summation of the exchangeable cations:

ECEC = Exch. (Ca + Mg + Na + K + H + Al)

Standard deviations experimentally obtained for exchangeable Ca, Mg, Na, K and (H + Al) on a certain sample, e.g. a command sample, are: 0.thirty, 0.25, 0.15, 0.15, and 0.lx cmolc/kg respectively. The total precision is:

It can be seen that the total standard deviation is larger than the highest private standard divergence, but (much) less than their sum. It is also clear that if one wants to reduce the total standard deviation, qualitatively the all-time consequence can be expected from reducing the largest individual contribution, in this example the exchangeable acidity.

2. Multiplication calculations

If the concluding effect x is obtained from multiplication (or subtraction) of (sub)measurements co-ordinate to

and so the total error is expressed by the standard deviation obtained by taking the square root of the sum of the individual relative standard deviations (RSD or CV, as a fraction or equally per centum, see Eqs. half dozen.6 and vi.seven):

If a (sub)measurement has a constant multiplication factor or coefficient, then this is included to calculate the effect of the RSD concerned, e.yard. (2RSD b)2.

Example

The calculation of Kjeldahl-nitrogen may be equally follows:

where

a = ml HCl required for titration sample
b = ml HCl required for titration blank
s = air-dry sample weight in gram
Thousand = molarity of HCl
1.iv = xiv×10-iii×100% (xiv = diminutive weight of N)
mcf = moisture correction gene

Note that in add-on to multiplications, this adding contains a subtraction likewise (often, calculations incorporate both summations and multiplications.)

Firstly, the standard deviation of the titration (a -b) is determined as indicated in Department 7 in a higher place. This is then transformed to RSD using Equations (half dozen.five) or (six.half dozen). Then the RSD of the other individual parameters accept to be adamant experimentally. The found RSDs are, for case:

distillation: 0.viii%,
titration: 0.v%,
molarity: 0.2%,
sample weight: 0.2%,
mcf: 0.2%.

The total calculated precision is:

Here once again, the highest RSD (of distillation) dominates the total precision. In practise, the precision of the Kjeldahl method is usually considerably worse (» 2.v%) probably mainly every bit a result of the heterogeneity of the sample. The present example does not accept that into account. It would imply that 2.five% - one.0% = one.5% or three/5 of the total random mistake is due to sample heterogeneity (or other disregarded crusade). This implies that painstaking efforts to improve subprocedures such as the titration or the preparation of standard solutions may non be very rewarding. It would, withal, pay to meliorate the homogeneity of the sample, e.g. by careful grinding and mixing in the preparatory stage.

Note. Sample heterogeneity is also represented in the moisture correction factor. However, the influence of this factor on the final result is usually very small.

6.3.5.2 Propagation of systematic errors

Systematic errors of (sub)measurements contribute directly to the total bias of the result since the individual parameters in the calculation of the concluding result each carry their ain bias. For instance, the systematic error in a residual will crusade a systematic mistake in the sample weight (also equally in the wet determination). Note that some systematic errors may cancel out, east.g. weighings by departure may not exist afflicted by a biased residue.

The but way to detect or avoid systematic errors is by comparison (calibration) with independent standards and outside reference or control samples.

half dozen.4 Statistical tests


6.4.1 Two-sided vs. one-sided test
6.4.2 F-test for precision
vi.4.3 t-Tests for bias
vi.4.4 Linear correlation and regression
6.four.5 Assay of variance (ANOVA)


In analytical work a often recurring operation is the verification of performance by comparing of data. Some examples of comparisons in do are:

- performance of two instruments,

- functioning of ii methods,

- functioning of a procedure in different periods,

- performance of 2 analysts or laboratories,

- results obtained for a reference or control sample with the "truthful", "target" or "assigned" value of this sample.

Some of the most common and convenient statistical tools to quantify such comparisons are the F-test, the t-tests, and regression analysis.

Because the F-test and the t-tests are the most basic tests they will be discussed first. These tests examine if two sets of ordinarily distributed data are like or dissimilar (vest or not belong to the aforementioned "population") by comparing their standard deviations and means respectively. This is illustrated in Fig. half-dozen-iii.

Fig. 6-3. Three possible cases when comparing ii sets of information (n 1 = due north 2 ). A. Different mean (bias), aforementioned precision; B. Same hateful (no bias), different precision; C. Both mean and precision are different. (The 4th case, identical sets, has not been fatigued).

vi.4.1 Two-sided vs. one-sided exam

These tests for comparison, for example between methods A and B, are based on the assumption that there is no significant difference (the "null hypothesis"). In other words, when the departure is so pocket-size that a tabulated critical value of F or t is not exceeded, nosotros can be confident (usually at 95% level) that A and B are not dissimilar. Two fundamentally dissimilar questions tin can be asked apropos both the comparison of the standard deviations southward 1 and s ii with the F-exam, and of the means¯xi, and ¯x2, with the t-exam:

1. are A and B different? (two-sided exam)
2. is A college (or lower) than B? (ane-sided examination).

This stardom has an important practical implication every bit statistically the probabilities for the 2 situations are different: the chance that A and B are only dissimilar ("it can go ii ways") is twice as big as the chance that A is higher (or lower) than B ("it tin go only one way"). The most common instance is the 2-sided (too called two-tailed) examination: there are no particular reasons to expect that the means or the standard deviations of two data sets are different. An instance is the routine comparison of a control chart with the previous 1 (see 8.three). Even so, when it is expected or suspected that the hateful and/or the standard deviation volition go but one way, east.thousand. afterwards a modify in an belittling procedure, the i-sided (or 1-tailed) examination is advisable. In this example the probability that it goes the other way than expected is causeless to exist zero and, therefore, the probability that it goes the expected manner is doubled. Or, more correctly, the uncertainty in the two-way exam of 5% (or the probability of 5% that the critical value is exceeded) is divided over the 2 tails of the Gaussian curve (see Fig. 6-2), i.e. 2.5% at the terminate of each tail beyond 2s. If we perform the one-sided exam with 5% uncertainty, we really increase this ii.5% to v% at the end of one tail. (Note that for the whole gaussian bend, which is symmetrical, this is then equivalent to an dubiety of 10% in two means!)

This departure in probability in the tests is expressed in the use of two tables of critical values for both F and t. In fact, the one-sided table at 95% confidence level is equivalent to the two-sided tabular array at xc% confidence level.

Information technology is emphasized that the 1-sided test is just appropriate when a difference in one management is expected or aimed at. Of course it is tempting to perform this test after the results bear witness a clear (unexpected) effect. In fact, however, and so a ii times higher probability level was used in retrospect. This is underscored by the observation that in this way even contradictory conclusions may arise: if in an experiment calculated values of F and t are found within the range betwixt the 2-sided and 1-sided values of F tab , and t tab , the two-sided test indicates no pregnant departure, whereas the one-sided test says that the issue of A is significantly college (or lower) than that of B. What actually happens is that in the first instance the 2.five% boundary in the tail was merely not exceeded, and and then, later on, this 2.5% boundary is relaxed to 5% which is then obviously more hands exceeded. This illustrates that statistical tests differ in strictness and that for proper interpretation of results in reports, the statistical techniques used, including the confidence limits or probability, should always be specified.

6.iv.2 F-test for precision

Considering the result of the F-test may be needed to choose betwixt the Educatee's t-examination and the Cochran variant (see next section), the F-test is discussed first.

The F-test (or Fisher'southward test) is a comparing of the spread of 2 sets of data to exam if the sets belong to the same population, in other words if the precisions are like or dissimilar.

The test makes apply of the ratio of the ii variances:

(half-dozen.11)

where the larger south two must be the numerator by convention. If the performances are not very different, and then the estimates southward ane , and southward2, do not differ much and their ratio (and that of their squares) should not deviate much from unity. In practice, the calculated F is compared with the applicable F value in the F-table (also called the disquisitional value, see Appendix two). To read the table it is necessary to know the applicable number of degrees of freedom for south 1 , and s 2 . These are calculated by:

df1 = none-1
df2 = north2-1

If F cal £ F tab ane can conclude with 95% conviction that there is no significant difference in precision (the "null hypothesis" that s1, = s, is accepted). Thus, there is still a 5% chance that nosotros describe the wrong conclusion. In sure cases more confidence may exist needed, then a 99% conviction table can exist used, which can be found in statistical textbooks.

Example I (2-sided test)

Table 6-1 gives the data sets obtained by 2 analysts for the cation exchange capacity (CEC) of a control sample. Using Equation (6.11) the calculated F value is 1.62. As nosotros had no particular reason to look that the analysts would perform differently, nosotros apply the F-table for the 2-sided examination and find F tab = iv.03 (Appendix 2, df 1 , = df 2 = ix). This exceeds the calculated value and the nothing hypothesis (no divergence) is accepted. It can be ended with 95% confidence that there is no significant difference in precision betwixt the piece of work of Analyst 1 and 2.

Table 6-1. CEC values (in cmolc/kg) of a control sample determined by two analysts.

1

two

ten.2

9.7

x.7

9.0

ten.5

ten.2

9.nine

x.3

9.0

10.8

eleven.ii

xi.1

11.v

9.iv

x.nine

9.2

8.ix

ix.eight

ten.6

10.2

¯ten:

ten.34

ix.97

s:

0.819

0.644

due north:

10

10

Fcal = 1.62

tcal = 1.12

Ftab = 4.03

ttab = 2.10

Example 2 (one-sided test)

The determination of the calcium carbonate content with the Scheibler standard method is compared with the uncomplicated and more rapid "acrid-neutralization" method using one and the same sample. The results are given in Table 6-2. Considering of the nature of the rapid method we suspect it to produce a lower precision then obtained with the Scheibler method and we can, therefore, perform the one sided F-exam. The applicable F tab = 3.07 (App. ii, df 1 , = 12, df 2 = 9) which is lower than F cal (=18.iii) and the null hypothesis (no difference) is rejected. It can be ended (with 95% conviction) that for this one sample the precision of the rapid titration method is significantly worse than that of the Scheibler method.

Table half-dozen-2. Contents of CaCOthree (in mass/mass %) in a soil sample determined with the Scheibler method (A) and the rapid titration method (B).

A

B

2.5

i.7

2.iv

ane.9

ii.v

ii.iii

two.six

two.3

2.5

two.8

two.5

two.five

2.4

i.6

2.6

1.9

2.vii

2.six

2.4

1.7

-

two.4

-

two.2

two.6

x:

2.51

2.13

s:

0.099

0.424

northward:

ten

13

F cal = 18.iii

tcal = 3.12

Ftab = 3.07

ttab* = 2.eighteen

(t tab * = Cochran'due south "alternative" t tab )

6.4.3 t-Tests for bias


6.4.three.i. Pupil's t-test
6.4.iii.ii Cochran'south t-test
six.4.3.three t-Test for large data sets (northward³ thirty)
6.4.three.4 Paired t-test


Depending on the nature of 2 sets of data (due north, s, sampling nature), the means of the sets tin can be compared for bias by several variants of the t-test. The post-obit most mutual types will be discussed:

ane. Student's t-examination for comparison of two independent sets of data with very like standard deviations;

2. the Cochran variant of the t-examination when the standard deviations of the independent sets differ significantly;

3. the paired t-test for comparison of strongly dependent sets of data.

Basically, for the t-tests Equation (6.8) is used but written in a different manner:

(half dozen.12)

where

¯x = mean of examination results of a sample
m = "true" or reference value
s = standard deviation of examination results
n = number of exam results of the sample.

To compare the mean of a data set with a reference value commonly the "two-sided t-tabular array of disquisitional values" is used (Appendix i). The applicable number of degrees of freedom here is:

df = n-1

If a value for t calculated with Equation (6.12) does non exceed the critical value in the tabular array, the information are taken to vest to the same population: there is no divergence and the "null hypothesis" is accepted (with the applicative probability, usually 95%).

As with the F-test, when information technology is expected or suspected that the obtained results are college or lower than that of the reference value, the one-sided t-test can be performed: if t cal > t tab , then the results are significantly college (or lower) than the reference value.

More commonly, even so, the "true" value of proper reference samples is accompanied past the associated standard deviation and number of replicates used to determine these parameters. Nosotros tin and then employ the more full general case of comparing the means of two information sets: the "true" value in Equation (6.12) is so replaced by the hateful of a 2nd information set up. As is shown in Fig. 6-three, to exam if 2 data sets belong to the same population information technology is tested if the two Gauss curves do sufficiently overlap. In other words, if the difference between the ways ¯x1-¯xii is small. This is discussed next.

Similarity or non-similarity of standard deviations

When using the t-exam for 2 small sets of data (n 1 and/or northward 2 <30), a choice of the type of test must be fabricated depending on the similarity (or non-similarity) of the standard deviations of the ii sets. If the standard deviations are sufficiently like they can be "pooled" and the Pupil t-exam can be used. When the standard deviations are not sufficiently similar an alternative procedure for the t-test must be followed in which the standard deviations are not pooled. A convenient culling is the Cochran variant of the t-test. The criterion for the choice is the passing or non-passing of the F-test (run into 6.four.2), that is, if the variances exercise or do not significantly differ. Therefore, for small data sets, the F-test should precede the t-test.

For dealing with large data sets (n 1 , n 2 , ³ 30) the "normal" t-exam is used (see Section 6.4.3.iii and App. 3).

six.4.three.1. Student's t-test

(To exist applied to pocket-size information sets (due north 1 , n 2 < xxx) where s 1 , and s ii are similar co-ordinate to F-test.

When comparison ii sets of data, Equation (half-dozen.12) is rewritten equally:

(6.thirteen)

where

¯xane = mean of data set 1
¯xtwo = mean of data gear up 2
due south p = "pooled" standard departure of the sets
n i = number of data in set up ane
north 2 = number of data in set 2.

The pooled standard deviation s p is calculated by:

6.14

where

s one = standard difference of data gear up 1
s2 = standard deviation of data gear up 2
north 1 = number of data in set 1
due north ii = number of information in set 2.

To perform the t-test, the critical t tab has to be constitute in the table (Appendix 1); the applicable number of degrees of liberty df is here calculated past:

df = northward 1 + northward 2 -2

Case

The two data sets of Table half dozen-one can exist used: With Equations (six.13) and (6.14) t cal , is calculated as 1.12 which is lower than the disquisitional value t tab of 2.10 (App. 1, df = eighteen, 2-sided), hence the null hypothesis (no difference) is accepted and the two data sets are causeless to belong to the aforementioned population: there is no significant difference betwixt the mean results of the two analysts (with 95% conviction).

Note. Some other illustrative style to perform this exam for bias is to calculate if the difference between the means falls within or outside the range where this deviation is withal not significantly large. In other words, if this difference is less than the to the lowest degree significant difference (lsd). This tin be derived from Equation (half-dozen.13):

six.15

In the nowadays example of Table half dozen-1, the calculation yields lsd = 0.69. The measured departure betwixt the means is ten.34 -9.97 = 0.37 which is smaller than the lsd indicating that at that place is no significant difference between the performance of the analysts.

In addition, in this arroyo the 95% confidence limits of the difference betwixt the means tin be calculated (cf. Equation half-dozen.8):

conviction limits = 0.37 ± 0.69 = -0.32 and 1.06

Note that the value 0 for the difference is situated within this conviction interval which agrees with the null hypothesis of x 1 = x 2 (no difference) having been accepted.

six.four.iii.2 Cochran's t-test

To exist applied to small-scale data sets (n ane , n 2 , < 30) where s ane and s two , are different according to F-test.

Calculate t with:

vi.xvi

So determine an "alternative" disquisitional t-value:

vi.17

where

t1 = t tab at northward1-i degrees of freedom
t2 = t tab at ntwo-one degrees of freedom

Now the t-test tin be performed as usual: if t cal < t tab * then the cypher hypothesis that the means do not significantly differ is accepted.

Example

The two information sets of Table 6-ii can exist used.

According to the F-test, the standard deviations differ significantly so that the Cochran variant must be used. Furthermore, in contrast to our expectation that the precision of the rapid test would be inferior, we have no idea nearly the bias and therefore the two-sided examination is appropriate. The calculations yield tcal = iii.12 and ttab *= two.18 meaning that tcal exceeds ttab * which implies that the null hypothesis (no divergence) is rejected and that the mean of the rapid analysis deviates significantly from that of the standard analysis (with 95% confidence, and for this sample only). Further investigation of the rapid method would have to include the use of more different samples and then comparison with the one-sided t-test would exist justified (meet six.four.3.4, Example 1).

6.four.3.3 t-Test for big data sets (n³ 30)

In the instance above (half-dozen.4.3.ii) the conclusion happens to accept been the aforementioned if the Student's t-test with pooled standard deviations had been used. This is caused by the fact that the difference in result of the Pupil and Cochran variants of the t-test is largest when small sets of information are compared, and decreases with increasing number of data. Namely, with increasing number of information a amend estimate of the real distribution of the population is obtained (the flatter t-distribution converges then to the standardized normal distribution). When n³ thirty for both sets, eastward.thousand. when comparison Control Charts (see 8.3), for all applied purposes the departure between the Student and Cochran variant is negligible. The procedure is then reduced to the "normal" t-test by simply computing tcal with Eq. (6.xvi) and comparing this with ttab at df = n 1 + due north two -2. (Notation in App. 1 that the two-sided ttab is now close to 2).

The proper choice of the t-examination as discussed in a higher place is summarized in a flow diagram in Appendix three.

6.4.3.iv Paired t-exam

When ii information sets are not independent, the paired t-test can be a improve tool for comparison than the "normal" t-test described in the previous sections. This is for instance the example when two methods are compared by the same analyst using the same sample(south). It could, in fact, also be applied to the example of Table 6-1 if the 2 analysts used the same analytical method at (most) the aforementioned time.

As stated previously, comparison of ii methods using dissimilar levels of analyte gives more validation data about the methods than using only one level. Comparison of results at each level could be done past the F and t-tests equally described above. The paired t-test, however, allows for different levels provided the concentration range is not also broad. As a rule of fist, the range of results should exist within the same magnitude. If the analysis covers a longer range, i.due east. several powers of x, regression analysis must be considered (run into Section vi.4.4). In intermediate cases, either technique may be chosen.

The nothing hypothesis is that at that place is no difference between the data sets, so the exam is to see if the mean of the differences between the data deviates significantly from zero or not (two-sided exam). If it is expected that one set is systematically higher (or lower) than the other set, then the one-sided test is appropriate.

Instance 1

The "promising" rapid single-extraction method for the conclusion of the cation exchange capacity of soils using the silverish thiourea complex (AgTU, buffered at pH 7) was compared with the traditional ammonium acetate method (NHivOAc, pH 7). Although for certain soil types the difference in results appeared insignificant, for other types differences seemed larger. Such a doubtable group were soils with ferralic (oxic) properties (i.due east. highly weathered sesquioxide-rich soils). In Table 6-3 the results often soils with these properties are grouped to test if the CEC methods give different results. The difference d within each pair and the parameters needed for the paired t-exam are given as well.

Table half dozen-3. CEC values (in cmolc/kg) obtained by the NHivOAc and AgTU methods (both at pH vii) for x soils with ferralic properties.

Sample

NH 4 OAc

AgTU

d

ane

seven.1

6.5

-0.half-dozen

2

4.half-dozen

5.vi

+1.0

3

10.6

fourteen.five

+iii.nine

four

ii.three

5.6

+3.iii

5

25.2

23.8

-1.4

6

4.four

10.4

+six.0

7

7.8

eight.4

+0.6

eight

2.7

5.5

+ii.8

9

14.3

19.ii

+4.9

x

13.half dozen

xv.0

+1.four

¯d = +2.19

tcal = 2.89

southward d = 2.395

ttab = 2.26

Using Equation (six.12) and noting that m d = 0 (hypothesis value of the differences, i.e. no difference), the t-value tin be calculated every bit:

where

= mean of differences within each pair of data
south d = standard divergence of the hateful of differences
n = number of pairs of data

The calculated t value (=ii.89) exceeds the critical value of 1.83 (App. 1, df = n -1 = 9, i-sided), hence the null hypothesis that the methods do not differ is rejected and it is concluded that the silver thiourea method gives significantly higher results as compared with the ammonium acetate method when practical to such highly weathered soils.

Notation. Since such information sets exercise not have a normal distribution, the "normal" t-examination which compares means of sets cannot exist used here (the ways do not constitute a fair representation of the sets). For the same reason no data virtually the precision of the ii methods tin can be obtained, nor can the F-test be applied. For data well-nigh precision, replicate determinations are needed.

Example two

Table half dozen-4 shows the data of full-P in 4 constitute tissue samples obtained by a laboratory L and the median values obtained past 123 laboratories in a proficiency (circular-robin) test.

Table 6-4. Total-P contents (in mmol/kg) of constitute tissue as determined past 123 laboratories (Median) and Laboratory 50.

Sample

Median

Lab 50

d

1

93.0

85.2

-vii.eight

ii

201

224

23

3

78.9

84.five

5.6

4

175

185

10

¯d = 7.70

t cal =1.21

s d = 12.702

ttab = iii.18

To verify the performance of the laboratory a paired t-test tin can be performed:

Using Eq. (6.12) and noting that m d=0 (hypothesis value of the differences, i.eastward. no deviation), the t value can be calculated as:

The calculated t-value is beneath the critical value of 3.18 (Appendix 1, df = n - 1 = iii, two-sided), hence the null hypothesis that the laboratory does non significantly differ from the group of laboratories is accepted, and the results of Laboratory L seem to concur with those of "the rest of the globe" (this is a and so-called third-line command).

half-dozen.4.4 Linear correlation and regression


vi.iv.four.1 Structure of calibration graph
half-dozen.4.iv.2 Comparing two sets of information using many samples at different analyte levels


These also belong to the nearly mutual useful statistical tools to compare furnishings and performances X and Y. Although the technique is in principle the same for both, in that location is a fundamental difference in concept: correlation analysis is practical to independent factors: if X increases, what volition Y do (increase, decrease, or mayhap not change at all)? In regression analysis a unilateral response is assumed: changes in Ten issue in changes in Y, but changes in Y do not result in changes in Ten.

For example, in analytical work, correlation analysis can be used for comparing methods or laboratories, whereas regression analysis can be used to construct calibration graphs. In exercise, notwithstanding, comparison of laboratories or methods is usually besides done by regression assay. The calculations tin be performed on a (programmed) calculator or more than conveniently on a PC using a home-made plan. Even more user-friendly are the regression programs included in statistical packages such as Statistix, Mathcad, Eureka, Genstat, Statcal, SPSS, and others. Also, near spreadsheet programs such as Lotus 123, Excel, and Quattro-Pro have functions for this.

Laboratories or methods are in fact contained factors. Still, for regression analysis one factor has to be the independent or "abiding" gene (e.thousand. the reference method, or the gene with the smallest standard difference). This factor is past convention designated X, whereas the other cistron is and so the dependent cistron Y (thus, nosotros speak of "regression of Y on 10").

Every bit was discussed in Section 6.4.3, such comparisons can often been done with the Educatee/Cochran or paired t-tests. All the same, correlation assay is indicated:

i. When the concentration range is so wide that the errors, both random and systematic, are non independent (which is the assumption for the t-tests). This is often the instance where concentration ranges of several magnitudes are involved.

2. When pairing is inappropriate for other reasons, notably a long time bridge between the ii analyses (sample aging, change in laboratory conditions, etc.).

The principle is to establish a statistical linear human relationship between two sets of respective data by fitting the information to a straight line past means of the "least squares" technique. Such data are, for example, analytical results of 2 methods applied to the same samples (correlation), or the response of an musical instrument to a series of standard solutions (regression).

Note: Naturally, non-linear higher-society relationships are likewise possible, merely since these are less common in analytical work and more complex to handle mathematically, they will not be discussed here. Still, to avoid misinterpretation, ever inspect the kind of human relationship past plotting the data, either on paper or on the reckoner monitor.

The resulting line takes the general course:

where

a = intercept of the line with the y-axis
b = slope (tangent)

In laboratory work ideally, when there is perfect positive correlation without bias, the intercept a = 0 and the gradient = i. This is the so-called "i:one line" passing through the origin (dashed line in Fig. 6-5).

If the intercept a ¹ 0 then there is a systematic discrepancy (bias, fault) between Ten and Y; when b ¹ 1 then at that place is a proportional response or deviation betwixt X and Y.

The correlation between X and Y is expressed past the correlation coefficient r which tin be calculated with the following equation:

half-dozen.19

where

10 i = data Ten
¯x = mean of data 10
y i = data Y
¯y = mean of data Y

It tin can be shown that r can vary from 1 to -one:

r = i perfect positive linear correlation
r = 0 no linear correlation (peradventure other correlation)
r = -1 perfect negative linear correlation

Ofttimes, the correlation coefficient r is expressed as r 2 : the coefficient of determination or coefficient of variance. The advantage of r2 is that, when multiplied past 100, it indicates the percentage of variation in Y associated with variation in Ten. Thus, for example, when r = 0.71 about fifty% (r two = 0.504) of the variation in Y is due to the variation in 10.

The line parameters b and a are calculated with the post-obit equations:

6.xx

and

Information technology is worth to notation that r is independent of the choice which factor is the independent factory and which is the dependent Y. Notwithstanding, the regression parameters a and practise depend on this choice as the regression lines will exist different (except when there is ideal 1:1 correlation).

vi.4.iv.1 Construction of calibration graph

Equally an example, we accept a standard series of P (0-1.0 mg/Fifty) for the spectrophotometric determination of phosphate in a Bray-I extract ("bachelor P"), reading in absorbance units. The data and calculated terms needed to determine the parameters of the scale graph are given in Table vi-five. The line itself is plotted in Fig. 6-four.

Tabular array half dozen-5 is presented hither to give an insight in the steps and terms involved. The calculation of the correlation coefficient r with Equation (6.19) yields a value of 0.997 (r 2 = 0.995). Such high values are common for scale graphs. When the value is not shut to 1 (say, below 0.98) this must exist taken as a warning and information technology might then be appropriate to repeat or review the procedure. Errors may have been made (e.one thousand. in pipetting) or the used range of the graph may not be linear. On the other hand, a high r may be misleading equally it does not necessarily betoken linearity. Therefore, to verify this, the scale graph should ever exist plotted, either on paper or on reckoner monitor.

Using Equations (vi.20 and (6.21) nosotros obtain:

and

a = 0.350 - 0.313 = 0.037

Thus, the equation of the scale line is:

y = 0.626x + 0.037

(vi.22)

Table half dozen-five. Parameters of calibration graph in Fig. 6-4.

x i

y i

x 1 -¯x

(x i -¯10) 2

y i -¯y

(y i -¯y) 2

(x 1 -¯x)(y i -¯y)

0.0

0.05

-0.5

0.25

-0.xxx

0.090

0.150

0.two

0.14

-0.three

0.09

-0.21

0.044

0.063

0.4

0.29

-0.1

0.01

-0.06

0.004

0.006

0.6

0.43

0.ane

0.01

0.08

0.006

0.008

0.8

0.52

0.3

0.09

0.17

0.029

0.051

ane.0

0.67

0.5

0.25

0.32

0.102

0.160

3.0

2.10

0

0.seventy

0

0.2754

0.438 South

¯x=0.5

¯y = 0.35

Fig. vi-4. Calibration graph plotted from data of Table 6-5. The dashed lines delineate the 95% confidence expanse of the graph. Note that the confidence is highest at the centroid of the graph.

During calculation, the maximum number of decimals is used, rounding off to the terminal significant figure is done at the stop (see instruction for rounding off in Section 8.2).

Once the calibration graph is established, its use is simple: for each y value measured the respective concentration 10 tin exist determined either by straight reading or past calculation using Equation (6.22). The use of calibration graphs is further discussed in Department 7.two.2.

Note. A treatise of the mistake or uncertainty in the regression line is given.

6.4.4.2 Comparison two sets of data using many samples at different analyte levels

Although regression assay assumes that one factor (on the ten-centrality) is constant, when certain conditions are met the technique can also successfully be applied to comparing two variables such equally laboratories or methods. These atmospheric condition are:

- The virtually precise data ready is plotted on the x-axis
- At least 6, but preferably more than 10 different samples are analyzed
- The samples should rather uniformly embrace the analyte level range of involvement.

To decide which laboratory or method is the nigh precise, multi-replicate results accept to be used to calculate standard deviations (see half dozen.four.ii). If these are non available then the standard deviations of the present sets could be compared (note that we are now not dealing with normally distributed sets of replicate results). Another convenient mode is to run the regression analysis on the computer, opposite the variables and run the analysis again. Discover which variable has the lowest standard departure (or standard mistake of the intercept a, both given past the computer) and so apply the results of the regression assay where this variable was plotted on the x-centrality.

If the analyte level range is incomplete, 1 might have to resort to spiking or standard additions, with the inherent drawback that the original analyte-sample combination may not adequately be reflected.

Example

In the framework of a performance verification programme, a large number of soil samples were analyzed by two laboratories X and Y (a course of "third-line control", run across Chapter 9) and the data compared past regression. (In this particular example, the paired t-test might take been considered also). The regression line of a mutual attribute, the pH, is shown here equally an illustration. Figure 6-5 shows the so-chosen "scatter plot" of 124 soil pH-H2O determinations past the two laboratories. The correlation coefficient r is 0.97 which is very satisfactory. The slope (= i.03) indicates that the regression line is just slightly steeper than the 1:1 ideal regression line. Very agonizing, however, is the intercept a of -1.18. This implies that laboratory Y measures the pH more than a whole unit lower than laboratory 10 at the low cease of the pH range (the intercept -1.eighteen is at pHten = 0) which difference decreases to near 0.8 unit at the loftier end.

Fig. 6-5. Scatter plot of pH data of two laboratories. Fatigued line: regression line; dashed line: ane:1 ideal regression line.

The t-test for significance is every bit follows:

For intercept a: m a = 0 (null hypothesis: no bias; ideal intercept is then zero), standard error =0.14 (calculated by the calculator), and using Equation (half-dozen.12) nosotros obtain:

Here, t tab = i.98 (App. 1, 2-sided, df = n - 2 = 122 (northward-two because an extra degree of freedom is lost as the information are used for both a and b) hence, the laboratories have a significant common bias.

For slope: m b = one (ideal slope: null hypothesis is no difference), standard mistake = 0.02 (given by figurer), and again using Equation (6.12) we obtain:

Once again, t tab = one.98 (App. 1; ii-sided, df = 122), hence, the divergence between the laboratories is not significantly proportional (or: the laboratories do non accept a significant divergence in sensitivity). These results suggest that in spite of the good correlation, the ii laboratories would have to wait into the crusade of the bias.

Annotation. In the present instance, the scattering of the points around the regression line does not seem to change much over the whole range. This indicates that the precision of laboratory Y does non change very much over the range with respect to laboratory 10. This is not always the instance. In such cases, weighted regression (not discussed hither) is more than appropriate than the unweighted regression every bit used hither.

Validation of a method (see Section 7.5) may reveal that precision tin can change significantly with the level of analyte (and with other factors such as sample matrix).

6.4.5 Analysis of variance (ANOVA)

When results of laboratories or methods are compared where more than one factor can be of influence and must be distinguished from random effects, and so ANOVA is a powerful statistical tool to exist used. Examples of such factors are: different analysts, samples with unlike pre-treatments, dissimilar analyte levels, different methods within one of the laboratories). Most statistical packages for the PC can perform this analysis.

As a treatise of ANOVA is across the scope of the present Guidelines, for farther discussion the reader is referred to statistical textbooks, some of which are given in the list of Literature.

Error or doubt in the regression line

The "fitting" of the calibration graph is necessary because the response points y i , composing the line do non fall exactly on the line. Hence, random errors are unsaid. This is expressed by an uncertainty nearly the slope and intercept b and a defining the line. A quantification can be constitute in the standard deviation of these parameters. Most computer programmes for regression will automatically produce figures for these. To illustrate the procedure, the example of the scale graph in Department half-dozen.4.3.1 is elaborated here.

A practical quantification of the incertitude is obtained past calculating the standard deviation of the points on the line; the "residuum standard deviation" or "standard fault of the y-estimate", which we assumed to be constant (but which is merely approximately so, run into Fig. 6-4):

(6.23)

where

= "fitted" y-value for each x i , (read from graph or calculated with Eq. 6.22). Thus, is the (vertical) deviation of the found y-values from the line.

n = number of calibration points.

Note: Only the y-deviations of the points from the line are considered. Information technology is causeless that deviations in the 10-management are negligible. This is, of course, only the case if the standards are very accurately prepared.

Now the standard deviations for the intercept a and slope b can exist calculated with:

6.24

and

half-dozen.25

To make this procedure articulate, the parameters involved are listed in Table 6-6.

The incertitude about the regression line is expressed by the conviction limits of a and b according to Eq. (six.9): a ± t.south a and b ± t.s b

Table half-dozen-6. Parameters for computing errors due to calibration graph (use also figures of Tabular array 6-five).

ten i

y i

0

0.05

0.037

0.013

0.0002

0.2

0.14

0.162

-0.022

0.0005

0.4

0.29

0.287

0.003

0.0000

0.6

0.43

0.413

0.017

0.0003

0.viii

0.52

0.538

-0.018

0.0003

1.0

0.67

0.663

0.007

0.0001

0.001364 S

In the nowadays example, using Eq. (6.23), nosotros summate

and, using Eq. (6.24) and Table 6-5:

and, using Eq. (half dozen.25) and Table 6-five:

The applicable t tab is 2.78 (App. i, 2-sided, df = due north -ane = 4) hence, using Eq. (6.nine):

a = 0.037 ± 2.78 × 0.0132 = 0.037 ± 0.037
and
b = 0.626 ± two.78 × 0.0219 = 0.626 ± 0.061

Note that if s a is large plenty, a negative value for a is possible, i.e. a negative reading for the blank or aught-standard. (For a give-and-take near the error in x resulting from a reading in y, which is especially relevant for reading a scale graph, encounter Department 7.2.iii)

The uncertainty about the line is somewhat decreased by using more scale points (assuming southward y has not increased): one more than point reduces t tab from ii.78 to 2.57 (meet Appendix 1).


Previous Page Top of Page Next Page

Which Is A Reason To Use Statistics To Evaluate Data?,

Source: https://www.fao.org/3/w7295e/w7295e08.htm

Posted by: gloverfign1969.blogspot.com

0 Response to "Which Is A Reason To Use Statistics To Evaluate Data?"

Post a Comment

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel