You can watch a video tutorial here.

An important step in explanatory data analysis (EDA) is the identification of outliers. These are values in data that are extreme and far removed from the other data points. It is important to identify the outliers and, in most cases, remove them so that they do not adversely affect the results of the data analysis. If the dataset is small then it is easy to identify the outliers by sorting and looking at the data. If the dataset is large, then we can find the outliers using the InterQuartile Range (IQR).

Using the IQR, an outlier is defined as any value 1.5 times the IQR above the 3^{rd} quartile (75^{th} percentile) or 1.5 times the IQR below the 1^{st} quartile (25^{th} percentile). We will use the following functions in Excel to calculate the outliers:

1. QUARTILE(): this returns the value of a quartile of a range of numbers

a. Syntax: QUARTILE(range, quart)

i. range: the range of values to be evaluated

ii. quart: the quartile to be returned e.g. 1 for the first quartile

2. IF(): this evaluates a condition and returns a value depending on whether the condition is true or false

a. Syntax: IF(condition, true, false)

i. condition: the condition or statement to be evaluated

ii. true: the value to be returned if the condition is true

iii. false: the value to be returned if the condition is false

3. OR(): this determines if the conditions in a test are true or false

a. Syntax: OR(condition1, condition2….)

i. condition1: a logical test

### Step 1 – Find the first quartile

– Select the cell in which the value is to be displayed

– Type the formula using cell references:

= QUARTILE(Data, 1)

– Press **Enter**

### Step 2 – Find the third quartile

– Select the cell in which the value is to be displayed

– Type the formula using cell references:

= QUARTILE(Data, 3)

– Press **Enter**

### Step 3 – Find the interquartile range (IQR)

– Select the cell in which the value is to be displayed

– Type the formula using cell references:

= Third quartile – First quartile

– Press **Enter**

### Step 4 – Find the upper bound

– Select the cell in which the value is to be displayed

– Type the formula using cell references:

= Third quartile + (1.5*IQR)

– Press **Enter**

### Step 5 – Find the lower bound

– Select the cell in which the value is to be displayed

– Type the formula using cell references:

= First quartile – (1.5*IQR)

– Press **Enter**

### Step 6 – Create the formula for outlier detection

– Select the cell in which the value is to be displayed

– Type the formula using cell references:

= IF(OR(Data < $Lower bound$, Data > $Upper bound$),TRUE, FALSE)

– Add dollar signs ($) to the cell references of the Upper and Lower bounds to make them constant

– Press **Enter**

### Step 7 – Copy the formula

– Using the fill handle from the first cell, drag the formula to the remaining cells

OR

a. Select the cell with the formula and press **Ctrl+C** or choose **Copy** from the context menu (right-click)

b. Select the rest of the cells in the column and press **Ctrl+V** or choose **Paste** from the context menu (right-click)

### Step 8 – Enable the in-column filters

– Go to **Data > Sort & Filter**

– Click on the **Filter **button

### Step 9 – Filter the outliers

– Expand the in-column filter

– Select only the ‘TRUE’ values

– Click **OK**

### Step 10 – Remove the outliers

– Select the filtered data

– Press **Delete**

– Go to **Data > Sort & Filter**

– Click on the **Filter **button to remove the in-column filters

### Step 11 – Check the result

– The outliers are removed