One hot encoding is a method of converting categorical variables into numerical form. It is a preprocessing needed for some machine learning algorithms to improve performance. In this article, we will learn how to do one-hot encoding in R.

For example, let’s say we have the following list of expenses with categories.

Category | Amount | |
---|---|---|

Transport | 100 | |

Grocery | 300 | |

Bills | 200 |

We would like to convert this data to numerical form. Here is an example.

Amount | Transport | Grocery | Bills |
---|---|---|---|

100 | 1 | 0 | 0 |

300 | 0 | 1 | 0 |

200 | 0 | 0 | 1 |

We can see that each row now has an entry for each category. There is a 0 if the row doesn’t have the category and 1 if the row does. As you may be able to tell, we will gain a lot of extra features from this type of encoding, but that is not always a problem.

To start, let’s create our data set. The eample here is a list of expenses and their respective categories.

```
import pandas as pd
df = pd.DataFrame({
"category": ["Transport", "Grocery", "Bills"],
"amount": [100, 300, 200]
})
df.head()
```

category | amount | |
---|---|---|

0 | Transport | 100 |

1 | Grocery | 300 |

2 | Bills | 200 |

To encode categorical variables, we can use the `get_dummies`

method from `pandas`

. We pass our data frame to the method and it returns a new data frome encoded with one hot encoding.

`pd.get_dummies(df)`

amount | category_Bills | category_Grocery | category_Transport | |
---|---|---|---|---|

0 | 100 | 0 | 0 | 1 |

1 | 300 | 0 | 1 | 0 |

2 | 200 | 1 | 0 | 0 |