Image for post
Image for post
Photo by Nicolas Hoizey on Unsplash

Kaggle recently introduced the Tabular Playground Series. These are month-long competitions that are targeted at folks new to data science, ML competitions, or both. I participated in the February challenge which provided tabular data consisting of both categorical and continuous features, with a goal of predicting a continuous target. I finished solidly in the middle of the pack; my best score was an RMSE of 0.84599 versus the winner’s score of 0.84115. Although I missed out on what I’m sure is awesome Kaggle swag, I was rewarded with a great learning experience. …


Image for post
Image for post

Visualizations are a great way to quickly understand a new dataset. They make it easier to identify correlations between the various columns, as well as identify informative patterns in the data. There are several visualization libraries available for Python users such as matplotlib, seaborn, plotly, and graphiz. Since both R and Python are commonly used in data science and analytics, you may find yourself going between both languages. Maybe your organization is converting projects from R to Python, or you are an R user that has joined a team that works exclusively in Python. …


Image for post
Image for post
Photo by Jason Rosewell on Unsplash

Data derived from speech has several application such as improving service at call centers and personalizing the user experience on speech-enabled devices. Detecting emotion from speech is critical in drawing meaning from this data. For this project, we’ll start with DataFlair’s implementation of the Multi-layer Perceptron (MLP) classifier, an Artificial Neural Network (ANN). The dataset comprises of 768 audio clips obtained from RAVDESS. Each audio clip is of an actor portraying an emotion, with the project honing in on clips that convey calm, happiness, fear, or disgust. I will focus here on my additions to the initial project. …


Image for post
Image for post
Photo by Lukas Blazek on Unsplash

Over the past 7 months or so, I have been working on strengthening my data analytics skills by learning how to code in Python and exploring machine and deep learning methods. In addition to online resources, I also recently completed a 9-week intensive data science bootcamp. Since then, I continue to work on personal projects so as to keep those skills sharp. My experience in Lean Six Sigma process optimization relied heavily on analyzing (largely numerical) data to identify root causes of process issues and find opportunities for simplification and sustainability. …

Njeri Gachago

Exploring the applications of programming in data science

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store