Data Analytics Projects via Google Colab


Decision tree analysis & predictions


The following is a sample project which utilized supervised learning to identify which group of people are most likely to own a laptop. Hence the decision tree method for classification was used to obtain the latter. The goal is to create a model that is able to predict the outcome of a target variable which for our case is the ownership of a laptop by learning simple decision rules inferred from the data features. The idea is when the target audience has been identified, we can proceed to build targeted ads towards those segments instead of broadcasting to all groups which lead to more efficiency in terms of budget and resources. In this example, the prediction model shows that it’s most likely for a female banker aged of 40 years old living in the city region with a net income of $20,000 to own a laptop.

Link to Google Colabs Files








NLTK sentiment analysis on products reviews in Amazon


For this project, sentiment analysis via Natural Language Toolkit - NLTK was utilized to determine the ratio of positive to negative product reviews from a merchant’s page in Amazon selling unlocked Samsung mobile phones. This method enables us to analyze bodies of text such as comments or reviews to obtain more insights from the buyer’s opinion regarding the product. The advantage of using such methods is that we can sweep through an entire database of product reviews with ease and conclude whether a product is worth investing which will be useful from a business point of view. Another applicable scenario would be to utilize sentiment analysis to provide an unbiased rating towards a product based on the buyer's comments rather than the usual 5 star-rating system. In this example, this merchant is considered to be worth buying from as most of the reviews are towards a positive sentiment.

Link to Google Colabs File











Natural language processing of fictional novels


Introducing Natural Language Processing – NLP into current practices allows companies to analyze data to find what’s relevant amidst the chaos and gain valuable insights that help automate tasks and drive business decisions. In this project, NLP has been used to identify the top 50 tokens of words being used from a novel extracted from the Gutenberg.org library - Dracula which identified the word "Van Helsing" as the highest word count used. NLP can help in classification of text or books to indicate if it’s children friendly or contains any adult contents that may require some sort of censorship with ease.

Link to Google Colabs File







Dashboard via Google Data Studio

Data Visualization via Data Studios


Data Studios enables users to create and share engaging reports and data visualizations. This platform helps to transform raw data into the metrics and dimensions needed to create easy-to-understand reports and dashboards with no prior knowledge of coding or queries required. We can easily connect data from spreadsheets without the requirement of creating tiresome pivot charts or tables and able to manipulate variables easily. This example is to study the underlying causes of employee turnover based on HR database of 15000 employees. Upon research, it’s found that the main contributing factor that causes employees to leave is due to lack of promotions. Two key departments were identified to have highest employee turnover which were HR and accounting. Employees with 3-5 years of experience are most likely to leave compared to new hires and those who have longer working experience. These older employees also tend to lower satisfaction value however they will not leave to another company. Hence, we have identified the key target employees and recognize and reward them when its due.

Link to Data Studio File












Art Creation Using Machine learning

Adding color to old photos using DeOldify


Another interesting application of Machine Learning is to giving life to old black and white photos. We are abled to colorized black and white images via DeOldify which is a library created by Jason Antic. It allows users to create hyper-realistic colorized images and videos as well. The picture on right is an image of my late uncle which was colorized via DeOldify.

Link to Data Studio File