Naive Bayes: Implementation from scratch.
There are two statistical theorems or philosophies Bayesian and Frequentist. A frequentist will relate any occurrence with past events happenings and any unknown will be related. Bayesian approves that every factor will somehow relate to the occurrence of an event and any unknown will be treated probabilistically i.e. scenarios can always be updated.
Naive Bayes comes under the Bayesianism Statistical approach which means it predicts on the basis of the probability of an event.
What Naive and Bayes means?
- Naive means that the features are independent of each other and the inference of anything will be contributed of all features.
- Bayes is from the Bayes theorem .
According to conditional probability
P(B|A) = P(A∩B)/P(B)
On the left side probability of B given A is equal to the probability of A intersection B means(probability of occurrence of A and B together) divided by the probability of occurrence of B.
P(A|B) = P(B∩A)/P(A)
On the left side probability of A given B is equal to the probability of B intersection A means(probability of occurrence of B and A together) divided by the probability of occurrence of A.
P(A∩B) = P(B∩A) therefore the equation can be written as
P(A|B) = P(B|A).P(A)/P(B)
- P(A|B): Posterior probability.
- P(A): Prior probability.
- P(B|A): Likelihood.
- P(B): Evidence.
In order to avoid the repetition and testing of code again and again let create a class in python named as NaiveBayesClassifier(): here I defined seven functions and all seven uses ‘self’ reason for using self is that self represents the instance of the class and by using self we can access the attributes and methods of the class in python.
The seven functions defined under class NaiveBayesClassifier() are-
- prior_probability(): Probability of hypothesis before observing the evidence.
- statistics(): This calculates mean, variance which needed for gaussian distribution for each column and convert to numpy array
- gaussian_density(): This is the Gaussian Distribution here u signifies the mean, sigma is the standard deviation and sigma² is the variance statistics() function feeds input to it.
- posterior_probability(): This calculates P(A) Probability of hypothesis A on the observed event B.
- fit(): This functions fits our Naive Bayes Model.
- predict(): This function provides predictions.
Our model has an accuracy of 0.92.
Overall our scratch implemented model works fairly enough but there always exist future scope of ifne tuning for increasing accuracy.