Know How to Implementing Logistic Regression from Scratch using Python?

Getting Started with Logistic Regression

There’s a good chance that if you’re doing Python Online Training, you can know about logistic regression and can try to solve problems with different datasets. For any logistic regression task, Logistic Regression is the best tool. But writing your algorithm and seeing how it works is an excellent way to learn about logistic regression. 

It’s always a good idea to write your algorithms from scratch because it gives you a lot of information that you might not have known otherwise. It also helps you remember what you learned about the subject. The more you already know about matrix algebra and Numpy, the better off you’ll be. This article only deals with Numpy arrays so don’t worry if you don’t know what they are. But let’s go ahead and start right now. At this point you may also think of Python Training in Delhi for great future ahead.

For Logistic Regression, you should add libraries

Ahead of anything else: We will first add the libraries and datasets we need. We’re only using Numpy arrays.

Import numpy as np 

From numpy import log, dot, e, shape

Import matplotlib.pyplot as plt

Import the dataset

For this article, we’ll be using sklearn’s make classification dataset, which has four different datasets that we’ll be using.


fromsklearn.datasets import make_classification

X,y = make_classification(n_features = 4,n_classes=2)

fromsklearn.model_selection import train_test_split

X_tr,X_te,y_tr,y_te = train_test_split(X,y,test_size=0.1)


Standardization is the process of scaling data around the mean with a single standard deviation around the middle of the data. That means we are making the mean of the attribute zero, and the standard deviation of the distribution is also zero. Some algorithms can be harmed by features that have different scales. We’ll have difficulty getting accurate results even if we use gradient descent for optimization. For example, if a dataset has two features, age and salary, then the higher range of wages will most likely dominate the outcome. So, it’s a good idea to make sure the data is the same before feeding it to the algorithm. It’s essential to read this article to understand standardization better. It is said in math that it is.

This is often necessary when attributes are from different scales.

def standardize(X_tr):

for i in range(shape(X_tr)[1]):

X_tr[:,i] = (X_tr[:,i] – np.mean(X_tr[:,i]))/np.std(X_tr[:,i])

Initiating the Parameters

There are always a lot of different ways to look at the data. If we do any math, we’ll need to use matrices to do it. It turns out that we need to deal with two matrices at once for input. One is for features, and the other is for parameters or weights. This is our first matrix. It has an mxn dimension, m is the number of observations, and n is the dimension of words. The second one is nx1 in size. Adding a bias column of ones and a corresponding parameter term to our feature vector matrix and weight matrix is what we’re going to do here. A lot of the model’s flexibility comes from having a bias in it.

    def initialize(self,X):

weights = np.zeros((shape(X)[1]+1,1))

        X = np.c_[np.ones((shape(X)[0],1)),X]


Note: The code above starts with zeros as the weight vector. You can choose any other value as well.

The sigmoid function

Y = ax+b for simple data with only one parameter. When we use logistic regression, the response variables are binary, which means they can be “yes” or “no.” This allows us to predict continuous values more accurately. If you want to expect anything but the values between 0 and 1, it doesn’t make sense to use the linear function. And the sigmoid or logistic function is the best way to keep the results of a linear equation from going outside the range [0,1].

As you can see, the sigmoid function meets the y-axis at the point where 0.5 is. Most of the time, we use this point to classify things. Any value above one will be called one, and any value below is 0. This isn’t a rule of thumb, though. When we need to, we can use different values than 0.5 instead. 

python code:

    def sigmoid(self,z):

sig = 1/(1+e**(-z))

return sig

In the above expression, z is the dot product of the mxn matrix containing observations and nx1 matrix of weights.

Cost Function

The cost function or loss function is the function that tells how much the calculated value is different from the actual value. The least squared error is used as the cost function in linear regression. The least squared error function is not convex if you want to do logistic regression. Gradient descent has more chances of us getting stuck in a small area. So we use log loss as the cost function instead.

Where hx = is the sigmoid function we used earlier.

python code:

        def cost(theta):

            z = dot(X,theta)

            cost0 =

            cost1 = (1-y)

cost = -((cost1 + cost0))/len(y) 

return cost

Gradient Descent

The next step is to go down the gradient. Gradient descent is an optimization algorithm that helps you find the best parameters for your project. So, what are the slopes of this picture? It is the vector of the first-order derivative of the cost function. Direction: These are the parts of a process with the steepest rise or peak. As we move down a gradient, we move in a different direction from the slopes. We’ll keep changing the weights until we get to the point where we’re all the same weight. The code for Gradient descent looks something like this in the code.

Here, the alpha is the step size that affects how quickly it gets to the global minimum. This is how it works. If the step size is too small, it will take a long time to reach the minimum, but if it is too big, it may go over the minimum while going down.

It’s called “gradient descent” because it’s how we get the cost function divided.

    def fit(self,X,y,alpha=0.001,iter=100):

params,X = self.initialize(X)

cost_list = np.zeros(iter,)

for i in range(iter):

params = params – alpha * dot(X.T, self.sigmoid(dot(X,params)) – np.reshape(y,(len(y),1)))

cost_list[i] = cost(params)

self.params = params



Everything we’ve done so far is for this step. This is how we’ll use what we learned about the model to predict data that hasn’t been seen before.

    def predict(self,X):

        z = dot(self.initialize(X)[1],self.weights)

lis = []

for i in self.sigmoid(z):

if i>0.5:






After we finish making predictions, we will move on to the F1-score section, where we will see how well our model predicts for data that we haven’t seen yet, and see how well our model does. The F1 score is an excellent way to look at how classification models work. The F1 score is the harmonic mean of precision and recall in math.

Precision = Precision is the number of true positives over the sum of true positives and false positives

Precision = TP/(TP+FP)

Recall = Recall is the number of true positives over the sum of true positives and false negatives.

Recall = TP/(TP+FN)


This article went over different parts of logistic regression and saw how we could use raw python code to do it so that we can do it. But if you’re willing to know by doing actual project, it’s better to do Python Training in Noida from any reputed institution.