Menu

Selva Prabhakaran

Selva is an experienced Data Scientist and leader, specializing in executing AI projects for large companies. Selva started machinelearningplus to make Data Science / ML / AI accessible to everyone. The website enjoys 4 Million+ readership. His courses, lessons, and videos are loved by hundreds of thousands of students and practitioners.

Build a Custom Scikit-Learn Regression Model: Step-by-Step Guide

Creating custom regressors in scikit-learn means building your own machine learning models that follow scikit-learn’s API conventions, allowing them to work seamlessly with pipelines, grid search, and all other scikit-learn tools. Ever hit a wall where existing scikit-learn regressors just don’t fit your specific problem? Maybe you need a model that minimizes a custom loss […]

Build a Custom Scikit-Learn Regression Model: Step-by-Step Guide Read More »

Understanding Confidence Intervals: A spelled out guide to clarify misconceptions

If you’ve ever read a scientific study, survey results, or even a political poll, you’ve probably encountered confidence intervals (CIs). They’re one of the most useful—yet often misunderstood—concepts in statistics. So, what exactly are they, and why do they matter? What Is a Confidence Interval? A confidence interval is a range of values, derived from

Understanding Confidence Intervals: A spelled out guide to clarify misconceptions Read More »

Optimizing Chunk Size in RAG Systems - Feature Image

Optimizing RAG Chunk Size: Your Definitive Guide to Better Retrieval Accuracy

Optimal chunk size for RAG systems typically ranges from 128-512 tokens, with smaller chunks (128-256 tokens) excelling at precise fact-based queries while larger chunks (256-512 tokens) provide better context for complex reasoning tasks. The key is balancing retrieval precision with context retention based on your specific use case. Ever built a RAG system that gave

Optimizing RAG Chunk Size: Your Definitive Guide to Better Retrieval Accuracy Read More »

Ridge Regression as MAP Estimation – Supporting notes

This is one of the most beautiful connections in machine learning – let me break down exactly why Ridge regression is MAP estimation in disguise. Let’s look at the concept step-by-step with a concrete numerical example. This is supporting notes to the MAP explanation where we see Ridge Regression as MAP estimation in the explanation

Ridge Regression as MAP Estimation – Supporting notes Read More »

Maximum A Posteriori (MAP) Estimation – Clearly Explained

Maximum A Posteriori (MAP) estimation is a Bayesian method for finding the most likely parameter values given observed data and prior knowledge. Unlike maximum likelihood estimation which only considers the data, MAP combines what we observe with what we already know (or believe) about the parameters. Ever wondered how your smartphone’s autocorrect gets better over

Maximum A Posteriori (MAP) Estimation – Clearly Explained Read More »

Flow Diagram for Relevant Segment Extraction RSE technique for RAG System

Relevant Segment Extraction (RSE) – Building better Context by assembling contiguous chunks for better RAG Performance

Relevant Segment Extraction (RSE) is a query-time post-processing technique that intelligently combines related text chunks into longer, coherent segments, providing LLMs with better context than individual chunks alone. RSE addresses the fundamental limitation of fixed-size chunking by dynamically reconstructing meaningful text segments based on relevance clustering. Ever asked a question to your RAG chatbot and

Relevant Segment Extraction (RSE) – Building better Context by assembling contiguous chunks for better RAG Performance Read More »

CUDA Programming: Do Large scale parallel computing in GPU from Scratch

While GPUs ((Graphics Processing Unit) are in high demand in video games, with the rise of Large Language Models (LLMs), GPUs are in high demand. They give the much needed large scale parallel computing capability required to train the LLMs (having billions of parameters). So what is CUDA? CUDA stands for Compute Unified Device Architecture

CUDA Programming: Do Large scale parallel computing in GPU from Scratch Read More »

Mutual information vs Cross Entropy

Cross-entropy is a measure of error, while mutual information measures the shared information between two variable. Both concepts used in information theory, but they serve different purposes and are applied in different contexts. Let’s understand both in complete detail. Cross-Entropy Cross-entropy measures the difference between two probability distributions. Specifically, it quantifies the amount of additional

Mutual information vs Cross Entropy Read More »

Bayesian Optimization for Hyperparameter Tuning – Clearly explained.

Bayesian Optimization is a method used for optimizing ‘expensive-to-evaluate’ functions, particularly useful in hyperparameter tuning for machine learning models. Let’s understand how it works and the math behind it with all the detail. Overview of Bayesian Optimization Bayesian optimization for hyperparameter tuning involves the following steps. We will break it down to simple details after

Bayesian Optimization for Hyperparameter Tuning – Clearly explained. Read More »

joint probability

Difference between Joint probability and Conditional probability

Joint probability and conditional probability are two concepts in probability theory that deal with the likelihood of events, but they are used in different contexts and measure different things. Joint Probability Definition: Joint probability is the probability of two events happening at the same time. Notation: $( P(A \cap B) ) or ( P(A \text{

Difference between Joint probability and Conditional probability Read More »

KL Divergence

KL Divergence – What is it and mathematical details explained

At its core, KL (Kullback-Leibler) Divergence is a statistical measure that quantifies the dissimilarity between two probability distributions. Think of it like a mathematical ruler that tells us the “distance” or difference between two probability distributions. Remember, in data science, we’re often working with probabilities – the chances of events happening. So, if we have

KL Divergence – What is it and mathematical details explained Read More »

How to get records from one table that does not exist in another

How to get records from one table that does not exist in another?

Problem You have two tables, tableA and tableB. You want to retrieve all records from tableA that do not have a matching record in tableB based on a specific column. Input Let’s start with creating two tables and populating them with data. tableA – This will be our primary table where we’ll select data from.

How to get records from one table that does not exist in another? Read More »

How to access the _previous row_ value in a SELECT statement in SQL

How to access the “previous row” value in a SELECT statement in SQL?

Problem How to access the “previous row” value in a SELECT statement in SQL? Input id sale_date units_sold 1 2023-01-01 10 2 2023-01-02 15 3 2023-01-03 12 4 2023-01-04 20 Try Hands-On: HERE Source Tables: Gist Desired Solution sale_date current_day_sales previous_day_sales 2023-01-01 10 2023-01-02 15 10 2023-01-03 12 15 2023-01-04 20 12 Solution 1: To

How to access the “previous row” value in a SELECT statement in SQL? Read More »

Scroll to Top
Course Preview

Machine Learning A-Z™: Hands-On Python & R In Data Science

Free Sample Videos:

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science

Scroll to Top