How to Build a Custom Instruction Dataset for LLM Fine-Tuning
Learn how to build a custom instruction dataset for LLM fine-tuning — covering Alpaca, ShareGPT, and DPO formats, quality filtering, synthetic data generation, token...
Learn how to build a custom instruction dataset for LLM fine-tuning — covering Alpaca, ShareGPT, and DPO formats, quality filtering, synthetic data generation, token...
11 min
Creating custom regressors in scikit-learn means building your own machine learning models that follow scikit-learn’s API conventions, allowing them to work seamlessly with pipelines,...
Whether you are a beginner or looking to level up your Data Science / AI / ML skills, I’ve put together a structured guide...
Cross-entropy is a measure of error, while mutual information measures the shared information between two variable. Both concepts used in information theory, but they...
Bayesian Optimization is a method used for optimizing ‘expensive-to-evaluate’ functions, particularly useful in hyperparameter tuning for machine learning models. Let’s understand how it works...
3 min
At its core, KL (Kullback-Leibler) Divergence is a statistical measure that quantifies the dissimilarity between two probability distributions. Think of it like a mathematical...
The Probe method is a highly intuitive approach to feature selection. If a feature in the dataset contains only random numbers, it is not...
Cook’s distance is a measure computed to measure the influence exerted by each observation on the trained model. It is measured by building a...
Z score, also called as standard score, is used to scale the features in a dataset for machine learning model training. It can also...
Z score is one of the most important concepts in statistics. It is also called standard score. Typically it is used to scale the...
Let’s understand what are outliers, how to identify them using IQR and Boxplots and how to treat them if appropriate. 1. What are outliers?...
5 min
MICE Imputation, short for ‘Multiple Imputation by Chained Equation’ is an advanced missing data imputation technique that uses multiple iterations of Machine Learning model...
3 min
Spline interpolation is a special type of interpolation where a piecewise lower order polynomial called spline is fitted to the datapoints. That is, instead...
3 min
Interpolation can be used to impute missing data. Let’s see the formula and how to implement in Python. But, you need to be careful...
8 min
Machine Learning works on the idea of garbage in – garbage out. If you put in useless junk data to the machine learning algorithm,...
6 min
Exploratory Data Analysis, simply referred to as EDA, is the step where you understand the data in detail. You understand each variable individually by...
6 min
ML modeling is the step where machine learning is used to find patterns in data and use that learned knowledge to predict an outcome....
6 min
Adaboost is one of the earliest implementations of the boosting algorithm. It forms the base of other boosting algorithms, like gradient boosting and XGBoost....
Let’s understand how to define and formulate the machine learning problem (for predictive modeling) from a business problem. This structured approach should help you...
2 min
Let’s build your first machine learning project with Python from scratch. “But I am a complete beginner, I am not ready yet!..” – Your...
Get the exact 10-course programming foundation that Data Science professionals use.