# Data Science Cheat Sheet For Beginners

## Introduction

If you are a Data Scientist, you’re well aware of the numerous SQL statements, excel formulas, functions, and algorithms in your profession. While the ones you use often are undoubtedly mastered, sometimes you need to leap into a project that demands different applications or new tools of your programming language of preference.

This is a specially drafted list of Data Science cheat sheets. These Data Science cheat sheet resources will make your work easier and help you become a better Data Scientist. Read this to uncover the best references for Python, SQL, Machine Learning, seaborn and more.

## Machine Learning

Machine Learning is changing our society, and Data Scientists are propelling that transformation. Machine Learning is used in our automated systems, Facebook algorithms, and Search engine results. However, there is a significant amount of programming that goes into constructing the Machine Learning models that customers deal with daily. It all starts with massive datasets and a lot of creative code.

The instant Machine Learning algorithms cheat sheet will be invaluable for Data Scientists who specialize in Machine Learning and analysts who are preparing to enter this booming domain.

## Supervised Learning

Supervised learning algorithms aim to predict trends acquired in previous information on unknown data by mapping inputs to outputs. Supervised learning models can be either regression models, which strive to determine a continuous variable, or which attempt to predict a binary or multi-class variable

Here we have two types of supervised learning models-

• Linear models
• Tree-based models

Linear models

The outputs of linear models are a linear arrangement of characteristics. In this part, we will discuss the most used linear models in machine learning:

 Algorithm Description Applications Linear Regression An approach for modeling a linear connection between inputs and a numeric output variable. Stock Price Forecast Housing price forecasting Customer lifetime value prediction Logistic Regression An algorithm that represents a linear connection between inputs and a category output 1 or 0. Credit risk score prediction Customer churn forecast Ridge Regression It is a member of the regression family that penalizes characteristics with poorly predicted outcomes by decreasing their coefficients closer to zero. It is relevant for classification and regression. Automobile predictive maintenance  Sales revenue forecasting Lasso Regression It is a member of the regression family that penalizes characteristics with poorly predicted outcomes by reducing their coefficients to zero. It is relevant for classification and regression. Housing price forecasting Clinical outcome prediction using health data

Tree-based models

To forecast from decision trees, tree-based models employ a set of “if-then” rules. In this part, we will go through some of the most often used linear models in machine learning.

 Algorithm Description Applications Decision Tree To create predictions, Decision Tree models apply decision rules to features. It is relevant for classification and regression. Customer churn forecast Disease prediction credit score modeling Random Forests A form of ensemble learning that integrates the output of several decision trees. Modeling of credit scores Housing price forecasting Gradient Boosting Regression Gradient Boosting Regression uses boosting to create predictive models from a group of poor predictive learners. Car emission forecasting Estimating ride-hailing fee XGBoost The Gradient Boosting algorithm is an effective and adaptable boosting method. It is relevant for both classification and regression problems. Churn prediction Insurance claims processing LightGBM Regressor A gradient boosting framework that is intended to be more effective than existing approaches. Flight time prediction for airlines Using health data to predict cholesterol levels

## Unsupervised Learning Algorithm cheat sheet

Unsupervised learning is concerned with identifying broad patterns in data. This form of segmentation is generalizable and used for a wide range of objects. Clustering methods learn how to group like data points together, and association algorithms group distinct data points depending on predefined criteria.

### Clustering models

 Algorithm Description Applications K-Means The most used approach—it dervies K clusters based on euclidean distances Recommendation systems Customer segmentation Hierarchical Clustering A bottom-up methodology in which each data point is considered as its cluster, and the nearest two clusters are continually merged together. Detection of Fraud Similarity-based document clustering Gaussian Mixture Models A probabilistic approach for representing evenly distributed clusters in a dataset. Recommendation systems Customer segmentation

### Association

 Algorithm Description Applications Apriori Algorithm A rule-based technique that determines the most frequent itemset in a given dataset using prior information of frequent itemset attributes. Recommendation engines Promotion optimization

SQL

Data Scientists use SQL worldwide to arrange data into tables and deal with different datasets. SQL is often used to extract the necessary data for a specific study, followed by Python and its many specialized modules to handle the challenging project.

As a Data Scientist, you will utilize the following SQL commands and functions:

Basic SQL cheat Sheet

### Important keywords

 Keyword Description SELECT state which columns to query. FROM Declares which table/view to choose from WHERE gives a condition = compare a value to a given input LIKE used with the where clause to get a specific pattern in a column GROUP BY Sets similar data into groups HAVING Specifies only rows where aggregate values match the specified conditions should be returned. INNER JOIN Gives all rows where the record of one table is similar to the records of another table. LEFT JOIN Gives all rows from the left with similar rows on the right. RIGHT JOIN Gives all rows from the right table with similar rows on the left. FULL OUTER JOIN Gives rows similar either in the left or right table

### Aggregate functions

 Function Description COUNT Give the no. of rows in a table. SUM Add the values AVG Gives the avg for of values MIN Gives the smallest value of the group MAX Gives the largest value of the group

### Querying data

 SQL Description SELECT student FROM class Select data in column student from a table named class SELECT * FROM class Select rows and columns from a table class SELECT student FROM class WHERE student = ‘Alex’ Select data in column student from a table class where student = ‘Alex’ SELECT student FROM class ORDER BY student ASC (DESC) Select data in column student from a table class and order by student. (in asc by default or desc order) SELECT student FROM class ORDER BY student LIMIT n OFFSET offset Select data in column student from a table class and skip offset of rows and gives the next n rows SELECT student, aggregate(subject) FROM class GROUP BY student Select data in column student from a table class and group rows with aggregate function SELECT student, aggregate(subject) FROM class GROUP BY HAVING clause Select data in column student from a table class and group rows with aggregate function and filter groups using the HAVING condition.

### Data modification

 SQL Description INSERT INTO class(columnfirst) VALUES(list_value) Insert a row into a table class INSERT INTO class(columnlist) VALUES (list_value), (list_value), … Insert rows into a table class INSERT INTO class(columnlist) SELECT columnlist FROM subject Insert rows from subject into a table class UPDATE Class SET student = newvalue Update a new value in table class in the column student for all rows UPDATE Class SET student = newvalue, father_name = new_value WHERE condition Update values in column student and father_name in table class that meet the condition DELETE FROM class Delete rows from a table class DELETE FROM class WHERE condition Delete all rows from table class that meet a certain condition

## Math

Data Science is a highly difficult discipline that necessitates some pretty good mathematics. Depending on your field of study, you may be required to use calculus, linear algebra, and statistics regularly. To progress in the discipline, Data Scientists must comprehensively know the ideas and how they apply in various contexts.

They are tools for Data Science students and experts to find a certain equation or double-check their work swiftly.

Even for competent Data Scientists, many of these equations might get hazy if not used daily. This is  your quick-reference basic linear algebra data Science cheat sheet, containing basic terminology that Data Scientists might need.

# Cheat Sheet for Linear Algebra

## Notation

 TERM NOTATION vector denoted by small letter v with arrow above scalar any real number, e.g.  2, 1,⅓ or π matrix A, represented by capital letter and equals a m × n  matrix m × n m rows times n columns basis vectors represented by letters i, j and k with a ^ hat over mapping T:Rm →Rn, Changing from m to n determinant scalar, the area or volume of vectors cross product length perpendicular to the plane of two vectors in three dimensions dot product scalar, when one vector meets another vector

Data Science Resources

If you’re just starting your career in Data Science or are still studying to become a Data Scientist, you need to brush up on essential terminology and Excel functions. This cheat sheet will give important shortcuts and commands and paste-able formulae that will save you time.

## Excel cheat sheet

 Function Shortcut Add Current Date ctrl+; Add Current Time shift+ctrl+; Edit Cell Comment shit+F2 Show Active Cell ctrl+backspace Add Column alt+lC Add Row alt+lR Fill Down ctrl+D Fill Right ctrl+R Save Workbook shift+alt+F2 Add Chart Alt+F1 Move to Last ctrl+END

## Excel cell reference cheat sheet

Formulas require a cell reference. Defining the cell reference will affect how the formula is implied and copied from one to another.

 Relative Cell Reference =A2+B2 Absolute Cell Reference +\$A\$1

## Excel date and time cheat sheet

 Function Syntax Description DATE DATE(year, month, day) returns a date given the parameters of year, month, date. DATEDIF DATEDIF(startdate,enddate,unit) calculates the time between two given dates. DAY DAY(serial no.) returns the actual day of a date (integer between to 31) EDATE EDATE(startdate, months) adds a period of months onto a start date. EOMONTH EOMONTH(start_date, months) same as the EDATE, returns the last period in the month. NOW NOW() returns the serial no. showing the date at the real time TODAY TODAY() returns the serial no. showing the date YEAR YEAR() returns the serial no. showing the date into a year.

### Conclusion

In this article, the recommended cheat sheets are a narrowed-down list of the best. They will keep you covered in the projects and help you brush up on your skills.

It’s critical to stay up with innovations in this fast-changing digital industry, no matter where you are on your Data Science journey. Every aspect of your profession is prone to change and progress with time. Data analysis programming languages, tools, and procedures are upgrading and becoming more robust. It is one of the best things that makes this profession so appealing.

Learning is a never-ending process. So, continue learning and advance professionally. Enroll in the latest online programs and webinars on big data, deep learning, Machine Learning, or Artificial intelligence if you want to dive further into a specific field of Data Science.

} }