SQL Commands & Syntax Cheat Sheet (PDF Download)
Enter your email below and we’ll send you the Desktop and Mobile Wallpaper version:
Beginner SQL Syntax
This section introduces newcomers to the fundamental SQL commands and their purposes.
SELECT
The SELECT statement is used to select columns in a database. It defines the data you want to retrieve from one or more tables
column_name1,
column_name2,
column_name3
FROM
The FROM clause specifies the table from which to pull the data.
It's used in conjunction with SELECT to define the source of the data.
column_name1,
column_name2,
column_name3
FROM table_name
WHERE
Use the WHERE clause to filter the data based on specific conditions.
It helps in narrowing down the data to only those rows that meet the criteria.
FROM table_name
WHERE date >= ‘2023-01-01’
GROUP BY
The GROUP BY statement groups rows that have the same values in specified columns.
It's often used with aggregate functions (COUNT, MAX, MIN, SUM, AVG) to group the resultset by one or more columns.
column_name,
COUNT(*)
FROM table_name
GROUP BY column_name
HAVING
The HAVING clause is used to filter groups created by the GROUP BY clause.
It's like a WHERE clause, but for groups.
column_name,
COUNT(*)
FROM table_name
GROUP BY column_name
HAVING COUNT(*) > 1
ORDER BY
Use ORDER BY to sort the result set in either ascending or descending order.
It specifies the order in which the rows appear in the resultset.
column_name1,
column_name2,
column_name3,
FROM table_name
ORDER BY column_name3 DESC
Aggregation Functions
This section covers key SQL aggregation functions, which are crucial for summarizing or analyzing data in your database.
SUM(column_name)
The SUM function calculates the total sum of a numeric column.
It adds up all the values in the specified column.
FROM table_name
COUNT()
The COUNT function returns the number of rows that matches a specified criterion.
COUNT() counts all rows in a table.
FROM table_name
COUNT(DISTINCT column_name)
This variant of the COUNT function counts the number of distinct (unique) values in a column.
FROM table_name
AVG(column_name)
The AVG function returns the average value of a numeric column.
It calculates the sum of the values and divides it by the number of values.
FROM table_name
MIN(column_name)
The MIN function returns the smallest value of the selected column.
It's useful for finding the lowest number, earliest date, etc.
FROM table_name
MAX(column_name)
The MAX function returns the largest value of the selected column, which is helpful for identifying the highest number, latest date, and so on.
FROM table_name
Intermediate SQL Concepts
This section delves into intermediate level SQL commands and functions, enhancing your database querying skills.
LIKE
The LIKE operator is used to search for a specified pattern in a column.
It's often used in a WHERE clause.
FROM table_name
WHERE column_name LIKE ‘%thequery%’
AND
The AND operator combines two or more conditions in a WHERE clause. It returns rows where all the conditions are true.
FROM table_name
WHERE condition1
AND condition2
OR
The OR operator is used in a WHERE clause to combine conditions.
It returns rows where any of the conditions are true.
FROM table_name
WHERE condition1
OR condition2
CASE WHEN
The CASE WHEN statement provides if then else logic within SQL.
It's used for conditional expressions.
CASE WHEN condition THEN result
ELSE alternative END
FROM table_name
IN
The IN operator allows you to specify multiple values in a WHERE clause.
It's useful for filtering by a list of items.
FROM table_name
WHERE column_name IN (value1, value2, ...)
UNION ALL
The UNION ALL operator is used to combine the resultset of two or more SELECT statements.
It includes all duplicates.
FROM table1
UNION ALL
SELECT column_name
FROM table2
BETWEEN
The BETWEEN operator selects values within a given range.
It's used in a WHERE clause for filtering.
FROM table_name
WHERE column_name
BETWEEN value1 AND value2
CAST
The CAST function converts one data type into another.
It's used for type conversion in queries.
CAST(column_name AS data_type)
FROM table_name
COALESCE
The COALESCE function returns the first nonnull value in a list of arguments.
It's useful for handling NULL values.
COALESCE(column1, column2, ...)
FROM table_name
Advanced SQL Concepts
This section explores advanced SQL features, ideal for users looking to deepen their understanding and enhance their data manipulation skills.
CTEs (Common Table Expressions)
CTEs, or Common Table Expressions, provide a way to write cleaner and more readable SQL. They allow you to define a temporary result set which you can then reference within your SELECT, INSERT, UPDATE, or DELETE queries.
SELECT *
FROM table_name
) SELECT *
FROM cte_name
SUBQUERIES
Subqueries are nested queries within a larger query.
They're used to perform operations in a sequence and can be found within SELECT, INSERT, UPDATE, or DELETE statements, often in a WHERE or HAVING clause.
FROM table_name
WHERE column_name IN (
SELECT column_name
FROM another_table
)
WINDOW FUNCTIONS
Window Functions perform calculations across a set of table rows that are somehow related to the current row.
They provide powerful tools for data analysis, such as running totals, moving averages, and cumulative counts.
FROM table_name
Joins
Joins are fundamental SQL operations that allow you to combine rows from two or more tables based on a related column between them.
Understanding different join types is crucial for effective data manipulation and analysis.
Here, we focus on three common join types: Inner Join, Left Join, and Full Join.
1. INNER JOIN:
An INNER JOIN selects records that have matching values in both tables.
It's the most common type of join and is used to combine rows from two or more tables where the join condition is met.
FROM table1
INNER JOIN table2
ON table1.column_name = table2.column_name
2. LEFT JOIN (or LEFT OUTER JOIN):
A LEFT JOIN (or LEFT OUTER JOIN) returns all records from the left table (table1), and the matched records from the right table (table2).
The result is NULL from the right side if there is no match.
FROM table1
LEFT JOIN table2
ON table1.column_name = table2.column_name
3. FULL JOIN (or FULL OUTER JOIN):
A FULL JOIN (or FULL OUTER JOIN) combines the results of both LEFT and RIGHT outer joins.
It returns all records when there is a match in either left or right table.
Records that don't match are still included in the result but with NULL values in place of the missing data.
FROM table1
FULL OUTER JOIN table2
ON table1.column_name = table2.column_name;
Understanding these joins and their differences is key to manipulating and retrieving data effectively in SQL.
Each join type serves a specific purpose and can be used in different scenarios to achieve the desired data set.
Rank Functions
Rank functions in SQL are powerful tools used for assigning ranks to rows within a partition or result set.
They are particularly useful in data analysis for sorting and comparing rows based on specific columns.
Here, we'll explore three key rank functions:
ROW_NUMBER:
ROW_NUMBER assigns a unique number to each row starting from 1 for the first row in each partition.
This function does not allow for ties; every row will have a distinct row number.
FROM table_name;
RANK:
RANK provides a rank to each row within a partition, with a unique set of rows.
In case of ties (rows having the same values), RANK will assign the same rank to all tied rows, but will skip subsequent ranks. This is similar to rankings in a golf tournament.
FROM table_name;
DENSE_RANK:
DENSE_RANK is similar to RANK, but it does not skip any ranks in case of ties.
Every distinct value in the order by clause will receive a unique rank, but tied values will all have the same rank, and the next distinct value will receive the next consecutive integer rank.
FROM table_name;
These rank functions are essential for detailed data analysis and can greatly enhance your data querying capabilities in SQL.
Understanding when and how to use these functions will allow you to extract more meaningful insights from your data sets.
Example SQL Patterns
Select Columns Filtered on Criteria
You want to retrieve specific columns from a table, but only for rows that meet certain criteria.
FROM orders
WHERE status = 'paid'
AND date BETWEEN '2023-01-01' and '2023-03-31'
AND email LIKE '%@thequery.io'
The purpose of this query is to retrieve all records from the orders table where the order status is 'paid', the order date is within the first quarter of 2023, and the customer's email is associated with the domain '@thequery.io'.
Such a query is useful in scenarios where a business needs to analyze orders based on specific criteria, such as payment status, date range, and customer email domain.
This can aid in targeted marketing campaigns, sales analysis, or customer segmentation.
Explore Column Values
You want to explore the data you’re working with to better understand it.
status,
COUNT(*) as num
FROM orders
GROUP BY status
ORDER BY num DESC
The purpose of this query is to count the number of orders for each unique status in the orders table and display these counts in descending order.
This allows for an analysis of the most to least common order statuses.
This query is particularly useful for businesses looking to analyze the distribution of order statuses, such as how many orders are in 'pending', 'paid', 'shipped', etc.
It can provide insights into the operational aspects, like which stages of the order process might be bottlenecks or which are functioning smoothly.
Common Aggregations
You want to summarize your data using aggregation functions.
COUNT(*) as num_rows,
MIN(date) as oldest_date,
AVG(revenue) as avg_rev
FROM orders
The purpose of this query is to provide an aggregated summary of the data in the orders table.
It gives the total number of orders, identifies the earliest order date, and calculates the average revenue per order.
This type of query is useful in business analytics for getting a quick overview of key metrics from the dataset.
Knowing the total number of orders, the date when the first order was placed, and the average revenue per order can be critical for understanding the performance and trends in business operations.
Research Duplicates with a Subquery
You want to research any duplicates you have in your table to learn more about what might be causing them.
FROM orders
WHERE order_id IN (
SELECT order_id
FROM orders
GROUP BY order_id
HAVING COUNT(*) > 1)
The main purpose of this query is to identify and extract all rows from the orders table that have duplicate order_ids.
This type of query is particularly useful in data cleaning and integrity checks.
For instance, in a scenario where order_id should be unique, this query helps identify where this uniqueness constraint is being violated.
It's a common practice in database management and data analysis to ensure data quality and consistency.
If/Then Logic
You would like to create a column that applies If, Then, Else logic to categorize or label a record.
*,
CASE WHEN revenue < 0
THEN 1 ELSE 0
END AS is_refund
FROM orders
The purpose of this query is to add an indicator to each order record, showing whether it's a refund (based on negative revenue).
This type of query is particularly useful in financial and sales data analysis.
For instance, a business might use this query to quickly identify and segregate refund transactions from regular sales in their orders data.
It aids in categorizing and analyzing different types of transactions for better financial insights and reporting.
Joins
You have two tables: orders and customers.
Each order in the orders table is associated with a customer, and you want to display the order information along with the customer's phone number.
o.*,
c.phone_number
FROM orders o
LEFT JOIN customers c
ON o.customer_id = c.id
The purpose of this query is to retrieve all order details along with the phone numbers of the customers who placed these orders.
This query is useful in situations where you need to generate comprehensive order reports that include customer contact information.
It's commonly used in sales analysis, customer service applications, and order processing systems to get a unified view of orders and customer details.
Unions
You have two tables, orders_2022 and orders_2023, each containing orders for the respective years.
You want to create a combined list of all orders from both years.
FROM orders_2022
UNION ALL
SELECT *
FROM orders_2023
The purpose of this query is to amalgamate all orders from 2022 and 2023 into a single comprehensive list.
This type of query is useful in scenarios where you need to analyze or report on data spanning multiple time periods or categories that are stored in separate tables.
For example, a business might use this query to assess total orders received over a two-year period, regardless of the year in which the orders were placed.
Change Data Type of Column
In the orders table, you have two columns, sale_date and order_id, which are not in the desired data types for analysis or reporting. sale_date might be stored as a string (varchar) and order_id as a different numeric type or string. You want to convert sale_date to a DATE type and order_id to an INT64 type for proper data handling.
CAST(sale_date AS DATE),
CAST(order_id AS INT64)
FROM orders
The purpose of this query is to ensure that the data types of sale_date and order_id are appropriate for further data manipulation, analysis, or reporting.
Converting data types can be crucial for accurate comparisons, calculations, and to meet schema requirements of downstream systems or processes.
This type of data type conversion is commonly required in data preparation steps.
For instance, when aggregating data from different sources, ensuring data type consistency is vital for accurate analysis.
Another use case could be preparing data for export to systems that require specific data formats.
Handle Nulls with Coalesce
In the customers table, you have two columns for phone numbers: primary_phone and mobile_phone.
Some customers might have only one of these filled, while the other is NULL.
You want to create a query that always returns a phone number for a customer, preferring the primary_phone but using mobile_phone if the primary one is not available.
COALESCE(primary_phone, mobile_phone) AS phone_number
FROM customers
The purpose of this query is to ensure that a phone number is always provided when querying the customers table, thus avoiding NULL values in the output.
It's a method to handle optional or missing data gracefully in the results.
This approach is particularly useful in data reporting or in scenarios where user contact information is required, and the data may come from multiple optional fields.
For example, a customer service report might require a list of customer phone numbers for outreach, and using COALESCE ensures that a phone number is listed even if some customers only provided a mobile number and not a primary number.
CTEs
You want to analyze orders from the orders table where the total amount, including sales tax, exceeds 100.
To do this, you first need to calculate the total amount for each order and then filter out the orders that meet this criterion.
(
SELECT *,
revenue * sales_tax AS total_amount
FROM orders
)
SELECT *
FROM rev_gt_100
WHERE total_amount > 100
The purpose of this query is to identify and retrieve information about orders where the total transaction amount, inclusive of sales tax, is more than 100.
This type of query is particularly useful for financial analysis, where you might want to identify higher-value transactions for further scrutiny, reporting, or targeted marketing campaigns.
CTEs make such queries more readable and maintainable, especially when dealing with complex calculations and multiple steps of data processing.
Window Functions
In the orders table, you want to number each order within its respective type based on the date of the order.
Additionally, you aim to calculate a running total of the revenue to date.
*,
ROW_NUMBER() OVER(PARTITION BY type ORDER BY date) AS idx,
SUM(revenue) OVER(ORDER BY date) AS running_total_revenue
FROM orders
The purpose of this query is to add two pieces of analytical information to each order:
- The order's sequence number within its type category, ordered by date.
- The cumulative revenue up to that order, providing a running total.
Such a query is useful in analyzing trends and patterns within order data.
For instance, businesses might use it to track the growth of different types of orders over time or to understand revenue accumulation trends.
Window functions like these enable sophisticated analyses without the need for complex subqueries or temporary tables.
Conclusion
We hope this SQL Cheat Sheet has equipped you with the tools and confidence to navigate through the vast world of SQL querying, whether you're just starting out or looking to refine your advanced skills.
Remember, the journey in data analysis is one of continuous learning and practice, and SQL is a fundamental skill that opens up a myriad of opportunities in the data-driven world we live in.
Stay Updated with The Query:
Don't miss out on the latest remote data analyst jobs, practice datasets, SQL lessons, and data analyst memes.
Subscribe to our newsletter, The Query!
It’s tailored specifically for data analysts like you.
It's your weekly digest of data analyst goodness to keep you informed and continuously learning.
Subscribe to The Query
Advance Your Career:
If you're seeking new opportunities to apply your SQL prowess and analytical skills, look no further than our job board.
It's a curated list of promising positions in the field of data analysis, connecting you with companies that value your expertise.
Explore exciting career possibilities and take your professional journey to the next level.
Visit Our Job Board for Data Analysts
Thank you for using our SQL Cheat Sheet!
Keep querying and keep learning!
Kyle is a seasoned data analyst with over 10 years of experience, currently serving as the Director of Analytics at his company.
He excels in financial modeling and is well-versed in SQL, Python, Excel/Google Sheets and various data visualization tools. Kyle's unique ability to connect finance and analytics allows him to drive significant business value.
In his free time, he enjoys outdoor activities and applying data analysis to real-life situations — most recently, an automated tracker he built to analyze his running data while training for a marathon.