The Art of the Start with SQL

Magnus PS
6 min readAug 27, 2021

--

In the last post we introduced the notion that data is the commodity of the future and that SQL is the key to accessing this data:

Those that can best gather and analyze this swell of data can and will hold a serious competitive advantage over those who cannot.

But before we can step through the door, before we can apply advanced tools and algorithms, we’ve first got to open the door. We’ve got to open the door between us and our data.

- The Key: An Intro to SQL

We started with ‘Why?’ and now can move on to the ‘How?’. More specifically, we’ll address three common stumbling blocks:

  1. There are 1000 and 1 articles online and a slew of different RDBMSs, how do I get setup?
  2. I’m not a tech person and have 0 programming experience, where do I start?
  3. There are all sorts of articles online for “beginner” to “intermediate” level SQL, how do I know who to listen to and what are actually the most important concepts for a beginner?

After going through these questions one-by-one, we can build atop the foundation of understanding why this sort of work is important with how to get started.

Where the last post explained why you may want to go fishing or become a fisherman, here we get you setup.

We put YOU on the fishing boat ;)

Photo by stephen momot on Unsplash

There are 1000 and 1 articles online and a slew of different RDBMSs, how do I get setup? What RDBMS should I use?

I won’t cover all the RDBMSs here.

At work, we use Postgresql as our relational database system and dbt or PopSQL to explore and transform our data. These are paid accounts though, and so not as good a place to start.

In school and in many online courses I’ve taken, we used MySQL Server for the database and MySQL Workbench as our SQL editor. I think the former is a great place to start.

To get setup with MySQL, check out this article.

Photo by Danielle MacInnes on Unsplash

I’m not a tech person and have 0 programming experience, where do I start?

This is a BIG question and so I’ll focus on the broader issue at play here: your inexperience doesn’t matter.

Let’s say you do have 0 technical know-how and 0 experience with any sort of programming or technical problem-solving.

That’s just the starting line.

Some started earlier, some started later, but everybody was in that place at some point and so you’re right where you need to be. At the beginning.

With that said, and assuming you truly do have 0 experience solving technical problems, data work is problem solving in its purest form and so before moving on to any sort of higher order data work (ie. “I want to be a data scientist”), we must first make sure that we can do the fundamentals.

Being that data preparation takes up the majority of time for any sort of data role or project, SQL is as good a place as any to start.

And from here, we choose our level of commitment, who to learn from, and how we’ll build our skills.

For online courses, I’d recommend Udemy’s Master SQL for Data Science and Datacamp. If this is your 1st SQL course, I’d recommend waiting until Udemy has a sale (for a price drop) and starting with Master SQL.

Google, StackExchange, and a Medium subscription can complement your learning as well. I’m constantly surprised how useful some of the articles on Medium / Toward Data Science are and would highly recommend a sub (if you don’t already have one).

I could list a hundred and one different resources, but it’s good, especially at the start, to keep the list of resources short. To focus, establish the fundamentals, and build from there.

Photo by Caleb Woods on Unsplash

This, naturally, leads to the question of …

What are the fundamentals? What are the most important concepts for a beginner?

The most important concepts for a beginner (in my opinion) would be to understand how to use:

  1. select and from
  2. where, order by, and limit
  3. aliasing and case statements
  4. join and group by (using aggregate functions)
  5. subqueries

Here’s a basic cheat sheet to highlight the syntax of these basic clauses and show how to write different conditions.

From an understanding of these concepts, we can build broad-ranging queries and establish a solid foundation.

Let’s explore a beginner-level query line-by-line:

K.I.S.S.

Note: once you start working with SQL you may notice that every individual and every team has different norms, including using CAPS (all uppercase) vs lowercase in their code base. Being that keywords are already highlighted, I typically leave them lowercase but you’ll notice that many courses still capitalize so ultimately it’s up to you, who you learn from, and who you work with.

Anyway …

  • `select` is used to specify the fields / columns with which we’re interested in. If we were to have specified `select *` that would have returned all columns but we were only interested in returning 4 columns.
  • `(quantity_stock — quantity_sold) as quantity_remaining` first calculates the difference between quantity_stock and quantity_sold and then assigns the result to a new field named quantity_remaining. This variable represents how much of a particular fruit remains at the end of a day so that the grocery procurement team can re-stock.
  • `from grocery_fruit` is used to specify the table from which we pull our data (grocery_fruit).
  • `where quantity_in_stock > 0` means that we filter our rows to only return fruit whose stock was greater than zero.
  • `order by 3` means that we order our returned rows by our 3rd field (quantity_remaining). These rows are then set in ascending order by default so that we see the fruit with the lowest remaining quantity.
  • `limit 10` means that we only return the 1st 10 rows.

If we define data preparation as our ability to transform and organize our data so that we can better answer the questions we’re posed, then we also might want to visit the question this query was aimed at answering: “What 10 fruits do we most need to re-stock after today’s sales?”

Typically we’d start with the question, build the query, and then optimize until our results accurately answer the question at hand.

The basic idea being to select columns of interest from the table of interest under specified conditions and return the result in the order we desire.

Tell me and I forget, teach me and I may remember, involve me and I learn.
-Benjamin Franklin

We’ve gone over how to get setup, where to start, and what the most important concepts are for a beginner.

There’s much more to cover, even at a beginner-level, but at least now we’ve put you on the boat.

From this point, the choice is yours.

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

--

--

Magnus PS
Magnus PS

Written by Magnus PS

I'm interested in data, health and mindset. I work for the Data Science team @ Fortegra.

No responses yet

Write a response