Intro to T-SQL for Data Science

Intro to T-SQL for Data Science

Data Science uses multitudes of scientific methods, algorithms, and processes to extract knowledge and insight from data and uses it across wide range of domains. Data Science helps to extract patterns from raw data which would have been invisible beforehand. In order to access and experiment with these data, a Database is essential. Thus, today we’ll learn about the Database as a whole and use examples and features of T-SQL to learn to find meaning out of raw data.

Check out other articles on SQL.

Transact SQL(T-SQL)

T-SQL is one of the mostly widely used SQL Derivatives. It is known as a transactional language used to define how things are to be done. T SQL is one of the easiest and most effective way to get things done. It is importantly used to create applications and also to add business login into the application the backend systems.

Tools

  • Cloud VS On-Premise:


What is Cloud?

Cloud or Cloud Computing can be defined as the on-demand availability of the ecosystem of computer resources ranging from data storage (cloud storage) to computer power without the user having to actively manage the system. These are the data centers which are available to users across the internet. We can say, Cloud is the architecture of the system where someone else owns the hardware such as AWS, Azure and Alibaba Cloud and the user can use it for their need at the time of convenience. This can also be called as Off-Premise.

On-Premises

On-premises is the scenario where the user owns the Hardware and the software systems are deployed on the very premises of the user or of the organization instead of remote facility for instance, Cloud Computing or Server Farm.

If you want to install in your own machine,

Download and Install SQL Server Express

-> SQL Server 2019 Express Edition

Editions of SQL Server

  • Express — which is free
  • Web
  • Standard
  • Enterprise
  • Developer


Tools of the Trade (IDE)

SSMS


SQL Server Management Studio (SSMS) is an application software which is widely used to configure, manage and administer different components within the Microsoft SQL Server.

VS Code

Visual Studio (VS) Code is a free software developed by Microsoft for Windows, Linux and macOS which developers and engineering write and edit code, debug, refactor and provides multiple other functionalities.

What is an IDE?

In layman’s terms, this is an environment where you can write your code and/or perform your analysis. For E.g., Microsoft SQL Server Management Studio, VS Code, Azure Data Studio.

We would highly suggest you to try out Azure Data Studio to start out T-SQL journey as it is free and to learn from the experience.

Database

Database can be defined as the collections of information which is organized such that the information could be accessed and worked upon. It is often controlled by a database management system which all together with database is known as a database system, in short form just called database.

Database Management System (DBMS)

A DBMS acts as the interface between the database and its end-users such that the users can access, manage and update the information.

Database Objects

Database objects can be understood as the objects in a database which are used in order to store or refer to the data. Tables, Views, Sequences, Indexes, Clusters and Synonyms are all examples of the database objects.

Table

A set of related data stored in our database

Excel Reference

A sheet in Excel. Just like Excel contains spreadsheets, relational databases are composed of tables.

Let us take sample data (car_data.csv from Kaggle).

What is a Column?

A set of data of a particular type, generally there is a value for each row in our table.

No alt text provided for this image



What is a Row?

A collection of fields that make up a record.


No alt text provided for this image


What is a Field?

The smallest source of data in a database

No alt text provided for this image


Logical Processing Order

The logical processing order of a SELECT statement explains the methods of how the query will be process and the final result will be achieved.

  • FROM
  • ON
  • JOIN
  • WHERE
  • GROUP BY
  • WITH CUBE or WITH ROLLUP
  • HAVING
  • SELECT
  • DISTINCT
  • ORDER BY
  • TOP


Database Objects

Working with Data:

Selecting

This is the process of retrieving one or many rows or columns from one or many tables in a database. E.g., Returning 20 records with our 11 selected columns from the Cars Table

Filtering

This is the process of excluding data based on predefined rules. E.g., Returning 12 records with our 11 selected columns from the Cars Table where the cars have a body style of type “convertible”

Ordering

This is the process of sorting a dataset based on a specified sort order. E.g., Returning 12 records with our 11 selected columns from the Cars Table where the cars have a body style of type “convertible” and ordering them from the widest based on the Wheelbase.

Grouping

Grouping data within a dataset typically over 1 or more columns. E.g., Returning All the records from the Cars dataset, grouping the data over the Make to see how many Models each Make has that is of Body_Style Convertible and what is the Average Price for each Model.

Modifying

Permanent VS Temporary data modifications. For Permanent we use the UPDATE statement, for temporary we can use what best suite our needs. E.g., We are correcting the spelling error of ‘Audi’ in the Make the top statement will only display the corrected data, where the bottom will permanently change the underlying data.

Multiple Datasets

Working with Multiple datasets

Joining

Joining 2 or more datasets together on a common “join” key. The SQL join allows us to collect two or more tables using the common identifiers. There are different types of join such as: Inner Join, Outer Join, Cross Join and Semi-join.

If you want to dive deeper into T — SQL, watch this video embedded below,

SET Operations

The SET Operations like Union, Intersect, Minus help us get meaningful outputs from the data stored in table under various special conditions.

Check out my full article at: https://www.c-sharpcorner.com/article/intro-to-t-sql-for-data-science/

要查看或添加评论,请登录

Ojash Shrestha的更多文章

  • Azure Cognitive Services

    Azure Cognitive Services

    In this article, we’ll learn about the Azure Cognitive Services, how it integrates into various other tools provided by…

    2 条评论
  • XGBoost 101

    XGBoost 101

    XGBoost 101 In the previous article, we got introduced to XGBoost and learned about various reasons for its wide…

  • XGBoost — The Undisputed GOAT!

    XGBoost — The Undisputed GOAT!

    In this article, we’ll learn about XGBoost, its background, its widely accepted usage in competitions such as Kaggle’s…

  • Amazon SageMaker

    Amazon SageMaker

    In this article, we’ll learn about Amazon SageMaker and various tools provided by it for Machine Learning purposes. The…

  • How To Connect VS Code To AWS

    How To Connect VS Code To AWS

    In this article, we’ll learn briefly about Amazon Web Services (AWS) and then go through a hands-on procedure to…

  • Benefits And Risks Of Cloud Computing

    Benefits And Risks Of Cloud Computing

    Various aspects of Cloud Computing, its major characteristics, and a comparative analysis between different cloud…

  • Major Characteristics Of Cloud Computing

    Major Characteristics Of Cloud Computing

    In this article, we’ll learn about Cloud Computing and its different characteristics. These constitute the major…

  • AWS VS Azure VS Google Cloud — Comparative Analysis Of Cloud Platforms For Machine Learning

    AWS VS Azure VS Google Cloud — Comparative Analysis Of Cloud Platforms For Machine Learning

    There are numerous cloud platforms providing services for machine learning. This article discusses the comparative…

    1 条评论
  • Properties of Modeling And Deployment Of Machine Learning Application

    Properties of Modeling And Deployment Of Machine Learning Application

    In this article, we’ll learn about the various properties of modeling and deployment of machine learning applications…

  • Using Container For Machine Learning Application

    Using Container For Machine Learning Application

    In the last article, we discussed deployment and production environment such that it consisted of two primary programs,…

社区洞察

其他会员也浏览了