Data Science

Data Science

Data science is an interdisciplinary academic field[1] that uses statistics, scientific computing, scientific methods, processes, scientific visualization, algorithms and systems to extract or extrapolate knowledge and insights from potentially noisy, structured, or unstructured data.

Data science also integrates domain knowledge from the underlying application domain (e.g., natural sciences, information technology, and medicine). Data science is multifaceted and can be described as a science, a research paradigm, a research method, a discipline, a workflow, and a profession.

Data science is "a concept to unify statistics, data analysis, informatics, and their related methods" to "understand and analyze actual phenomena" with data.[5] It uses techniques and theories drawn from many fields within the context of mathematics, statistics, computer science, information science, and domain knowledge.[6] However, data science is different from computer science and information science. Turing Award winner Jim Gray imagined data science as a "fourth paradigm" of science (empirical, theoretical, computational, and now data-driven) and asserted that "everything about science is changing because of the impact of information technology" and the data deluge.

A data scientist is a professional who creates programming code and combines it with statistical knowledge to create insights from data.

Foundations

Data science is an interdisciplinary field[10] focused on extracting knowledge from typically large data sets and applying the knowledge and insights from that data to solve problems in a wide range of application domains. The field encompasses preparing data for analysis, formulating data science problems, analyzing data, developing data-driven solutions, and presenting findings to inform high-level decisions in a broad range of application domains. As such, it incorporates skills from computer science, statistics, information science, mathematics, data visualization, information visualization, data sonification, data integration, graphic design, complex systems, communication and business. Statistician Nathan Yau, drawing on Ben Fry, also links data science to human–computer interaction: users should be able to intuitively control and explore data. In 2015, the American Statistical Association identified database management, statistics and machine learning, and distributed and parallel systems as the three emerging foundational professional communities.

Relationship to statistics[edit]

Many statisticians, including Nate Silver, have argued that data science is not a new field, but rather another name for statistics. Others argue that data science is distinct from statistics because it focuses on problems and techniques unique to digital data. Vasant Dhar writes that statistics emphasizes quantitative data and description. In contrast, data science deals with quantitative and qualitative data (e.g., from images, text, sensors, transactions, customer information, etc.) and emphasizes prediction and action. Andrew Gelman of Columbia University has described statistics as a non-essential part of data science.

Stanford professor David Donoho writes that data science is not distinguished from statistics by the size of datasets or use of computing and that many graduate programs misleadingly advertise their analytics and statistics training as the essence of a data-science program. He describes data science as an applied field growing out of traditional statistics.

要查看或添加评论,请登录

Vanshika Munshi的更多文章

  • Key Data Engineer Skills and Responsibilities

    Key Data Engineer Skills and Responsibilities

    Over time, there has been a significant transformation in the realm of data and its associated domains. Initially, the…

  • What Is Financial Planning? Definition, Meaning and Purpose

    What Is Financial Planning? Definition, Meaning and Purpose

    Financial planning is the process of taking a comprehensive look at your financial situation and building a specific…

  • What is Power BI?

    What is Power BI?

    The parts of Power BI Power BI consists of several elements that all work together, starting with these three basics: A…

  • Abinitio Graphs

    Abinitio Graphs

    Graph Concept Graph : A graph is a data flow diagram that defines the various processing stages of a task and the…

  • Abinitio Interview Questions

    Abinitio Interview Questions

    1. What is Ab Initio? Ab Initio is a robust data processing and analysis tool used for ETL (Extract, Transform, Load)…

  • Big Query

    Big Query

    BigQuery is a managed, serverless data warehouse product by Google, offering scalable analysis over large quantities of…

  • Responsibilities of Abinitio Developer

    Responsibilities of Abinitio Developer

    Job Description Project Role : Application Developer Project Role Description : Design, build and configure…

  • Abinitio Developer

    Abinitio Developer

    Responsibilities Monitor and Support existing production data pipelines developed in AB Initio Analysis of highly…

  • Data Engineer

    Data Engineer

    Data engineering is the practice of designing and building systems for collecting, storing, and analysing data at…

  • Pyspark

    Pyspark

    What is PySpark? Apache Spark is written in Scala programming language. PySpark has been released in order to support…

社区洞察

其他会员也浏览了