Dataiku

Dataiku

Dataiku is a platform for building, managing, and deploying data and AI projects. Dataiku is used for a variety of applications, including customer segmentation, fraud detection, customer scoring, deep learning, and natural language processing.

Dataiku is a platform that accelerates the democratization of data. Overall, as a machine learning platform, Dataiku is easy enough to use that it can be utilized by citizen developers, but robust and customizable enough that you can accomplish whatever you need to on the platform.

It's designed to help data professionals collaborate on tasks such as:

  • Data preparation
  • Machine learning
  • Creating AI models
  • Making predictions
  • Exploring and sharing analyses
  • Automating processing chains?

Dataiku's capabilities include:

  • Data preparation: Connect to, cleanse, and prepare data at scale?
  • Machine learning: Accelerate model building with Dataiku AutoML and a guided framework?
  • Extensibility: Expand Dataiku's native capabilities with plugins and custom applications?
  • Standardized development environment: Data scientists can operate from notebooks and IDEs, organize code, and manage virtual environments?
  • Model comparisons: Data scientists and ML engineers can perform champion/challenger analysis on candidate models?
  • Scalability: Teams can scale workloads with on-demand, elastic resources powered by Spark and Kubernetes

Main features of the Dataiku platform

Integration & Connectivity of Dataiku DSS within other infrastructures

The platform integrates with Hadoop, Spark, SQL, Teradata, and is available on the AWS, Azure and Google Cloud platform marketplaces.

The detection of data schemas and formats is automatic. Thus, Dataiku is able to natively recognise a numerical variable, a character string, an age, a date, or even a geographical location.

Moreover, there is a decorrelation between data storage and processing: the data stays where it is. Access to data is therefore instantaneous and without the need to transfer data for processing.

Plugins

Dataiku DSS comes with standard visual components to connect to data, process and train models. But Dataiku also offers the flexibility to implement custom components, package them and share them with others. These custom components are available as plugins. Each plugin consists of both a graphical user interface and a backend programmed by the developer in R or Python.

There is a gallery of more than 100 plugins in the Dataiku Plugin Store, providing data applications in many areas such as language translation, weather, recommendation systems, data import/export and ready-to-use graphical interfaces.

Optimised data preparation

The graphical interface of Dataiku DSS accelerates data wrangling with interactive data cleansing and enrichment. Contextual transformations are automatically suggested by Dataiku according to the type of data. For example, from a date, Dataiku proposes to calculate an age. From an address, Dataiku is able to extract the street number and name, the postal code or the city. There are more than 80 visual processors that can be activated with a few clicks and without code. This graphical console also allows, with simple clicks, to interact with the data for filtering, transformations or statistical summaries.

Integrated development

Many languages are supported by Dataiku DSS: Python, R, Scala, PySpark, SparkR and SparkSQL, SQL, Hive, Pig and Impala. Dataiku is therefore aimed at all types of users whatever their technical background and at all levels of expertise.

Machine learning & AI

The platform includes a complete graphical interface (called Datalab) dedicated to the development of machine learning models. This interface allows the configuration of models, the visualisation of model performance and a simplified reading of the results produced by the algorithms.

Collaboration & Governance

Dataiku DSS incorporates features to optimise sharing and exchange within data teams and business teams. These include project management, chat, wiki and versioning tools.?

For data governance, the platform provides a centralised catalogue of data, comments, elements and models. In addition, all user activities are shown on a dedicated dashboard and security is guaranteed by other features (such as, for example, permissions management, log management or monitoring of data size and instance activity). Dataiku meets all data governance and auditing requirements.


要查看或添加评论,请登录

Rohit Singh的更多文章

  • STL

    STL

    Standard Template Library (STL) provides the built-in implementation of commonly used data structures known as…

  • Fraud Detection

    Fraud Detection

    Fraud detection is a set of activities undertaken to prevent money or property from being obtained through false…

  • Django

    Django

    Django, built with Python, is designed to help developers build secure, scalable, and feature-rich web applications…

  • Product Backlog

    Product Backlog

    A product backlog is a prioritized list of work for the development team that is derived from the product roadmap and…

  • Delta Lake

    Delta Lake

    A Delta Lake is an open-source storage layer designed to run on top of an existing data lake and improve its…

  • API Testing

    API Testing

    API testing is a process that involves making requests to an API endpoint and verifying the response. It's also known…

  • SAP MM

    SAP MM

    SAP MM stands for "Materials Management." SAP MM (Materials Management) is a SAP ERP Central Component (ECC) module…

  • Gap analysis

    Gap analysis

    A gap analysis is a method of assessing the performance of a business unit to determine whether business requirements…

  • Azure Cognitive Services

    Azure Cognitive Services

    Microsoft Azure Cognitive Services provides a variety of pre-trained powerful AI tools and models that gives the…

  • UX Design

    UX Design

    User experience (UX) design is the process of creating products that are easy and enjoyable to use. This includes…

社区洞察

其他会员也浏览了