Journey from DataBricks to Azure DataBricks

Journey from DataBricks to Azure DataBricks

DataBricks is an organization and big data processing platform designed by the creators of Apache Spark. It was founded by Ali Chodsi, Andy Kowinshi, Scott Shenker, Ion Stoica, Patrick Wendell, Reynold Xin and Matei Zaharia in 2013. It is headquartered in San Francisco and California. DataBrcks was founded to design an alternative for MapReduce system and provide just in time cloud-based platform for big data processing client. It is basically created for Data Engineer, Data Scientist and Business Analyst people to help them work together and do analytics in just few drag and drops. This integration simplifies the processes from Data Preparation to experimentation and machine learning application deployment.

According to the company, the DataBricks platform is much faster than Apache Spaark. By unifying the pipeline involved with developing machine learning tools, DataBricks is said to accelerate development and innovation and increase security. Data processing clusters can be configured and deployed with just few clicks. The platform includes varied built-in data visualization features. In addition to building the DataBricks framework, it also co-organizing massive courses for Spark and runs the largest conference about Spark.

DataBricks Logo

In 15th November 2017, DataBricks announced that it has become a partner with Microsoft to expand the reach of its Unified Analytics Platform and address customer demand for Spark on Microsoft Azure. Databricks’ Unified Analytics Platform will be offered as an integrated service within the Azure Portal as Azure Databricks. “Microsoft says that, there’s a large base of Microsoft Azure customers looking for a high-performance analytics platform based on Spark  and Databricks is already the leading Cloud platform for Spark. These organizations will be able to simplify Big Data and AI with Azure Databricks.

Azure Databricks core functionality includes:

  • A unified, collaborative workspace for Data Science and Data Engineering teams to work more effectively with business users;
  • A single, unified engine for all types of analytics (batch, ad hoc, machine learning and deep learning, streaming, graph); and
  • A fully managed, serverless Cloud infrastructure for isolation, cost control and auto-scaling.

Azure Databricks also offers full integration with the Azure cloud platform, including:

  • The Azure Portal, where users can launch Databricks with a single click;
  • Azure Active Directory, which enables single sign-on so that users can get started with Databricks immediately;
  • The most popular data sources on Azure, including Azure SQL Data Warehouse, Azure Cosmos DB, Azure Data Lake Store, and Azure Blob storage; and
  • Microsoft Power BI, so that Databricks analytics can be put into business dashboards and visualizations.

Through this article, I would also like to thank each and everyone who read, liked, clapped, commented on my articles. This is the sole motivation which encourages me to write articles.

Keep reading and I’ll keep writing.

要查看或添加评论,请登录

Suravi Mahanta的更多文章

社区洞察

其他会员也浏览了