What are the advantages and disadvantages of using TensorFlow over Scikit-learn for unsupervised learning?

What are the advantages and disadvantages of using TensorFlow over Scikit-learn for unsupervised learning?

TensorFlow? is an open source software library for numerical computation using data flow graphs. Nodes in the graph represent mathematical operations, while the graph edges represent the multidimensional data arrays (tensors) communicated between them. The flexible architecture allows you to deploy computation to one or more CPUs or GPUs in a desktop, server, or mobile device with a single API.

TensorFlow was originally developed by researchers and engineers working on the Google Brain Team within Google's Machine Intelligence research organisation for the purposes of conducting machine learning and deep neural networks research, but the system is general enough to be applicable in a wide variety of other domains as well.

Scikit-learn (formerly scikits.learn) is a free software machine learning library - well, also TensorFlow is free - for the Python programming language. It features various classification, regression and clustering algorithms including support vector machines, random forests, gradient boosting, k-means and DBSCAN, and is designed to interoperate with the Python numerical and scientific libraries NumPy and SciPy.

TensorFlow is a powerful library that’s mostly used for deep learning, although its computational model based on directed graphs certainly allows for a wider range of use cases. Deep learning is the main area of machine learning where scikit-learn is really not that useful.

For most practical machine learning tasks, TensorFlow is overkill. Scikit-learn is a much more user-friendly library that is more than sufficient in most scenarios.

When it comes to unsupervised learning, scikit-learn implements various versions of clustering and dimensionality reduction. I would say that supervised learning is where scikit-learn really shines, but keep in mind that unsupervised learning is still an immature area of machine learning.

TensorFlow is really for deep learning applications. Scikit-learn is of little use in that area. For most applications, especially for beginners, you’d want to use sci-kit learn. For unsupervised learning, sci-kit learn has various clustering and decomposition algorithms that are simple to use.

When we consider Natural Language Processing, scikit-learn offers a couple of interesting functions to turn words and sentences into vectors. Quite simply to use but powerful enough to solve specific problems such: topic detection, sentiment analysis, text clustering and many others. In particular, tokenise text, count occurrences or tf-idf vectorise a corpus requires few line of python code. Easy to use and fast.

In conclusion:

TensorFlow: More powerful, good for deep learning. Overkill for simpler tasks.

Scikit-learn: Easy to use, supports most practical tasks - also in NLP. Not the right solution for deep learning.

Ayan Kar

Data Platform Transformation | Data Analytics & Data Science | Data & AI Products | Ex Head of Data & ML Products HSBC

7 年

I find Scikit much easier to code against compared to TF, I find the whole concept of having to work with sessions requires too much overhead code to deal with. For testing some simple unsupervised hypothesis Scikit is better. However for more serious and performance deep NN TF is great. One of the areas I want experiment with is the use of multi processing that is built into TF by default, though this can be enabled with Scikit as well.

Marcin P?kalski

Manager of Data Science at Kambi Sports Solutions | Kaggle Master

7 年

You can do more than deep learning in TF, and some of its advantages are speed, tensorboard and tensorflow serving.

Fabian Fürst

Founder Flexa | ex-McKinsey | HEC | HSG

7 年

要查看或添加评论,请登录

社区洞察

其他会员也浏览了