Representation matters: How deep learning transforms data for machine learning
Deep Learning Journey #2
Machine learning is the science of making computers learn from data to perform specific tasks, such as classification, regression, clustering, etc. However, not all data are equally useful for machine learning. Some data are too complex, noisy, or irrelevant to be directly fed into a machine learning algorithm. That's where deep learning comes in.
Using deep learning, machines can learn automatically how to represent data in a way that is suitable for machine learning. Artificial neural networks enable deep learning by processing and transforming data through layers of simple computational units. By stacking multiple layers of neural networks, deep learning can learn complex and abstract representations of data, such as features, patterns, concepts, and categories.
What are the benefits of deep learning for data representation?
Machine learning is largely dependent on how the data are represented. For example, imagine we have two groups of data points that we want to separate with a line in a scatterplot. If we use Cartesian coordinates (x and y values) to plot the data, as shown on the left, we see that there is no straight line that can separate the two groups. However, if we use polar coordinates (angle and radius) to plot the same data, as shown on the right, we see that a simple vertical line can separate the two groups. This shows how changing the representation of the data can make a big difference in the difficulty of the problem.
As another example, to determine whether a pregnant woman should have a cesarean delivery, we might use logistic regression, a data analysis technique that uses mathematics to find relationships between two data factors. It uses very simple formula, and therefore considered one of the most basic machine learning algorithms.
The logistic regression algorithm cannot examine the patient directly. It relies instead on information provided by the doctor, such as the presence or absence of a uterine scar, blood pressure, and fetal heart rate. Each piece of information is called a feature, and the collection of features is called a representation of data. In logistic regression, each feature is correlated with the outcome (cesarean or not) and a prediction is made based on that correlation.
However, logistic regression cannot affect how features are defined or extracted from the data. In the absence of a doctor's report, logistic regression would not be able to make useful predictions based on an MRI scan of the patient. This is because individual pixels in an MRI scan have little or no correlation with the outcome. MRI scans contain too much irrelevant and redundant information that obscures the important details. We need to manually design and select the features that are relevant and informative for logistic regression to work effectively.
Manual feature engineering is tedious, time-consuming, and domain-specific. An effective feature engineering process combines subject matter expertise, problem definition, exploratory data analysis, and iteration through the transformation-selection-evaluation cycle. The feature engineering process can be messy and iterative, and data scientists may need to go back upstream to enhance the dataset or resolve problems. Feature engineering requires data analysis, business domain knowledge, and intuition. Additionally, it may not capture all the nuances and variations in the data that are important for machine learning. ?
deep learning can help
Deep learning can learn how to represent data without human intervention or domain knowledge from raw inputs, such as images, text, audio, etc. Deep learning can learn multiple levels of representation, from low-level features (such as edges and colors in images) to high-level features (such as faces and objects in images). Deep learning can also learn how to combine and integrate different types of features (such as visual and textual features) to form a unified representation. By doing so, deep learning can extract the most relevant and informative features from the data which can improve the performance and efficiency of machine learning.
领英推荐
Limitations and drawbacks
Deep learning has many advantages, but it also has some limitations and drawbacks. Two of them related to the representation problem are: Overfitting, which occurs when the model becomes too complex and starts to fit the noise in the training data rather than the underlying patterns and can lead to poor performance on new data, and Lack of interpretability, since Deep learning models, especially those with many layers, can be complex and difficult to interpret which can make it difficult to understand how the model is making predictions and to identify any errors or biases in the model.
Future of Effective data representation
Effective data representation is a key research issue in deep learning. Deep learning algorithms are one promising avenue of research into the automated extraction of complex data representations (features) at high. The final representation of data constructed by the deep learning algorithm provides useful information from the data which can be used as features in building classifiers, or even can be used for data indexing and other applications which are more efficient when using abstract representations of data rather than high dimensional sensory data. To learn better representations and abstractions, one can use some supervised data for training the deep learning model.
Deep learning is also being used in the field of omics data, which includes genomics, proteomics, and metabolomics. Researchers are exploring the most common types of biological data and data representations that are used to train deep learning models, with the goal of developing predictive models for medical discoveries. In addition, deep learning is being used to enable more effective and efficient frameworks for scientific data representation and generation.
In summary, representation matters for machine learning because it determines how well and how fast machines can learn from data. Deep learning is a powerful technique that can automatically learn how to represent the data in a way that is suitable for machine learning. By using deep learning, we can avoid manual feature engineering and leverage the full potential of our data.
?#deeplearningjourney
?
Sources:
Founder @ Pink Media | Digital Marketing
1 个月???? ??? ?? ??????! ??? ????? ???? ?????? ??? ?????? ??? ??????? ???? ????? ?????? ?????? ???? ?????? ???? ????, ????? ????? ?????? ?????? ?????: https://chat.whatsapp.com/BubG8iFDe2bHHWkNYiboeU