Erythemato-squamous Diseases Prediction
Anirban Chowdhury
Associate Consultant (Data Analytics & AI) @ Infosys Consulting | Weschool, Mumbai | IXL Innovation Olympics Awardee | Data Science & Analytics | PGDM 2022-2024
Background of the Data
The differential diagnosis of erythemato-squamous diseases is a real problem in dermatology. They all share the clinical features of erythema and scaling, with very little differences. The diseases in this group are psoriasis, seboreic dermatitis, lichen planus, pityriasis rosea, cronic dermatitis, and pityriasis rubra pilaris. Usually a biopsy is necessary for the diagnosis but unfortunately these diseases share many histopathological features as well. Another difficulty for the differential diagnosis is that a disease may show the features of another disease at the beginning stage and may have the characteristic features at the following stages. Patients were first evaluated clinically with 12 features. Afterwards, skin samples were taken for the evaluation of 22 histopathological features. The values of the histopathological features are determined by an analysis of the samples under a microscope. This database contains 34 attributes, 33 of which are linear valued and one of them is nominal.
In the dataset constructed for this domain, the family history feature has the value 1 if any of these diseases has been observed in the family, and 0 otherwise. The age feature simply represents the age of the patient. Every other feature (clinical and histopathological) was given a degree in the range of 0 to 3. Here, 0 indicates that the feature was not present, 3 indicates the largest amount possible, and 1, 2 indicate the relative intermediate values.
Data Dictionary
A brief explanation of all the machine learning models that I have applied
The output I have generated via the model.
领英推荐
Showcasing how I can predict given specific data (Similar to my dataset)
To predict the class of a new patient using a machine learning model trained on the dermatology dataset, we need to extract the same features from the patient as the training dataset, i.e., the 34 attributes. Then, we can feed these features into the trained model, which will predict the class of the new patient.
For example, suppose we have a new patient with the following features:
We can use the trained machine learning model to predict the class of the new patient based on these features. For instance, if we have used a Random Forest classifier to train the model, we can use the `predict()` method of the classifier to get the predicted class for the new patient. Suppose the classifier predicts that the new patient belongs to class 1, which corresponds to psoriasis. Then we can conclude that the patient most likely has psoriasis based on the clinical and histopathological features.
Conclusion:
The attributes are a mix of clinical and histopathological attributes that describe different aspects of skin diseases. The class distribution of the dataset is imbalanced with 6 classes, namely psoriasis, seborrheic dermatitis, lichen planus, pityriasis rosea, chronic dermatitis, and pityriasis rubra pilaris.
Overall, this dataset can be a valuable resource for developing diagnostic tools and improving our understanding of skin diseases. The development of such models has significant implications for the early detection and prevention of ESD. With early detection, individuals can receive timely treatment, reducing the severity of the disease and the associated costs of treatment. Additionally, preventative measures such as sun protection and regular skin screenings can be implemented for high-risk individuals.
Factors such as changes in the environment, genetics, and individual behavior can influence the development of ESD and are not captured by the models.