登录查看更多内容

United States and Puerto Rico Cancer Statistics, 1999-2019 Incidence

Siddhant Kondekar

On a mission of sustained self-growth and success | Marketing Strategy Enthusiast | A content writer | AMFI |Management Trainee- JB Pharmaceuticals | PGDM-Healthcare Management-Welingkar |

发布日期: 2023年4月30日

BACKGROUND OF THE DATA

Cancer is one of the deadliest diseases of today’s time. Millions of people get affected due to cancer today and many are off which leads to death only. The data presented and analyzed in the United States and Puerto Rico Cancer Statistics,1999 to 2019 Incidence. In this, the incidence of various types of cancers according to Age, Sex, race, and Sites of Cancer. All this data can be used in cancer detection and treatment. The number of people having a specific type of cancer according to their age groups is specifically divided into data and through it the prediction, detection and various factors leading to cancer can be calculated and early diagnosis can be done.

DATA DICTIONARY

YEAR: the year of diagnosis (integer)

STATE: the state or territory of residence (string)

COUNTY: the county of residence (string)

AGE_ADJUSTED_RATE: the age-adjusted incidence rate per 100,000 population (float)

AGE_ADJUSTED_CI_LOWER: the lower limit of the 95% confidence interval for the age-adjusted rate (float)

AGE_ADJUSTED_CI_UPPER: the upper limit of the 95% confidence interval for the age-adjusted rate (float)

COUNT: the number of cases (integer)

POPULATION: the population count (integer)

CRUDE_RATE: the crude incidence rate per 100,000 population (float)

CRUDE_CI_LOWER: the lower limit of the 95% confidence interval for the crude rate (float)

CRUDE_CI_UPPER: the upper limit of the 95% confidence interval for the crude rate (float)

RACE: the race/ethnicity of the patient (string)

SEX: the sex of the patient (string)

SITE: the site of cancer (string)

YEAR_ID: a unique identifier for the year (integer)

AGE_ADJUSTED_RATE_STD: the age-adjusted incidence rate standardized to the 2000 U.S. standard population (float)

CRUDE_RATE_STD: the crude incidence rate standardized to the 2000 U.S. standard population (float)

EVENT_TYPE: whether the data is incidence (I) or mortality (M) (string)

DATA INTERPRETATION

There are 2 models that I have applied to this data’s Interpretation.

领英推荐

Beating cancer

Zachary Karabell 1 年前

Using machine learning models to unlock insights into…

Flatiron Health 7 个月前

Of Course Cancer Isn’t Random

David L. Katz, MD, MPH 7 年前

1.?????Logistic Regression - Logistic regression is the appropriate regression analysis to conduct when the dependent variable is dichotomous (binary).?Like all regression analyses, logistic regression is a predictive analysis.?Logistic regression is used to describe data and to explain the relationship between one dependent binary variable and one or more nominal, ordinal, interval, or ratio-level independent variables. Sometimes logistic regressions are difficult to interpret; the Intellects Statistics tool easily allows you to conduct the analysis, then in plain English interprets the output.

2.?????ARIMA - An autoregressive integrated moving average, or ARIMA, is a statistical analysis model that uses time series data to either better understand the data set or to predict future trends. A statistical model is autoregressive if it predicts future values based on past values. For example, an ARIMA model might seek to predict a stock's future prices based on its past performance or forecast a company's earnings based on past periods.

The Data given consists of various types of cancers like the Brain and Nervous System, Breast and Cervix Uteri. Data is used to predict these, and a predictive model is prepared.

The Model includes:

Logistic Regression using Age has alpha =0.05 and the significance is present ie. Yes. The Chi-square is 748.8194 and df is 4. ?In this, the accuracy is 0.9550. The ROC curve is the following -

Logistic Regression using Race and gender. In the logistic data, the age group was divided, and then count was added into sex codes, and race with leading cancer sites data was also separated and regression was carried out on it.

A forecasting model was also created for the ARIMA, and the year-wise forecast of cases was done to create an ARIMA dataset. It included the year-wise total number of cancer cases. The ARIMA model data has alpha=0.05 and model parameters. In the ARIMA model, total case data was used to predict the next 5 years of cancer cases and prevention measures can be done.?

It showed that both the models applied were successful.

Some key findings from the report include:

1.?????Overall, the incidence of cancer in the United States and Puerto Rico has remained stable from 2012 to 2019.

2.?????The most common types of cancer in both the United States and Puerto Rico are breast, lung, prostate, and colorectal cancer.

3.?????The incidence of lung cancer has been declining in both the United States and Puerto Rico, likely due to a decrease in smoking rates.

4.?????The incidence of liver cancer has been increasing in both the United States and Puerto Rico, likely due to a rise in the prevalence of hepatitis C and non-alcoholic fatty liver disease.

5.?????The incidence of thyroid cancer has been increasing in both the United States and Puerto Rico, but this may be due to increased detection rather than a true increase in the number of cases.

BUSINESS AREA:

This data can be used by various hospitals and medical platforms for the prevention of cancer and to detect the total cases of cancer and to predict the future cases of cancer in the country as per the significance of age, sex, race wise. This data can be sold and earnings could be done at a price of 10 lakh with marketing strategies like email and B2B could be done.

CONCLUSION:

This data is a very important one and by using various interpretation and statistical tools to predict and use this data for the prevention and treatment of cancer. The models used are very effective and the accuracy is also the highest, which gives the credibility of the data. The data and both models of Logistic Regression and ARIMA are very reliable and indicate precision.

United States and Puerto Rico Cancer Statistics, 1999-2019 Incidence

Siddhant Kondekar

On a mission of sustained self-growth and success | Marketing Strategy Enthusiast | A content writer | AMFI |Management Trainee- JB Pharmaceuticals | PGDM-Healthcare Management-Welingkar |

领英推荐

更多精彩文章

社区洞察

其他会员也浏览了

Engagement Is Key To Early Cancer Detection

Max Health Bulletin: Understanding Cervical Cancer

BGI Genomics Monthly Spotlight - September 2024

A call to redefine cancer treatment studies, the latest health news, Stanford president defends his decision, and much more!

Health Equity in Cancer: Challenges and Solutions

Prostate Cancer Awareness and Action: Leveraging Technology to turn the tide

Cancer deaths in women to rise significantly

Boost for breast screening as research trials in Aberdeen show how AI is making an impact on cancer diagnosis

What if THIS happens?

The Onco'Zine Brief - Broadening the Understanding of Cancer and Cancer Treatment

领英推荐

Revolutionizing Healthcare through Patient-Centered Health Systems in Healthcare IT

2023年8月27日

Medical Device: Chatbots

2023年2月25日

Africa Demographic Survey (Nairobi) Dashboard Using Tableau

2023年2月4日

社区洞察

其他会员也浏览了

Engagement Is Key To Early Cancer Detection

Max Health Bulletin: Understanding Cervical Cancer

BGI Genomics Monthly Spotlight - September 2024

A call to redefine cancer treatment studies, the latest health news, Stanford president defends his decision, and much more!

Health Equity in Cancer: Challenges and Solutions

Prostate Cancer Awareness and Action: Leveraging Technology to turn the tide

Cancer deaths in women to rise significantly

Boost for breast screening as research trials in Aberdeen show how AI is making an impact on cancer diagnosis

What if THIS happens?

The Onco'Zine Brief - Broadening the Understanding of Cancer and Cancer Treatment