登录查看更多内容

If ChatGPT has not secured you yet, it will now!

Murtaza Haider

Professor … Columnist … Data Scientist

发布日期: 2023年11月7日

Generative AI is learning fast. OpenAI 's GPT 4 has recently added data analysis and image rendering capabilities. These will have serious implications for academia.

Here are two ways Generative AI will impact research and pedagogy.

First, Rendering images frees us from copyright limitations. We can generate an image based on instructions, free of copyright restrictions. I generated an image of a Canadian downtown. It showed Canadian flags blowing in opposite directions, perhaps a reflection of an electorate that can't make up its mind!

When I tried to generate a book cover, the algorithm generated the image but with spelling mistakes! It spelled Statistics with four T's.

Second, the data analysis ability is enabling and concerning at the same time. If one is not empirically inclined, one can upload a tabular data set and instruct the AI engine about the analysis one would like to perform. It's like sitting next to an expert statistician and having them analyze your data.

But if you are like me, who assigns data sets to students to analyze for assignments and exams, you may want to read the following carefully. ChatGPT can complete most assignments and exams we routinely hand out to students.?

We will need new ways to judge competencies.?

Here is an exercise that I undertook with OpenAI today. I used the data from a paper by Professor Daniel Hamermesh in which he used data from the University of Texas at Austin to determine whether an instructor's teaching evaluation score (averaged from evaluations conducted by students, a routine task at most schools) is influenced by their looks. Put simply, do good-looking professors get higher teaching evaluations from their students? The paper is available at:

https://www.sciencedirect.com/science/article/abs/pii/S0272775704001165?

I've shared below my step-by-step interaction with ChatGPT after I uploaded the data set in Excel format. If a student had submitted the following output, I would have to give it a good grade. My minimalist prompts are in ITALICS.

There are some minor mistakes in the estimated regression (two regressors should have been excluded), but they do not affect the conclusion. Also, I'd like to point out how remarkably AI guessed the nature of variables by their names. It made just one mistake because allstudents was too vague a variable name to be guessed correctly.

You will notice that the AI conducts the analysis, describes the results and concludes at the end. The AI-generated conclusion is produced verbatim below:

In conclusion, after controlling for various confounders, the analysis indicates that instructors' perceived beauty does have a statistically significant influence on their teaching evaluations.

The AI engine generates the analysis using Python and reports the code for further processing. I have pasted the code at the end along with the tabular output generated by AI.

Last Word:

Teaching & Learning Statistics Will Never Be the Same

My Prompt:

The attached file contains information about teaching evaluation scores received by instructors and their attributes, plus the attributes of the courses they taught. We are particularly interested in finding an answer to the question that if we control for other confounders, is teaching evaluation of instructors influenced by their perceived beauty by the students?

Output from ChatGPT

Data Analysis

The dataset contains several columns which might be relevant to our analysis:

minority: Indicates if the instructor is a member of a minority group.
age: The age of the instructor.
gender: The gender of the instructor.
credits: If the course taught was a multi-credit course.
beauty: The perceived beauty rating of the instructor by the students.
eval: The teaching evaluation score.
division: The division of the course taught (upper, lower).
native: Indicates if the instructor is a native speaker.
tenure: Indicates if the instructor has tenure.
students: The number of students in the course.
allstudents: The total number of students taught by the instructor in all courses.
prof: An identifier for the professor.

To assess the influence of perceived beauty on teaching evaluations while controlling for other factors, we would typically conduct a multivariate regression analysis where eval is the dependent variable and beauty, along with other instructor and course attributes, are the independent variables.

Would you like to proceed with a regression analysis, or is there another specific analysis or preprocessing step you would like to perform first?

My Prompt

Let's do preprocessing first.

Output from ChatGPT

5G 6G & O-RAN 8 个月前

Perceptions and expectations of ChatGPT

Mark Anthony Camilleri, Ph.D. 8 个月前

?? ChatGPT and how it can help you as a Business…

Mark Bruins 1 年前

Data Analysis

The preprocessing step has revealed that:

There are no missing values in any of the columns, which means we do not need to handle missing data.
The summary statistics provide some insights into the data: age ranges from 29 to 73 with a mean of around 48. beauty scores range from approximately -1.45 to 1.97. eval scores, which are our primary focus, range from 2.1 to 5.0 with a mean score of approximately 4.0. The students and allstudents columns have a wide range, suggesting some courses or instructors have many more students than others. The prof column is simply an identifier with 94 unique instructors represented in the dataset.

Before proceeding to regression analysis, we might consider normalizing or standardizing data if the scales are significantly different, which could affect the regression weights. However, this is not always necessary, especially if we're interpreting the beta weights in terms of standard deviations.

We should also encode categorical variables (like gender, minority, native, tenure, credits, and division) into numerical values if we intend to include them in the regression analysis since these models require numerical input.

Do you want to normalize/standardize any numerical variables and encode the categorical variables before moving on to the regression analysis?

?My Prompt

Let's do encoding and standardizing and move to regression. Weight the regression model by the variable students. Also, report clustered standard errors by clustering them on the variable prof.

ChatGPT

Data Analysis

The data has been standardized, and categorical variables have been encoded. We now have a transformed feature set that includes:

Standardized age, beauty, students, and allstudents.
Binary encoded variables for minority, gender, credits, division, native, and tenure.

Next, we will perform the regression analysis. To do this, we'll use the statsmodels library, which allows for specifying weights and clustering standard errors by groups. We'll weight the regression model by the students column and report clustered standard errors by clustering them on the prof variable.

Let's move on to fitting the weighted least squares regression model with clustered standard errors.

The weighted least squares regression model provides the following insights:

The model explains approximately 32.1% of the variance in teaching evaluations (R-squared: 0.321).
The beauty coefficient is positive (0.1754) and statistically significant at the p < 0.001 level, suggesting that perceived beauty is positively associated with higher teaching evaluation scores, even when controlling for other variables in the model.
Other significant predictors of teaching evaluation scores include: students: The number of students positively influences evaluation scores (coef = 0.2088, p < 0.05). minority_yes: Being a minority is associated with a decrease in evaluation scores (coef = -0.1986, p < 0.05). gender_male: Male gender is associated with an increase in evaluation scores (coef = 0.2139, p < 0.05). credits_single: Courses with single credits are associated with higher evaluation scores (coef = 0.7308, p < 0.001).
Some predictors like age, allstudents, division_upper, native_yes, and tenure_yes were not statistically significant at the p < 0.05 level, although tenure_yes was close to being significant (p = 0.053).

It's important to note that the standard errors are robust to clustering by the prof variable, which accounts for the non-independence of observations within clusters of the same instructor.

In conclusion, after controlling for various confounders, the analysis indicates that instructors' perceived beauty does have a statistically significant influence on their teaching evaluations.

Automatically generated Code in Python

import pandas as pd
# Load the data to take a look at what we have file_path = '/mnt/data/TeachingRatings.csv'
teaching_ratings_df = pd.read_csv(file_path)
# Display the first few rows of the dataframe to understand its structure

teaching_ratings_df.head()
# Check for any missing values in the dataset
missing_values = teaching_ratings_df.isnull().sum()
# Summarize the data to see the distribution and potential outliers
summary_statistics = teaching_ratings_df.describe()
missing_values, summary_statistics

import statsmodels.api as sm
# Adding a constant to the model for the intercept
X_transformed_df = sm.add_constant(X_transformed_df)
# Convert 'students' to integer for the weights
weights = teaching_ratings_df['students'].astype(int)
# Fit the weighted least squares regression model
wls_model = sm.WLS(y, X_transformed_df, weights=weights).fit(cov_type='cluster', cov_kwds={'groups': teaching_ratings_df['prof']})
# Get the summary of the regression model
model_summary = wls_model.summary()
model_summary

Insights with Data & Analytics

6,202 位关注者

Orol Imomov

Reliability | CM Supervisor at UZBEKISTAN GTL LLC

1 年

Interesting, can chatgpt4 do webscraping and data cleaning as well?

Dr.G.V.R.K ACHARYULU

Professor-SCM, School of Management Studies, Program Coordinator- MBA(Healthcare), Vice President at ISDSI

1 年

Is it Statistics or statisttics?

Dr.G.V.R.K ACHARYULU

Professor-SCM, School of Management Studies, Program Coordinator- MBA(Healthcare), Vice President at ISDSI

1 年

Congratulations sir

?? Stewart Berry

?? VP Product Management ?? Maptitude Location Intelligence for Operations & Business Development Analysis

1 年

Time to start using ChatGPT or time for plastic surgery?

3 次回应

S. Madeh Piryonesi

Educator and data analyst

1 年

Fair enough! I give all my take home exams to ChatGPT first.

1 次回应

查看更多评论

要查看或添加评论，请登录

查看全部

If ChatGPT has not secured you yet, it will now!

Murtaza Haider

Professor … Columnist … Data Scientist

领英推荐

Insights with Data & Analytics

6,202 位关注者

更多精彩文章

社区洞察

其他会员也浏览了

ChatGPT: How Much Does It Cost to Build a Chatbot Like Chat GPT?

Grok vs. ChatGPT: Unveiling Elon Musk's AI Chatbot Showdown

What is ChatGPT? Technology behind ChatGPT

Do you know what is Chat GPT?

My Early Take on ChatGPT: What’s Amazing and What’s Alarming

How may ChatGPT AI be set to disrupt the NHS?

ChatGPT FAQs

What is GPT- 4? - Everything You need to know about Open AI's tool (Contd)

ChatGPT and Higher Education: Crisis or Opportunity?

Behind the Scenes: The Training Process of ChatGPT

领英推荐

Insights with Data & Analytics

6,202 位关注者

A 30% jump in housing sales, and why aren't we celebrating?

2024年11月19日

Who pays higher rents? Canadians or international students

2024年11月18日

Toronto's Falling Housing Starts and the Condofication of a Metropolis

2024年11月16日

Toronto’s Construction Decline: An Overstated Concern?

2024年11月15日

Billions Lost, Pennies Spent: Toronto's $11 Billion Traffic Cost Gets Only $3 Million Band-Aid

2024年11月13日

Housing Starts: Diverging Trends in the US and Canada

2024年11月9日

Bank of Canada: Replacing results post publishing?

2024年11月7日

Maximizing Impact: Build Social Housing Where Land Is Affordable

2024年11月5日

Knowledge is a commodity, experience is not

2024年11月3日

High-Speed Rail in Canada? Too expensive when better rail investment alternatives exist.

2024年10月31日

社区洞察

其他会员也浏览了

ChatGPT: How Much Does It Cost to Build a Chatbot Like Chat GPT?

Grok vs. ChatGPT: Unveiling Elon Musk's AI Chatbot Showdown

What is ChatGPT? Technology behind ChatGPT

Do you know what is Chat GPT?

My Early Take on ChatGPT: What’s Amazing and What’s Alarming

How may ChatGPT AI be set to disrupt the NHS?

ChatGPT FAQs

What is GPT- 4? - Everything You need to know about Open AI's tool (Contd)

ChatGPT and Higher Education: Crisis or Opportunity?

Behind the Scenes: The Training Process of ChatGPT