If ChatGPT has not secured you yet, it will now!
Generative AI is learning fast. OpenAI 's GPT 4 has recently added data analysis and image rendering capabilities. These will have serious implications for academia.
Here are two ways Generative AI will impact research and pedagogy.
First, Rendering images frees us from copyright limitations. We can generate an image based on instructions, free of copyright restrictions. I generated an image of a Canadian downtown. It showed Canadian flags blowing in opposite directions, perhaps a reflection of an electorate that can't make up its mind!
When I tried to generate a book cover, the algorithm generated the image but with spelling mistakes! It spelled Statistics with four T's.
Second, the data analysis ability is enabling and concerning at the same time. If one is not empirically inclined, one can upload a tabular data set and instruct the AI engine about the analysis one would like to perform. It's like sitting next to an expert statistician and having them analyze your data.
But if you are like me, who assigns data sets to students to analyze for assignments and exams, you may want to read the following carefully. ChatGPT can complete most assignments and exams we routinely hand out to students.?
We will need new ways to judge competencies.?
Here is an exercise that I undertook with OpenAI today. I used the data from a paper by Professor Daniel Hamermesh in which he used data from the University of Texas at Austin to determine whether an instructor's teaching evaluation score (averaged from evaluations conducted by students, a routine task at most schools) is influenced by their looks. Put simply, do good-looking professors get higher teaching evaluations from their students? The paper is available at:
I've shared below my step-by-step interaction with ChatGPT after I uploaded the data set in Excel format. If a student had submitted the following output, I would have to give it a good grade. My minimalist prompts are in ITALICS.
There are some minor mistakes in the estimated regression (two regressors should have been excluded), but they do not affect the conclusion. Also, I'd like to point out how remarkably AI guessed the nature of variables by their names. It made just one mistake because allstudents was too vague a variable name to be guessed correctly.
You will notice that the AI conducts the analysis, describes the results and concludes at the end. The AI-generated conclusion is produced verbatim below:
In conclusion, after controlling for various confounders, the analysis indicates that instructors' perceived beauty does have a statistically significant influence on their teaching evaluations.
The AI engine generates the analysis using Python and reports the code for further processing. I have pasted the code at the end along with the tabular output generated by AI.
Last Word:
Teaching & Learning Statistics Will Never Be the Same
My Prompt:
The attached file contains information about teaching evaluation scores received by instructors and their attributes, plus the attributes of the courses they taught. We are particularly interested in finding an answer to the question that if we control for other confounders, is teaching evaluation of instructors influenced by their perceived beauty by the students?
Output from ChatGPT
Data Analysis
The dataset contains several columns which might be relevant to our analysis:
To assess the influence of perceived beauty on teaching evaluations while controlling for other factors, we would typically conduct a multivariate regression analysis where eval is the dependent variable and beauty, along with other instructor and course attributes, are the independent variables.
Would you like to proceed with a regression analysis, or is there another specific analysis or preprocessing step you would like to perform first?
My Prompt
Let's do preprocessing first.
Output from ChatGPT
领英推荐
Data Analysis
The preprocessing step has revealed that:
Before proceeding to regression analysis, we might consider normalizing or standardizing data if the scales are significantly different, which could affect the regression weights. However, this is not always necessary, especially if we're interpreting the beta weights in terms of standard deviations.
We should also encode categorical variables (like gender, minority, native, tenure, credits, and division) into numerical values if we intend to include them in the regression analysis since these models require numerical input.
Do you want to normalize/standardize any numerical variables and encode the categorical variables before moving on to the regression analysis?
?My Prompt
Let's do encoding and standardizing and move to regression. Weight the regression model by the variable students. Also, report clustered standard errors by clustering them on the variable prof.
ChatGPT
Data Analysis
The data has been standardized, and categorical variables have been encoded. We now have a transformed feature set that includes:
Next, we will perform the regression analysis. To do this, we'll use the statsmodels library, which allows for specifying weights and clustering standard errors by groups. We'll weight the regression model by the students column and report clustered standard errors by clustering them on the prof variable.
Let's move on to fitting the weighted least squares regression model with clustered standard errors.
The weighted least squares regression model provides the following insights:
It's important to note that the standard errors are robust to clustering by the prof variable, which accounts for the non-independence of observations within clusters of the same instructor.
In conclusion, after controlling for various confounders, the analysis indicates that instructors' perceived beauty does have a statistically significant influence on their teaching evaluations.
Automatically generated Code in Python
import pandas as pd
# Load the data to take a look at what we have file_path = '/mnt/data/TeachingRatings.csv'
teaching_ratings_df = pd.read_csv(file_path)
# Display the first few rows of the dataframe to understand its structure
teaching_ratings_df.head()
# Check for any missing values in the dataset
missing_values = teaching_ratings_df.isnull().sum()
# Summarize the data to see the distribution and potential outliers
summary_statistics = teaching_ratings_df.describe()
missing_values, summary_statistics
import statsmodels.api as sm
# Adding a constant to the model for the intercept
X_transformed_df = sm.add_constant(X_transformed_df)
# Convert 'students' to integer for the weights
weights = teaching_ratings_df['students'].astype(int)
# Fit the weighted least squares regression model
wls_model = sm.WLS(y, X_transformed_df, weights=weights).fit(cov_type='cluster', cov_kwds={'groups': teaching_ratings_df['prof']})
# Get the summary of the regression model
model_summary = wls_model.summary()
model_summary
?
Reliability | CM Supervisor at UZBEKISTAN GTL LLC
1 年Interesting, can chatgpt4 do webscraping and data cleaning as well?
Professor-SCM, School of Management Studies, Program Coordinator- MBA(Healthcare), Vice President at ISDSI
1 年Is it Statistics or statisttics?
Professor-SCM, School of Management Studies, Program Coordinator- MBA(Healthcare), Vice President at ISDSI
1 年Congratulations sir
?? VP Product Management ?? Maptitude Location Intelligence for Operations & Business Development Analysis
1 年Time to start using ChatGPT or time for plastic surgery?
Educator and data analyst
1 年Fair enough! I give all my take home exams to ChatGPT first.