The Fragility of Critical Thinking
Brandon Bean
GDIT Defense AI/ML Director offering true hybrid multi-cloud, modular, open-source, cloud-first AI/ML solutions to GDIT's customers. AI/ML thought leader; educator; career Human Intelligence Warrant Officer
Critical thought is an imperative in Data Science. While there are several facets of the domain that require specialization and deep knowledge, all facets require critical thought. Asking the right question and informing your customer, whether internal or external, how to ask the right question is equally important. Warning up front. This article is part rant, part plea. Please, if you are a Data Science practitioner, educator, or leader, ensure that you are imbuing the importance of critical thinking in your teams, classes, and subordinates.
Critical Thought is an after-thought in higher education. Some students, especially digital natives, take data at face value based on an inherent trust in the results produced by technology. There is no motivation to ask the never-ending, "what if" or "what else" questions required in Data Science and Data Analytics. As an adjunct at both the Graduate and Under-graduate levels, I have seen semester after semester the permeation of the lack of critical thought into student work. A perfect exemplar of this is learning the Structured Query Language (SQL).
My teaching ranges across an array of subjects, mostly SQL, Python, Machine Learning, Data Visualization, and Data Analytics. I have developed most of the curriculum that I teach. In some cases, as an adjunct, I teach the curriculum as developed by others with my own ingestion of critical and industry-relevant topics to prepare students for their future careers in their prospective fields. In this specific article I will opine on my findings from over 16 semesters of teaching a Graduate SQL class I developed for one of the universities I teach at. I won't name the University, but if you browse my LinkedIn profile, you have a 33.3 percent chance of guessing which one it could be. And this article isn't about the university, its program, its curriculum, or its student's writ large. While I describe a subset of students that exhibit lackluster intuition into the data they encounter, it by no means is an attestation to an epidemic or a carte blanche accusation that any students lack the critical thinking necessary to succeed in a post-academic career. In fact, I have had several students that exhibit exceptional critical thinking skills. You can find several of them in my LinkedIn network. This article is an examination of the methods I use to elicit and assess critical thinking, and how the capability can be cultivated in individuals through thoughtful instructional design.
Bottom line--don't cancel me for having an unpopular opinion about what I see in academia.
The Setup
When I designed the SQL course (let's call it SQL 504), I intended to ensure the students were exposed to the reality of data in the wild. I remember when I went to Graduate school and tool SQL. I used a canned database in MySQL running on a WAMP server on my laptop. While it did the trick, it didn't prepare me for the real world of wild, messy, organic data. In fact, nothing prepared me for that aspect of analysis. And working in the Department of Defense, there is/was no shortage of dirty, messy data. SQL 504 was supposed to mimic the realization of what data analytics can and will be like when you get to your first job. A lot of identifying and interpreting database schemas, data quality, tracking down poorly updated data dictionaries, and the data. All of the dirty little things you encounter as a janitor of data. Of which, that term gets a lot of snickers when used. I do not know why. Janitors see and hear everything. They are the eyes and ears of the school or company. They get all the tea. That's what Data Analysts and Data Scientists are supposed to do...elicit all the tea the data has to offer (albeit via methodical approaches).
I designed SQL 504 around a business use case. Think Recreational Equipment Incorporated (REI) meets Cannondale. The use case revolves around each student being a new member of an analytical team charged with helping the company, named Outdoor Performance Center (OPC) revitalize its boutique mountain bike segment to remain relevant in the market. Why choose something so unique for a use case? I am an avid enduro mountain biker, and the data is something that most students hopefully have never seen before. They have to rely on the data dictionary (which is left relatively opaque on purpose) and developing domain knowledge to understand the data and segment. Part of this design is intentional based on the learning objective of involving domain experts (the instructor) in their work, and the other is the tyranny of time. I only had six weeks to develop the course and all materials.
The OPC use case includes company background, their mission as an analyst, and their objectives, which align to the course learning objectives. The academics in SQL 504 include topical overviews, examples, lectures, and supplemental data camp learning on topics such as the history of relational databases (our friend, Mr. Codd), normal forms and why they are important, building relational databases, and basic, intermediate, and advanced querying. Each level of querying covers the basics of unary table queries through window functions, and conditionals such as aggregates and the CASE statement. Additional topics on query optimization, query engines, using SQL with Python and R, are also covered.
Students have a discussion most weeks that covered the thematic topic, allowing them to do additional research and opine on their opinions or research while empowering their peers through Socratic engagement, and weekly SQL lessons and data camp instruction related to the weekly topics. Data Camp was chosen as a supplement to engage learners of various modalities. In some courses, I also include weekly podcasts as the discussion material. The final project is a presentation of what they learned using SQL over the eight weeks and their recommendations to OPC leadership on what they were able to uncover within the data. Is this realistic in the real world? No. Does it empower the students to feel like their opinions matter. Yes. In fact, the recommendation they are making is whether OPC should acquire a young boutique brand, Ord Cycles, and add it to their brand line to try to gain competitive advantage. Granted M&A would involve legal and several others and would not rely on the opinion of a lonely analyst, the end state is a student that sees the real potential in how the analysis they conduct affects decisions. In this case the bottom line for boutique mountain bike sales for REI.
The Tech
Students use PostgreSQL (including pgAdmin) and Azure Data Studio. PostgreSQL was an easy choice. It is the closest to Codd's principles as you will get, is open source, and has a lot of scalability and extensibility if students want to use it for other courses and purposes (including using the pgvector plugin for Gen AI). Azure was chosen specifically to allow students familiar with Jupyter Notebooks from their Python courses to write SQL queries in a familiar notebook format. And sheepishly it makes grading easier for me when I am traveling since they can save the Notebooks as html files for submission. It also allows students the ability to do basic transformations and visualizations on their SQL data natively within the Notebooks. It is a win-win for the students, and for me.
The Trap
By now you are probably saying, "when will this guy get to the damn point and talk about critical thinking?" Well, I am there. But first I have to tell you what the trap is. The trap is a single question I ask in Week three's assignment on basic querying. To this point in the course, the students' first exposure to critical thinking is database schema design and developing to third normal form (3NF). I choose 3NF because it is hard enough to get students to understand 3NF and transitive dependencies, let alone move beyond them. Students are given word documents that represent the schema of each table in the database, but there is no entity relation diagram (ERD) to show them how the tables interact. They are required to find those constraints and dependencies on their own. Week two is usually a complete cluster as everyone designs the data base differently. This is expected and understood. The students are not database administrators. They are learning SQL. I let them fail lightly to reinforce the importance of good database design and to inject the first measle of critical thought into the course.
I start week four by providing the students a database backup that they load from pgAdmin that has all of the tables and data prepared. This is the sigh of relief point for the students. Everyone is now on a common footing and has the same data. Errors from this point on are related to their queries, understanding of the database and its tables and data, and the technology.
So why do I call this section the trap? Week four's homework has a specific question, let's call it Part 3, that specifically asks the students to query the customer table for all customers that joined OPC between 9/11/01 to 10/01/01. Why this date? There should be a lull in customer acquisition right after 9/11/01. However, 8 customers (out of 2,600 in the 10-year period of data) joined in that timeframe.
The homework is divided into a queryable question or direction, such as "How many OPC customers joined between 9/11/01 and 10/0/01?" This generates an actionable/answerable query. Then there are follow-up questions based on the query results, such as "Find all applicable customer and order information for customers that joined during this time period named 'Delacruz'" This generates an additional query that garners more questions, and so on. Depending on the question or the thought I am trying to get the student to elicit, this action can go to several layers of depth.
In this particular case there are two Delacruz's that appear, Brandon and Margo. The trap is the follow-up question: "What is unique about these two customers?"
An open-ended question with enough data points to infer several unique traits between the two customers. I have provided the two rows below for your own investigation. Pardon the multitude of images.
领英推荐
The heart of the argument of critical thought and the reason behind this article is the response I get most often from students when asked what is unique between these customers: first name, last name, and customer number.
In 16 semesters, I have had maybe 10 students out of approximately 320, who have provided an answer beyond what you see above. While technically correct, this is not important to anyone. In fact, the expectation is that students find at least three of the following unique facts about the two Delacruz customers.
1. Margo joined two days after (+2) Brandon but has a cus_id that is lower than his. Logic says that auto-increment or unique fields usually ascend in number with newer records. What could explain this? Possibly a rule that allows for the re-use of cus_ids if records are deleted? Usually not, but not out of the ordinary. Cell phone numbers and SSNs are re-used after periods of black out.?
2. There is a leading space in Margo's address. Not a unique finding, but a data quality issue that needs resolved that could be a result of a bad front-end script or a failure of our data validation checks. May need addressed (hint).
3. Both are from different states. Could their joining two days apart be coincidence? Are the warehouses their orders shipped from, correct? They paid their sales tax to the correct state. We would need a new JOIN to add the state names, or we just need to know the values of the lookup tables. In this case, 2 is Alabama and 18 is North Carolina. The warehouses sent from are Columbus, OH and Sacramento, CA. Why would Margo's bike ship to Alabama from Sacramento when Dallas or Columbus is closer? Do we have inventory issues?
4. Look at the time it took for each Delacruz to make an order. They both ordered at least 11 months or more after joining the rewards program. Why? It took Brandon Almost 4 years to order. Are they linked?
5. The shipping address flag tells us they didn't order bikes for one another as gifts, as that flag indicates an order that was placed and shipped to an address other than the customer's address (gift, etc.).?
6. Look at how much Margo has spent in total order value versus Brandon and how their customer appreciation codes do not match up. Why?
A lot of these can't be answered by these rows, but additional queries and additional data (additional pivots) can answer these and highlight business process issues that may need addressed.
In most cases, students that do provide some level of critical thought will get one of these unique artifacts correct. In most cases, answers stray just beyond the attributes they think are unique to include other attributes, such as the bikes they purchased (referenced by the build_id here, but an amalgam of the builds and components tables).
This one question gives me a very good indication of whether students are actually looking at the data and thinking about what they see and what their query means, or if they are just going through the motions.
The Impact
All of the above information was derived from two rows of data. Imagine what the rest of the database holds? And this is just a small training database. Imagine asking these questions to the totality of REI's data holdings.
The trap in Week three sets the pace for my baseline of each student. This allows me to watch them grow over the next four weeks as they rapidly absorb new information. In most cases, the final projects are a far cry better than the Week three showing. in some cases, this is not an accurate statement.
There is an ocean of data out there that lends itself to creating information, knowledge, and wisdom, but we have to ask the right questions and follow-up with the "what if" and "what else." This vignette is just a glimpse at critical thought (or the lack of) in academia. Imagine the risk should this pervasively infiltrate industry or academia. Digital natives take technology at face value since they have interacted with it natively their entire lives. Risk does not look the same to them as it does to those who have seen the evolution of technology and the exponential decrease in Moore's Law year over year. The lack of critical thought could and may be detrimental to the future of Artificial Intelligence, Machine Learning, and Analytics. As professionals in the field, it is up to us to bear the standard for our peers, subordinates, and students. If you are an educator, leader, or influencer, continue to cultivate the capability among your cohorts and constituents. Never accept the X. Keep digging.
--
9 个月You are doing a good job.?
Career Change - Data Analytics | Searching for an internship to enhance my technical skills in data analysis and use my experience in critical thinking, communications, and problem solving.
9 个月I completely agree with you. As a former English teacher, I saw the decline of critical thinking and analytical skills in middle school students. Once I realized this deficiency, I aimed to help them become thinkers because analysis and critical thinking are at the heart of any subject. Without critical thinking skills, all industries will be affected.
Value Builder | Technology Transformation Leader
9 个月Great piece Brandon! I’ve shared with my educator sister and my teenage boys as well. Important information for everyone.