Why Would Someone in Europe look at a New Hampshire Hillbilly LinkedIn Profile?
First I put comments on the picture as clickbait. Look carefully at the AI generated picture. First, it can’t count: I asked for 20 people and it gave me 15.167 people. Overall, the people do look like people, and they are at the beach, playing with giant Legos, that mostly look like plastic blocks. Besides the thumbless young woman 4th from the left in the front row, most of the people seem to have 5 fingers on each hand, and besides the extra random dis-attached tattooed arm (the 0.167 part of a person) and the guy floating in the air in a sitting position, it is a nice picture. But if you zoom in on the details, almost every image of a person has something odd. Would you let a computer program like the one that drew this picture analyze a medical database? Would anyone trust an AI to make plastic surgery decisions after seeing the face of image of a woman that is fourth from the left in the top row?
This article started as a private email to a person who looked at my LinkedIn profile. The person was from Europe and was in charge of a clinical database.
Hi, LinkedIn said you took a look at my profile. Why would someone in Europe look at the profile of a New Hampshire hillbilly? My wild guess is that I posted that many years ago the estimate was 50% of all programmers had never written a working program ever in their life. According to Paul Ercolino, a top known effective programmer on LinkedIn, things are worse now and if I attach his name, I should not use the word “programmer” for the imposters who are better described as “coders” or “Software Engineers.”
While people fantasize about building more nuclear power plants to make electricity for humongous AI installations, a graduate student at University of Wisconsin White Water showed me the “leader board” for Large Language Model (LLM) on the Google Cloud Platform for Students. This student was implementing some categorization AI models. Free test sets containing large amounts of semi-structured data from public newsgroups. The free tier of LLM models had about 256 dimensions. Picking more dimensions without upgrading to a paid tier often led to incomplete results: twice the dimensions often resulted in more than 4 times the execution time, and in the free tier, your time was limited. But on the “leader board” you could see that the largest LLM with the most dimensions provided only a 0.5% gain in accuracy from the smallest LLM on the same problem data set. But changing the model a bit gave a 10% gain in accurate categorization on the same data set. The smallest number of dimensions were sufficient for the task. The energy and time sucking extra dimensions had little impact on the final results. At least with this problem, I had no qualms about the student using the smallest LLM in dimensional size and trying to optimize the model instead. I think AI as currently implemented is useless for scientific data and more useless for clinical data. Rather than AI, one needs Actual Intelligence. Computer databases may be able to help. Way back, some colleagues of mine made progress with semantic databases and organizing analytical data. They were trying to make a kinase/phosphatase data base. While promising, I think the GIGO factor made the database not very useful. (GIGO is garbage in garbage out. It is hard to get accurate and precise data on protein phosphorylation.) In a semantic database, there are no hallucinations such as there are in AI. The answers are as accurate as the input data. Using the semantic database, patterns and conclusions that no one person has seen are assembled from individual data points. The front end of the database also allows nearly human language queries: the limitation being that queries must use only defined vocabulary.
领英推荐
Through my career as a biophysicist, I have become an expert at curve fitting. The key to making sense of any data set is understanding the model being used. AI is almost model free, so it can’t tell you very much. Because the wrong model will tell you wrong answers, some scientists think “model free” analysis is better and more “objective.” But model free analysis usually tells you nothing. In biophysics, the more precise information you can add to a model, the more information you can extract from any data set. For instance, in one experiment I analyzed, the statistics of the monomer – pentamer model were equivalent to the statistics of the dimer – tetramer – octamer model. But the protein structure in no way supported 5 fold symmetry, so I get more information if I use just the dimer – tetramer – octamer model. In both cases I knew the monomer weight within 3% and fixed that parameter.
Assigning meaning to model parts is the most common error in science. The map is not the territory. Sometimes different models explain the same territory: both are “right” such as was the case with Heisenberg (infinite matrices) and Schrodinger (wave equation) even though it took Johnny Neumann to figure out the strange delta function that proved how the equations of the other two guys were really the same thing. Some of Richard Feynman’s famous Feynman diagrams can be interpreted as electrons going backward in time. Feynman diagrams explain the data, but there may be other frameworks and equations possible that do not use time traveling electrons. Hermann Minkowski proposed the 4 dimensional space time continuum to explain Einstein’s special relativity theories. Current string theories use 10 or 11 dimensions. Do the mathematical dimensions greater than 3 correspond to anything real? All particles and forces are mapped to some kind of vibration of the “strings.” It seems that these strings are as fanciful as the luminiferous aether theorized to exist before Einstein’s relativity theory.
AI and particularly LLM are known to suffer from spurious correlations. Did you know that the popularity of the name “Stevie” is correlated to Lululemon’s stock price with an r-value of 0.98? There is also a statistically significant correlation between the number of Nicolas Cage movies and drownings in swimming pools. The LLM can find statistically significant patterns, but some of those patterns are spurious and do not correspond to any type of causality.
Actual intelligence is needed to manage large data sets and extract useful information. The fantasies of AI being “smarter” than humans, especially with scientific and medical data, are simply absurd.
Co-Founder of Altrosyn and DIrector at CDTECH | Inventor | Manufacturer
2 个月The pursuit of AI safety and ethics is crucial as we navigate this uncharted territory. Consider the potential impact on fields like healthcare, where AI could revolutionize diagnostics and treatment. How can we ensure that these advancements benefit all of humanity, not just a select few?