Journey: Defrag PCs to Defrag Analytics

Journey: Defrag PCs to Defrag Analytics

My journey started in the mid-90s assembling, repairing, and optimizing desktop computers. To this day I recall the buzz surrounding the release of Windows NT 4.0 and Netscape Navigator evolving to Netscape Communicator. This was an era when my MP3 collection pushed my Cambridge Soundworks subwoofer to its limits and internal hard drives were connected via large-clunky IDE ribbon cables.

Speaking of hard drives, defragmenting their data was a bi-weekly computer maintenance ritual back then. I recall how happy customers were after I performed a tuneup on their computer, which always included a defrag. What is drive defragmentation and why was it required? Over the years I’ve thought of a wide spectrum of analogies (e.g., defragging is kinda like reorganizing your closet so the items that you use frequently are in the front and within easy reach). However today I'll share my latest analogy:

Imagine for a moment, that you are a computer and your hard drive is a three-ring binder from when you were in High School. After the first week of school, there are a few items in the binder: syllabus, course handouts, and loose-leaf binder paper with some written notes for the different classes –however it's mostly empty (just as a new computer’s hard drive is mostly empty in the beginning). Imagine you’re in one of your classes (let’s say Biology) and you need to jot down some notes. When you open the binder, it opens at a random position within all of the binder’s content. Your Biology teacher is speaking on an important topic and you intend to take notes, however, the binder happened to open in the algebra section. You can’t delay any longer on writing so you quickly write the notes regarding cellular respiration, somewhere within the algebra section of the binder.

When you finished writing, you set your Dixon Ticonderoga yellow No. 2 pencil down on the desk. Imagine that in this hypothetical, you either listen or write, not both at the same time. After a few minutes, a term used during the biology lecture reminds you of a topic from chemistry class. You move from the algebra section of the binder to the chemistry section; quickly searching for the notes that you were just reminded of. Finally, you find it, but now suddenly there are more topics from the lecture that you want to write notes on –however, you are now in the chemistry section of the binder. In this hypothetical, you are unable to navigate back to the biology section; you immediately jot down additional notes about biology right below where you left off -the chemistry notes.

Now imagine, that the above occurs in all of your classes. Your notes become fragmented in the binder. Now imagine the state of the binder near the end of the school year. The binder transitioned from mostly empty to almost full. The point was reached where you cannot add additional paper to the binder. In this stage, you write notes in any blank space you can find on any piece of paper. Occasionally, there are some notes that you’ve memorized and so you erase those notes so that you have some blank spaces for new content. Most times, you write down your notes in a highly fragmented way –wherever you can find space to write. It’s common for a paragraph of notes to be written across several pages.

Can you imagine how time-consuming it would be to find specific notes in your binder near the end of the school year? This is how data became fragmented on Winchester (mechanical) hard drives run by legacy operating systems. This is the main reason why a computer slows way down when the drive is near capacity. There were various software utilities that would defrag the data to organize and optimize data reading performance; eventually, such utilities would be built into the operating system. With today’s solid-state drive (SSD) and modern operating systems, we don’t need to worry about fragmentation anymore. Okay, let us fast forward to how I defrag data today:

I organize and make sense of data –I seek meaningful patterns in the sea of letters and numbers.?

My toolbox includes methods such as wrangling, data normalization (different from database normalization), imputing values, and aggregation. These days, I’m learning how to use cool tools such as RStudio and Tableau. You may read more about my journey in one of my other blogs: https://www.dhirubhai.net/pulse/my-evolution-handling-data-carlos-m-yupanqui

This is the story behind Defrag 4N4LYT1CS.

No alt text provided for this image

#CareerJourney #DataAnalytics


Alex Elabed

e-Learning enthusiast & Communication Studies Instructor at the College of San Mateo and Ohlone College's Rising Scholars Program

1 年

I love the binder analogy when referring to legacy hard drives. I thought even modern SSD's can slow down when users have them close to capacity and frequently run resource hungry applications?

要查看或添加评论,请登录

Carlos M. Yupanqui的更多文章

  • The Gift Of Rejection

    The Gift Of Rejection

    I've been rejected countless times throughout my career. In the past, my initial reaction would be that of doubting…

    6 条评论

社区洞察

其他会员也浏览了