登录查看更多内容

Crashing the Student Computer Lab

Dr. Robert McKeon Aloe

Generative AI Safety / Red Teaming at Apple

发布日期: 2019年2月6日

In my last year of graduate school at Notre Dame, I used over 1,000,000 computer hours or just over 114 years of compute time. Only once did I inadvertently crash the engineering computing lab.

I was using a distributed compute grid called Condor. It was installed on most computers in the computer labs across the engineer building. This later expanded to the entire university. It would only use spare compute cycles, and it would stop if someone logged in. One would have a script to send jobs and commands to these machines, and it would dump the results in a nice little folder.

Below is from a paper about how one would make the all vs all comparison for face recognition experiments. These were the types of experiments I was running.

I needed to use Condor because I was using Iterative Closest Point (ICP) to do face matching. At the time, it was one of the best techniques, but it was computational expensive like O(m*n*log n * iterations) where m is the number of points in the model and n is the number of points in the comparison model. The iterations made a difference, and you could do some optimization, but it was still slow. However, for failure analysis, ICP was visually appealing and understandable relative to other pattern recognition techniques.

Usually, running large experiments was always bottlenecked by resources. So weekends and nights were when most of the jobs got done. I also had to compete for resources with two other grad students that were using Condor almost as much as I was. After I left, Condor got really busy. Below is a utilization chart pulled from whatever I could find as an example. It happens to have me (rmckeon) as the top user, and coincidentally, the time period it covers includes Spring Break, which is when the computer lab lost!

Spring break of 2009 came (see chart above), and suddenly I was dominating all of the resources. I was thrilled until I got the email. Some poor student over the break came to the lab and tried to login to a computer. The login just spun and spun. He tried multiple machines, but I was on all of them through Condor, twice over (each machine was duel core, so two of my jobs per machine).

My jobs didn’t give up its priority as they should have, and as a result, my jobs had render the labs unless to anyone else unless they hard rebooted all the machines. Even on reboot, my jobs would get pushed to those machines and take over if the user didn't login quick enough!

I had to kill off all of my jobs, and the professor in charge of Condor had to fix the bug. I didn’t want to dominate an entire computer lab, but it was pretty funny.

Me on Twitter

Me on Medium

Further readings of mine:

My coffee Setup

A Day in the Life of a Data Scientist

Writings Sorted by Topic

Abandon Ship: How a Startup went Under

Reflections on Professional Character

396 位关注者

Prasiddhi M.

Senior Data Scientist at Google

6 年

Really fascinating! Can you give more details about the bug in Condor that didn't lower priority for your jobs?

1 次回应

His Excellency Raymond Toh

ICT Counsel | Autodidact @ SYNC01? Global Outreach Mechanism?

6 年

Cool - lol - ?? : )

1 次回应

查看更多评论

要查看或添加评论，请登录

Dr. Robert McKeon Aloe的更多文章

Ph.D. Interviews

2019年7月30日

Ph.D. Interviews

I have interviewed mostly Ph.D.
How to break into Data Science the easy way

2019年7月16日

How to break into Data Science the easy way

Scratch that; there’s not an easy way. Data science has become a hot topic the past few years along side machine…

5 条评论
ML: Examining the Test Set

2019年5月13日

ML: Examining the Test Set

I recently saw a post where someone said “Never touch your test set.” The theory was that you (as the algorithm…

8 条评论
Privacy in Machine Learning: PII

2019年4月24日

Privacy in Machine Learning: PII

Privacy is not a value explicitly written into the US Constitution, but the essentials are there. As a democratic…

1 条评论
Mastering LinkedIn

2019年3月27日

Mastering LinkedIn

Account Creation I never had a LinkedIn account until I was searching for a job, and then I only paid attention to it…

1 条评论
Withdrawing a Conference Paper

2019年3月14日

Withdrawing a Conference Paper

In graduate school, I tried all sorts of optimizations aimed at making my face matcher work better and faster. I found…

1 条评论
Thoughts on Leaving

2019年2月26日

Thoughts on Leaving

Relax, I’m not leaving my current job right now. I’ve been writing about many different aspects of my work experience…
Presentation Essentials

2019年1月23日

Presentation Essentials

I have fallen asleep in my fair share of presentations, and I’ve worked hard at making sure my presentations are not…
Design of Experiment: Data Collection

2019年1月9日

Design of Experiment: Data Collection

Anyone can collect data; some people can collect good data. The key theme to any good data collection is data…
Preserving LinkedIn for Professionalism

2019年1月2日

Preserving LinkedIn for Professionalism

I recently saw a discussion on LinkedIn about LinkedIn possibly becoming more like Facebook and how that was…

See all articles

Crashing the Student Computer Lab

Dr. Robert McKeon Aloe

Generative AI Safety / Red Teaming at Apple

Reflections on Professional Character

396 位关注者

Dr. Robert McKeon Aloe的更多文章

社区洞察

其他会员也浏览了

124 Most Inspirational Quotes, Phrases, and Sayings About Computers

Computer Science Now a Graduation Prerequisite for North Carolina Schools

The History of RISC: From Early Beginnings to Modern ARM and RISC-V

RISC-V Competition | Peking University High-Performance Computing Comprehensive Ability Competition Concludes Successfully!

History of Innovation in Computing

Quantum Computing - Think Qubit

HBM Memory Forces a Reshuffle in the Chip Industry

Getting Started with Quantum Computing and Q# Training Course

Back to Bytes: A Time Capsule of Computing and A.I. in 1984 (part i of iii)

Introduction

Reflections on Professional Character

396 位关注者

Dr. Robert McKeon Aloe的更多文章

Ph.D. Interviews

How to break into Data Science the easy way

ML: Examining the Test Set

Privacy in Machine Learning: PII

Mastering LinkedIn

Withdrawing a Conference Paper

Thoughts on Leaving

Presentation Essentials

Design of Experiment: Data Collection

Preserving LinkedIn for Professionalism

社区洞察

其他会员也浏览了

124 Most Inspirational Quotes, Phrases, and Sayings About Computers

Computer Science Now a Graduation Prerequisite for North Carolina Schools

The History of RISC: From Early Beginnings to Modern ARM and RISC-V

RISC-V Competition | Peking University High-Performance Computing Comprehensive Ability Competition Concludes Successfully!

History of Innovation in Computing

Quantum Computing - Think Qubit

HBM Memory Forces a Reshuffle in the Chip Industry

Getting Started with Quantum Computing and Q# Training Course

Back to Bytes: A Time Capsule of Computing and A.I. in 1984 (part i of iii)

Introduction