Mainframe Channel I/O
After transfer to research in silicon valley, I got to wander around both IBM and customer datacenters, including bldg14 (disk engineering) and bldg15 (disk product test) across the street. At the time they were doing prescheduled, 7x24, stand-alone machine testing. They said that they had recently tried MVS, but it had 15min mean-time-between-failure (in that environment). I offered to rewrite I/O supervisor to make it bullet proof and never fail, allowing any amount of on-demand, concurrent testing ... greatly improving productivity. I then wrote up an (internal) research report on the work and happened to mention the MVS 15min MTBF ... bringing down the wrath of the MVS organization on my head. Informally I was told they tried to have me separated from the IBM company, when that failed they would make my time at IBM unpleasant in other ways (however the joke was on them, I was already being told I had no career, no promotions, no awards and no raises). A few years later when 3380s were about to ship, FE had a regression test of 57 errors that were likely to occur, in all 57 cases, MVS would fail (requiring re-ipl) and in 2/3rds of the cases there was no indication of what caused the failure. I didn't feel badly.
Bldg15 got the 1st engineering 3033 (outside POK, #3 or #4?) and since testing only took percent or two of processor, we found spare 3830 and couple strings of 3330 drives and setup private online service (ran 3270 coax under the street and added to my 3270 terminal switch on my desk). One Monday, got irate call asking what I had done to 3033 system (significant degradation, they claimed they did nothing). Eventually found that 3830 controller had been replaced with engineer 3880 controller. 3830 had fast horizontal microcode processing. 3880 had special hardware path for data transfer, but an extremely slow processor (JIB-prime) for everything else ... significantly driving up channel busy (and radically cutting amount of concurrent activity). They managed to mask some of the degradation before customer ship.
Trout/3090 had designed number channels for target throughput based on assumption that 3880 was same as 3830 (but supporting 3380 3mbyte/sec data rate). When they found out how bad 3880 channel busy really was, they realized they had to significantly increase the number of channels to achieve target throughput. The increase in channels required an additional TCM ... and 3090 semi-facetiously said that they would bill the 3880 controller group for the increase in 3090 manufacturing cost. Marketing then respun the large increase in number channels (to compensate for 3880 channel busy increase) to be a great I/O machine.
In 1980, STL (since renamed SVL), was bursting at the seams and they were moving 300 from IMS group to offsite bldg (w/dataprocessing back to STL datacenter). They had tried "remote 3270" but found the human factors totally unacceptable. I get con'ed into doing channel extender support, allowing channel attach 3270 controllers to be placed at offsite bldg (with no difference in human factors between offsite and inside STL). The hardware vendor tries to get IBM to release my support, but there was group in POK playing with some serial stuff, that get it vetoed (afraid if it is in the market, it would make it harder to get their stuff announced).
领英推荐
Unintended consequences; The STL 168s had 3270 (channel attached) controllers across all channels shared with disk controllers. Moving the 3270 controllers offsite with super fast channel interface box to 168 channel, channel busy was enormously reduced (for the same amount of 3270 terminal activity) compared to the 3270 controllers directly channel attached, increasing system throughput by 10-15% (eliminating much of the channel interference with disk controllers). There was some discussion of configuring all 3270 channel attached controllers for all the STL 168 systems similarly (even though didn't need the channel-extender capability, but the 10-15% in throughput would be welcome).
In 1988, the IBM branch asks me to help LLNL (national lab) get some serial stuff they are working with, standardized ... which quickly becomes fibre channel standard ("FCS", including some stuff that I had done in 1980) ... initially 1gbit/sec, full-duplex, 2gbit/sec aggregate, 200mbytes/sec. Then in 1990, the POK group gets their stuff released (when it is already obsolete) with ES/9000 as ESCON (17mbytes/sec).
Then some POK engineers become involved in FCS and define a heavy weight protocol that drastically reduces the throughput and is eventually released as FICON. The most recent published numbers I can find is Z196 "Peak I/O" benchmark that got 2M IOPS using 104 FICON (running over 104 FCS). About the same time there was FCS announced for E5-2600 blades claiming over million IOPs (two such FCS having higher throughput than 104 FICON).
Retired at Retired
2 年some followup Mainrame Channel Redrive https://www.dhirubhai.net/pulse/mainframe-channel-redrive-lynn-wheeler/
Retired at Retired
2 年Old email with some more about working with LLNL on porting their LINCS filesystem to HA/CMP ... also ADSTAR working on high-speed disks (not CKD DASD, but would require CKD emulation in order support POK favorite son operating system. https://www.garlic.com/~lynn/2019b.html#email911230 in this archived post https://www.garlic.com/~lynn/2019b.html#57 HA/CMP, HA/6000, Harrier/9333, STK Iceberg & Adstar Seastar
Retired at Retired
2 年Other DISK I/O trivia: 360 CKD sort of traded off abundant channel/disk capacity for limited real storage ... a decade later I was starting to pontificating that trade-off was starting to invert (instead of wasting enormous amount of channel&disk resources on multi-track search to repeatedly find something, cache the location and/or even the data). I wrote a tome in the early 80s on the subject (systems had gotten 40-50 times faster while disks only got 3-5 times faster) and some disk divsion executive took exception, assigning the division performance group to refute the claims. After a few weeks, they came back and essentially said that I had slightly understated the problem. They then respun the information to be about optimizing disk configurations for throughput for presenting at customers group meetings https://www.share.org/ presentation (16Aug1984, SHARE 63, B874). Note: no real CKD disks have been made for decades, all simulated on industry standard fixed-block disks
Octal Bus Driver
2 年When or did the presence of third parties show up while you were there? Storagetek, Memorex, Amdahl etc.