Posted on Leave a comment

Is SMART Really Useful?

Being in technology for a long time, I have seen my fair share of disk failures. However I have never seen a single instance where SMART has issued a sufficient warning to backup any data on a failing disk. The following is an example of this in action.

Toshiba MQ01ABD050
Toshiba MQ01ABD050

Here is a 2.5″ Toshiba MQ01ABD050 500GB disk drive. This unit was made in 2014, but has a very low hour count of ~8 months, with only ~5 months of the heads being loaded onto the platters, since it has been used to store offline files. This disk was working perfectly the last time it was plugged in a few weeks ago, but today within seconds of starting to transfer data, it began slowing down, then stopped entirely. A quick look at the SMART stats showed over 4000 reallocated sectors, so a full scan was initiated.

SMART Test Failure
SMART Test Failure

After the couple of hours an extended test takes, the firmware managed to find a total of 16,376 bad sectors, of which 10K+ were still pending reallocation. Just after the test finished, the disk began making the usual clicking sound of the head actuator losing lock on the servo tracks. Yet SMART was still insisting that the disk was OK! In total about 3 hours between first power up & the disk failing entirely. This is possibly the most sudden failure of a disk I’ve seen so far, but SMART didn’t even twig from the huge number of sector reallocations that something was amiss. I don’t believe the platters are at fault here, it’s most likely to be either a head fault or preamp failure, as I don’t think platters can catastrophically fail this quickly. I expected SMART to at least flag that the drive was in a bad state once it’s self-test completed, but nope.

Internals
Internals

After pulling the lid on this disk, to see if there’s any evidence of a head crashing into a platter, there’s nothing – at least on a macroscopic scale, the single platter is pristine. I’ve seen disks crash to the point where the coating has been scrubbed from the platters so thoroughly that they’ve been returned to the glass discs they started off as, with the enclosure packed full of fine black powder that used to be data layer, but there’s no indication of mechanical failure here. Electronic failure is looking very likely.

Clearly, relying on SMART to alert when a disk is about to take a dive is an unwise idea, replacing drives after a set period is much better insurance if they are used for critical applications. Of course, current backups is always a good idea, no matter the age of drive.

Posted on Leave a comment

nb Tanya Louise – Gas Locker Corrosion Part 1 – Removing The Old Locker & Replacing the Deck Plate

Severe Corrosion
Severe Corrosion

This is a part of the boat that hasn’t really had much TLC since we moved aboard, and finally it’s completely succumbed to corrosion, opening a rusty hole into the engine space below. I’ve already used a grinder to remove the rest of the locker – and even this had corroded to the point of failure all around the bottom just above the welds. The bulkhead forming the rear of the locker has also corroded fairly severely, so this will be getting cut out & replaced with a new piece of steel.
This was originally a 1/8″ plate, but now it’s as thin as foil in some places, with just the paint hiding the holes.

Replacement Steel
Replacement Steel

I’ve cut out as much of the corroded deck plate as possible –  it’s supported underneath by many struts made of angle iron, and got the new 3mm replacement tacked in place with the MIG. I’ve not yet cut out the rotten section on the bulkhead, this will come after we’ve got the steel cut to replace it, as electrical distribution is behind this plate – I’d rather not have weather exposure to the electrical systems for long! Unfortunately more corrosion has showed itself around the edges of the old locker:

Thin Steel
Thin Steel

Around the corner the steel has pretty much totally failed from corrosion coming from underneath – applying welding heat here has simply blown large holes in the steel as there’s nothing more than foil thickness to support anything.

Some more extensive deck replacement is going to happen to fix this issue, more to come when the steel comes in!

Posted on Leave a comment

16-Port SATA PCIe Card – Cooling Recap

It’s been 4 months since I did a rejig of my storage server, installing a new 16-port SATA HBA to support the disk drives. I mentioned the factory fan the card came with in my previous post, and I didn’t have many hopes of it surviving long.

Heatsink
Heatsink

The heatsink card has barely had enough time to accumulate any grime from the air & the fan has already failed!

There’s no temperature sensing or fan speed sensing on this card, so a failure here could go unnoticed, and under load without a fan the heatsink becomes hot enough to cause burns. (There are a total of 5 large ICs underneath it). This would probably cause the HBA to overheat & fail rather quickly, especially when under a high I/O load, with no warning. In my case, the bearings in the fan failed, so the familiar noise of a knackered sleeve bearing fan alerted me to problems.

Replacement Fan
Replacement Fan

A replacement 80mm Delta fan has been attached to the heatsink in place of the dead fan, and this is plugged into a motherboard fan header, allowing sensing of the fan speed. The much greater airflow over the heatsink has dramatically reduced running temperatures. The original fan probably had it’s bearings cooked by the heat from the card as it’s airflow capability was minimal.

Fan Rear
Fan Rear

Here’s the old fan removed from the heatsink. The back label, usally the place where I’d expect to find some specifications has nothing but a red circle. This really is the cheapest crap that the manufacturer could have fitted, and considering this HBA isn’t exactly cheap, I’d expect better.

Bearings
Bearings

Peeling off the back label reveals the back of the bearing housing, with the plastic retaining clip. There’s some sign of heat damage here, the oil has turned into gum, all the lighter fractions having evaporated off.

Rotor
Rotor

The shaft doesn’t show any significant damage, but since the phosphor bronze bearing is softer, there is some dirt in here which is probably a mix of degraded oil & bearing material.

Stator & Bearing
Stator & Bearing

There’s more gunge around the other end of the bearing & it’s been worn enough that side play can be felt with the shaft. In ~3000 hours running this fan is totally useless.

Posted on Leave a comment

nb Tanya Louise Heating System – Oxide Sludge

I wrote a few weeks ago about replacing the hot water circulating pump on the boat with a new one, and mentioned that we’d been through several pumps over the years. After every replacement, autopsy of the pump has revealed the failure mode: the first pump failed due to old age & limited life of carbon brushes. The second failed due to thermal shock from an airlock in the system causing the boiler to go a bit nuts through lack of water flow. The ceramic rotor in this one just cracked.
The last pump though, was mechanically worn, the pump bearings nicely polished down just enough to cause the rotor to stick. This is caused by sediment in the system, which comes from corrosion in the various components of the system. Radiators & skin tanks are steel, engine block cast iron, back boiler stainless steel, Webasto heat exchanger aluminium, along with various bits of copper pipe & hose tying the system together.
The use of dissimilar metals in a system is not particularly advisable, but in the case of the boat, it’s unavoidable. The antifreeze in the water does have anti-corrosive additives, but we were still left with the problem of all the various oxides of iron floating around the system acting like an abrasive. To solve this problem without having to go to the trouble of doing a full system flush, we fitted a magnetic filter:

Mag Filter
Mag Filter

This is just an empty container, with a powerful NdFeB magnet inserted into the centre. As the water flows in a spiral around the magnetic core, aided by the offset pipe connections, the magnet pulls all the magnetic oxides out of the water. it’s fitted into the circuit at the last radiator, where it’s accessible for the mandatory maintenance.

Sludge
Sludge

Now the filter has been in about a month, I decided it would be a good time to see how much muck had been pulled out of the circuit. I was rather surprised to see a 1/2″ thick layer of sludge coating the magnetic core! The disgusting water in the bowl below was what drained out of the filter before the top was pulled. (The general colour of the water in the circuit isn’t this colour, I knocked some loose from the core of the filter while isolating it).

If all goes well, the level of sludge in the system will over time be reduced to a very low level, with the corrosion inhibitor helping things along. This should result in much fewer expensive pump replacements!

Posted on Leave a comment

Project Volantis – Storage Server Rebuild

For some time now I’ve been running a large disk array to store all the essential data for my network. The current setup has 10x 4TB disks in a RAID6 array under Linux MD.

Up until now the disks have been running in external Orico 9558U3 USB3 drive bays, through a PCIe x1 USB3 controller. However in this configuration there have been a few issues:

  • Congestion over the USB3 link. RAID rebuild speeds were severely limited to ~20MB/s in the event of a failure. General data transfer was equally as slow.
  • Drive dock general reliability. The drive bays are running a USB3 – SATA controller with a port expander, a single drive failure would cause the controller to reset all disks on it’s bus. Instead of losing a single disk in the array, 5 would disappear at the same time.
  • Cooling. The factory fitted fans in these bays are total crap – and very difficult to get at to change. A fan failure quickly allows the disks to heat up to temperatures that would cause failure.
  • Upgrade options difficult. These bays are pretty expensive for what they are, and adding more disks to the USB3 bus would likely strangle the bandwidth even further.
  • Disk failure difficult to locate. The USB3 interface doesn’t pass on the disk serial number to the host OS, so working out which disk has actually failed is difficult.

To remedy these issues, a proper SATA controller solution was required. Proper hardware RAID controllers are incredibly expensive, so they’re out of the question, and since I’m already using Linux MD RAID, I didn’t need a hardware controller anyway.

16-Port HBA
16-Port HBA

A quick search for suitable HBA cards showed me the IOCrest 16-port SATAIII controller, which is pretty low cost at £140. This card breaks out the SATA ports into standard SFF-8086 connectors, with 4 ports on each. Importantly the cables to convert from these server-grade connectors to standard SATA are supplied, as they’re pretty expensive on their own (£25 each).
This card gives me the option to expand the array to 16 disks eventually, although the active array will probably be kept at 14 disks with 2 hot spares, this will give a total capacity of 48TB.

HBA
SATA HBA

Here’s the card installed in the host machine, with the array running. One thing I didn’t expect was the card to be crusted with activity LEDs. There appears to be one LED for each pair of disks, plus a couple others which I would expect are activity on the backhaul link to PCIe. (I can’t be certain, as there isn’t any proper documentation anywhere for this card. It certainly didn’t come with any ;)).
I’m not too impressed with the fan that’s on the card – it’s a crap sleeve bearing type, so I’ll be keeping a close eye on this for failure & will replace with a high quality ball-bearing fan when it finally croaks. The heatsink is definitely oversized for the job, with nothing installed above the card barely gets warm, which is definitely a good thing for life expectancy.

Update 10/02/17 – The stock fan is now dead as a doornail after only 4 months of continuous operation. Replaced with a high quality ball-bearing 80mm Delta fan to keep things running cool. As there is no speed sense line on the stock fan, the only way to tell it was failing was by the horrendous screeching noise of the failing bearings.

SCSI Controller
SCSI Controller

Above is the final HBA installed in the PCIe x1 slot above – a parallel SCSI U320 card that handles the tape backup drives. This card is very close to the cooling fan of the SATA card, and does make it run warmer, but not excessively warm. Unfortunately the card is too long for the other PCIe socket – it fouls on the DIMM slots.

Backup Drives
Backup Drives

The tape drives are LTO2 300/600GB for large file backup & DDS4 20/40GB DAT for smaller stuff. These were had cheap on eBay, with a load of tapes. Newer LTO drives aren’t an option due to cost.

The main disk array is currently built as 9 disks in service with a single hot spare, in case of disk failure, this gives a total size after parity of 28TB:

The disks used are Seagate ST4000DM000 Desktop HDDs, which at this point have ~15K hours on them, and show no signs of impending failure.

USB3 Speeds
USB3 Speeds

Here’s a screenshot with the disk array fully loaded running over USB3. The aggregate speed on the md0 device is only 21795KB/s. Extremely slow indeed.

This card is structured similarly to the external USB3 bays – a PCI Express bridge glues 4 Marvell 9215 4-port SATA controllers into a single x8 card. Bus contention may become an issue with all 16 ports used, but as far with 9 active devices, the performance increase is impressive. Adding another disk to the active array would certainly give everything a workout, as rebuilding with an extra disk will hammer both read from the existing disks & will write to the new.

HBA Speeds
HBA Speeds

With all disks on the new controller, I’m sustaining read speeds of 180MB/s. (Pulling data off over the network). Write speeds are always going to be pretty pathetic with RAID6, as parity calculations have to be done. With Linux MD, this is done by the host CPU, which is currently a Core2Duo E7500 at 2.96GHz, with this setup, I get 40-60MB/s writes to the array with large files.

Disk Array
Disk Array

Since I don’t have a suitable case with built in drive bays, (again, they’re expensive), I’ve had to improvise with some steel strip to hold the disks in a stack. 3 DC-DC converters provides the regulated 12v & 5v for the disks from the main unregulated 12v system supply. Both the host system & the disks run from my central battery-backed 12v system, which acts like a large UPS for this.

The SATA power splitters were custom made, the connectors are Molex 67926-0001 IDC SATA power connectors, with 18AWG cable to provide the power to 4 disks in a string.

IDT Insertion Tool
IDT Insertion Tool

These require the use of a special tool if you value your sanity, which is a bit on the expensive side at £25+VAT, but doing it without is very difficult. You get a very well made tool for the price though, the handle is anodised aluminium & the tool head itself is a 300 series stainless steel.

Posted on Leave a comment

eBay Chinese Chassis Power Supply S-400-12 400W 12v 33A

S-400-12 PSU
S-400-12 PSU

Here’s a cheap PSU from the treasure trove of junk that is eBay, rated at a rather beefy 400W of output at 12v – 33A! These industrial-type PSUs from name brands like TDK-Lambda or Puls are usually rather expensive, so I was interested to find out how much of a punishment these cheap Chinese versions will take before grenading. In my case this PSU is to be pushed into float charging a large lead acid battery bank, which when in a discharged state will try to pull as many amps from the charger as can be provided.

Rating Label
Rating Label

These PSUs are universal input, voltage adjustable by a switch on the other side of the PSU, below. The output voltage is also trimmable from the factory, an important thing for battery charging, as the output voltage needs to be sustained at 13.8v rather than the flat 12v from the factory.

Input Voltage Selector
Input Voltage Selector
Main Terminal Block
Main Terminal Block

Mains connections & the low voltage outputs are on beefy screw terminals. The output voltage adjustment potentiometer & output indicator LED are on the left side.

Cooling Fan
Cooling Fan

The cooling fan for the unit, which pulls air through the casing instead of blowing into the casing is a cheap sleeve bearing 60mm fan. No surprises here. I’ll probably replace this with a high-quality ball-bearing fan, to save the PSU from inevitable fan failure & overheating.

PCB Bottom
PCB Bottom

The PCB tracks are generously laid out on the high current output side, but there are some primary/secondary clearance issues in a couple of places. Lindsay Wilson over at Imajeenyus.com did a pretty thorough work-up on the fineries of these PSUs, so I’ll leave most of the in-depth stuff via a linky. There’s also a modification of this PSU for a wider voltage range, which I haven’t done in this case as the existing adjustment is plenty wide enough for battery charging duty.

Bare PCB
Bare PCB

The PCB is laid out in the usual fashion for these PSUs, with the power path taking a U-route across the board. Mains input is lower left, with some filtering. Main diode bridge in the centre, with the voltage selection switch & then the main filter caps. Power is then switched into the transformer by the pair of large transistors on the right before being rectified & smoothed on the top left.

Main Switching Transistors
Main Switching Transistors

The pair of main switching devices are mounted to the casing with thermal compound & an insulating pad. To bridge the gap there’s a chunk of aluminium which also provides some extra heatsinking.

SMPS Drive IC & Base Drive Transformer
SMPS Drive IC & Base Drive Transformer

The PSU is controlled by a jelly-bean TL494 PWM controller IC. No active PFC in this cheap supply so the power factor is going to be very poor indeed.

Input Protection
Input Protection

Input protection & filtering is rather simple with the usual fuse, MOV filter capacitor & common mode choke.

Main Output Rectifiers
Main Output Rectifiers

Beefy 30A dual diodes on the DC output side, mounted in the same fashion as the main switching transistors.

Output Current Shunt
Output Current Shunt

Current measurement is done by these large wire links in the current path, selectable for different models with different output ratings.

Hot Glue Support
Hot Glue Support

The output capacitors were just floating around in the breeze, with one of them already having broken the solder joints in shipping! After reflowing the pads on all the capacitors some hot glue as flowed around them to stop any further movement.

This supply has now been in service for a couple of weeks at a constant 50% load, with the occasional hammering to recharge the battery bank after a power failure. at 13A the supply barely even gets warm, while at a load high enough to make 40A rated cable get uncomfortably warm (I didn’t manage to get a current reading, as my instruments don’t currently go high enough), the PSU was hot in the power semiconductor areas, but seemed to cope at full load perfectly well.

Posted on Leave a comment

Boating: Drydock Time – Running Gear Replacement

Progress
Progress

Things are coming along nicely with this year’s drydock operations.

Blacking - Second Coat
Blacking – Second Coat

Shes looking much better, the second coat of bitumen blacking is on, we’re going to continue at a coat a day until we’re due back in the water.

Shaft Tube Damage
Shaft Tube Damage

I’ve now removed the shaft from the stern tube to gain better access, now the full extent of the damage to the tube can be seen. There’s nothing left at all of the old bearing, which on this boat was simply a nylon bushing pressed into the end of the tube. (I knew it was crap the last time we were out, but ran out of time to get a fix done).
The stainless shaft, having lost it’s support bearing at some point, has been running on the inside of the steel tube, and has neatly chewed straight through it.

Prop Shaft
Prop Shaft

Here’s the prop shaft removed from the boat – possibly the longest shaft I’ve ever seen on a narrowboat at 6′ 2″. Unfortunately, the fact that it lost the bearing has also damaged the shaft itself, this will have to be replaced.

Prop Taper
Prop Taper

Here’s the end of the shaft that would run in the end bearing, it’s badly scored & fitting a new bearing to this shaft would cause failure very quickly. The taper on the end isn’t much better, and a loose fit in the prop has done some damage there also.

Old Prop
Old Prop

Here’s the old prop – a 16×12 that was only fitted a few years ago. This will be replaced with a new 4-blade prop, as this one is far too small for the size of the boat & installed power. Installing a larger diameter prop isn’t possible due to clearance from the swim, so I’ll have to get a more steeply pitched prop, with 4-blades for increased contact area with the water.

Posted on Leave a comment

nbTanya Louise Drive Failure

As I have posted about before, the main propulsion system onboard the boat is all hydraulic. To get the drive from the flywheel of the engine to the hydraulic pump stack, a custom drive plate was machined by Centa Transmissions over in Yorkshire, and a Centaflex A coupling was fitted to this.

Centaflex A Coupling
Centaflex A Coupling

This coupling is a big rubber doughnut, bolted to a centre hub of steel. The steel hub is splined onto the input shaft of the hydraulic pump stack.

Pump Stack
Pump Stack

The problem we’ve had is that to prevent the coupling from riding along the splines in operation, a pair of giant grub screws are provided in the side of the centre steel boss, that compress the splines to lock the device in place. These screws are a nightmare to get tightened down (the engineer from Centa who originally came to survey the system said we’d probably shear some tools off trying).

Because of this, the grub screws have loosened over the last 350-odd hours of running & this has had the effect of totally destroying the splines in the hub.

Spline Remains
Spline Remains

Here’s the backside of the centre boss, with what remains of the splines, the figure-8 shaped gap on the right is where the securing grub screws deform the steel to lock the coupling into place.

No More Splines
No More Splines

Here’s the other side of the coupling, showing the damage. The splines have effectively been totally removed, as if I’d gone in there with a boring bar on the lathe. Luckily this part isn’t too expensive to replace, and no damage was done to the input shaft of the hydraulic pump stack (Mega ££££). Quite luckily, this damage got to the point of failure while running the engine on the mooring, so it didn’t leave us stranded somewhere without motive power.

More to come when the new coupling arrives!

Posted on Leave a comment

Samsung ETA-U90UWE Adaptor Failure

Here’s an odd & sudden failure, the power adaptor for a Samsung device. It’s been working for months & on being plugged into the mains today the magic blue smoke escaped.

Samsung Charger
Samsung Charger

It’s one of their 2A models, for charging bigger devices like tablets.

Flash Burn
Flash Burn

Strangely for one of these chargers, no glue is used to hold it together – just clips. This made disassembly for inspection much easier. Evidence of a rather violent component failure is visible inside the back casing.

PCB
PCB

Here’s the charger PCB removed from the casing. As to be expected from Samsung, it’s a high quality unit, with all the features of a well designed SMPS.

PCB Reverse
PCB Reverse

However, on turning the board over, the blown component is easily visible. It’s the main SMPS controller IC, with a massive hole blown in the top. The on board fuse has also blown open, but it obviously didn’t operate fast enough to save the circuit from further damage!

 

Posted on Leave a comment

IBM Ultrastar Failure

IBM Clear Platters
IBM Clear Platters

Here is a SCSI U320 Ultrastar drive with a slight issue: the magnetic coating has been scrubbed off the substrate. I’ve never seen this happen to any other drive before.

The inside of the drive is coated with the resulting dust created from this rather epic failure mode.

Posted on Leave a comment

Tornado eCig Battery Repair

This is just a few notes on the repair of an eCig battery (1Ah Tornado).

These batteries seem to have a flaw in which they will randomly stop working, while still displaying all the normal activity of the battery.
Here is what I have found.

Control PCB
Control PCB

Here the battery has been partially disassembled, with the control circuitry exposed here at the end of the unit. All the wiring here is fine & the electronics themselves are also OK, due to the LEDs still operating as normal when the button is pushed. The 1000mAh Li-Poly cell is to the right.

Ground Wire
Ground Wire

Here the end cap has been removed from the opposite end of the battery & the problem is found: the short wire here is the GND return for the atomiser, normally connected to the negative terminal of the battery in the tube, however here it has broken off.
This is most likely due to either the cell moving inside the tube during normal operation, weakening the solder joint, or simply a bad solder job from the factory. (This lead-free ROHS bullshit is to blame).

Repaired
Repaired

Here the wire has been successfully soldered back on to the battery tab. I have also added a small dab of hot glue to hold the battery in place on the inside of the tube, & replaced the solder on the joints with real 60/40 leaded solder. £15 saved.

 

 

Posted on Leave a comment

Hot Laminator

Top
Top

Here is a cheap no brand hot laminator. This pulls the paper, inside a plastic pouch through a pair of heated rollers to seal it.

Heater
Heater

Top removed, heater assembly visible. PCB attached to the top cover holds LEDs to indicate power & ready status.

Switch
Switch
Thermostat
Thermostat

Here is the thermostat & thermal fuse, the thermostat switching the indicator on the front panel to tell the user when the unit is up to temperature. This has a self regulating thermostat. Thermal fuse inside the heat resistant tubing is to protect against any failure of the heater.

Motor
Motor

5 RPM motor that turns the rollers through a simple gear system.

Posted on Leave a comment

Hair Dryer

Housing
Housing

This is a 1500W hairdryer, death caused by thermal switch failure.

Switch
Switch

This is the switch unit. Attached are two suppression capacitors & a blocking diode. Cold switch is on right.

Heating Element
Heating Element

Heating element unit removed from housing. Coils of Nichrome wire heat the air passing through the dryer. Fan unit is on right.

Thermal Switch
Thermal Switch

Other side of the heating element unit, here can be seen the thermal switch behind the element winding. (Black square object).

Fan Motor
Fan Motor

The fan motor in this dryer is a low voltage DC unit, powered through a resistor formed by part of the heating element to drop the voltage to around 12-24v. Mounted on the back of the motor here is a rectifier assembly. Guide vanes are visible around the motor, to straighten the airflow from the fan blades.

Fan
Fan

5-blade fan forces air through the element at high speed. Designed to rotate at around 13,000RPM.

Posted on Leave a comment

Western Digital 160GB 2.5″ HDD

Top Of Drive With Label
Top

This is a Western Digital drive recently removed from my laptop when it died of a severe head crash.
Top of drive can be seen here.

Top Removed From Drive
Top Removed

Here the cover has been removed from the drive, showing the platter, head arm & magnet. Yellow piece top left is head parking ramp.

Head Arm of Drive
Head Arm

The head assembly of the drive is shown here. The head itself is on the left hand end of the arm in the plastic parking ramp. The other end of the arm holds the voice coil part of the head motor, surrounded by the magnet.

Bottom Of Drive with PCB
Bottom Of Drive with PCB

Bottom of drive, with controller PCB. SATA interface socket at bottom.

PCB removed from bottom of drive. Spindle motor connections & connections to the head unit can be seen on the bottom of the drive unit.

Controller PCB. Supports the cache, interface & motor controller ICs.

Closeup of the motor driver IC, this controls the speed of the spindle motor precisely to 5,400RPM. Also controls the voice coil motor controlling the position of the head arm on the platters.

Interface IC closeup. This IC receives signals from the head assembly & processes them for transmission to the SATA bus. Also holds drive firmware, controls the Motor driver IC & all other functions of the drive.

Cache Memory IC.