First I like to tell you about myself and from what perspective I write this article. My world is process industries such as Iron, Steel, Pulp, Power and Wood based industries such as Oriented Strand Board (OSB), medium Density Fiber (MDF), Chemical, Oil and Gas, Food and Beverage etc.
In short all industries where a break down of critical equipment assets results in risk for environmental damage, personal injury, lost quality and volume in throughput or high costs for maintenance.
I lose some of my important arguments if reliability is not important because then the maintenance organization has no “revenue”.
If reliability is not important the sense of urgency and importance in the work we do, as a maintenance organization is not there.
International phenomena
I work on a worldwide arena and observe the same problems, or improvement opportunities, in all countries and all type of industries.
If you have worked as a reliability and maintenance professional in many industries and/or countries you also know that this is true. If you only worked in one plant you believe that you are unique and different than all other plants but that is very seldom the case.
The reason why maintenance management is so similar between different types of industries and facilities lies in a couple of facts.
- Equipment does not break down, components such as; gears, couplings, control valves, transducers, seals and bearings break down. The whole equipment e.g. a compressor does not break down.These components are the same with some variations in all industrial plants. The environment they operate in is different but if an electric motor is covered by chocolate, saw dust or pulp the consequence is the same: it will overheat and shorten electric life dramatically. Some plants have a more aggressive corrosive atmosphere but again the consequences of corrosion are the same.
- Reliability and maintenance management is driven by the system and processes people work in, not by the physical assets the organization maintain.
There are some differences that make implementation and execution of best reliability and maintenance practices more or less difficult.
These are more cultural differences and it is important to know and understand these.
They include but are not limited to:
- Political Systems make a difference in e.g. how profits are calculated.
- Taxation rules make a difference in how life cycle costs are calculated.
- Living standards are different between countries. In many countries with high living standard I often find a culture of entitlements and complacency and less of a desire to improve than in other countries where people are eager to learn and improve their performance.
- Labor laws, working hours and employee benefits and unionization are very different between countries.
- Some industrial plants have many short and long shut downs others have no scheduled shut downs. This fact only changes the way you plan and schedule work that requires equipment to be down to do work. If your plant has scheduled shutdowns you must plan work before you schedule work to be efficient. If your plant has no scheduled shut downs, or if it is easy to shut down and start up again, then you should focus more on planning and execute planned work when the opportunity to access physical asset can be done safely at best opportunity from manufacturing point of view.
But the system, processes and practices used to manage reliability and maintenance are not different. Nor have they changed in the last fifty or more years.
What has changed and improved dramatically is technology including much better and more affordable computer systems and tools for condition monitoring.
We have much better and more affordable equipment for measurements and analysis of component condition such as:
- Infrared cameras.
- Wear Particle Analysis.
- Vibration Analysis.
- Acoustic Emission
- Alignment of components.
- Stroboscopes.
- Ultrasonic methodologies.
- Etc.
Reliability Maintenance: Do the Basics Better and Better
My advice is to never forget to improve execution of the basics of maintenance. This was true 50 years ago and it is still true.
Too often we complicate things beyond what is necessary. In the field of reliability and maintenance many tend to give new names on what in the end anyway comes down to the basics.
TPM, RBM, VDM, QCC, RCM and many other acronyms only lead to confusion in the message you need to send to your employees in the maintenance organization. They will start talking about the “Program of the month” and lose faith in you as a leader.
Anyone who have attended any conference including conferences covering the subject of Reliability and Maintenance have heard several speakers referring to Albert Einstein’s definition of insanity:
“To do the same thing over and over and expecting different results”
This hold true if you do the wrong thing. However if you do the right things better and better over a long period of time you will generate substantial results.
I know from a very long experience in industry that guaranteed results will be achieved by executing the basics of reliability and maintenance (the right things) better and better forever.
The basics are perhaps not as glorious to talk about as many would like them to be but I find them very interesting and challenging because I am still interested in people and equipment and the fantastic results that can be achieved when an organization execute these well.
Results can even be life changing for some people.
Organizations today are spending way too much time on other more complex initiatives and therefore forget where the true improvement potential lies.
The basic elements of reliability and maintenance are:
- Maintenance Prevention
- Inspect
- Prioritize
- Plan work
- Schedule work
- Execute work
If you do not Execute these things very well you will never have time to do what you know you need to do to become as reliable and low cost as you can be.
I stress the word Execute because most organizations know what they need to do.
So many strategies and improvement plans are developed and so little Execution of the very basic elements of reliability maintenance occur.
These next steps are:
- Root Cause Problem Elimination
- Apply Life Cycle Cost when specifying equipment
- Design for Reliability and Maintainability in early equipment design
- Use tools such as 5S, Single Minute Exchange of Dice (SMED), Reliability Centered Maintenance methodology (RCM), to enhance performance of work within the processes that build the whole reliability and maintenance system
A holistic overview of the reliability and maintenance management system, processes, elements and tools can be described in the models per figure 1 and figure 2.
The structures of system, processes and elements described above are what we call Current Best Practices (CBP) for reliability and maintenance.
If you do an audit it is on the level of elements that you evaluate and discover improvement potential and the gap between how good your organization can become.
A good advice is to only focus on the right things to do and not discuss how you can do these things. That comes as the next step.
The reason for this is that the first step must be to agree on the right things to do. Because they are all common sense your organization will agree.
They might not agree to how you are going to implement these things.
As a leader you must show what your beliefs are and give your organization a direction that is what you do here. Then you bring your organization with you to help execute your strategy.
You can say that the well described 245 elements comprises a very well documented reliability and maintenance strategy and if this strategy is not executed you have wasted money and time to develop it.
Figure 3 shows what we often find in many organizations.
So what are The Basics?
The most important and essential elements of the basics are listed above and include in more detail:
1. Maintenance prevention
With maintenance prevention we include everything you do to prevent problems from occurring.
We like to use the term “Problems” because the context of total manufacturing reliability includes the functions of Engineering, Maintenance, Operations and Storeroom (Spare Parts and Material for Equipment) support.
If we use the term “Failures” instead of problems we often focus our thoughts on equipment and maintenance issues and we mentally exclude operational and other issues such as raw material variances and changes in how a production line is operated.
Here we assume your plant is in an operational phase and not in a position to procure new equipment. Instead you have to do better with what you already have.
1.1 Cleaning is an important element.
Here we do not talk about general housekeeping but detailed cleaning of equipment and components. When detailed cleaning is done you cannot avoid but also doing visual inspections.
When you clean you also inspect. When equipment is clean it is easier to see abnormal conditions such as loose fasteners and leaks.
Another benefit is longer life of for example electric motors.
It does not take much contamination on an electric motor to increase temperature in windings and rotor by 10 C or 18 F. a 10 C increase in temperature will shorten electric motor life by 50%.
For the same reason you should be careful not to paint motors with layers of paint than necessary. Another benefit of this basic element is that electric motors will pull less energy the cleaner and cooler they are.
1.2 Lubrication and contamination control.
Even though awareness in this area increases it is more common than not to find very poor practices.
Precision lubrication, which includes right lubricant in the right volume at the right time, is an absolute key to achieve better reliability and lower costs.
Lubricators must be trained execute lubrication in a well documented process that describes lubricant, volume and frequency in an optimally laid out route and in work orders for shut down oil changes and lubrication that cannot be done safely when equipment is operating.
Filtration of lubricants has to be done to adequate standards e.g. down to 4 microns for many oils and central lubrication systems.
Modern tools should be used to measure that the right volume is reaching the lubricated object. To control contamination it is vital that lubricants are stored in a professional way.
Figure 4 shows a world-class storage and contamination control of lubricants.
1.3 Alignment
This another important element of the basic elements that prevent problems from occurring. Alignment should be done when equipment is in operating temperature or with compensation for thermal growth.
Jacking bolts should be installed to make precision alignment possible. (You cannot align to 0.001 of an inch or 0.0254 of a millimeter with a sledgehammer).
More than three shims should not be used as more than that can cause a soft foot. Today most plants use laser alignment tools that make it easier to align and also keep track on alignments that have been done.
Alignment with precision does not only prevent problems of the aligned component such as sprockets, sheaves and couplings. Precision alignments also prolong life and prevent problems with bearings, mechanical seals, chains, belts etc.
Another benefit is reduced energy consumption for electric motor drives. A misaligned coupling increases temperature in both coupling and bearings significantly.
A brief and fast check of alignment can be done using a handheld basic infrared thermometer. Increases in temperature of couplings, V-belts and chains indicate misalignment.
1.4 Balancing
The balancing of components such as an assembly of shaft and impeller for a pump, electric motors, rolls and other rotating equipment also prevents problems from occurring.
Balancing of rotating equipment prolong life of components and prevent problems from occurring. Vibration measurement should be part of quality control for any rebuild of these components.
1.5 Operating practices
Operating practices are often a forgotten part of maintenance and problem prevention. It is common that over 50% of equipment failures and breakdowns are caused by poor operating practices.
This is because operators are seldom trained in the function of the equipment they operate and what impact wrong startups and shutdowns have on components. Nor have operators been properly trained in how to inspect components.
Let me emphasize this with some examples on questions I often get from operators:
- Why can I not heat up the steam system faster after a shutdown?
- I have been told to not let cold water come in contact with the drier cans when they are hot. Why is that?
- Why should I not try to start up electric motors too frequently?
- Why do we need to run redundant equipment equal hours?
Etc.
I know it is important that people are trained not only in “How” but also in “Why”. We call the training we do in equipment care for operators and others “Know Why” training.
Explain to the operators that a steam system must start up slowly for example to avoid water hammer and consequences from too rapid thermal expansion.
In a cold steam system steam will condensate and steam traps must have time to trap the condensate and discharge condensate from the system.
If too much condensate is built up in the system it can fill up a pipe to form a “water plug” which travels through the system with 85 – 90 miles per hour or 135 – 150 kilometer per hour.
When this “plug” hits a pipe elbow it can damage the pipe. If the system provides rotating dryer cans with steam for heating, the steam inlet is through a bearing journal shaft.
If system is heated too fast this journal heats up and expands faster than inner race of bearing and this can lead to that the inner race of the bearing cracks.
Cold water on a dryer can or other hot object can cause deformation and/or cracks because of uneven shrinkage cause by thermal shrinkage.
When an electric motor is frequently started the consequence is that windings might burn. This is because when starting up an electric motor, the Amperage ( I ) spikes by the square. Q = Heat, R = Resistance (Joules law Q = I2 X R).
Many plants have redundant equipment for critical steps in production. For example duplicate lubrication pumps for central lubrication.
It is necessary to operate these pumps equal amount of time. Mark redundant equipment A and B and then make sure operators shift to run only A equipment and then B equipment.
This will prevent moisture build up in electric motors and bearings to be destroyed from brinelling caused by vibrations when bearing rolling elements are in same position. Packing material in glands will dry up and leak when pump is started after being idle for long time.
All of the above are examples of the basics of what we call Maintenance Prevention.
2. Early identification of work.
This part of the basics is very critical and if not done well, it is one of the major reasons why many maintenance organizations are reactive and very inefficient. It includes:
- Disciplined and right priorities on requested work.
- Condition Monitoring.
2.1 Disciplined and right priorities.
One of the two top reasons as to why maintenance work is not planned before the work is scheduled and then executed is that priorities are too emotional and not based on importance for the business. I have reviewed many backlogs in maintenance organizations all over the world and often find that the majority of work in the backlogs have been assigned priority 1; and some of the priority 1 work requests are over two years old! Two common reasons for these phenomena are:
- The maintenance organization is viewed as a service provider to operations.
- The requesters of maintenance work do not trust work will be done unless they assign priority 1 to the work request.
If your maintenance organization is viewed as a service provider you want to provide good service and this often leads to that you obey to requests from operations.
This view must change to a working relationship where the maintenance organization is viewed as an equal partner with operations.
The role of maintenance is to deliver manufacturing Equipment Reliability and Operations deliver manufacturing Process Reliability.
If your common goal is to improve manufacturing reliability and roles between partners are clearly defined and adhered to you have laid out the foundation for success.
As one of the first steps in creating this partnership you should together agree to criteria for deciding priorities of maintenance work.
Do not make this too complicated. I have seen 19 pages long documents used as a guideline for assigning priorities on maintenance work and it is obvious that will not work.
In my opinion there are only two priorities: Do the work now or at what date it has to be completed; Very simple but it works because people understand the logic. The overall criteria for setting priorities should include risk for:
- Environmental or personal injury.
- High reliability cost for Quality, Time or Speed losses.
- High cost for maintenance repairs.
To get examples of priority guidelines please email [email protected]
Remember that the discussions you have between operations and maintenance to arrive to the agreed upon priority guideline is important because this is one step of many you do to build the operations – maintenance reliability culture.
2.2 Condition Monitoring
I like to use the term condition monitoring because the term Predictive Maintenance excludes the very important part of basic inspections that includes See, Listen, Smell, And Touch.
When I here use the term Condition Monitoring I include all tasks you do to discover problems early. E.g.
- Basic objective inspections.
- Basic Subjective inspections
- Vibration Analysis, Infrared measurements, Wear Particle Analysis, Ultrasonic material testing, Acoustic emission testing and other methods.
In several studies we have found that most problems are detected through basic inspections.
The example below demonstrates this. Figure 5.
The following is an example of a basic inspection of a heat exchanger:
Many Preventive Maintenance inspection programs might describe the inspection of a cooler for a hydraulic unit as “Inspect Cooler” without any further explanation.
I have used this example in numerous plants and most mechanics and operators admit they have no clue what to look for more than the obvious such as leaks and looseness and perhaps temperature of the cooled outgoing media.
First you need to explain how the cooler, Figure 6 works.
That can be done with a simple sketch as in the example below Figure 7.
The function and components of the blue cooler, Figure 6, is described in the diagram, Figure 7.
Most important is that the outgoing temperature of the cooled hydraulic fluid does not exceed maximum allowed temperature and system shall not start to operate before the hydraulic fluid has reached a minimum temperature.
If a temperature gauge can be mounted on outgoing hydraulic fluid and marked with lower and upper temperature limits it is good.
A handheld infrared thermometer can also be used. If the person doing the inspection is taught how the system works it is easy for him/her to understand that it is important to track the position of the control valve.
If the control valve is fully, or almost fully open it is time to report this condition so planning and then scheduling of replacement or cleaning of cooler can be done before the system overheats.
Explain that the consequence of operating the system above the maximum temperature will lead to a break down. This is because components such as packings in cylinders and valves will deteriorate fast at high temperature.
This will lead to internal leaks in system, which in turn will generate more heat, faster deterioration, and then the system function ceases.
The sacrificing anode is made of a short bolt with an inside ½ inch (12.7 mm) hole in which a rod made of zinc. The zinc rod will corrode before any other material in cooler, thus protecting corrosion of material in cooler.
In this example it is designed in such a way that no one can see if it is gone or not. It shall be made in one piece of zinc, and then a small weeping hole is drilled about 1.5 inches (38.1 mm) into it.
When the zinc rod is corroded to this point it will show as a small leak and replacement can be planned and then scheduled before any damage is done to the cooler.
The above are examples on basic inspections and “Know Why” training. When done this way, not only will problems be discovered early, the inspection is also meaningful and more interesting to do.
Inspections with the right method reveal latent problems at an early stage and this provides the necessary lead-time needed to plan and then schedule work before execution of corrective action to avoid breakdowns.
The link Early Discovery of a Problem – Plan Corrective action – Schedule Corrective Action – Execute Corrective action is a vital foundation for any maintenance organization.
It is often referred to as Condition Based Maintenance. (CBM).
Inspections do not prevent anything at all unless the problems discovered during inspections are corrected before breakdowns occur.
Planning and Scheduling of Work
Even with good skills people cannot be more efficient than the system they work in allows them to be. To design, document, repeatedly communicate, and reinforce the execution of the system is a leadership obligation.
When work is properly planned and after that scheduled and executed accordingly employee productivity will increase significantly and reliability will increase. This will result in faster product throughput and lower costs
Planning and Scheduling of work
It is important to understand the difference between planning and scheduling. These two elements of maintenance management are essential and are very often mixed up.
Most organizations, where scheduled shutdowns of the manufacturing process are common, plan and schedule work quite good because there is a consequence if they do not.
Planning and scheduling of weekly/daily On-The-Run work is often very poor. Perhaps this is because of more lax expectations on performance than during a shutdown?
The short definitions used here are:
- Planning of work = Deciding What, How and Time to do work.
- Scheduling of work = Deciding When and by Whom work will be done.
Planning of work is to prepare everything needed to do the work.
E.g. Scope and description of work, any safety requirements, tools, parts and material, documentation, need for scaffolding, skills required, shut down required or can be done without interference with production etc.
Scheduling of work is to first decide when job shall be done by date/time and who will do the work.
A best practice is to plan work before work is scheduled for execution and to schedule to the work that need to be done and then schedule people to the work.
All work can be planned but all work cannot be scheduled.
To plan work is the easy part if you have dedicated people who are allowed to focus on planning.
Even correction of a break down can in theory be planned because you know it can, and most probably will happen, but you cannot schedule all work because you do not always know when the break down will occur.
Most breakdowns can be prevented but all failures cannot be prevented. This is because all failures do not have a long enough failure-developing period.
The failure-developing period is the period in time that lapses from the point in time you discovered a failure until the break down occurs.
If this time is too short the failure will develop into a breakdown before the corrective action can be planned.
This is common for electronic components. Before problems in systems with electronic equipment can be corrected troubleshooting has to be done.
Breakdowns can still be prevented with redundant components.
Work Management Process
It is necessary to document and reinforce the process defining how work is managed. If this is not done you will surely end up in the “Circle of Despair” (Figure 1. Part 1).
My intention in this article is to discuss the very basics and an overview, not a complete article about planning and scheduling. The essential steps in a work management process include:
Front Line Management
Execution of the work management process has to occur with the front line organization.
It is at this level of the organization results will be delivered or not delivered. The front line organization consists the following functions.
In bigger organizations each of these functions are full time employees. In smaller organizations employees have to do all or some of these functions:
Justification for planners
I have worked with many plants where they have no planners because the maintenance organization said they needed them but was not able to justify planner(s) position(s).
I like to offer some ideas on how we successfully helped maintenance organizations justify more efficient planning with planners.
With or without planners somebody always does planning of work, otherwise the work could not be done. In an organization without planners the following is a typical situation:
(Working hours 07:00 – 15:30)
• 07:00 – 07:30 Crew arrives and meet with supervisor.
• 07:30 All have been assigned what to do today. (E.g. “Pump 20-439 does
not pump”)
• 07:30 – 08:45 Two mechanics troubleshoot and find that bearing, seal and impeller
unit must be changed.
• 8:45 – 09:00 get rigging tools.
• 09:00 – 09:15 Morning break.
• 09:15 – 10:30 Finding parts.
• 10:30 – 11:30 Arrange rigging.
• 11:30 – 12:00 Lunch break.
• 12:00 – 14:00 Disassemble bearing, seal and impeller unit.
• 14:00 – 15:30 Impeller too big. Machine down to right diameter.
• 15:30 – 17:00 Install, test and start pump.
In summary, the scope of work had to be decided by the mechanics, tools, parts, rigging etc. had also to be decided by mechanics, adjustment of impeller was also decided by mechanics.
All of this is PLANNING.
The inefficiency in this example lies in that planning was done after scheduling and it must be done the other way around to enable people to be efficient.
The other scenario is that the problem with the pump was discovered during an established inspection route a couple of weeks before the problem must be corrected.
A planner could then plan the job efficiently. It would take the planner about two hours to prepare all needed for work, arrange for pump impeller to be adjusted etc.
The store would stage and deliver parts in advance. The mechanics would then do the work in a safe and organized way in about five maintenance hours instead of about 20 hours including overtime as in the example above.
We have done hundreds of evaluations of maintenance organizations all over the world and found that without organized inspections and planning followed by scheduling of work crafts people spend 40 – 60 % of their time on “planning activities” as given in the example above.
Number of Crafts people | % of time they “plan” | Total “planning” hours/day | Target Hours/day | Freed up Time. Hours/day |
60 | 50 | 240 | 50 | 190 |
In this example the maintenance organization is very reactive and crafts people are put in a situation where they have to “plan” to get the work done.
The implementation of basic inspections will change the situation so that a planner can plan before work is scheduled by a supervisor and executed by crafts people.
The target is to get down to about 10% urgent work where the situation described in the scenario above would still be repeated. That would free up 190 hours/day from crafts people’s time.
To be efficient in work management this organization would need about three to four planners (24 – 32 hours/day). This would enable crafts people to free up 158 – 166 hours/day.
The number of planners needed is very dependent on disciplined priorities of work, access to an updated and accurate bill of materials and close cooperation with operations.
Roles of Front Line Management
Some of the most common questions I get from organizations all over the world include:
• Do we need leaders in the frontline?
• Do we need planners?
• How many planners do we need?
• How many frontline leaders do we need?
• Do we need Operations – Maintenance Coordinators ?
• How should we decide the roles of planners and frontline leaders?
These are the same questions I received when I started in industry many years ago and it is still today one of the first issues that need to be clarified when we help organizations improve reliability and maintenance performance.
Over all these years organizations have tried everything from combined roles, centralized planners, self-directed work teams, autonomous maintenance, no planners, no frontline leaders and so on.
All these experimental attempts I have seen over all these years has reverted back to the fact that leaders and planning and scheduling are absolutely necessary to provide a safe working environment and efficient work execution.
In smaller organizations of up to about to eight maintenance craftspeople the roles of planner and frontline leader is by necessity often combined, but someone still has to do these functions. In larger organizations I know you need all of the above roles as positions to be efficient.
How many planners and frontline leaders?
To decide how many planners and frontline leaders an organization needs is not a simple answer based on ratio of planners to craftspeople and frontline leader to craftspeople.
There are a number of factors needed to give the right answer including:
• How the role of a planner is defined.
• How the role of a frontline leader is defined.
• Quality and access to support systems such as a complete Bill of Materials.
• Skill level and participation in planning by crafts people.
• Implemented and disciplined use of processes for maintenance.
There are other circumstances that impact crew sizes per planner and frontline leaders such as the size of the physical area they manage.
One frontline leader managing a central workshop might handle 25 to 30 people. In a very spread out manufacturing area the frontline leader can handle much fewer crafts people.
The function of Maintenance – Operations Coordinator will also enable the frontline roles to become more efficient.
For examples on Role Descriptions please contact [email protected]