Archive for the ‘A/B Data’ Category

Is the new one better than the old one?

thumbs upSuccessful commercialization of products and services is fueled by one fundamental – making the new one better than the old one.  If the new one is better the customer experience is better, the marketing is better, the sales are better and the profits are better.

It’s not enough to know in your heart that the new one is better, there’s got to be objective evidence that demonstrates the improvement.  The only way to do that is with testing.  There are a number of types testing mechanisms, but whether it’s surveys, interviews or in-the-lab experiments, test results must be quantifiable and repeatable.

The best way I know to determine if the new one is better than the old one is to test both populations with the same test protocol done on the same test setup and measure the results (in a quantified way) using the same measurement system.  Sounds easy, but it’s not.  The biggest mistake is the confusion between the “same” test conditions and “almost the same” test conditions.  If the test protocol is slightly different there’s no way to tell if the difference between new and old is due to goodness of the new design or the badness of the test setup.  This type of uncertainty won’t cut it.

You can never be 100% sure that new one is better than the old one, but that’s were statistics come in handy.  Without getting deep into the statistics, here’s how it goes.  For both population’s test results the mean and standard deviation (spread) are calculated, and taking into consideration the sample size of the test results, the statistical test will tell you if they’re different and confidence of it’s discernment.

The statistical calculations (Student’s t-test) aren’t all that important, what’s important is to understand the implications of the calculations.  When there’s a small difference between new and old, the sample size must be large for the statistics to recognize a difference.  When the difference between populations is huge, a sample size of one will do nicely.  When the spread of the data within a population is large, the statistics need a large sample size or it can’t tell new from old. But when the data is tight, they can see more clearly and need fewer samples to see a difference.

If marketing claims are based on large sample sizes, the difference between new and old is small.  (No one uses large sample sizes unless they have to because they’re expensive.) But if in a design review for the new product the sample size is three and the statistical confidence is 95%, new is far better than old.  If the average of new is much larger than the average of old and the sample size is large yet the confidence is low, the statistics know the there’s a lot of variability within the populations. (A visual check should show the distributions to more wide than tall.)

The measurement systems used in the experiments can give a good indication of the difference between new and old.  If the measurement system is expensive and complicated, likely the difference between new and old is small.  Like with large sample sizes, the only time to use an expensive measurement system is when it is needed.  And when the difference between new and old is small, the expensive measurement system’s ability accurately and repeatably measure small differences (micrometers vs. meters).

If you need large sample sizes, expensive measurement systems and complicated statistical analyses, the new one isn’t all that different from the old one.  And when that’s the case, your new profits will be much like your old ones.  But if your naked eye can see the difference with a back-to-back comparison using a sample size of one, you’re on to something.

Image credit – amanda tipton

To make the right decision, use the right data.

wheels fall offWhen it’s time for a tough decision, it’s time to use data.  The idea is the data removes biases and opinions so the decision is grounded in the fundamentals.  But using the right data the right way takes a lot of disciple and care.

The most straightforward decision is a decision between two things – an either or – and here’s how it goes.

The first step is to agree on the test protocols and measure systems used to create the data.  To eliminate biases, this is done before any testing.  The test protocols are the actual procedural steps to run the tests and are revision controlled documents.  The measurement systems are also fully defined.  This includes the make and model of the machine/hardware, full definition of the fixtures and supporting equipment, and a measurement protocol (the steps to do the measurements).

The next step is to create the charts and graphs used to present the data. (Again, this is done before any testing.) The simplest and best is the bar chart – with one bar for A and one bar for B.  But for all formats, the axes are labeled (including units), the test protocol is referenced (with its document number and revision letter), and the title is created.  The title defines the type of test, important shared elements of the tested configurations and important input conditions.   The title helps make sure the tested configurations are the same in the ways they should be.  And to be doubly sure they’re the same, once the graph is populated with the actual test data, a small image of the tested configurations can be added next to each bar.

The configurations under test change over time, and it’s important to maintain linkage between the test data and the tested configuration.  This can be accomplished with descriptive titles and formal revision numbers of the test configurations.  When you choose design concept A over concept B but unknowingly use data from the wrong revisions it’s still a data-driven decision, it’s just wrong one.

But the most important problem to guard against is a mismatch between the tested configuration and the configuration used to create the cost estimate.  To increase profit, test results want to increase and costs wants to decrease, and this natural pressure can create divergence between the tested and costed configurations. Test results predict how the configuration under test will perform in the field.  The cost estimate predicts how much the costed configuration will cost.  Though there’s strong desire to have the performance of one configuration and the cost of another, things don’t work that way.  When you launch you’ll get the performance of AND cost of the configuration you launched.  You might as well choose the configuration to launch using performance data and cost as a matched pair.

All this detail may feel like overkill, but it’s not because the consequences of getting it wrong can decimate profitability. Here’s why:

Profit = (price – cost) x volume.

Test results predict goodness, and goodness defines what the customer will pay (price) and how many they’ll buy (volume).  And cost is cost.  And when it comes to profit, if you make the right decision with the wrong data, the wheels fall off.

Image credit – alabaster crow photographic

Experiment With Your People Systems

Battle_of_Waterloo_1815It’s pretty clear that innovation is the way to go. There’s endless creation of new technologies, new materials, and new processes so innovation can create new things to sell. And there are multiple toolsets and philosophies to get it done, but it’s difficult.

When doing new there’s no experience, no predictions, no certainty. But innovation is no dummy and has come up with a way to overcome the uncertainty. It builds knowledge of systems through testing – build it, test it, measure it, fix it. Not easy, but doable. And what makes it all possible is the repeatable response of things like steel, motors, pumps, software, hard drives. Push on them repeatably and their response is repeatable; stress them in a predictable way and their response is predictable; break them in a controlled way and the failure mode can be exercised.

Once there’s a coherent hypothesis that has the potential to make magic, innovation builds it in the lab, creates a measurement system to evaluate goodness, and tests it. After the good idea, innovation is about converting the idea into a hypothesis – a prediction of what will happen and why – and testing them early and often. And once they work every-day-all-day and make into production, the factory measures them relentlessly to make sure the goodness is shipped with every unit, and the data is religiously plotted with control charts.

The next evolution of innovation will come from systematically improving people systems. There are some roadblocks but they can be overcome. In reality, they already have been overcome it’s just that no one realizes it.

People systems are more difficult because their responses are not repeatable – where steel bends repeatably for a given stress, people do not. Give a last minute deliverable to someone in a good mood, and the work gets done; give that same deliverable to the same person on a bad day, and you get a lot of yelling. And because bad moods beget bad moods, people modify each other’s behavior. And when that non-repeatable, one-person-modifying-another response scales up to the team level, business unit, company, and supply chain, you have a complex adaptive system – a system that cannot be predicted. But just as innovation of airliners and automobiles uses testing to build knowledge out of uncertainty, testing can do the same for people systems.

To start, assumptions about how people systems would respond to new input must be hardened into formal hypotheses. And for the killer hypotheses that hang together, an experiment is defined; a small target population is identified; a measurement system created; a baseline measurement is taken; and the experiment is run. Data is then collected, statistical analyses are made, and it’s clear if the hypothesis is validated or not. If validated, the solution is rolled out and the people system is improved. And in a control chart sense, the measurement system is transferred to the whole system and is left to run continuously to make sure the goodness doesn’t go away. If it’s invalidated, another hypothesis is generated and the process is repeated. (It’s actually better to test multiple hypotheses in parallel.)

In the past, this approach was impossible because the measurement system did not exist. What was needed was a simple, mobile data acquisition system for “people data”, a method to automatically index the data, and a method to quickly process and display the results. The experimental methods were clear, but there was no response for the experiments. Now there is.

People systems are governed by what people think and feel, and the stories they tell are the surrogates for their thoughts and feelings. When an experiment is conducted on a people system, the stories are the “people data” that is collected, quantified, and analyzed. The stories are the response to the experiment.

It is now possible to run an experiment where a sample population uses a smart phone and an app to collect stories (text, voice, pictures), index them, and automatically send them to a server where some software groups the stories and displays them in a way to see patterns (groups of commonly indexed stories). All this is done in real time. And, by clicking on a data point, the program brings up the story associated with that data point.

Here’s how it works. The app is loaded, people tell their stories on their phone, and a baseline is established (a baseline story pattern). Inputs or constraints are changed for the target population and new stories are collected. If the patterns change in a desirable way (statistical analysis is possible), the new inputs and constraints are rolled out. If the stories change in an undesirable way, the target population reverts back to standard conditions and the next hypothesis is tested.

Unbiased, real time, continuous information streams to make sense of your people systems is now possible. Real time, direct connection to your employees and your customers is a reality, and the implications are staggering.

Thank you Dave Snowden.

Product Thinking


Product costs, without product thinking, drop 2% per year. With product thinking, product costs fall by 50%, and while your competitors’ profit margins drift downward, yours are too high to track by conventional methods. And your company is known for unending increases in stock price and long term investment in all the things that secure the future.

The supply chain, without product thinking, improves 3% per year. With product thinking, longest lead processes are eliminated, poorest yield processes are a thing of the past, problem suppliers are gone, and your distributers associate your brand with uninterrupted supply and on time delivery.

Product robustness, without product thinking, is the same year-on-year. Re-injecting long forgotten product thinking to simplify the product, product robustness jumps to unattainable levels and warranty costs plummet. And your brand is known for products that simply don’t break.

Rolled throughput yield is stalled at 90%. With product thinking, the product is simplified, opportunities for defects are reduced, and throughput skyrockets due to improved RTY. And your brand is known as a good value – providing good, repeatable functionality at a good price.

Lean, without product thinking has delivered wonderful results, but the low hanging fruit is gone and lean is moving into the back office. With product thinking, the design is changed and value-added work is eliminated along with its associated non-value added work (which is about 8 times bigger); manufacturing monuments with their long changeover times are ripped out and sold to your competitors; work from two factories is consolidated into one; new work is taken on to fill the emptied factories; and profit per square foot triples. And your brand is known for best-in-class quality, unbeatable on time delivery, world class performance, and pioneering the next generation of lean.

The sales argument is low price and good payment terms. With product thinking, the argument starts with product performance and ends with product reliability. The sales team is energized, and your brand is linked with solid products that just plain work.

The marketing approach is stickers and new packaging. With product thinking, it’s based on competitive advantage explained in terms of head-to-head performance data and a richer feature set. And your brand stands for winning technology and killer products.

Product thinking isn’t for everyone. But for those that try – your brand will thank you.

Improving Product Robustness 101

Improving product robustness is straightforward and difficult. Here’s how to do it.

Identify specific failure modes, prioritize them, and go after the biggest ones first. Failure modes can be identified through multiple sources. Warranty data is sometimes coded by failure mode (more precisely, symptom type), so start there. The number one failure mode in this type of data is typically “no problem found”, so be ready for it. Analysis of the actual products that come back is another good way. Returned product is routed to the appropriate engineer who analyzes it and enters the failure mode into a database. A formal design FMEA generates a list of prioritized failure modes through the risk priority number (RPN), where larger is more important. To do this, engineers are hauled into a room and a facilitator helps them come up with potential failure modes. One caution – the process can generate many failure modes, more than you can fix, so make the top five or ten go away and don’t argue the bottom fifty. It makes no sense to even talk about number eleven if you haven’t fixed the top ten. But the best way I have found to identify failure modes (problems) that are meaningful to the customer is to ask the technical services group for their top five things to fix. They will give you the right answer because they interact daily with customers who have broken product. They won’t expect you to listen to them (you never listened before), so surprise them by fixing one or two on their list. They will be grateful you listened (they’ll likely want to buy you coffee for the rest of your career) and your customers will notice.

Once failure modes are identified, define the physics of failure – why the product breaks. This is tough work and requires focused thought and analysis. If, when you break the product, it “looks like” the ones coming back from the field, you have defined the physics of failure. This is the same thing as replicating the problem in the lab. Once that’s defined, create an automated test rig or experimental setup that breaks the product in a way that captures the physics of failure. I call this test rig a robustness surrogate because it stands in for the actual failure mode seen in the field. The robustness surrogate should break the product as fast as possible while retaining the physics of failure so you can break it and fix it many times before product launch. The robustness surrogate should be designed to break the product within minutes, not hours or days – the faster the better.

To know if product robustness is improved, the baseline (or existing) design is broken on the robustness surrogate. The new design must survive longer on the robustness surrogate than the baseline design. The result is A/B data (baseline design/ new design) that is presented at the design review using a simple bar graph format which I call big-bar-little-bar. Keep improving robustness of the new design even if it outperforms the baseline design by a factor of ten – that’s not good enough for your customers.

Don’t stop improving robustness until you run out of time, and don’t stop if you meet the arbitrary MTBF specification. Customers like improved robustness, and in this case too much of a good thing is wonderful.

Using this method, I reduced warranty cost per unit by 75% over a five year period. It worked.

It’s a tough time to be a CEO

2009 is a tough year, especially for CEOs.

CEOs have a strong desire to do what it takes to deliver shareholder value, but that’s coupled with a deep concern that tough decisions may dismantle the company in the process.

Here is the state-of-affairs:

Sales are down and money is tight.  There is severe pressure to cut costs including those that are linked to sales – marketing budgets, sales budgets, travel – and things that directly impact customers – technical service, product manuals, translations, and warranty.

Pricing pressure is staggering.  Customers are exerting their buying power – since so few are buying they want to name their price (and can).  Suppliers, especially the big ones, are using their muscle to raise prices.

Capacity utilization is ultra-low, so the bounce-back of new equipment sales is a long way off.

Everyone wants to expand into new markets to increase sales, but this is a particularly daunting task with competitors hunkering down to retain market share, cuts in sales and marketing budgets, and hobbled product development engines.

There is a desire to improve factory efficiency to cut costs (rather than to increase throughput like in 2008), but no one wants to spend money to make money – payback must be measured in milliseconds.

So what’s a CEO to do?  Read the rest of this entry »

Six Lessons Learned from a Successful Design For Assembly Program

Six Lessons Learned DFA paper for May 2006 DFMA Forum.pdf  (8 pages)

Each company works with Design for Assembly (DFA) methods for different reasons.  Some companies want to take cost out of their products, some want to make more products in their factories, and some want to simplify the product to increase quality and reliability.  In a growing market a company wants to reduce labor content to get more products through the factory to meet demand without adding assembly workers.  And, in a growing market a company wants to reduce the required floor space required to meet demand without building another factory.  Remarkably, the goals are similar for Read the rest of this entry »

Mike Shipulski Mike Shipulski
Subscribe via Email

Enter your email address:

Delivered by FeedBurner