For Engineer By Engineer

  • Monday 26 April 2021

    Design Of Experiments (DOE) in Minitab

    Hii all ....!! 
    Hope everyone is safe and healthy.

    Previously i've shared a post on DOE on minitab and how to perform it, but it is quite in a simple way without much explanations.

    Today, this post is going to be a full length post which will mostly cover all aspects of DOE.


    This post might get heavier for those who have no previous experience, hence to gain some, visit: Design of Experiments Using Minitab

    In Six Sigma, we have two ways of implementation which depends on the scenario i.e., DMAIC & DMADV.


    Let's say for an example, we have taken a process which is designed in a laboratory and scaled-up in plant for meeting commercial orders. The theoretical yield that can be obtained is 750 Kg for 500 Kg input and post completion of design at lab scale the yield is standardized as 600 Kg with a acceptable variation of ~25 Kg on both sides. The remaining yield i.e., ~150 Kg (Theoretical yield - standardized yield) can be considered as losses which might be attributed by the limited conversion during reaction, losses of product due to partial distribution into spent workup layers, slight solubility of product in the solvent used in isolation.

    Post completion of scale-up, the commercial yield is at a level of 550 Kg i.e., 50 Kg less than that of the lab standard, with some variations with about ~50 Kg on both sides which are attributed by some common cause variations.

    Now, the actual scenario begins,

    Our supply chain team has fore-casted an order which could be a future requirement. Based on the requirement and product time-cycles we came to be a conclusion that if we need to meet the supply demand then we need an average output of ~590 Kg / batch.

    In this scenario, as per the lab design and standard yield (at lab) the maximum yield that can be obtained is 600 Kg (with some allowable variation) and the target to meet the requirement is 590 Kg/batch, then we can consider that there is a scope for improvement and as a regular practice we can have some kaizens implementation / PDCA, but to be more precise and to show a better commitment towards towards meeting the requirement we can implement a green belt six sigma project (i.e., DMAIC approach) to improve the output.

    In an alternate scenario, lets say we need to get a output of ~650 Kg / batch to meet the future market demand, we can improve the process performance by proposing some kaizens but the point is the process is designed for a standard yield of ~600 Kg and that could be a limitation and to further increase the output, it requires a design change (i.e., DMADV approach). That is where mostly we implement DOE to further refine/optimize or to extend the design for meeting the demand.

    So basically there is a insight about the difference in implementing the DMADV vs DMAIC approach to many that DMAIC shall be applicable to existing process and DMADV shall be applicable to new development, but the above scenario is an exemption to their insight.

    Let's get into our topic i.e., performing DOE, but before that there is a need to understand the basics like Factorial, levels, factors, response etc. I'll explain these in manufacturing terms which will make easy for our pharma guys to understand.


    What is a factor ?

    A factor is a parameter which might / might not have a significant impact on the output which is in our scope of study.

    What is response ?

    The name itself is self explanatory, response is nothing but the output of the run / experiment.

    What is a level ?

    Level is the count of factor values that we need to study, simply lets say we have to study the impact of temperature at 0 to 5 C, then the level is 2. If we need to study at 0 C, 2.5 C, 5 C, then the level is 3.

    What is Factorial design ? 

    Factorial design is a tool which helps in studying the effect of factors and the interaction of factors on the output.

    How to calculate the no. of experiments required for study ?

    The total no. of experiments required for study shall be calculated as Level no. to the power of factor no.
    Lets say, we have 3 factors (A, B, C) and 2 levels (1, 2), 
    then the no. of experiments shall be 2^3 = 8.
    The experiments shall be 
    A B C
    1 2 2
    1 2 1
    1 1 2
    2 1 1
    2 2 1
    1 1 1
    2 2 2
    2 1 2
    So, this is going to be full factorial.

    What is half factorial design ?

    Half factorial design includes only impact of the main factors and it doesn't bother about the impact of interactions between factors. Half factorial design (1/2 th fraction) experiment no. shall be calculated as Level no. (L) to the power of factor no. (F) - 1 i.e., L ^ (F - 1).

    Similarly for 1/4 th fraction, the no. of experiment's shall be L ^ (F - 2).
    & for 1/8 th fraction, the no. of experiments shall be L ^ (F - 3),
    & for 1/16 th fraction, the no. of experiments shall be L ^ (F - 4).

    1/4 th, 1/8 th, 1/16 th, 1/32 th fraction designs are simply called as Fractional Factorial Design.


    What is resolution ?

    Resolution indicates the degree of factorial. In Minitab DOE, the levels is considered as 2 by default, unless you proceed with "General Full Factorial Design". Below is the screen depicting the resolutions:
    Lets say, for a 2 factor experiment the total no. of experiments shall be 2^2 = 4.
    for a 3 factor design, the total no. of experiments shall be 2^3 = 8 and if we want to reduce it to half factorial design i.e., 1/2 th fraction, it shall be 2 ^(3-1) = 4, which is called as resolution 3.

    Similarly, with increase in factors there will be increase in resolutions, which indicates we are reducing the study of interactions.

    As the resolution count is low, the risk will increase because we are not studying all the interactions and this is applicable only when we are high on confidence about the interaction.

    What are Replicates ?

    Replicates itself indicates those are copy of the experiments which are previously performed for the same factor levels. So, then a doubt might strike through your mind that "Why to replicate and what's the necessity of replication?".

    Why to replicate and what's the necessity of replication?

    Replicates help us in identifying the variability and deriving smooth conclusions.

    What are Blocks in DOE ?

    Blocking is a technique which help in reducing the effect (i.e., bias and variance) due to nuisance factors, by separating the factors based on interest.

    What are nuisance factors ?

    Nuisance factors are those which have an impact on the response but not of primary interest.

    What are Center Points ?

    The term itself indicates that center point will take a value of levels mid point. Lets say i've taken two factors and the factors are Dose & pH, where the levels of these factors are between 2 and 10, that means the center point would be 6 and 6.

    But please note that the no. of replicates are applicable only to the levels mentioned and not for the center points, until or unless we have selected blocks more than 1.

    I've done a full time project during B. Tech final semester. As i'm providing it here, i have mentioned that "i've done it", but actually we had a team of 5 members and the project is "Industrial waste treatment (parameters we have taken is turbidity and COD) using Response Surface Methodology (Minitab)".

    Being a team member i've dealt with lab experiments along with other members and our team leader (Battula Amritha) has taken the responsibility of performing DOE in Minitab. Being frank i was quite uncomfortable with DOE during the project time, but after started working with Dr. Reddy's Labs got the significance of it and i'm grateful to our guides / Mentors "Ms. Kalyani Gaddam & Dr. Shisir Kumar Behera".

    Study Example

    Lets start our show,

    Lets jump into topic i.e., creating a factorial design for identifying the best set of parameters to get optimum output. And begin the design for a reaction where the factors are Reagent mole Equivalent, Temperature and Dosing time.

    As per the proof of concept studies (i.e., POC), to further finetune the process DOE study is proposed and the levels of the factors are as below:

    Reagent mole Eq.: 1 to 5,
    Temperature : 20 to 80 ℃,
    Dosing time: 2 to 10 hours.


    So, now there are three factors and the levels are 2 for each factor.
    The number of experiments shall be 2 ^ 3 = 8 experiments.

    Lets have one center point for these factors to evaluate the performance in-depth i.e., if we don't have center point the performance of the factors at the mid can't be understood.

    So the no. of experiments would be = 8 (full factorial) + 1 center point = 9 experiments.

    And to have better understanding of the bias / variability, lets have replicates and i would prefer 3 replicates each.  So the number of experiments would be = 8 x 3 + 1 = 25.
    [please note that the replicates is not applicable to center point here].

    Step - 1: Create a factorial design
    [Approach: Stat --> DOE --> Factorial --> Create Factorial Design]



    Then the session window would be displayed with the factors, runs, blocks, replicates and center point we have selected and the worksheet would be containing the StdOrder, RunOrder, CenterPt, Blocks and the factors we have selected. 

    The worksheet is depicted below:


    Step - 2: Now its time to perform the experiments as per the randomized runs provided by minitab in worksheet & Include the results of the experiments against the runs as shown below:
    [i've considered the conversions just for completing the case study]


    Step - 3: Analysing the Factorial design
    [Approach: Stat --> DOE --> Factorial --> Analyze Factorial Design]


    Below is the output after analyzing [check in the session window]



    Interpretation:
    From the ANOVA (Analysis of Variance) table, it can be observed that the P - value for all of the factors and interactions between factors reported as less than 0.05, which indicates that those interactions are having significant impact on the conversion (i.e., output response).

    The response can be predicted based on the regression equation,

    Conversion (%) = 51.62 - 0.097 Eq. + 0.0135 Temp. - 0.424 Time + 0.05174 Eq.*Temp.
    - 0.0069 Eq.*Time + 0.00712 Temp.*Time + 0.00191 Eq.*Temp.*Time
    + 31.04 Ct Pt

    ** Ct Pt are error estimates, which shall be eluted during the usage of center points.

    Rationale for P - value & 0.05:
    Rationale shall be explained in my next post, which shall be about hypothesis testing.

    Below are the graphs (Normal & Pareto)
    Normal Plot:
    Interpretation:
    Based on the normal plot, it can be concluded that the interactions are having significant impact on the output response.

    Pareto Chart:
    Interpretation:
    From the above pareto chart, it can be concluded that the factors and their interactions are having significant impact on the response, except interaction ABC, which is below the dotted line.

    Step - 4: Interpretation using Cube plot
    [Approach: Stat --> DOE --> Factorial --> Cube plot]

    Cube Plot:

    Interpretation:
    From the above cube plot, the 3D interactions can be observed and the way in which response is varying can be found.

    Step - 5: Analyzing through response optimizer
    [Approach: Stat --> DOE --> Factorial --> Response optimizer]

    Enter the target value for conversion as shown below:

    Interpretation: 
    From the above response optimizer, we can conclude that the target of 95% conversion with a desirability of 0.4000 i.e., 40%.

    Now i've changed the target to 90 from 95, now check the interpretation:

    Interpretation: From the above response, it can be concluded that the 90% conversion can be achieved at points Eq. = 5.0, Temperature = 80 C & Time =10 hours.


    Now lets check by maximizing to possible extent with high level of desirability.

    Interpretation:
    From the above response optimizer, it can be concluded that the maximum conversion can be achieved is
    92% at inputs of Eq. = 3, Temperature = 50 C and time = 6 hours.

    Note: As the desirability increases, the probability of getting the predicted response will be high.


    web survey

    About The Author


    Hi! I am Ajay Kumar Kalva, Currently serving as the CEO of this site, a tech geek by passion, and a chemical process engineer by profession, i'm interested in writing articles regarding technology, hacking and pharma technology.
    Follow Me on Twitter AjaySpectator & Computer Innovations

    No comments:

    Post a Comment

    This Blog is protected by DMCA.com

    ABOUT ADMIN


    Hi! I am Ajay Kumar Kalva, owner of this site, a tech geek by passion, and a chemical process engineer by profession, i'm interested in writing articles regarding technology, hacking and pharma technology.

    Like Us On Facebook