Research Activities

Multivariate Statistical Process Monitoring

Statistical methods for detecting changes in industrial processes are included in a field generally known as statistical process control (SPC) or statistical quality control (SQC). The most widely used and popular SPC techniques involve univariate methods, that is, observing a single variable at a given time as well as statistics, such as mean and variance, that are derived from these variables. Univariate methods are proven, simple, and easy to implement, which is the primary reason for their widespread popularity. However, univariate methods are not without shortcomings.

While a univariate approach may well indeed work for monitoring a small number of process variables, current trends in data acquisition hardware allow a large number of variables to be easily measured and application to larger multivariable systems becomes difficult, if not impossible, with univariate methods. This simplified approach to process monitoring requires an operator to continuously monitor perhaps dozens of different univariate charts, which substantially reduces his ability to make accurate assessments about the state of the process.

Multivariable statistical process monitoring (MSPM) techniques have been developed to reduce the burden of manually measuring a large number of process variables as well as to provide more robust monitoring methods. MSPM methods are typically concerned with measuring directionality of data from a multivariable spaceas opposed to univariate methods that only monitor the magnitude and variation of single variables. MSPM techniques reduce the amount of raw data presented to an operator and provides a concise set of statistics that describe the process behavior. A balance between MSPM and univariate SPM tools is therefore required to extract the most useful information from a process.

Data-driven MSPM methods that are capable of handling correlated data include Principal Component Analysis (PCA), Projection to Latent Structures (PLS), and Canonical Variate and Subspace State Space modeling. Data-driven techniques are dependent on data collected from a real process in order to formulate a model that describes the variability of that process. This method is referred to as system identification. Conversely, model-based techniques are dependent on detailed physical models of the system of interest. With either method, a model may be used to predict the future values of monitored process variables. Many systems of interest in chemical engineering exhibit non-linear behavior, thereby mandating the use of linear methods as a close approximation to the non-linear behavior of the system to achieve meaningful results. After a good model of the process is obtained, one-step-ahead predictions of the process variables may be used to monitor the process. Assuming that the model accurately describes the NOC of the process, one may examine the difference between the one-step-ahead model prediction and the actual value once it is obtained. This difference is referred to as the model residual. Errors between the predicted and actual values may be measured for statistical significance and suggest the presence of abnormal conditions in the process.

In addition to measuring the magnitude of the residuals, one may also examine trends in the residuals. Sensor errors in the form of a bias change, drift, or excessive noise may be diagnosed by examining the residuals over time. Gross sensor failures are easy to recognize using univariate methods and redundant sensor measurements. Slow drifts or failure in one or more sensors,however, are exceedingly difficult to detect via univariate techniques. Periodically performing sensor audits, that is, examining the residuals for statistically significant changes,can greatly enhance the credibility of the data as well as significantly reduce the number of false alarms.

Cellular Automata

The modeling of many physical systems is typically performed by using a set of assumptions to simplify model development and numerical computation. One such common assumption is that the physical space of the model is uniform and therefore, spatial inhomogeneities may be neglected. This so-called mean field approach is quite useful, for example, in the modeling of some continuous stirred tank reactors (CSTR) where the physical space may in fact be homogeneous and the analytical solution of the ordinary differential equations (ODEs) provides an accurate model.

However, when the system is not homogeneous, application of the above assumption often yields a model that does not accurately represent the system. For example, in CSTRs with a highly viscous medium where spatial heterogeneities exist in species concentrations, temperature, etc. Of course, the application of partial differential equations (PDEs) to model spatial inhomogeneities such as diffusion and hydrodynamic turbulence may produce accurate models, although their solution requires advanced numerical techniques such as orthogonal collocation or finite element methods. Additionally, the numerical techniques for solution of the PDEs are often computationally expensive and do not account for localized stochastic phenomena.

Cellular automata are an attractive alternative to PDEs to model complex systems with inhomogeneous physical spaces. A cellular automata lattice is comprised of discrete cells whose states are functions of the previous state of the cell and its neighbors. Rules are used to update each cell by scanning the value of the cells in the neighborhood. The von Neuman neighborhood (named after John von Neuman who is considered the pioneer of CA) consists of the four nearest neighboring cells. Traditionally, the directions North, East, South, and West, are given to the respective neighboring cells, with North as the top cell. The Moore neighborhood adds the four second-nearest neighbors NE, SE, SW, and NW.

Multivariable Batch Process Modeling, Monitoring, and Fault Diagnosis

Batch and semibatch reactors are frequently used in the pharmaceutical production industry. The characterization of successful batch production implies that a prescribed sequence of process events was executed under quality constraints. Batches vary as a result of disturbances, a lack of on-line quality measurements, acceptable recipe changes in normal operation, and varying batch run lengths. The traditional techniques used in process monitoring apply a combination of mathematical models (Kalman filters) and knowledge-based models (processevaluators based upon statistically-framed models) with varied success in an on-line framework.

The use of nonlinear models allows one to determine product quality during a particular run and if sampling product throughout the course of the reaction is feasible, the quality assessment may leave an indication on how to change the process to regain control. This requires a general differential model between the values and trends of the manipulated inputs and the product quality variables so that when the monitored inputs are outside normal behavior, the product quality variables (if they could be measured) would confirm the observation of abnormal behavior.

The specific purpose of this research is to investigate the merits of utilizing a differential technique and hidden Markov models (HMM) in monitoring and analyzing the operation of a modeled penicillin fermentor. Differential examination is a statistical tool where a set of mathematical basis functions approximates the differential behavior contained within replicates of sequential observations. This has an important link to principal components analysis (PCA), and its use enhances the control chart performance in future data reconstruction. In conjunction with hierarchical PCA (HPCA), where the complete data trajectory is not required for classifying the overall status of the current batch run, differential techniques allow for effective process characterization by a few user-friendly monitoring charts. Partial Least Squares (PLS) in conjunction with an extended Kalman Filter (EKF) framework allows final product quality estimates to be made with the insight of optimal control moves included. Applying differential estimation techniques to the PLS/EKF framework will further improve final product quality prediction.

Dynamic time warping (DTW) is a fault diagnosis tool, and the more flexible hidden Markov modeling will be implemented. Hidden Markov modeling is a stochastic tool where a series of observations are approximated as sequence of chain events linked by transition probabilities. DTW is also a tool utilized to align similar events within batch trajectories. Optimal curve registration will be applied to illustrate the added benefit of applying PDA to align batch processes of multiple phases. The system of tools can be utilized to form a supervisory framework that would operate on-line in the fault detection, diagnosis and overall supervisory control of a batch system.

Knowledge-Based Systems

Knowledge-based systems (KBS) or expert systems are computer systems designed in an attempt to emulate the decision-making capabilities and knowledge of a human expert in a specific field. KBS consist of a knowledge base, decision rules, and an inferencing engine. The knowledge base is comprised of a set of data or facts pertaining to a specific process. Decision or production rules are typically of the IF/THEN variety that operate on data acquired from the process and the knowledge base.

KBS are useful for solving problems that can only be done by human experts or are repetitive in nature. Applications in chemical engineering include diagnosis, monitoring, alarm handling, prediction, and process control. The advantage over human experts is access to very large databases and fast execution time. While a human expert may need to search volumes of printed text for a piece of information, a KBS can quickly search electronic databases in short time.

Knowledge-based systems are developed using languages capable of symbolic processing such as LISP, PROLOG or more specialized KBS development software such as G2. The advantage of these software over software commonly used in engineering such as FORTRAN is the ease of programming knowledge qualitatively. Additionally, code written by one programmer may be easily understood by another, without significant knowledge of the language. Code such as FORTRAN or C are written in a style that is unlike that used by humans to communicate, making it very difficult for one person to understand the code written by another programmer. The main disadvantage of software for qualitative reasoning is that it is significantly slower than software designed for scientific computing. While there are a large number of available software packages for advanced numerical analysis, the great majority of these software are not written in a way that is compatible with software for knowledge-base systems.

Nonlinear Empirical Dynamic Modeling

The focus is developing practical nonlinear system identification tools. The application of such models is in the fields of process monitoring and control. The technique should require little prior knowledge, and perform well for multi-step ahead prediction. Currently we are focusing on extensions of the so-called Subspace linear state space identification techniques. One such Subspace technique is referred to as CVA (Canonical Variate Analysis). As an extension to nonlinear modeling we have proposed using a nonlinear CVA to find optimal transformation of past regressors to predict future outputs.

This technique has been successfully applied to nonlinear system including a CSTR with output multiplicities. Currently we are investigating input design. Pseudo Random Binary Noise (PRBN) sequences are not adequate in extracting rich enough information for modeling, especially in system exhibiting output multiplicities. Each model is analyzed by comparing the actual steady state performance of the system to that of the model.

Metabolic Pathways Model of Human Liver

The liver has a critical role in the human body, being responsible for fuel management, nitrogen excretion, and regulation of water distribution between the blood and tissues, and detoxification of foreign substances. Therefore, an understanding of the mechanisms involved would be of interest to both engineering and medical communities. In this study, we consider a metabolic pathways description of the chemical reactions inside the liver involved primarily in energy metabolism, and desire to build an understanding of the observed phenomena with a bottom-up approach.

We first chart the metabolic pathways to describe 54 metabolites involved in 47 reactions. Next, assuming that the reactions are elementary, we associate a rate term for each metabolite, to write material balance equations. Then we test the ability of our resulting model in explaining observed phenomena in different physiological states.

The concentration of metabolites in different organs fluctuates all the time as a result of changing inner and outer effects. Hence a regulation mechanism in the liver acts to coordinate between supplies and needs at different instants. Accordingly, different configurations of pathway activities result. For example, when there is a large demand for glucose in the body glycogenolysis and gluconeogenesis are stimulated . When the body is at rest and glucose supply is large then pathways of glycolysis and fatty acid synthesis are active. Our model is intended to act in similar fashion as described above in response to changing physiological conditions.