Program LND for log-normal decomposition of particle size distributionsMiroslaw JonaszTable of contents
IntroductionThis program decomposes a frequency particle size distribution into a sum of log-normal functions (components). It does this by scanning a particle size distribution data set with a window of variable width looking for log-normal components of the distribution at each of the possible locations of the window. Once the components are found and their coefficients are determined, the user can select the best set of coefficients of log-normal components manually from the program output file or use a companion LNDFST program for automated selection and listing of the coefficients. This latter option is especially advantageous when processing a large set of size distributions. A companion program LNDVIEW makes it easy to select a set of components, as well as view and print a graph of log-normal components for a size distribution. The original version of this program was used in research described by Jonasz and Fournier (1996). See also Jonasz and Fournier (1999) for an erratum and additional results. This program is intended to be used in the Windows environment. A screen resolution of 1024 x 768 is expected. The LND program can be modified to allow
decomposition of a size distribution or similar data set into components other
than log-normal. Please contact MJC if you need such
modification(s). Quick startThe shipment disk/package contains particle size distribution data files in the Data directory to enable you to run and experiment right away with the LND program (in the LND directory) in both in the batch mode and single-file modes of operation. These data files may include:
The package also contains the Results directory with LND files (results files) corresponding to these data files. If you received a ZIPped package, the default directory structure is created automatically and files are copied to correct folders on unzipping the package (provided that the default directory structure encoded in the ZIPped package does not conflict with the directory structure existing in your computer). Please note that this directory structure is set up merely for convenience and is not e prerequisite for running the LND program:
To run the LND program in the batch mode, locate the "Run" group in the main form of the program and
To run the LND program in the single file mode, click the "Single file" button in the "Run" group and select a PSD file. [Top] User interfaceThis section is intended only for a brief discussion of the user interface itself. Please refer to the remaining part of the document for terms that are not explained in this section. "Scan window parameters" groupThis parameter group defines the data scan window in terms of the number of data points, where each point is defined by the particle diameter D and FD (frequency size distribution). The two parameters of this group define the start width range of the scan window that is used by the LND algorithm to find log-normal components of the size distribution. "Initial start width" edit
"Final start width" "View" group"Input"check box "Results"check box "Run" groupThis parameter group enables one to select the processing mode: batch or single file. Batch mode
Results are saved automatically (if possible) in this mode. The names of the results files (LND files) are auto-generated by the program. In the batch mode, two options are available: "Use all files" and "Select files". These options are selected by clicking the "Use all files" or "Select files" radio buttons respectively. "Use all files" "Select files" The single-file mode Error logIf log-normal components cannot be generated (bad data file, no components found, or any other reason), the relevant data files are listed in the error log window along with the potential error source in the following format: input file name > error message No LND file is created for an
input data file that had caused errors. "Processing file" info groupThis info group displays:
[Top] Miscellaneous items"Help" button "About" button [Top] Log-normal decomposition (LND)The LND algorithmThe program decomposes a frequency particle size distribution FD(D) represented by a data array [D, FD(D)], into a sum of log-normal components, each represented by a parabola:
The coefficients b0, b1, and b2 of each component are determined through an iterative application of the least-squares procedure. Detailed description of the fitting algorithm can be found in (Jonasz and Fournier 1996) The frequency size distribution, FD(D) [cm-3*µm-1] is defined as follows:
where dN is the number concentration of particles with "diameters" in a range of D to D + dD. It is the derivative of the cumulative size distribution [cm-3] and describes, as a function of the particle size, the number concentration of particles per unit size range. In short, the fit is performed in two steps as follows:
Apart from the average fit, a set of log-normal components is thus produced for each start scan window width. This width is represented by the minpts parameter in the program code. A log-normal component is considered to exist if coefficient (b2) at the second power of logD is negative ( the parabola has a maximum ). Beginning with version 2.02, this condition applies only to the log-normal components' fits, not to the average fit, where the b2 can assume any value. The minimum width of a detectable log-normal component, the initial start width of the scan window is set by the content of the initial start width edit. The default initial minimum width is set to 4 data points and the maximum initial width is always set to the minimum width plus 8. For each value of the minimum scan window width, a set of components may be discovered by the program. Such a process is termed a scan series. A scan series produces components with the "characteristic width" ranging from that corresponding to the start width of the scan window for that series to that corresponding to the final start window width (a default of start width plus 8 points). Thus, for example, a scan series for the start window width of 5 data points may contain components with the width range of 5 to 13 pts. After the scan series is finished, the start window width is incremented and another scan series begins with the scan window width ranging from the new initial start width to the number of data points. The last scan series begins with the start window width set to the final start width Several components may be found during a scan series. The program selects the component that produced the lowest approximation error to the data within its window (whose position and width may be different for each component found during the scan series). Once a component is found, it is subtracted from the current version of the input data (i.e. the particle size distribution), with the difference becoming the new version of the size distribution, presumed to contain other log-normal components whose presence was "obscured" by the dominant component that has just been identified. This new version of the size distribution is then processed in the same way as was the previous version. Only those difference data points are included in the new version of the particle size distribution whose FD value is greater than the old data point value by a factor named fract in the program code (with a value of 1E-10). The initial start width of the scan window determines the characteristic width of log-normal components that can be detected. For example, if the size distribution contains a component with a size scale of 4 data points at your size grid, this component will most likely be undetected in scans beginning with the window width > 4. The absolute meaningful minimum scan window width is 3 points, i.e. the number of points required to fit a parabola (log-normal function in log-log scale) to data. Too fine a value of the initial scan window width might result in fitting the "noise". Too large a value might miss some small-width components. Following the completion of a scan series, the initial start window width is incremented and the new scan series begins. For all values of the initial scan window width from the range defined in the "Scan window parameters" group, one obtains a set of scan series, each with a characteristic fit error by the sum of components that were found during that scan series. A fit which yields the lowest error can now be selected. Such a selection can be performed manually or by a program. If you desire to obtain such a program (LNDFST) which will do that in a batch mode, please contact MJC at an address listed at the end of this text. Particle size scale and the component widthIt should be stressed that the scan window width settings may affect the decomposition of a size distribution into log-normal components. The default values of these settings have been arrived by trial-and-error during analysis of size distributions from a database that we compiled. That database contained mostly size distributions specified at a size grid equidistant in the log-size scale. This size grid of the particle diameter, D, can be defined as follows: D0 where the constant a is on the order of 21/3. In this case, we have, for example: D0 = 2 micrometers (µm)
This particle size scale has been frequently used in environmental sciences, such as oceanography (for example, Sheldon et al. 1972), where the particle size range typically spans several decades. The window width measurement in points rather than in length units (for example, micrometers) was selected for simplicity that is generally justified by the presentation conventions of the marine size distributions. If your data are denser that the size grid just discussed implies, you need to select the minimum window width that reflects your diameter data spacing. For example, if the size increment is 0.1 µm, 12 data points span a size range of 1.2 µm. Thus, if the expected minimum component width is about 1 µm, then the initial start window width in data points should be about 8 instead of 4 (the default value). [Top] Statistical significance of componentsThe LND program performes a statistical test (Fisher test, for example, Hudson 1964) of the significance of the various components within each scan group. The Fisher test examines the equivalence of variances, here represented by the approximation errors, expressed by the sum of squares of residuals:
where N is the data count, and FDapprox, i is the sum of log-normal components. This is an approximation of the "correct" expression for the error:
that is implied by the way the fit is calculated (by using the log-log transform of the original data set). Approximation (3) is valid for small values of the error, in which we are interested: if the error is large, we are not going to consider the fit anyway. The approximation is used for the calculation speed and also to avoid potential singularity problem with taking a logarithm of FDapprox when the latter (potentially) evaluates to a machine zero value. The significance of components versus each other and vs. the average fit is found for a scan series as follows. At first, the components are ordered according to their ability to remove the approximation error as follows:
The significant components are chosen by successive applications of the Fisher test to approximations of the log-transformed data by cumulative components subsets within a scan series, i.e. first to the approximation by component 1 vs. approximation by components: 1 and 2, then to approximation by components: 1 and 2, vs. approximation by components: 1, 2, and 3, and so on. Finally, the best components selection's significance is tested versus the average fit by using the Fisher test again. These tests are performed by using the data and components expressed in the log-log scale. The significance of a set of components for a scan series is not evaluated against the significance of components for another scan series. Sample resultsFig.1 shows a sample log-normal decomposition of a frequency particle size distribution.
Fig. 1. An example of the decomposition of a frequency particle size distribution into a set of log-normal components for data from file Kraatl8601.psd (the Northwestern Atlantic, data kindly provided by Dr. K. Kranck and Dr. T. Milligan, Bedford Institute of Oceanography, Canada). [Top] The file systemPSD (input) filesThe LND program works in the single-file and batch modes. In the single-file mode, the user supplies the names of the PSD data file, selects the diameter and PSD columns in that data file, the manner in which the input data are read, and supplies the name of the results file. In the batch mode, the user only selects files to be processed or elects to process all files in a directory and the program decides how to obtain the data form these files: empty lines or lines in the file that contain text, which cannot be converted to columns of numbers are ignored. This is also the default setting for the single-file mode. The input data must be available as the space, tab, or comma-delimited text data files (PSD files, extensions: "psd" and "txt" in the single file mode, extension "psd" in tha batch mode). Each PSD file is expected to contain the frequency particle size distribution data. The program does not check whether the data provided in a PSD file are indeed those of such a distribution. The structure of a PSD file is as follows: D FD where D is the particle diameter and FD is the frequency particle size distribution Several size distributions can be included in the same data file, all sharing the same particle diameter grid (the 1st column of the file). The units are irrelevant, however, for the proper viewing with a companion spreadsheet program (LNDVIEW), the units should be as follows: particle "diameter", D, unit = µm, particle size distribution, FD, unit = cm-3 µm-1 Note that 1 µm = 10-6 m. A sample PSD file (JONATL7821.PSD): [beginning of file] LND (results) filesThe output (i.e. LND) results can be stored to text files with the following structures: Start section Components' parameters' section where the first index in the two-dimensional arrays denotes the log-normal parameter number (0, 1, and 2), and:
The components' parameters' section is repeated for each scan series. Note that if the average is, according to the Fisher test, statistically equivalent to a component's fit, then the "simpler" average fit is listed in the components' parameter's section. End section where
In the single-file mode, the name of a results file is set by the user, although the program proposes a default file name according to a rule applicable to the batch mode. In the batch mode, the names of the results files are generated by the program as follows:
For example, if the original file name is c:\lnd\data\jonbal7821.psd and the FD data are listed in the 2nd column of that file, then the LND results are stored in c:\lnd\data\jonbal7821_2.lnd ReferencesHudson D. 1964. Statistics for physicists. Geneva. Jonasz M., Fournier G. 1996. Approximation of the size distribution of marine particles by a sum of log-normal functions. Limnol. Oceanogr. 41: 744-754. Jonasz, M. and Fournier G. F. 1999. Approximation of the size distribution of marine particles by a sum of log-normal functions (Errata: Corrections and additional results. Limnol. Oceanogr. 44: 1358-1358. Sheldon R. W., Prakash A., Sutcliffe W. H. Jr. 1972. The size distribution of particles in the ocean. Limnol. Oceanogr. 17: 327-340. Contact info for comments and questionsPlease direct your comments and questions regarding this software, as well as questions on other MJC Optical Technology software and services to: Dr. Miroslaw Jonasz DisclaimerThe information contained in this document is believed to be accurate. However, neither the author nor MJC Optical Technology guarantee the accuracy nor completeness of this information and neither the author nor MJC Optical Technology assumes responsibility for any omissions, and errors, or for damages which may result from using or misusing this information. |
||||||||||||
Last modified: . Copyright 2000 MJC Optical Technology. All rights reserved. |
||||||||||||