Extended Histopathology in Immunotoxicity Testing: Interlaboratory Validation Studies

D. R. Germolec*,1, A. Nyska{dagger}, M. Kashon{ddagger}, C. F. Kuper§, C. Portier, C. Kommineni||, K. A. Johnson||| and M. I. Luster||||

* Laboratory of Molecular Toxicology/National Toxicology Program, National Institute of Environmental Health Sciences, Research Triangle Park, North Carolina (RTP, NC); {dagger} Laboratory of Experimental Pathology, National Institute of Environmental Health Sciences, RTP, NC; {ddagger} Biostatistics Branch, National Institute for Occupational Safety and Health, Morgantown, West Virginia; § TNO Nutrition and Food Research, Zeist, The Netherlands; Laboratory of Computational Biology and Risk Assessment, National Institute of Environmental Health Sciences, RTP, NC; || Pathology and Physiology Research Branch, National Institute for Occupational Safety and Health, Morgantown, West Virginia; ||| Toxicology and Environmental Research and Consulting, The Dow Chemical Company, Midland, Michigan; and |||| Toxicology and Molecular Biology Branch, National Institute for Occupational Safety and Health, Morgantown, West Virginia

Received September 22, 2003; accepted November 28, 2003


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 
There has been considerable interest in the use of expanded histopathology as a primary screen for immunotoxicity assessment. To determine the utility of a semiquantitative histopathology approach for examining specific structural and architectural changes in lymphoid tissues, a validation effort was initiated. This study addresses the interlaboratory reproducibility of extended histopathology, using tissues from studies of ten test chemicals and both negative and positive controls from the National Toxicology Program's immunotoxicology testing program. We examined the consistency between experienced toxicologic pathologists, who had varied expertise in immunohistopathology in identifying lesions in immune tissues, and in the sensitivity of the individual and combined histopathological endpoints to detect chemical effects and dose response. Factor analysis was used to estimate the association of each pathologist with a so-called "common factor" and analysis-of-variance methods were used to evaluate biases. Agreement between pathologists was highest in the thymus, in particular, when evaluating cortical cellularity of the thymus; good in spleen follicular cellularity and in spleen and lymph node-germinal center development; and poorest in spleen red-pulp changes. In addition, the ability to identify histopathological change in lymphoid tissues was dependent upon the experience/training that the individual pathologist possessed in examining lymphoid tissue and the apparent severity of the specific lesion.

Key Words: immunology; pathology; spleen; thymus; lymph node; histopathology; immunopathology; risk assessment.


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 
The identification of chemicals that have the potential to cause injury to the immune system is of considerable public health significance, as alterations in immune function can lead to increased incidence of hypersensitivity disorders, autoimmune or infectious diseases, or neoplasias. Experimental animal data collected over the past 20 years, using standardized testing panels, has provided a database from which the sensitivity and predictability of a variety of immune function and host resistance tests commonly used for the screening of chemicals for immunotoxicity has been evaluated (Luster et al., 1988Go, 1992aGo, 1993Go; Vos and van Loveren, 1987Go). This has allowed the development of screening panels that incorporate a tiered evaluation and the assessment of multiple endpoints, including the measurement of an immune response following antigen stimulation (i.e., a functional test). These tiered screening panels have been the basis for several risk assessment guidelines and a number of regulatory agencies in the United States, European Union, and Japan have established or are developing requirements or guidelines for the assessment of potential adverse effects on the immune system (Japanese Pharmaceutical Manufacturer’s Association [JPMA, 1999Go]; International program of Chemical Safety [IPCS, 1996Go]; Organization of Economic and Cooperation Development [OECD, 1995Go]; U.S. Environmental Protection Agency [USEPA], 1998Go; U.S. Food and Drug Administration [FDA], 2002Go). In several of the already established guidelines, enhanced/expanded histopathologic evaluations of the spleen, thymus, lymph nodes (one covering route of administration and one the distal nodes), gut-associated lymphoid tissue (GALT), and bone marrow are included in the primary screen for potentially immunotoxic agents. This was based partly on the results of interlaboratory studies in rats with potent immunotoxic agents: azathioprine, cyclosporine A, and hexachlorobenzene (International Collaborative Immunotoxicity Study [ICICIS] Group Investigators, 1998Go; Richter-Reichhelm et al., 1995Go). These studies demonstrated that enhanced/expanded histopathology of lymphoid organs could flag the three agents as immunotoxic at doses below the maximum tolerated dose (MTD). A similar conclusion was reached in a study with a number of pesticides (Vos et al., 1983Go).

Other studies suggested that inclusion of a functional test, in addition to pathology endpoints, might be more sensitive at detecting potential immunotoxicants than was histopathology alone (reviewed in van Loveren et al., 1996Go). The use of enhanced/expanded histopathologic evaluation as a primary screen would be advantageous for two reasons. First, potential immunotoxicity could be assessed during routine toxicology studies, such as the 28-day rodent study, without the need for additional animals. Secondly, no specific expertise in immune-function testing or equipment would be required. For screening tests to be meaningful, however, it is important to identify both the limitations of the test, as well as the concordance (i.e., how accurately the test predicts the interest of concern). The latter usually involves more than a simple qualitative answer, and quantitative issues may need to be considered (i.e., sensitivity differences between the screening test and the interest of concern). In addition to sensitivity, potential inter- and intralaboratory variability needs to be considered.

The predictive value of various laboratory tests and their potential correlation with morphological change has been investigated for a number of other target organs/systems. The validation process for these studies has required the availability of relatively large databases of tested compounds, where the experimental procedures were uniformly applied. One example of such an effort is the comparison of serum enzyme levels that measure hepatic and renal function, microsomal enzymes, organ weight, and histopathological evaluation in the liver (Amacher et al., 1998Go; Travlos et al., 1996Go). Waters et al. (2003)Go recently described a large-scale initiative to investigate the correlation between altered gene expressions, as evaluated using microarray techniques, with specific parameters from standard toxicology studies including a thorough histopathological evaluation.

To help validate the enhanced/expanded histopathology, including grading of lymphoid organs in mice, a workgroup was formed consisting of four pathologists from academia, government, and industry. In the ICICIS study (1998Go), it was noted that training of the pathologists in the structured histopathology assessment scheme greatly added to the sensitivity of the histopathology. Thus, for all pathologists to have the same understanding of the evaluation criteria, the individual with the most expertise in immunopathology presented the details of standardization of grading of the tissues. Once the evaluation criteria were agreed upon, histopathology slides from 10 separate chemical studies were selected from those that had been conducted as part of the National Toxicology Program (NTP) testing efforts. We analyzed the histopathology results from these studies in order to evaluate the consistency among pathologists and the sensitivity of the individuals, and combined histological endpoints for hazard identification and dose-response assessment.


    MATERIALS AND METHODS
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 
Experimental chemicals and positive controls.
All chemicals selected for these studies were evaluated as part of the NTP immunotoxicology-testing program. Chemicals included compounds that were known to inhibit specific components of the immune system and were known to have no effects on immune function, or had been shown to be immunostimulatory. In all studies three dose levels of the test chemical and one vehicle-control group were included. The high dose was selected from 14- or 90-day, dose-ranging toxicity studies and was based upon an estimate that no overt toxicity would be produced. Although routine toxicological assessment in rodent models often tests compounds at the maximum toxic dose (MTD), these high doses have been shown to produce a neuroendocrine stress response that does not occur at lower doses (Brown et al., 1988Go; Clement, 1985Go; Kunimatsu et al., 1996Go). Such nonspecific stress responses commonly lead to erroneous identification of the test compound as immunotoxic (Pruett et al., 1993Go, 1999Go, 2000Go).

Within the NTP, preliminary dose-range studies are routinely conducted prior to immunotoxicity studies, and, for the reasons stated above, the highest dose is set slightly below the MTD and at doses where body weight changes would not be ≥10%. Histopathological examination and data analyses also included tissue from the positive controls, run as reference compounds for the immunotoxicity studies. Positive control chemicals, which included cyclophosphamide, methotrexate, and sodium arsenite, were examined at only one dose level and were performed in conjunction with 6 of the 10 chemicals in the data set; cyclophosphamide was used in four experiments. Detailed information on each of the chemicals and doses used in these studies can be found in Supplementary Appendix A, available online at the journal's Web site (www.toxsci.oupjournals.org). Additional details on the duration, route of exposure, and other study parameters can be found in the following references: Burns et al., 1994Go; Cao et al., 1990Go; Karrow et al., 2000aGo,bGo; NTP 1988aGo,bGo, 1989Go; Phillips et al., 1997Go; Sikorski et al., 1989Go.

Tissue preparation and histological examination.
All tissues represented archived samples from previous NTP immunotoxicity studies using female B6C3F1 mice. Tissues were collected at the termination of each study under GLP guidelines, according to standard operating procedures developed under an NIEHS contract. All animals were weighed and then humanely euthanized, using carbon dioxide inhalation. Thymus, spleen, and the complete chain of the superior mesenteric lymph nodes were collected and fixed in 10% neutral-buffered formalin. The thymus and spleen were weighed prior to fixation. One middle cross-section from the spleen, both lobes of the thymus, and the mesenteric lymph nodes were embedded in paraffin, and 5-6 micron sections were prepared and stained with hematoxylin and eosin (H&E) for histopathological evaluation. For each pathologist, a standardized slide set was generated for each of the chemicals.

The pathologists did not know the identity of the test chemical at the time of evaluation. However, for each slide set, the pathologist received data identifying positive control, negative control, and chemical concentrations, as well as organ weights. As discussed above, a working group was convened for establishment and standardization of the endpoints to be evaluated. All of the participating individuals were toxicologic pathologists with extensive experience in laboratory animal studies. However, only one pathologist was considered to have specific expertise in immunopathology and this individual provided training in the evaluation of tissues for the study. Following subsequent microscopic evaluation of the lymphoid organs from treatment groups of two randomly selected compounds, telephone discussions among the pathologists ensured that every participant understood the usage of the criteria and provided an opportunity for necessary changes to be made.

A semiquantitative assessment was used to estimate the histopathological changes within different anatomical compartments of the lymphoid tissues. The diagnostic terms for identifying and evaluating the histopathologic changes were those recommended by Kuper et al. (2000Go; 2002Go). The grading scheme consisted of ordinal categories ranging from "0" (no effect) to "4" (severe effect) and an indicator as to whether the effect was increased or decreased relative to normal tissue. Histopathological evaluations took into consideration changes in cell density or change in the anatomical compartment size. The pathologists were also instructed to add comments that were not quantifiable but considered important for proper histopathological assessment, such as "focal increased cellularity of outer thymic cortex" and "increased tingible body macrophages in the thymic cortex." Remarks were made on quality of the sections (i.e., plane of sectioning that influenced the size of lymphoid follicles, staining quality, thickness of section, and on quantity of tissue present in section), bleeding that may have influenced the morphology of the red pulp of the spleen or other compartments, and suggested usage of immunohistochemical staining for better characterization of the changes. Four compartments were evaluated in the lymph node: grade of cellularity in the follicles, paracortical areas, medullary cords, and sinuses. Five compartments were evaluated in the spleen: cellularity of periarteriolar lymphoid sheaths (PALS), lymphoid follicles, marginal zone, red pulp, and the total number of germinal centers in each section. Three compartments were evaluated in the thymus: cortex cellularity, medullary cellularity, and the cortico-medullary ratio. Following the microscopic examination, the coded data were transferred to an electronic format, and a formal quality control was conducted on the data entry of the entire set of findings.

Statistical Methodology: Analytic approach.
Although efforts were made that all pathologists would have the same level of understanding in the use of the expanded histopathologic nomenclature, complete agreement was not expected because of the subjective nature of the evaluation. This incomplete agreement, or variation among pathologists, can occur for several reasons. First, given the complex nature of pathological assessment, each pathologist is not going to weigh every facet of the tissue in exactly the same manner. The grading assigned to a given tissue section by a given pathologist will ultimately reflect that pathologist’s individual criteria, which are influenced by experience and biases. Furthermore, there may be slight variation between the actual set of tissue sections that was evaluated by each pathologist and a certain amount of random error, or unexplained variation, which will contribute to the incomplete agreement. Conceptually, agreement can be evaluated by examining the associations, biases, or tendencies to grade specimens systematically higher or lower among the pathologists. Given these multiple components of agreement, a single index of agreement cannot fully describe the data and statistical modeling has been advocated (Agresti, 1992Go; Uebersax, 1992Go). Factor analysis methods (Dunn, 1989Go) were used to estimate the associations and analysis-of-variance methods (Ubersax, 1992Go) to evaluate biases. Factor analysis was used to estimate the association of each pathologist with a so-called latent factor. The latent factor can be thought of as representing the underlying trait that is being estimated, i.e., the pathological trait of a given tissue specimen, while each grade assignment can be viewed as an imperfect representation of the underlying pathology that is subject to the pathologist's biases and random error. A "common factor" analytical approach was used to assess the association among the pathologists with the underlying latent factor. This common factor is ultimately derived from the matrix of correlation coefficients among all of the pathologists. The factor loadings, or output, generated from the common factor approach, are essentially the correlation coefficients of each pathologist with the underlying common factor. Factor loadings that are positive and high (1 [one] is the maximum value) are indicative of high levels of agreement. The square of these factor loadings represents the proportion of variance in each pathologist's ratings that is accounted for by the underlying common factor. The higher these values, the more in agreement the pathologists are with the underlying common factor. These types of agreement statistics have been used successfully in other disciplines, most notably psychology and sociology, to predict changes in dependent variables using multiple explanatory variables (Hair et al., 1992Go). Factor analysis has also been used to model relationships between immune-function endpoints and host resistance following exposure to the prototypical immunosuppressant, dexamethasone (Keil et al., 1999Go, 2001Go).

The agreement among pathologists on each tissue parameter was assessed by calculating the correlation coefficient between each pathologist and the common factor, as a function of the specific tissue parameter that was scored and as a function of the dose of the chemical that was applied. These analyses were all performed without regard to the specific test chemical. However, for some analyses, the positive controls were evaluated independently to assess agreement when effects would likely be the most severe.

To assess bias among the pathologists, we utilized an analysis-of-variance model that provided estimates of the mean grades of each tissue compartment as a function of the individual pathologist and the specific dose of each of the chemical compounds. These estimates allow assessment of particular biases, apparent by pathologist on a given tissue compartment, which would not be reflected in the above analysis examining associations. For example, the factor analytic approach would indicate very high agreement among pathologists, even if one pathologist consistently rated a given histological specimen one unit higher than another. Using an analysis-of-variance approach, this bias would be apparent in the plots of the mean grades. Given the large amount of data in this experiment and the high statistical power to detect small differences among pathologists and doses, a classic hypothesis-testing paradigm was not used; a more descriptive analysis of the results was applied. For these analyses, data from all test chemicals were analyzed together; however, each chemical was included in the model as a random effect to assess variability in scoring across different chemicals (Littell et al., 1996Go).

Data.
The pathologists' ratings were on a scale of 0 to 4, as an ordinal index of the severity of the lesion, with an additional categorical designation for either an increase or a decrease in grading. These data were converted for each compound to a scale in which those ratings with a decreased grade designation were assigned a negative value and those with an increased grade designation were assigned positive numbers. Thus, the data regarding pathological ratings are on a scale ranging from -4, to +4, with 0 indicating no effect. The high dose and the control groups were examined first for histopathological lesions. If there was no effect observed in any of the animals getting the high dose, a value of zero was assigned to animals in the low- and medium-dose groups. There were several instances of missing data where either the tissue section was unreadable or had been lost. Missing values are represented as blanks in the data set.

Computational methods.
The data were analyzed using SAS/STAT software, Version 8.2, of the SAS System for Windows (SAS Institute, Cary, NC). Factor analysis was performed utilizing Proc Factor, with the iterated principle factor method (prinit) and the number of factors set to 1. Correlations among the pathologists with the common factor scores for both tissue compartment and dose were calculated using Proc Corr. Mixed model analyses of variance were performed on each tissue parameter using Proc Mixed. Random effects in the mixed model included animal, since each animal was evaluated by each pathologist and for each test chemical. Nonparametric tests of each pathologist, for a trend to assess whether the individual tissue parameters would show dose-responsive effects of the compounds, and were performed using the Jonckheere-Terpstra option in Proc Freq.


    RESULTS
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 
The correlations (i.e., factor loadings) obtained from analyzing the positive control and experimental and combined-data sets showed considerable differences between the data sets and individual pathologists (Table 1). The strong correlation observed with the positive control data, relative to the test-chemical data set, suggests that the pathologists have better agreement when the histopathological lesions are more severe, as occurs in the positive control. This would also imply that the converse is true. That is, poorer agreement occurred when the nature of the lesion was less severe, as in the test-chemical data. When the data from both the positive controls and chemically exposed animals are combined, the associations among the pathologists are only slightly decreased, as compared to the positive control group only. There was one pathologist's scoring (Pathologist #2) with which the other three pathologists most often agreed, whereas the other three pathologist's gradings had less agreement with each other. Thus, this pathologist's correlation with the common factor was much higher. As this individual is a recognized expert in immunopathology, it was assumed that that person was able to detect more subtle changes than the other pathologists, or, stated another way; pathologists with less experience may have more difficulty in discerning subtle lesions.


View this table:
[in this window]
[in a new window]
 
TABLE 1 Overall Evaluation of the Level of Agreement Between Each Pathologist and the Common Factor

 
Figure 1 presents the correlation with the common factor as a function of the histopathological compartments examined, and provides a measure of the relative agreement for each pathologist based upon the compartment being evaluated. When the entire data set is included (Fig. 1A), good agreement is obtained for all compartments of the thymus. Within the spleen, measures of follicle cellularity and germinal centers provided good agreement, while splenic red pulp demonstrated the poorest agreement among the compartments evaluated. Similarly, germinal center development demonstrated the best agreement in the evaluation of the lymph nodes.



View larger version (28K):
[in this window]
[in a new window]
 
FIG. 1. Correlation coefficients between each pathologist and the common factor are shown for each tissue compartment measured when all data including the positive controls are utilized (A), when only data from the positive controls are utilized (B), and when only data from the test chemical are utilized. An asterisk (*) represents an absence of estimates on the compartment from a given pathologist.

 
The relative level of agreement between the pathologists, based on the correlations of each compartment examined with the common factor, showed similar trends to the level of agreement observed in the composite analysis. Specifically, agreement was higher between pathologists when examining the tissue from positive-control animals (Fig. 1B) and lower when examining the tissue from animals exposed to the test chemicals (Fig. 1C). However, it was also apparent from the analyses that the level of agreement was not only dependent upon the severity of the lesion, but also the specific compartment measured. For example, the level of agreement when scoring medullary cellularity in the thymus followed a similar trend to the composite data sets (i.e., high in the positive control and low in the chemical data sets). However, the level of agreement when scoring germinal centers in the spleen was similar in both the positive control and test chemical data sets.

The analyses examining agreement as a function of chemical dose is shown in Figure 2. There is a general trend toward greater agreement for increasing doses of chemicals examined. Analogous to observations made when examining positive controls and test chemicals, this implies that the level of agreement increases with the severity of the lesion and that subtle changes are more difficult to detect. As demonstrated for the relative level of agreement between individual pathologists, when agreement was examined as a function of dose, pathologist #2 had the highest level of agreement with the common factor.



View larger version (16K):
[in this window]
[in a new window]
 
FIG. 2. Correlation coefficients between each pathologist and the common factor are shown for each dose level. The data represented are from all positive control and test chemicals used in the study.

 
The means of the individual pathologist's scores for each parameter and dose, including the positive controls, is shown in Figure 3. The scores were fairly consistent among pathologists across the compartments examined, relative to each other, with one individual grading slightly higher than the others. This had little effect on the overall interpretation of the data. These results were also used to provide dose-response trends for the particular parameters. While significant dose-response trends could not be detected for many of the compartments examined, significant trends were found for the lymph node medullary cord cellularity, lymph node paracortical cellularity, thymus cortical cellularity, spleen follicle cellularity, and spleen periarteriolar lymphoid sheaths. For some compartments, such as the splenic red pulp, marked effects were observed, but the trends were inconsistent, with the direction of the effects differing among pathologists. When only the positive control data were considered, the pathologists identified lesions in 7 of the 12 parameters examined: splenic red pulp, spleen germinal centers, spleen follicle cellularity, lymph node paracortical area cellularity, and all parameters in the thymus.



View larger version (29K):
[in this window]
[in a new window]
 
FIG. 3. Mean ratings (±standard error) are shown for each compartment and each dose combination. The data represented are from all positive controls and test chemicals used in the study.

 

    DISCUSSION
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 
An interlaboratory study, to assess the consistency among toxicologic pathologists with different levels of experience in extended histopathology, was undertaken to evaluate a common set of lymphoid tissues for pathological lesions. This study specifically addresses the level of agreement obtained when performing expanded histopathology in immunotoxicology studies. To address the different facets of agreement, factor-analysis methods were used to estimate associations while analysis of variance methods were used to evaluate biases. The major conclusion that can be drawn from these validation studies is that the ability to identify more subtle histopathological change in lymphoid organs is dependent, in addition to the apparent severity of the specific lesion, upon the experience/training of the pathologist in immunohistopathology and the specific lymphoid tissue compartment that is examined.

Within the thymus, changes in cortical cellularity were readily detected and provided the highest degree of agreement. This would indicate that alterations in this endpoint are readily discernable or, as suggested by De Waal et al. (1997)Go, that cells in the cortex are generally the most sensitive to injury. The thymus is a major generative organ of the immune system and orchestrates the development of the T-lymphocyte repertoire. The thymic lobules are divided into 2 zones: a peripheral, lymphocyte-rich cortex and a central, less densely populated medulla. Changes in thymus histopathology and architecture are considered to be of particular relevance for immunotoxicity screening (Schuurman et al., 1992Go). After administration of immunosuppressive agents, depletion of lymphocytes, or reduction in cellularity can occur in a diffuse manner or be limited to either the cortex or medulla. Wachsmuth (1983)Go has shown that the immunosuppressive effects of a number of different pharmaceutical agents are evident on histopathological examination of the thymus, and that histological findings correlate well with thymus weight and peripheral lymphocytic counts in both the rat and dog. Altered thymic cellularity may be reported for specific compartments, and together with remarks on the presence of cell necrosis or increased numbers of macrophages with tingible bodies, can provide some indication as to the mechanism of altered cellularity (Harleman, 2000Go). Decreases in cellularity may be correlated with reductions in thymus weight, which has been shown to be a predictive indicator of immunotoxicity (Luster et al., 1992bGo).

The spleen is generally composed of white and red pulp, and, while composed of discrete morphological structures, the analyses presented here suggest that group morphologic assessment is difficult. The white pulp is located around a central arteriole and comprises the periarteriolar lymphoid sheaths (T-cell area), adjacent follicles (B-cell area), and marginal zone. Nonimmune functions, such as extramedullary hematopoiesis in the splenic red pulp, may complicate the evaluation of this organ. Lymph nodes are organized structures, divided into capsule, cortex (B-cell zone, composed of follicles and germinal centers), paracortex (T-cell zone), and medullary sinuses and cords. Within the spleen, measures of follicle cellularity and germinal centers provided the best agreement. As observed in the spleen, the histopathological ratings for germinal center development had the strongest agreement for all parameters examined in lymph-node tissues. This is not surprising as germinal centers are distinct, highly active structures, formed as a direct result of immune activation and characterized by extensive lymphocyte proliferation and differentiation. An immunologically activated lymph node will show complex changes involving several of its anatomic subunits that may make it difficult to evaluate. The relatively poor agreement reached with the spleen and lymph-node assessment, compared with the thymus, may be related to the fact that these organs undergo subtle changes in the different zones (Greaves, 1990Go) and would suggest the need for a thorough analysis and awareness of the various functions and interactions of each zone.

From a histological standpoint, assessment of the mammalian immune system is neither routine nor simple. It is composed of multiple organs and tissues, some of which are also responsible for hematopoiesis (bone marrow and spleen), others for lymphocyte maturation (thymus), and others that generate responses to antigen (lymph nodes). In addition, there are specialized tissues located throughout the body that are responsible for responding to antigens or pathogens locally (e.g., skin- lung- and gut-associated lymphoid tissues). General and parameter-specific differences were observed and ranged from good agreement to poor agreement in detecting or not detecting pathological changes. Thus, ratings for the thymus cortical cellularity were highly consistent; pathology occurred with a significant number of the test chemicals and positive controls and occurred in a dose-dependent fashion. In contrast, significant pathological changes were seen in spleen red pulp but there was a lack of consistency in the ability of the pathologists to agree on the severity of the lesion or even the direction of the change. There was good agreement among the pathologists when examining spleen follicular cellularity for the positive-control data set, but not in the test-chemical data set, suggesting that subtle changes in this compartment may be difficult to read. There was also generally good agreement among the pathologists in the lymph node sinus, lymph node paracortical area, thymus cortico-medullary ratio, thymus medullary cellularity, spleen periarteriolar lymphoid sheaths, and spleen germinal centers. However, there was a lack of pathological change detected in both the experimental and positive-control groups for these endpoints, suggesting that the chemical agents did not target these tissue compartments, or that changes in these parameters are more difficult to discern. The spleen marginal zone and lymph node medullary-cord cellularity also appear to fall into this category, with only one individual reporting histological change in each parameter. The lymph node follicular germinal center-development measure appears not to be a very sensitive indicator, because it was not affected by any chemical exposure and showed poor agreement with the pathologists in the positive-control data set. While not assessed in the statistical evaluations conducted for this study, this type of semiquantitative examination of histological parameters should also consider a careful histological description of the types of lesions, including focal alterations, cell necrosis, granulomata, etc.

Summary
In summary, the ability to identify histopathological changes in lymphoid tissues was highly dependent on the severity of the specific lesion and the tissue compartment measured in these studies. Overall, histopathological changes were most frequently and most consistently reported in the thymus cortex and medulla, and in the spleen and lymph node follicles (cellularity and germinal center development). The ability to detect dose-response trends was not readily apparent in any tissue when all compartments in that tissue were considered. Similar to the ICICIS studies (ICICIS Group Investigators, 1998Go), specific training in expanded immunohistopathology was an important factor in the ability to detect subtle lesions in immune tissues. The correlation between expanded immunohistopathology and lymphoid organ weights and between traditional immune function tests and host resistance assays are addressed in an accompanying study (Germolec et al., in preparation).


    ACKNOWLEDGMENTS
 
The authors would like to extend their warmest appreciation to Dr. Robert Maronpot for his encouragement and facilitation of the study, as well as his thoughtful comments on the manuscript. We would also like to thank Drs. Michael Andrew and Russell Helms for their assistance with the statistical analysis.


    NOTES
 

1 To whom correspondence should be addressed at the Laboratory of Molecular Toxicology, National Institute of Environmental Health Sciences, 111 Alexander Drive, P.O. Box 12233, Research Triangle Park, NC 27709. Fax: (919) 541-0870. E-mail: germolec{at}niehs.nih.gov


    REFERENCES
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 
Agresti, A. (1992). Modeling patterns of agreement and disagreement. Stat. Methods Med. Res. 1, 201–218.[Medline]

Amacher D. E., Schomaker, S. J., and Burkhardt, J. E. (1998). The relationship among microsomal enzyme induction, liver weight, and histological change in rat toxicology studies. Food Chem. Toxicol. 36, 831–839.[CrossRef][ISI][Medline]

Brown, L. D., Wilson, D. E., and Yarbrough, J. D. (1988). Alterations in the hepatic glucocorticoid response to mirex treatment. Toxicol. Appl. Pharmacol. 92, 203–213.[ISI][Medline]

Burns, L. A., Bradley, S. G., White, K. L., Jr., McCay, J. A., Fuchs, B. A., Stern, M., Brown, R. D., Musgrove, D. L., Holsapple, M. P., Luster, M. I., and Munson A. E. (1994). Immunotoxicity of Mono-nitrotoluenes in female B6C3F1 mice: I. Para-nitrotoluene. Drug Chem. Toxicol. 17, 317–358.[ISI][Medline]

Cao, W., Sikorski, E. E., Fuchs, B. A., Stern, M. L., Luster, M. I., and Munson, A. E. (1990). The B lymphocyte is the immune cell target for 2'3'-dideoxyadenosine. Toxicol. Appl. Pharmacol. 105, 492–502.[ISI][Medline]

Clement, J. G. (1985). Hormonal consequences of organophosphate poisoning. Fundam. Appl. Toxicol. 5, S61–77.[ISI][Medline]

De Waal, E. J., Schuurman, H. J., van Loveren, H., and Vos, J. G. (1997). Differential effects of 2,3,7,8-tetrachlorodibenzo-p-dioxin, bis(tri-n-butyltin)oxide and cyclosporin on thymus histophysiology. Crit. Rev. Toxicol. 27, 381–430.[ISI][Medline]

Dunn, G. (1989). Design and Analysis of Reliability Studies: The Statistical Evaluation of Measurement Errors. Oxford University Press, New York.

Greaves, P. (1990). Haemopoietic and lymphatic systems. In Histopathology of Preclinical Toxicity Studies. (P. Greaves, Ed.), pp. 77–142. Elsevier, Amsterdam.

Hair, J. F., Anderson, R. E., Tatham, R. L., and Black, W. C. (1992). Multivariate Data Analysis, Macmillan, New York.

Harleman, J. H. (2000). Approaches to the identification and recording of findings in the lymphoreticular organs indicative for immunotoxicity in regulatory type toxicity studies. Toxicology 142, 213–219.[CrossRef][ISI][Medline]

ICICIS Group Investigators. (1998). Report of validation study of assessment of direct immunotoxicity in the rat. Toxicology 125, 183–201.[CrossRef][ISI][Medline]

IPCS Environmental Health Criteria 180 (1996). Principles and Methods for Assessing Direct Immunotoxicity Associated with Exposure to Chemicals, WHO, Geneva.

JPMA (1999). International Trends in Immunotoxicity Studies of Medical Products. JPMA Drug Evaluation Committee Fundamental Research Group. Data 92.

Karrow, N. A., McCay, J. A., Brown, R., Musgrove, D., Munson, A. E., and White, K. L., Jr. (2000a). Oxymetholone modulates cell-mediated immunity in male B6C3F1 mice. Drug Chem. Toxicol. 23, 621–644.[CrossRef][ISI][Medline]

Karrow, N. A., McCay, J. A., Brown, R. D., Musgrove, D. L., Pettit, D. A., Munson, A. E., Germolec, D. R., and White, K. L., Jr. (2000b). Thalidomide stimulates splenic IgM antibody response and cytotoxic T lymphocyte activity and alters leukocyte subpopulation numbers in female B6C3F1 mice. Toxicol. Appl. Pharmacol. 165, 237–244.[CrossRef][ISI][Medline]

Keil, D. R., Luebke, R. W., Ensley, M., Gerard, P. D., and Pruett, S. B. (1999). Evaluation of multivariate statistical methods for analysis and modeling of immunotoxicology data. Toxicol. Sci. 51, 245–258.[Abstract]

Keil, D., Luebke, R. W., and Pruett, S. B. (2001). Quantifying the relationship between multiple immunological parameters and host resistance: Probing the limits of reductionism. J. Immunol. 167, 4543–4552.[Abstract/Free Full Text]

Kunimatsu, T., Kamita, Y., Isobe, N., and Kawasaki, H. (1996). Immunotoxicological insignificance of fenitrothion in mice and rats. Fundam. Appl. Toxicol. 33, 246–253.[CrossRef][ISI][Medline]

Kuper, C. F., de Heer, E., van Loveren, H., and Vos, J. G. (2002). Immune system. In Handbook of Toxicologic Pathology, 2nd ed. (W. M. Haschek, C. G. Rousseaux,, and A. A. Wallig, Eds.), pp. 585–647. Academic Press, San Diego.

Kuper, C. F., Harleman, J. H., Richter-Reichelm, H. B., and Vos, J. G. (2000). Histopathologic approaches to detect changes indicative of immunotoxicity. Toxicol. Pathol. 28, 454–466.[ISI][Medline]

Littell, R. C., Milliken, G. A., Stroup, W. W., and Wolfinger, R. D. (1996). SAS System for Mixed Models. SAS Institute, Cary, NC.

Luster, M. I., Munson, A. E., Thomas, P. T., Holsapple, M. P., Fenters, J. D., White, K. L., Jr., Lauer, L. D., Germolec, D. R., Rosenthal, G. J., and Dean, J. H. (1988). Development of a testing battery to assess chemical-induced immunotoxicity: The National Toxicology Program's guidelines for immunotoxicity evaluation in mice. Fundam. Appl. Toxicol. 10, 2–19.[ISI][Medline]

Luster, M. I., Pait, D. G., Portier, C., Rosenthal, G. J., Germolec, D. R., Comment, C. E., Munson, A. E., White, K., and Pollock, P. (1992a). Qualitative and quantitative experimental models to aid in risk assessment for immunotoxicology. Toxicol. Lett. 64–65, 71–78.[CrossRef]

Luster, M. I., Portier, C., Pait, D. G., Rosenthal, G. J., Germolec, D. R., Corsini, E., Blaylock, B. L., Pollock, P., Kouchi, Y., Craig, W., White, K.L., Munson, A.E. and Comment, C.E. (1993). Risk assessment in immunotoxicology: II. Relationships between immune and host resistance tests. Fundam. Appl. Toxicol. 21, 71–82.[CrossRef][ISI][Medline]

Luster, M. I., Portier, C., Pait, D. G., White, K. L., Jr., Gennings, C., Munson, A. E., and Rosenthal, G. J. (1992b). Risk assessment in immunotoxicology:. I. Sensitivity and predictability of immune tests. Fundam. Appl. Toxicol. 18, 200–210.[ISI][Medline]

National Toxicology Program (NTP) (1988a). Immunotoxicity of 2,4-diaminotoluene (DAT). Final gavage report in female B6C3F1 mice. Report of NTP study number IMM87034.

National Toxicology Program (NTP) (1988b). The Immunotoxicity of Aldicarb Oxime in Female B6C3F1 Mice. Report of NTP study number IMM89025.

National Toxicology Program (NTP) (1989). Immunotoxicity of Ribavirin in Female C57Bl/6 Mice. Report of NTP Study number IMM90010.

OECD (1995). OECD Guideline for the Testing of Chemicals 407: Repeated Dose 28-Day Oral Toxicity Study in Rodents.

Phillips, K. E., McCay, J. A., Brown, R. D., Musgrove, D. L., Meade, B. J., Butterworth, L. F., Wilson, S. White, K. L., Jr., and Munson, A. E. (1997). Immunotoxicity of 2'3'-didoxyinosine in female B6C3F1 mice. Drug Chem. Toxicol. 20, 189–228.[ISI][Medline]

Pruett, S. B., Collier, S., Wu, W. J., and Fan, R. (1999). Quantitative relationships between the suppression of selected immunological parameters and the area under the corticosterone concentration-vs.-time curve in B6C3F1 mice subjected to exogenous corticosterone or to restraint stress. Toxicol. Sci. 49, 272–280.[Abstract]

Pruett, S. B., Ensley, D. K., and Crittenden, P. L. (1993). The role of chemical-induced stress responses in immunosuppression: A review of quantitative associations and cause-effect relationships between chemical-induced stress responses and immunosuppression. J. Toxicol. Environ. Health 39, 163 –192.[ISI][Medline]

Pruett, S. B., Fan, R., Myers, L. P., Wu, W. J., and Collier, S. (2000). Quantitative analysis of the neuroendocrine-immune axis: Linear modeling of the effects of exogenous corticosterone and restraint stress on lymphocyte subpopulations in the spleen and thymus in female B6C3F1 mice. Brain Behav. Immun. 14, 270–287.[CrossRef][ISI][Medline]

Richter-Reichhelm, H.-B., Dalmsbrook, C. A., Descores, G., Emmendorfer, A. C., Ernst, H. U., Harleman, J. H., Hildebrand, B., Kuttler, K., Ruhl-Fehlert, C. I., Schilling, U., et al. (1995). Validation of a modified 28-day rat study to evidence effects of test compounds on the immune system. Regul. Toxicol. Pharmacol. 22, 54–56.[CrossRef][ISI][Medline]

Schuurman, H. J., van Loveren, H., Rozing, J., and Vos, J. G. (1992). Chemicals trophic for the thymus: Risk for immunodeficiency and autoimmunity. Int. J. Immunopharmacol. 14, 369–375.[CrossRef][ISI][Medline]

Sikorski, E. E., McCay, J. A., White, K. L., Jr., Bradley, S. G., and Munson, A. E. (1989). Immunotoxicity of the semiconductor gallium arsenide in female B6C3F1 mice. Fundam. Appl. Toxicol. 13, 843–858.[ISI][Medline]

Travlos, G. S., Morris, R. W., Elwell, M. R., Duke, A, Rosenblum, S., and Thompson, M. B. (1996). Frequency and relationships of clinical chemistry and liver and kidney histopathology findings in 13-week toxicity studies in rats. Toxicology 107, 17–29.[CrossRef][ISI][Medline]

Uebersax, J. S. (1992). Modeling approaches for the analysis of observer agreement. Invest. Radiol. 27, 738–743.[ISI][Medline]

United States Environmental Protection Agency (EPA) (1998). Health Effects Test Guidelines. OPPTS 870.7800 Immunotoxicity.

U.S. Department of Health and Human Services, Food and Drug Administration (FDA) (2002). Guidance for Industry: Immunotoxicology Evaluation of Investigational New Drugs. Food and Drug Administration (FDA) Center for Drug Evaluation and Research (CDER).

van Loveren, H., Vos, J. G., and De Waal, E. J. (1996). Testing immunotoxicity of chemicals as a guide for testing approaches for pharmaceuticals. Drug Info. J. 30, 275–279.

Vos, J. G., and Krajnc, E. I. (1983). Immunotoxicity of pesticides. Dev. Toxicol. Environ. Sci. 11, 229–40.[Medline]

Vos, J. G., and van Loveren, H. (1987). Immunotoxicity testing in the rat. In Advances In Modern Environmental Toxicology (E. J. Burger, R. G. Tardiff, and J. A. Bellanti, Eds.), Vol. 13, pp. 167–180. Princeton Scientific Publishing, Princeton, NJ.

Wachsmuth (1983). Evaluating immunopathological effects of new drugs. In Immunotoxicology (G. G. Gibson, R. Hubbard, and D. V. Parge, Eds.), pp. 237–250. Academic Press, London.

Waters, M., Boorman, G., Bushel, P., Cunningham, M. Irwin, R., Merrick, A., Olden, K., Paules, R., Selkirk, J., Stasiewicz, S., Weis B., Van Houten, B., Walker, N., and Tennant, R. (2003). Systems toxicology and the chemical effects in biological systems (CEBS) knowledge base. EHP Toxicogenomics 111, 15–28.[Medline]