A key challenge to using diet-induced NASH mice for preclinical research involves selection and quantification of disease endpoints. In humans, underlying metabolic syndrome is a typical precursor for the later onset of NASH. For both the underlying and liver stages of disease progression, some of the hallmarks of the human condition are recapitulated in the mouse, while others are not. Even among conserved symptoms, some manifest themselves differently in the mouse, despite similar physiological underpinnings. Deciding which endpoints to measure and how to interrogate them requires careful consideration. New trends in sampling techniques will also be presented.
Endpoint selection aims
When selecting disease endpoints to measure in the mouse, one should consider:- Translational relevance to the human condition, especially with regards to markers measured during clinical trialing.
- Economic and practical limitations of the researcher's budget, skillset, and supporting analytical infrastructure.
- Enabling noninvasive, nonterminal sampling, so disease progression/resolution can be measured serially and longitudinally, and terminal values can be compared to baseline for individuals rather than between groups.
Standards for the Evaluation of NASH
Liver biopsy is the gold standard for baseline and post-treatment evaluation of human patients in Phase 2/3 NASH trials. The FDA's draft guidance on NASH trial design recommends the following efficacy endpoints:- Resolution of steatohepatitis on overall histopathological reading and no worsening of liver fibrosis, or
- Improvement in liver fibrosis greater than or equal to one stage and no worsening of steatohepatitis, or
- Both resolution of steatohepatitis and improvement in fibrosis.
Liver biopsy is necessarily an invasive surgical procedure, with high challenge, cost, and risk burdens, and scoring is inherently subjective. To generate a more robust data set, patient serum is also obtained and used to measure surrogate markers of liver injury and impairment. Additional analytical chemistry interrogates metabolites that are salient to NASH.
Two Phase 3 trial drug candidates have recently completed their interim (18 month) analyses. Intercept Pharmaceuticals' Ocaliva (REGERATE) generated positive data3 and is continuing with the long-term safety arm of the trial. Genfit's elafibranor (RESOLVE IT) did not achieve key objectives and was discontinued4. Regardless of outcome, the trial endpoints and patient enrollment strategies provide a useful context to discuss NASH study design for the mouse.
Direct liver interrogation of NASH
Human endpoints: Ocaliva improved biopsy-confirmed liver fibrosis by ≥1 stage in a greater number of patients than placebo. Ocaliva did not worsen patient NASH composite score. Elafibranor did not achieve significant improvement in either metric.Human endpoints explained: Liver biopsies were obtained at baseline screening and at 18 months. A NAFLD activity score5 (NAS) and a fibrosis staging score were determined by pathologist evaluations. The NAS is a composite score ranging from 0 to 8, comprising: steatosis grade (0-3) + lobular inflammation degree (0-3) + hepatocellular ballooning grade (0-2). Fibrosis staging ranges from 0 to 4; at stage 3, "bridging" fibrosis is observed to connect lobules and portal areas.
Mouse endpoint equivalents: Survival biopsy of the mouse liver is possible6. This requires advanced technical expertise, and risks animal infection or death. If survival biopsies are not feasible, a cohort of mice can be designated for terminal biopsy collection to serve as a representative baseline group. In this fashion, experimental groups could not be sorted based on biopsy results, so would not generate longitudinal histological data; furthermore, sub-responders could not be sorted out.
A mouse NAS (0-8) has been adapted from the human scoring system. The same parameters are interrogated. A key limitation is that hepatocyte ballooning rarely exceeds a score of 1. Fibrosis staging can also be evaluated, but stage 3 (bridging) typically takes extremely long induction periods to achieve using purely dietary means, thus may not be a practical endpoint to aim for.
Recent trends: To overcome the qualitative and subjective nature of a pathologist's review, quantitative data can be derived from stained and immunostained liver tissue. This is also useful for describing NASH fibrosis in shorter timeframes. It is increasingly common for histology providers to offer high content imaging and algorithm analysis of endpoints including but not limited to:
- % area fibrosis (collagen deposition using PicroSirius Red or Masson's Trichrome stain)
- % area steatosis (lipid droplets using hematoxylin & eosin stain); this can also be delineated into micro- and macro-vesicular steatosis levels
- Inflammation (% area using galectin 3 immunostain, immune cell density using H&E stain)
- Hepatic stellate cell activation (cell density using α smooth muscle actin stain)
NASH surrogate and biomarker interrogation of liver diseases
Human endpoints: Ocaliva robustly decreased levels of the circulating biomarkers alanine and aspartate aminotransferase (ALT, AST) and γ glutamyl transferase (GGT). In a Phase 2b trial (GOLDEN 505)8, elafibranor had moderate effect on ALT but robustly improved GGT and alkaline phosphatase (ALP).Human endpoints explained: Liver function tests quantify hepatic enzymes that are released into circulation more abundantly in cases of liver injury or dysfunction. ALT and AST are gold standard biomarkers for hepatocellular diseases including hepatitis, toxicity, and cirrhosis; GGT and ALP describe cholestasis and oxidative stress. Sampling for these is noninvasive, can be done with great frequency, and routinely serves as a surrogate for liver biopsy in Phase 2 trials.
Mouse endpoint equivalents: All four biomarkers are routinely used to monitor NASH and other liver diseases in mice. ALT is extensively used for sorting animals by disease severity at baseline. If mouse blood is not required for additional analyses, these biomarkers might be sampled as frequently as every two weeks, although a four-week sampling interval is more common.
An animal's fasted/fed state and the route of blood sampling may influence biomarker levels. Downstream analytical methodologies may stipulate the use of specific anticoagulants that can also impact values. Best practices should include standardization of sample collection time of day, route, and collection vessels used throughout a study.
Recent trends: ALT, AST, ALP, and GGT are biomarkers for liver dysfunction, but not NASH per se. In the clinic, the lack of widely accepted noninvasive tests specifically developed to quantify NASH is a barrier for patient diagnosis, but numerous candidate assays are emerging in popularity. These include NIS4, a multianalyte assay for quantifying microRNA miR-34a (steatosis/inflammation marker), alpha 2 macroglobulin and glycoprotein YKL-40 (fibrosis markers) and HbA1c (metabolic marker)9. These biomarkers merit validation in different rodent models of NASH as noninvasive tools to quantify disease progression.
Serum chemistry and metabolic comorbidities
Human endpoints: Ocaliva trial recruits were required to present with at least one comorbidity of NASH, including obesity or type 2 diabetes. Elafibranor recruits were evaluated at baseline and post-treatment for blood glucose, serum triglycerides, and insulin resistance.Human endpoints explained: NASH is commonly preceded by metabolic syndrome, defined as the presence of at least three of five of the following cluster factors: abdominal obesity, high serum triglycerides, low HDL cholesterol, high blood pressure, and hyperglycemia/insulin resistance. NASH drugs may work along a mechanistic axis that acts systemically on fat or glucose metabolism, and thus have beneficial secondary effects on metabolic syndrome.
Mouse endpoint equivalents: Routine measurements of body weight (weekly or daily) is best practice for most mouse studies, and especially practical for high fat NASH diet models. Body weight is convenient for sorting animals at baseline, and to weed out sub-responders. If infrastructure permits, the relative proportions of fat to lean mass can be measured using dual energy X ray absorptiometry. For improved resolution, depot specific fat masses can be quantified using magnetic resonance imaging. Hepatomegaly (liver as % of body weight) is also an easily scored terminal endpoint.
For many mouse dietary NASH models, the facets of hypertriglyceridemia and hyperglycemia may be absent or only weakly recapitulated10,11. The former is not crucial, as fat accumulation within the liver is a more pressing focus and a proven model prerequisite. Regarding the latter: while fasted blood glucose may appear normal in diet-induced NASH mice, a glucose incursion test may confirm these animals are in fact insulin resistant. Glucose clearance rates and HOMA IR are practical efficacy endpoints that may also be used for sorting purposes in addition to disease monitoring.
Sampling strategy summary
The table below summarizes various potential endpoints for preclinical NASH studies.Technique | Invasive? | Useful for sorting? | Utility for longitudinal sampling? | Popularity | Comments |
Body weight | No | Yes; routine | Routine, weekly | High | Routine best practice for rodents studies; no cost |
Serum ALT | No | Yes; routine | Routine, every 4 weeks | High | Most commonly used surrogate for direct liver interrogation; affordable |
Serum AST, ALP, GGT | No | Yes | Common, every 4 weeks | Mid | AST is more commonly analyzed than GGT, ALP; AST is usually not affected to same degree as ALT |
Serum glucose | No | May be useful to disqualify outliers | Poor (not reliable) | Mid | Many diet-based models do not induce appreciable hyperglycemia; easy and affordable |
Serum triglycerides | No | Not reliable | Poor (not reliable) | Mid | Many diet-based models do not elevate, or may actually decrease serum triglycerides |
Glucose tolerance test/ HOMA-IR | No | Yes | Fair, every 4 weeks | Mid | Time consuming; requires technical expertise; measuring insulin for HOMA-IR adds cost |
% body fat composition (DXA) | No | Yes | Good, every 4 weeks | Low | Requires specialized instrumentation and training |
Body composition (MRI) | No | Yes | Good, every 4 weeks | Low | Requires specialized instrumentation and training |
Hepatomegaly (% liver weight) | Yes; terminal | No; may be useful as a baseline endpoint for non-longitudinal sampling strategy | No | High | Routinely calculated when livers are harvested for other analyses |
Survival liver biopsy | Yes | Yes | No, repeat sampling is strongly discouraged | Mid | Most direct way to interrogate liver for sorting animals and obtaining longitudinal baseline data; requires specialized training; can provide numerous post-hoc histology readouts |
Terminal liver biopsy | Yes; terminal | No; may be useful as a baseline endpoint for non-longitudinal sampling strategy | No | High | Most direct way to interrogate liver; requires larger animal cohorts so study will be adequately powered, as subresponders will be sampled; can provide numerous post-hoc histology readouts |
Liver stiffness (SWE) | No | Low | Good, every 4-8 weeks | Low | Relatively new technique that does not have track record to support widespread use |