Skip to main content

Segmentation algorithm can be used for detecting hepatic fibrosis in SD rat



Liver fibrosis is an early stage of liver cirrhosis. As a reversible lesion before cirrhosis, liver failure, and liver cancer, it has been a target for drug discovery. Many antifibrotic candidates have shown promising results in experimental animal models; however, due to adverse clinical reactions, most antifibrotic agents are still preclinical. Therefore, rodent models have been used to examine the histopathological differences between the control and treatment groups to evaluate the efficacy of anti-fibrotic agents in non-clinical research. In addition, with improvements in digital image analysis incorporating artificial intelligence (AI), a few researchers have developed an automated quantification of fibrosis. However, the performance of multiple deep learning algorithms for the optimal quantification of hepatic fibrosis has not been evaluated. Here, we investigated three different localization algorithms, mask R-CNN, DeepLabV3+, and SSD, to detect hepatic fibrosis.


5750 images with 7503 annotations were trained using the three algorithms, and the model performance was evaluated in large-scale images and compared to the training images. The results showed that the precision values were comparable among the algorithms. However, there was a gap in the recall, leading to a difference in model accuracy. The mask R-CNN outperformed the recall value (0.93) and showed the closest prediction results to the annotation for detecting hepatic fibrosis among the algorithms. DeepLabV3+ also showed good performance; however, it had limitations in the misprediction of hepatic fibrosis as inflammatory cells and connective tissue. The trained SSD showed the lowest performance and was limited in predicting hepatic fibrosis compared to the other algorithms because of its low recall value (0.75).


We suggest it would be a more useful tool to apply segmentation algorithms in implementing AI algorithms to predict hepatic fibrosis in non-clinical studies.


Liver fibrosis is an abnormal repair reaction in chronic liver injury characterized by the excessive production and accumulation of extracellular matrix (ECM) in the liver. It is caused by chronic hepatitis B (CHB), chronic hepatitis C (CHC), alcoholic fatty liver disease (AFLD), and other causes [1,2,3,4]. Liver fibrosis begins with pro-inflammatory reactions; liver tissues’ standard structure and physiological function are gradually destroyed. This causes the production of scar tissue that replaces the liver parenchyma and further progress into more severe consequences, such as liver cirrhosis, liver failure, or liver cancer, eventually leading to patient death of patients [5]. However, liver fibrosis is reversible in the early stages of cirrhosis; therefore, it is a top priority in treating liver conditions. This treatment aims to reduce or reverse hepatic fibrosis by reducing inflammation, protecting the liver, preventing the proliferation and activation of hepatic stellate cells (HSCs), and restraining ECM production and deposition [6, 7]. Many antifibrotic candidate drugs have shown reliable results in experimental animal models; however, they have shown limited effects in the clinical phase owing to the complicated pathological mechanisms of liver fibrosis. Adverse reactions induced by large doses are one of the leading causes of failure. Most drugs targeting liver fibrosis caused by various factors in chronic liver diseases are still in the preclinical stage of development [8].

Rodent models have usually been used to evaluate the efficacy of anti-fibrotic therapeutics in non-clinical research in which histopathological differences between the control and treatment groups are examined. The accurate quantification of liver fibrosis is pivotal for assessing the efficacy of novel anti-fibrotic candidates. Conventionally, semi-quantitative histological evaluation has been the method of choice for liver fibrosis assessment [9, 10] and is still regarded as the gold standard. In the last two decades, there have been significant progress in digital image analysis (DIA) for analyzing biopsy specimens. Researchers have focused on developing automated methods to quantify fibrosis by determining the ratio of fibrosis areas to the total area of liver tissue examined. This is done using a measurement called the proportionate collagen area (CPA), which calculates the extent of fibrosis in relation to the entire liver tissue area analyzed. [11,12,13,14]. Furthermore, recent studies have started to adopt deep learning methods to score hepatic fibrosis in rodent models [15,16,17]. These methods have shown reliable correlations with pathologist scoring systems, even at the whole-slide image (WSI) level [17]. However, it is crucial to evaluate the effectiveness of different artificial intelligence (AI) algorithms and determine the most appropriate AI before implementing a specific algorithm for pathological use.

As previous studies have shown, localizing and separating the lesion of interest on the slide is essential to quantify and visualize abnormalities [17, 18]. However, those studies used only one algorithm, not the other detection type of algorithm. Therefore, in this study, we examined the performance of three different localization algorithms for detecting hepatic fibrosis: SSD [19], an object-detection algorithm, and two segmentation algorithms: Mask R-CNN [20] and DeepLabV3+ [21]. We considered Mask R-CNN and DeepLabV3+ because of the morphology of fibrosis, atypical and polygonal shape; thus, we assumed that the segmentation algorithm would be more efficient in recognizing the lesion. In contrast, SSD was selected because of its fast speed in detecting an objective in an image in real-time. Pathologists can diagnose a slide quickly by using a microscope to determine whether the slide has fibrotic lesions. Therefore, if the SSD shows an accuracy comparable to that of the segmentation algorithms, it could be more valuable than the others. In this study, to investigate the proper deep learning algorithm for detecting liver fibrosis, we evaluated the performance of each model using precision and recall based on predicting fibrosis on large-scale images rather than trained images.


Algorithm training

All the losses during training were calculated and recorded as total losses. Although the loss components calculated during training differed according to the algorithm, the loss values stabilized steeply during the early phase of learning (Fig. 1). The loss values observed in this study show that algorithm learning was successfully performed using the training dataset. After the model training, each algorithm’s mean intersection of union (mIoU) was calculated for the test dataset. Consequently, the mIoU value of the two segmentation algorithms was 0.76, comparable to the ground-truth annotations, and that of the SSD was 0.82.

Fig. 1
figure 1

Total loss according to algorithm models observed in every epoch during the training

Model accuracy

According to the trained weight (Fig. 1), the results showed that the two segmentation algorithms (Mask R-CNN and DeepLabV3+) predicted hepatic fibrosis closer to the ground truth label than the object detection algorithm SSD; in particular, the trained Mask R-CNN algorithm showed the closest prediction to the ground truth annotation compared with other algorithms (Fig. 2).

We also calculated the precision, recall, F1 score, and accuracy based on the ground-truth labels to mathematically evaluate the model’s performance.

Fig. 2
figure 2

Prediction result of each trained algorithm on 2688*2688 pixels of images. Yellow arrows point to the region of ground truth labels. Mask R-CNN detected hepatic fibrosis the most similar to the ground truth labels

The results showed that the performances of the two segmentation models were better than that of the SSD object detection model. Similar values were obtained to evaluate the prediction accuracy of each algorithm. However, the algorithms had differences in recall values (Table 1). The Mask R-CNN showed the highest values for all parameters related to the model performance for detecting hepatic fibrosis and the highest recall value. The performance of the SSD model was lower than that of the segmentation algorithms showing the lowest recall value when compared with the two segmentation algorithms. This indicates that SSD is limited in detecting hepatic fibrosis close to the ground-truth annotation.

Table 1 Precision, recall, F1 score, and accuracy were calculated from a large-scale image prediction test


To investigate the proper deep learning algorithm for detecting hepatic fibrosis in a non-clinical study, we investigated three different localization algorithms for detecting hepatic fibrosis: SSD, mask R-CNN, and DeepLabV3+. A total of 5750 images with 7503 annotations were trained using the three algorithms, and the total loss in each model was observed during training. Loss occurs because of the estimation error of the model when a learned model is applied to real data; therefore, the smaller the loss, the better the model. The two segmentation algorithms, Mask R-CNN and DeepLabV3+, showed smaller loss value ranges than the object detection algorithm SSD; thus, the training of the two former algorithms was more successful than that of SSD. After training, the mean interaction of union (mIoU) of the test dataset was calculated by comparing the annotations to the prediction results that refer to the trained weight of each algorithm. There was a difference in the calculation method for the intersection of union (IoU) between the two segmentation algorithms and SSD. A segmentation algorithm compares the annotation to the prediction results based on the area, whereas the SSD uses the prediction rate for the number of labels. Therefore, the IoU values of the segmentation algorithms can vary according to the IoU between the prediction by the trained model and the ground truth label. In the case of SSD, the IoU was defined by three values (1, 0.5, and 0.33) according to the prediction rates of 100%, 50%, and 33% of the predicted hepatic fibrosis, respectively, compared with the number of ground truth labels. Therefore, it tends to be overestimated compared with the IoU values from the segmentation algorithms, which are calculated based on the area and region between the ground truth and the prediction. Thus, the mIoU calculated from the three algorithms had a limitation in comparing their performances.

To overcome this limitation and confirm its performance on a large-scale image, we evaluated the precision, recall, F1 score, and accuracy of the predictions of 2688 × 2688 pixel images for each trained algorithm. The trained Mask R-CNN outperformed the other algorithms in predicting hepatic fibrosis, although it mispredicted inflammatory cells and connective tissue as hepatic fibrosis (Fig. 2b). The trained DeepLabV3+ tended to detect inflammatory cells and connective tissue in hepatic fibrosis better than the trained Mask R-CNN (Fig. 2c). The trained SSD showed the lowest performance in detecting hepatic fibrosis among the two segmentation algorithms. Blood vessels were not excluded from the images (Fig. 2d).

The parameters related to the model accuracy also proved the segmentation algorithms’ high performance compared to the objective detection algorithm. A high recall value indicated that detecting hepatic fibrosis by the trained algorithm was closest to the ground truth. The trained Mask R-CNN exhibited good performance on the test images. This tendency is reflected well in the prediction results in Fig. 2, where the Mask R-CNN identified the inflammatory lesions and connective tissue from hepatic fibrosis better than the other segmentation algorithm, DeepLabV3+. Therefore, the trained Mask R-CNN results for detecting hepatic fibrosis in the test image were the closest to the ground truth label and showed the highest accuracy compared to any other model. The other segmentation algorithm, DeepLabV3+, used in this study, DeepLabV3+, performed comparably to the Mask R-CNN. However, it has a recall limitation owing to the frequent misprediction of inflammatory cells and connective tissue in hepatic fibrosis.

In contrast, the trained SSD, an object detection model, showed the lowest values related to model accuracy compared with the segmentation algorithms, especially regarding the recall and the ability to predict hepatic fibrosis compared with ground truth annotations. This result is presented in Fig. 2d as empty detection results with yellow arrows, indicating that the trained SSD did not predict hepatic fibrosis as well as the other algorithms. Indeed, the bounding-box-based detection algorithm might be suitable for detecting an object that can be filled in the bounding box, such as an automobile, but not for atypical and long-shaped objects, such as hepatic fibrosis. Therefore, the SSD may not be a suitable algorithm for detecting hepatic fibrosis.

A previous study by Ramot et al. [16] demonstrated automated quantification of liver fibrosis in mice using a segmentation algorithm, U-net, with two magnifications (10× and 40×) of picrosirius red-stained slide images. The F1 score of the study (0.8775) was similar to the value observed in this study (0.87 for Mask R-CNN), although the staining method and trained algorithm were different [16]. This result supports our previous study [17], which showed the possibility of applying a Mask R-CNN to quantify hepatic fibrosis at the WSI level. In addition, the results from this study showed again that the implementation of Mask R-CNN could successfully quantify hepatic fibrosis using H&E staining, a general staining method for tissue analysis, instead of specific staining, such as Sirius red or Masson’s trichrome staining.


In this study, the Mask R-CNN outperformed the others in detecting hepatic fibrosis, especially regarding the recall value; therefore, it showed the closest prediction results among the algorithms. The other segmentation algorithm, DeepLabV3+, showed comparable accuracy to the Mask R-CNN; however, it showed a lower prediction rate for detecting hepatic fibrosis than the Mask R-CNN. SSD showed the lowest accuracy and ability to predict hepatic fibrosis compared with the segmentation algorithms. Therefore, we suggest that segmentation algorithms can help to implement artificial intelligence algorithms to predict toxicological lesions in non-clinical studies.


Animal experiments

N-nitrosodimethylamine (NDMA) was administered to the test animals via a four-week repeated intraperitoneal injection to induce hepatic fibrosis in Sprague-Dawley (SD) rats. Details of the animal experiments have been described previously [17]. Briefly, 1 mL/kg NDMA (10 mg/10 mL) was administered to 6-to 7-week-old SD rats via intraperitoneal (IP) injection three times a week for four weeks (total of 12 times). After chemical administration, the test animals were euthanized using isoflurane, and their livers were collected in 10% formaldehyde. After tissue collection, hematoxylin and eosin (H&E) staining was performed using paraffin-embedded left lateral and median lobes of the liver, and the sections were used for digital archiving.

Data preparation

Data preparation for training on hepatic fibrosis was conducted as described in previous studies [17, 18]. Briefly, 10× magnified whole-slide images (WSIs) of liver sections were cropped into 448 × 448 pixel tile images, and all lesions were labelled. We annotated all fibrotic lesions in the tile images using VGG Image Annotator (Visual Geometry Group, Oxford University, UK), and an accredited toxicological pathologist confirmed the annotations before algorithm training was initiated. The annotation information was saved in a JSON file. A total of 500 image tiles were obtained from 12 WSIs. The lesions identified in these images were labeled, and 663 annotations were obtained. The training test split function embedded in the scikit-learn package was used to split the annotated image tiles into training, validation, and test datasets at a ratio of 7:2:1. Data augmentation was conducted to improve the training dataset. It was performed 16 times using image-augmenting techniques (reverse, rotation, and brightness). A total of 5600, 100, and 50 images were used for training, validation, and testing, respectively, and the number of annotations was 7296, 140, and 67, respectively.

Training of hepatic fibrosis and metrics for model performance

Model training

TensorFlow 2.1.0, Keras 2.4.3 backend, and PyTorch were used for conducting algorithm training. We applied three open-source packages (Mask R-CNN: torchvision [22], DeepLabV3+: jfzhang95 pytorch-deeplab-xception package [23], SSD: amdegroot ssd.pytorch package [24]) to train the hepatic fibrosis, and all the requirements for the packages were met in this study. Algorithm calculation during the training was powered by an NVIDIA RTX 3090 24G GPU. The hyperparameters were set differently for each algorithm because of the varying hyperparameter requirements according to the algorithm, and Mask R-CNN used the same settings as in previous studies [17, 18]. Details are presented in Table 2. Every loss calculated using the algorithm during training was recorded and saved.

Table 2 Hyperparameters used in Mask R-CNN, DeepLabV3+, and SSD

Metrics for model performance

To evaluate the performance of each trained model, we compared the precision, recall, F1 score, and accuracy calculated from the prediction of hepatic fibrosis using 60 large-scale images (2688 × 2688 pixels). First, the ground truths of the test images were annotated using the same procedure used to prepare the training data to calculate these values. Then, the values were defined as the ratio of true positives, false positives, and false negatives according to the presence or absence of lesion detection in the 448 × 448 pixels of tiles derived from 2688 × 2688 images compared to the ground truth labels. A schematic diagram for calculating the precision, recall, F1 score, and accuracy in larger-scale test images is depicted in Fig. 3. The precision, recall, and accuracy are defined as follows (a–d):

  1. (a)


  2. (b)


  3. (c)


  4. (d)
    $$\text{F1 score}=\frac{2*Precision*Recall}{Precision+Recall}$$


Fig. 3
figure 3

The process to calculate the parameters regarding the examination of model performance in large-scale images. True and false are determined by comparing the ground truth annotation to the prediction results according to the weight of each model at the level of 448 × 448 pixels of tiles

Availability of data and materials

The datasets generated and(or) analyzed during the current study are not publicly available because they are currently under copyright registration but are available from the corresponding author upon reasonable request.



Alcoholic fatty liver disease


Chronic hepatitis B


Chronic hepatitis C


Collagen proportionate area


Digital image analysis


Extracellular matrix


Hematoxylin and eosin


Hepatic stellate cells


Intersection of union


Mean intersection of union




Whole slide level


  1. Poynard T, Bedossa P, Opolon P. Natural history of liver fibrosis progression in patients with chronic hepatitis C. The OBSVIRC, METAVIR, CLINIVIR, and DOSVIRC groups. Lancet. 1997;349(9055):825–32.

    Article  CAS  PubMed  Google Scholar 

  2. Benhamou Y, Bochet M, Di Martino V, Charlotte F, Azria F, Coutellier A, et al. Liver fibrosis progression in human immunodeficiency virus and hepatitis C virus coinfected patients. Multivirc Group Hepatol. 1999;30(4):1054–8.

    CAS  Google Scholar 

  3. Pinzani M, Macias-Barragan J. Update on the pathophysiology of liver fibrosis. Expert Rev Gastroenterol Hepatol. 2010;4(4):459–72.

    Article  PubMed  Google Scholar 

  4. Povero D, Busletta C, Novo E, di Bonzo LV, Cannito S, Paternostro C, et al. Liver fibrosis: a dynamic and potentially reversible process. Histol Histopathol. 2010;25(8):1075–91.

    PubMed  Google Scholar 

  5. Zoubek ME, Trautwein C, Strnad P. Reversal of liver fibrosis from fiction to reality. Best Pract Res Clin Gastroenterol. 2017;31:129–41.

    Article  PubMed  Google Scholar 

  6. Campana L, Iredale JP. Regression of liver fibrosis. Semin Liver Dis. 2017;37(1):1–10.

    Article  PubMed  Google Scholar 

  7. Roehlen N, Crouchet E, Baumert TF. Liver fibrosis: mechanistic concepts and therapeutic perspectives. Cells. 2020;3(4):875.

    Article  Google Scholar 

  8. Tan Z, Sun H, Xue T, Gan C, Liu H, Xie Y, et al. Liver fibrosis: therapeutic targets and advances in Drug Therapy. Front Cell Dev Biol. 2021;9:730176.

    Article  PubMed  PubMed Central  Google Scholar 

  9. Brunt EM, Janney CG, Di Bisceglie AM, Neuschwander-Tetri BA, Bacon BR. Nonalcoholic steatohepatitis: a proposal for grading and staging the histological lesions. Am J Gastroenterol. 1999;94(9):2467–74.

    Article  CAS  PubMed  Google Scholar 

  10. Ishak K, Baptista A, Bianchi L, Callea F, De Groote J, Gudat F, et al. Histological grading and staging of chronic hepatitis. J Hepatol. 1995;22(6):696–9.

    Article  CAS  PubMed  Google Scholar 

  11. Gawrieh S, Sethunath D, Cummings OW, Kleiner DE, Vuppalanchi R, Chalasani N, et al. Automated quantification and architectural pattern detection of hepatic fibrosis in NAFLD. Ann Diagn Pathol. 2020;47:151518.

    Article  PubMed  PubMed Central  Google Scholar 

  12. Kleiner DE, Brunt EM, Van Natta M, Behling C, Contos MJ, Cummings OW, et al. Nonalcoholic Steatohepatitis Clinical Research Network. Design and validation of a histological scoring system for nonalcoholic fatty liver disease. Hepatology. 2005;41(6):1313–21.

    Article  PubMed  Google Scholar 

  13. Caballero T, Pérez-Milena A, Masseroli M, O’Valle F, Salmerón FJ, Del Moral RM, et al. Liver fibrosis assessment with semi-quantitative indexes and image analysis quantification in sustained-responder and non-responder interferon-treated patients with chronic hepatitis C. J Hepatol. 2001;34(5):740–7.

    Article  CAS  PubMed  Google Scholar 

  14. Calvaruso V, Burroughs AK, Standish R, Manousou P, Grillo F, Leandro G, et al. Computer-assisted image analysis of liver collagen: relationship to Ishak scoring and hepatic venous pressure gradient. Hepatology. 2009;49(4):1236–44.

    Article  PubMed  Google Scholar 

  15. Heinemann F, Birk G, Stierstorfer B. Deep learning enables pathologist-like scoring of NASH models. Sci Rep. 2019;9(1):18454.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Ramot Y, Deshpande A, Morello V, Michieli P, Shlomov T, Nyska A. Microscope-based automated quantification of liver fibrosis in mice using a deep learning algorithm. Toxicol Pathol. 2021;49(5):1126–33.

    Article  PubMed  Google Scholar 

  17. Hwang JH, Kim HJ, Park H, Lee BS, Son HY, Kim YB, et al. Implementation and practice of Deep Learning-Based Instance Segmentation Algorithm for quantification of hepatic fibrosis at whole Slide Level in Sprague-Dawley rats. Toxicol Pathol. 2022;50:186–96.

    Article  CAS  PubMed  Google Scholar 

  18. Baek EB, Hwang JH, Park H, Lee BS, Son HY, Kim YB, et al. Artificial intelligence-assisted image analysis of acetaminophen-induced acute hepatic injury in Sprague-Dawley rats. Diagnostics (Basel). 2022;12(6):1478.

    Article  CAS  PubMed  Google Scholar 

  19. Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC. Ssd: single shot multibox detector. In: European conference on computer vision, Amsterdam, The Netherlands. 2016.

  20. He K, Gkioxari G, Dollar P, Girshick R, Mask R-CNN. In: 2017 IEEE international conference on computer vision (ICCV), Venice, Italy. 2017;2980–8.

  21. Chen LC, Lin TY, Goyal P, Girshick R, He K, Doll P. Encoder–Decoder with atrous separable convolution for semantic image. In: European conference on computer vision (ECCV), Munich, Germany. 2018;801–18.

  22. TorchVision maintainers and contributors. TorchVision: PyTorch’s Computer Vision library. GitHub repository. 2016. Accessed 14 Aug 2022.

  23. Zhang J. pytorch-deeplab-xception package. GitHub repository. 2019. Accessed 14 Aug 2022.

  24. DeGroot M. Amdegroot SSD.pytorch package. GitHub repository. 2019. Accessed 14 Aug 2022.

Download references


We thank Ga-Hyun Kim and Ji-soo Yang for annotating hepatic fibrosis in all image data.


The authors disclose the receipt of the following financial support for the research, authorship, and(or) publication of this article: This research was supported by a Grant (20183MFDS411) from the Ministry of Food and Drug Safety in 2022.

Author information

Authors and Affiliations



Conceptualization, JWC; methodology, JWC and JHH; software, SYJ and J-SP; Animal study, HP and B-SL. Data preparation: MYL and K-JH; validation and analysis, J-HH and JL; writing, original draft preparation, J-HH; writing, review and editing, J-HH and J-WC; supervision, Y-BK, JL, and JWC. All authors have read and agreed to the published version of the manuscript.

Corresponding author

Correspondence to Jae-Woo Cho.

Ethics declarations

Ethics approval and consent to participate

All animal experiments were approved by the Assessment and Accreditation of Laboratory Animal Care International (AAALAC) and the Institutional Animal Care and Use Committee (IACUC). Approval ID: IAC-21-01-0248. Approval date: May 26, 2021.

Competing interests

The authors declare that they have no conflict of interest.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hwang, JH., Lim, M., Han, G. et al. Segmentation algorithm can be used for detecting hepatic fibrosis in SD rat. Lab Anim Res 39, 16 (2023).

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: