Osamu KOMORI

(小森理)

Profile Information

Affiliation: Professor, Faculty of Science and Technology Department of Science and Technology , Seikei University
The Institute of Statistical Mathematics

Degree: 博士(統計科学)

J-GLOBAL ID: 201301069266445882
researchmap Member ID: B000232905

External link: http://www.ci.seikei.ac.jp/komori/

Research Interests

Research Areas

Informatics / Statistical science / データサイエンス

Research History

Apr, 2018 - Present

Seikei University
Oct, 2015 - Mar, 2018

University of Fukui
Apr, 2010 - Sep, 2015

The Institute of Statistical Mathematics

Education

Apr, 2007 - Mar, 2010

The Graduate University for Advanced Studies
Apr, 2005 - Mar, 2007

慶應義塾大学大学院基礎理工学専攻数理科学専修
Apr, 2001 - Mar, 2005

Department of Mathematics, Faculty of Science and Technology, Keio University

Committee Memberships

Oct, 2017 - Present

Japanese Journal of Statistics and Data Science, Associate Editor, 日本統計学会
Apr, 2015 - Present

日本計量生物学会誌編集委員, 日本計量生物学会

Awards

Mar, 2016

奨励賞, 日本計量生物学会

小森理

Papers

Prediction of Occurrence of Cerebral Infarction After Successful Mechanical Thrombectomy for Ischemic Stroke in the Anterior Circulation by Arterial Spin Labeling

Masamune Kidoguchi, Ayumi Akazawa, Osamu Komori, Makoto Isozaki, Yoshifumi Higashino, Satoshi Kawajiri, Shinsuke Yamada, Toshiaki Kodera, Hidetaka Arishima, Tetsuya Tsujikawa, Hirohiko Kimura, Kenichiro Kikuta

Clinical Neuroradiology, Jun 6, 2023

Abstract Purpose The overall goal of our study is to create modified Alberta Stroke Program Early Computed Tomography Score (ASPECTS) determined by the findings on arterial spin labeling imaging (ASL) to predict the prognosis of patients with acute ischemic stroke after successful mechanical thrombectomy (MT). Prior to that, we examined predictive factors including the value of cerebral blood flow (CBF) measured by ASL for occurrence of cerebral infarction at the region of interest (ROI) used in the ASPECTS after successful MT. Methods Of the 92 consecutive patients with acute ischemic stroke treated with MT at our institution between April 2013 and April 2021, a total of 26 patients who arrived within 8 h after stroke onset and underwent MT resulting in a thrombolysis in cerebral infarction score of 2B or 3 were analyzed. Magnetic resonance imaging, including diffusion-weighted imaging (DWI) and ASL, was performed on arrival and the day after MT. The asymmetry index (AI) of CBF by ASL (ASL-CBF) before MT was calculated for 11 regions of interest using the DWI-Alberta Stroke Program Early CT Score. Results Occurrence of infarction after successful MT for ischemic stroke in the anterior circulation can be expected when the formula 0.3211 × history of atrial fibrillation +0.0096 × the AI of ASL-CBF before MT (%) +0.0012 × the time from onset to reperfusion (min) yields a value below 1.0 or when the AI of ASL-CBF before MT is below 61.5%. Conclusion The AI of ASL-CBF before MT or a combination of a history of atrial fibrillation, the AI of ASL-CBF before MT, and the time from onset to reperfusion can be used to predict the occurrence of infarction in patients arriving within 8 h after stroke onset in which reperfusion with MT was successful.
Statistical learning for species distribution models in ecological studies

Osamu Komori, Yusuke Saigusa, Shinto Eguchi

Japanese Journal of Statistics and Data Science, 6(2) 803-826, May 18, 2023
Regularization Learning of Trace Element Contamination Stemmed from Tailings Dam-Break

Bulent Tutmez, Osamu Komori

Pollution, 9(3) 1082-1097, May, 2023 Peer-reviewed
Visual Evoked Potential Can Predict Deterioration of Visual Function After Direct Clipping of Paraclinoid Aneurysm With Anterior Clinoidectomy

Satoshi Kawajiri, Makoto Isozaki, Osamu Komori, Shinsuke Yamada, Yorhifumi Higashino, Takahiro Yamauchi, Ayumi Akazawa, Masamune Kidoguchi, Munetaka Yomo, Toshiaki Kodera, Hidetaka Arishma, Kousuke Awara, Masaru Inatani, Kenichiro Kikuta

Neurosurgery, 92(6) 1276-1286, Feb 10, 2023

BACKGROUND: The role of visual evoked potential (VEP) in direct clipping of the paraclinoid internal carotid artery (ICA) aneurysm remains uncertain. OBJECTIVE: To examine whether intraoperative neuromonitoring with VEP can predict deterioration of visual function after direct clipping of the paraclinoid ICA aneurysm with anterior clinoidectomy. METHODS: Among consecutive 274 patients with unruptured cerebral aneurysm, we enrolled 25 patients with paraclinoid ICA aneurysm treated by direct clipping after anterior clinoidectomy with intraoperative neuromonitoring with VEP in this study. We evaluated the visual acuity loss (VAL) and visual field loss (VFL) before surgery, 1 month after surgery, and at the final follow-up. RESULTS: The VAL at 1 month after surgery (VAL1M) and VAL at the final follow-up (Final VAL) were significantly related to the reduction rate of VEP amplitude at the end of surgery (RedEnd%), more than 76.5%, and the maximal reduction rate of VEP amplitude during surgery (MaxRed%), more than 66.7% to 70%. The VFL at 1 month after surgery (VFL1M) and the VFL at the final follow-up (Final VFL) were significantly related to MaxRed% more than 60.7%. CONCLUSION: VAL1M, Final VAL, VFL1M, and Final VFL could be significantly predicted by the value of RedEnd% and MaxRed% in direct clipping of Al-Rodhan group Ia, Ib, and II paraclinoid ICA aneurysms with anterior clinoidectomy.
Effectiveness of high implantation of SAPIEN 3 in preventing pacemaker implantation: A propensity score analysis

Takayuki Onishi, Osamu Komori, Tomo Ando, Motoki Fukutomi, Tetsuya Tobaru

Archives of Cardiovascular Diseases, 116(2) 79-87, Feb, 2023
生物多様性ビッグデータに基づいたネイチャーの可視化：その現状と展望

Yasuhiro Kubota, Buntarou Kusumoto, Takayuki Shiono, Shogo Ikari, Keiichi Fukaya, Nao Takashina, Yuya Yoshikawa, Yutaro Shigeto, Masashi Shimbo, Akikazu Takeuchi, Yusuke Saigusa, Osamu Komori

Japanese Journal of Biometrics, 43(2) 145-188, 2023
Factors affecting global neurocognitive status and frontal executive functions in the early stage after surgical clipping of unruptured anterior circulation aneurysms with respect to keyhole clipping and conventional clipping

Yoshifumi Higashino, Makoto Isozaki, Kenzo Tsunetoshi, Osamu Komori, Yoshinori Shibaike, Satoshi Kawajiri, Shinsuke Yamada, Ayumi Akazawa, Masamune Kidoguchi, Toshiaki Kodera, Hidetaka Arishima, Takuro Inoue, Takanori Fukushima, Kenichiro Kikuta

Acta Neurochirurgica, 164(8) 2219-2228, Jun 22, 2022
Prostate-specific antigen nomogram to predict advanced prostate cancer using area under the receiver operating characteristic curve boosting

Takeshi Hashimoto, Osamu Komori, Jun Nakashima, Takeshi Kashima, Yuri Yamaguchi, Naoya Satake, Yoshihiro Nakagami, Toshihide Shishido, Kazunori Namiki, Yoshio Ohno

Urologic Oncology: Seminars and Original Investigations, 40(4) 162.e9-162.e16, Apr, 2022
Generalized quasi-linear mixed-effects model

Yusuke Saigusa, Shinto Eguchi, Osamu Komori

Statistical Methods in Medical Research, 31(7) 1280-1291, Mar 14, 2022

The generalized linear mixed model (GLMM) is one of the most common method in the analysis of longitudinal and clustered data in biological sciences. However, issues of model complexity and misspecification can occur when applying the GLMM. To address these issues, we extend the standard GLMM to a nonlinear mixed-effects model based on quasi-linear modeling. An estimation algorithm for the proposed model is provided by extending the penalized quasi-likelihood and the restricted maximum likelihood which are known in the GLMM inference. Also, the conditional AIC is formulated for the proposed model. The proposed model should provide a more flexible fit than the GLMM when there is a nonlinear relation between fixed and random effects. Otherwise, the proposed model is reduced to the GLMM. The performance of the proposed model under model misspecification is evaluated in several simulation studies. In the analysis of respiratory illness data from a randomized controlled trial, we observe the proposed model can capture heterogeneity; that is, it can detect a patient subgroup with specific clinical character in which the treatment is effective.
A Unified Formulation of k-Means, Fuzzy c-Means and Gaussian Mixture Model by the Kolmogorov–Nagumo Average

Osamu Komori, Shinto Eguchi

Entropy, 23(5) 518-518, Apr 24, 2021

Clustering is a major unsupervised learning algorithm and is widely applied in data mining and statistical data analyses. Typical examples include k-means, fuzzy c-means, and Gaussian mixture models, which are categorized into hard, soft, and model-based clusterings, respectively. We propose a new clustering, called Pareto clustering, based on the Kolmogorov–Nagumo average, which is defined by a survival function of the Pareto distribution. The proposed algorithm incorporates all the aforementioned clusterings plus maximum-entropy clustering. We introduce a probabilistic framework for the proposed method, in which the underlying distribution to give consistency is discussed. We build the minorize-maximization algorithm to estimate the parameters in Pareto clustering. We compare the performance with existing methods in simulation studies and in benchmark dataset analyses to demonstrate its highly practical utilities.
Sampling bias correction in species distribution models by quasi-linear Poisson point process.

Osamu Komori, Shinto Eguchi, Yusuke Saigusa, Buntarou Kusumoto, Yasuhiro Kubota

Ecological Informatics, 55, 2020 Peer-reviewed
The significance of micro-lymphatic invasion and pathological Gleason score in prostate cancer patients with pathologically organ-confined disease and negative surgical margins after robot-assisted radical prostatectomy

Takeshi Hashimoto, Jun Nakashima, Rie Inoue, Osamu Komori, Yuri Yamaguchi, Takeshi Kashima, Naoya Satake, Yoshihiro Nakagami, Kazunori Namiki, Toshitaka Nagao, Yoshio Ohno

International Journal of Clinical Oncology, 25(2) 377-383, Oct 31, 2019
Robust kernel canonical correlation analysis to detect gene-gene co-associations: A case study in genetics

Md. Ashad Alam, Osamu Komori, Hong-Wen Deng, Vince D. Calhoun, Yu-Ping Wang

Journal of Bioinformatics and Computational Biology, 17(04) 1950028-1950028, Aug, 2019

The kernel canonical correlation analysis based U-statistic (KCCU) is being used to detect nonlinear gene–gene co-associations. Estimating the variance of the KCCU is however computationally intensive. In addition, the kernel canonical correlation analysis (kernel CCA) is not robust to contaminated data. Using a robust kernel mean element and a robust kernel (cross)-covariance operator potentially enables the use of a robust kernel CCA, which is studied in this paper. We first propose an influence function-based estimator for the variance of the KCCU. We then present a non-parametric robust KCCU, which is designed for dealing with contaminated data. The robust KCCU is less sensitive to noise than KCCU. We investigate the proposed method using both synthesized and real data from the Mind Clinical Imaging Consortium (MCIC). We show through simulation studies that the power of the proposed methods is a monotonically increasing function of sample size, and the robust test statistics bring incremental gains in power. To demonstrate the advantage of the robust kernel CCA, we study MCIC data among 22,442 candidate Schizophrenia genes for gene–gene co-associations. We select 768 genes with strong evidence for shedding light on gene–gene interaction networks for Schizophrenia. By performing gene ontology enrichment analysis, pathway analysis, gene–gene network and other studies, the proposed robust methods can find undiscovered genes in addition to significant gene pairs, and demonstrate superior performance over several of current approaches.
Development of novel diagnostic system for pancreatic cancer, including early stages, measuring <scp>mRNA</scp> of whole blood cells

Yoshio Sakai, Masao Honda, Shigeyuki Matsui, Osamu Komori, Toshinori Murayama, Tadami Fujiwara, Masaaki Mizuno, Yasuhito Imai, Kenichi Yoshimura, Alessandro Nasti, Takashi Wada, Noriho Iida, Masaaki Kitahara, Rika Horii, Tamai Toshikatsu, Masashi Nishikawa, Hirofumi Okafuji, Eishiro Mizukoshi, Tatsuya Yamashita, Taro Yamashita, Kuniaki Arai, Kazuya Kitamura, Kazunori Kawaguchi, Hajime Takatori, Tetsuro Shimakami, Takeshi Terashima, Tomoyuki Hayashi, Kouki Nio, Shuichi Kaneko

Cancer Science, 110(4) 1364-1388, Mar 27, 2019

Pancreatic ductal adenocarcinoma (PDAC) is the most life‐threating disease among all digestive system malignancies. We developed a blood mRNA PDAC screening system using real‐time detection PCR to detect the expression of 56 genes, to discriminate PDAC from noncancer subjects. We undertook a clinical study to assess the performance of the developed system. We collected whole blood RNA from 53 PDAC patients, 102 noncancer subjects, 22 patients with chronic pancreatitis, and 23 patients with intraductal papillary mucinous neoplasms in a per protocol analysis. The sensitivity of the system for PDAC diagnosis was 73.6% (95% confidence interval, 59.7%‐84.7%). The specificity for noncancer volunteers, chronic pancreatitis, and patients with intraductal papillary mucinous neoplasms was 64.7% (54.6%‐73.9%), 63.6% (40.7%‐82.8%), and 47.8% (26.8%‐69.4%), respectively. Importantly, the sensitivity of this system for both stage I and stage II PDAC was 78.6% (57.1%‐100%), suggesting that detection of PDAC by the system is not dependent on the stage of PDAC. These results indicated that the screening system, relying on assessment of changes in mRNA expression in blood cells, is a viable alternative screening strategy for PDAC.
An Optimal Semiparametric Method for Two‐group Classification

Seungchul Baek, Osamu Komori, Yanyuan Ma

Scandinavian Journal of Statistics, 45(3) 806-846, Apr 19, 2018

Abstract In the classical discriminant analysis, when two multivariate normal distributions with equal variance–covariance matrices are assumed for two groups, the classical linear discriminant function is optimal with respect to maximizing the standardized difference between the means of two groups. However, for a typical case‐control study, the distributional assumption for the case group often needs to be relaxed in practice. Komori et al. (Generalized t‐statistic for two‐group classification. Biometrics 2015, 71: 404–416) proposed the generalized t‐statistic to obtain a linear discriminant function, which allows for heterogeneity of case group. Their procedure has an optimality property in the class of consideration. We perform a further study of the problem and show that additional improvement is achievable. The approach we propose does not require a parametric distributional assumption on the case group. We further show that the new estimator is efficient, in that no further improvement is possible to construct the linear discriminant function more efficiently. We conduct simulation studies and real data examples to illustrate the finite sample performance and the gain that it produces in comparison with existing methods.
Information geometry associated with generalized means

Shinto Eguchi, Osamu Komori, A. Ohara

Information geometry and its applications (IGAIA IV, Liblice, Czech Republic, 2016) (Nihat Ay et al. eds.) 279-295, Springer, 279-295, 2018 Peer-reviewed
Robust bias correction model for estimation of global stock status in fishery

O. Komori, S. Eguchi, Y. Saigusa, H. Okamura, M. Ichinokawa

Ecosphere, 8(12) e02038, Dec, 2017 Peer-reviewed
Quasi-linear score for capturing heterogeneous structure in biomarkers

K. Omae, O. Komori, S. Eguchi

BMC Bioinformatics, 18(308), Jul, 2017 Peer-reviewed
Robust bias correction model for estimation of global trend in marine populations

Komori, O, Eguchi, S, Saigusa, Y, H. Okamura, H, Ichinokawa, M

Ecosphere, 8(12), 2017 Peer-reviewed

In modeling biological and ecological processes from data, it is essential to deal with data selection bias properly in order to obtain reliable and reasonable predictions. To incorporate the mechanism of selection bias into a statistical analysis, a propensity score (PS) is widely employed as an inverse probability weight in order to obtain a consistent estimation of a binary response variable of interest. However, the estimation performance often becomes unstable due to the mis-estimation of the PS. In order to obtain a consistent estimation as well as to stabilize the estimation performance, we propose a new regression model that incorporates the PS as an explanatory variable. Moreover, we show that the proposed model has a the property of double robustness, which enables us to obtain a consistent estimation of the response without suffering from selection bias if either the PS model or the proposed model is correctly specified. The robust bias correction model also accommodates heterogeneity of data distributions based on an asymmetric logistic model, which in turn improves model fitting and prediction accuracy. The PS in our regression model enables us to estimate consistently the global fish stock status even if the information of the stock status available is biased.
Reproducible detection of disease-associated markers from gene expression data

Omae, K., Komori, O., Eguchi, S.

BMC Medical Genomics, 9(53), Aug, 2016 Peer-reviewed
Clinical and microarray analysis of breast cancers of all subtypes from two prospective preoperative chemotherapy studies

H S Okuma, F Koizumi, A Hirakawa, M Nakatochi, O Komori, J Hashimoto, M Kodaira, M Yunokawa, H Yamamoto, K Yonemori, C Shimizu, Y Fujiwara, K Tamura

British Journal of Cancer, 115(4) 411-419, Jul 14, 2016
Effect of lifespan and age on reproductive performance of the tardigrade Acutuncus antarcticus: minimal reproductive senescence

Megumu Tsujimoto, Osamu Komori, Satoshi Imura

Hydrobiologia, 772(1) 93-102, Feb 8, 2016
An asymmetric logistic regression model for ecological data

Osamu Komori, Shinto Eguchi, Shiro Ikeda, Hiroshi Okamura, Momoko Ichinokawa, Shinichiro Nakayama

Methods in Ecology and Evolution, 7(2) 249-260, Feb 1, 2016 Peer-reviewed

© 2016 British Ecological Society. Binary data are popular in ecological and environmental studies; however, due to various uncertainties and complexities present in data sets, the standard generalized linear model with a binomial error distribution often demonstrates insufficient predictive performance when analysing binary and proportional data. To address this difficulty, we propose an asymmetric logistic regression model that uses a new parameter to account for data complexity. We observe that this parameter controls the model's asymmetry and is important for adjusting the weights associated with observed data in order to improve model fitting. This model includes the ordinary logistic regression model as a special case. It is easily implemented using a slight modification of glm or glmer in statistical software R. Simulation studies suggest that our new approach outperforms a traditional approach in terms of both predictive accuracy and variable selection. In a case study involving fisheries data, we found that the annual catch amount had a greater impact on stock status prediction, and improved predictive capability was supported with a smaller AIC compared to a generalized linear model. In summary, our method can enhance the applicability of a generalized linear model to various ecological problems using a slight modification, and significantly improves model fitting and model selection.
Preoperative predictive factors and further risk stratification of biochemical recurrence in clinically localized high-risk prostate cancer

Riu Hamada, Jun Nakashima, Makoto Ohori, Yoshio Ohno, Osamu Komori, Kunihiro Yoshioka, Masaaki Tachibana

International Journal of Clinical Oncology, 21(3) 595-600, Nov 19, 2015
Binary classification with pseudo exponential model and its application for multi task learning.

Takenouchi, Takashi, Komori, Osamu, Eguchi, Shinto

Entropy, 17 5673-5694, Aug, 2015 Peer-reviewed
Generalized t-statistics for two-group classification

Komori, Osamu, Eguchi, Shinto, Copas, John

Biometrics, 71 404-416, Jun, 2015 Peer-reviewed
A novel boosting algorithm for multi-task learning based on the Itakuda-Saito divergence

Takashi Takenouchi, Osamu Komori, Shinto Eguchi

BAYESIAN INFERENCE AND MAXIMUM ENTROPY METHODS IN SCIENCE AND ENGINEERING (MAXENT 2014), 1641 230-237, 2015 Peer-reviewed

In this paper, we propose a novel multi-task learning algorithm based on an ensemble learning method. We consider a specific setting of the multi-task learning for binary classification problems, in which features are shared among all tasks and all tasks are targets of performance improvement. We focus on a situation that the shared structures among dataset are represented by divergence between underlying distributions associated with multiple tasks. We discuss properties of the proposed method and investigate validity of the proposed method with numerical experiments.
Path Connectedness on a Space of Probability Density Functions

Shinto Eguchi, Osamu Komori

GEOMETRIC SCIENCE OF INFORMATION, GSI 2015, 9389 615-624, 2015 Peer-reviewed

We introduce a class of paths or one-parameter models connecting arbitrary two probability density functions (pdf's). The class is derived by employing the Kolmogorov-Nagumo average between the two pdf's. There is a variety of such path connectedness on the space of pdf's since the Kolmogorov-Nagumo average is applicable for any convex and strictly increasing function. The information geometric insight is provided for understanding probabilistic properties for statistical methods associated with the path connectedness. The one-parameter model is extended to a multidimensional model, on which the statistical inference is characterized by sufficient statistics.
Maximum power entropy method for ecological data analysis

Osamu Komori, Shinto Eguchi

BAYESIAN INFERENCE AND MAXIMUM ENTROPY METHODS IN SCIENCE AND ENGINEERING (MAXENT 2014), 1641 337-344, 2015 Peer-reviewed

In ecology predictive models of the geographical distribution of certain species are widely used to capture the spatial diversity. Recently a method of Maxent based on Gibbs distribution is frequently employed to have reasonable accuracy of a target distribution of species at a site using environmental features such as temperature, precipitation, elevation and so on. It requires only presence data, which is a big advantage to the case where absence data is not available or unreliable. It also incorporates our limited knowledge into the model about the target distribution such that the expected values of environmental features are equal to the empirical average. Moreover, the visualization of the inhabiting probability of species is easily done with the aid of geographical coordination information from Global Biodiversity Inventory Facility (GBIF) in a statistical software R. However, the maximum entropy distribution in Maxent is derived from the Boltzmann-Gibbs-Shannon entropy, which causes unstable estimation of the parameters in the model when some outliers in the data are observed. To overcome the weak point and to have deep understandings of the relation among the total number of species, the Boltzmann-Gibbs-Shannon entropy and Simpson's index, we propose a maximum power entropy method based on beta-divergence, which is a special case of U-divergence. It includes the Boltzmann-Gibbs-Shannon entropy as a special case, so it could have better performance of estimation of the target distribution by appropriately choosing the value of the power index beta. We demonstrate the performance of the proposed method by simulation studies as well as publicly available real data.
Individualized Prostate-specific Antigen Threshold Values to Avoid Overdiagnosis of Prostate Cancer and Reduce Unnecessary Biopsy in Elderly Men

K. Kanao, O. Komori, J. Nakashima, T. Ohigashi, E. Kikuchi, A. Miyajima, K. Nakagawa, S. Eguchi, M. Oya

Japanese Journal of Clinical Oncology, 44(9) 852-859, Jul 16, 2014
Spontaneous Clustering via Minimum Gamma-Divergence

Akifumi Notsu, Osamu Komori, Shinto Eguchi

NEURAL COMPUTATION, 26(2) 421-448, Feb, 2014 Peer-reviewed

We propose a new method for clustering based on local minimization of the gamma-divergence, which we call spontaneous clustering. The greatest advantage of the proposed method is that it automatically detects the number of clusters that adequately reflect the data structure. In contrast, existing methods, such as K-means, fuzzy c-means, or model-based clustering need to prescribe the number of clusters. We detect all the local minimum points of the gamma-divergence, by which we define the cluster centers. A necessary and sufficient condition for the gamma-divergence to have local minimum points is also derived in a simple setting. Applications to simulated and real data are presented to compare the proposed method with existing ones.
Robust Independent Component Analysis via Minimum gamma-Divergence Estimation

Pengwen Chen, Hung Hung, Osamu Komori, Su-Yun Huang, Shinto Eguchi

IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 7(4) 614-624, Aug, 2013 Peer-reviewed

Independent component analysis (ICA) has been shown to be useful in many applications. However, most ICA methods are sensitive to data contamination. In this article we introduce a general minimum U-divergence framework for ICA, which covers some standard ICA methods as special cases. Within the U-family we further focus on the gamma-divergence due to its desirable property of super robustness for outliers, which gives the proposed method gamma-ICA. Statistical properties and technical conditions for recovery consistency of gamma-ICA are studied. In the limiting case, it improves the recovery condition of MLE-ICA known in the literature by giving necessary and sufficient condition. Since the parameter of interest in gamma-ICA is an orthogonal matrix, a geometrical algorithm based on gradient flows on special orthogonal group is introduced. Furthermore, a data-driven selection for the gamma value, which is critical to the achievement of gamma-ICA, is developed. The performance, especially the robustness, of gamma-ICA is demonstrated through experimental studies using simulated data and image data.
Multiple suboptimal solutions for prediction rules in gene expression data

Osamu Komori, Mari Pritchard, Shinto Eguchi

Computational and Mathematical Methods in Medicine, 2013 14, 2013 Peer-reviewed

This paper discusses mathematical and statistical aspects in analysis methods applied to microarray gene expressions. We focus on pattern recognition to extract informative features embedded in the data for prediction of phenotypes. It has been pointed out that there are severely difficult problems due to the unbalance in the number of observed genes compared with the number of observed subjects. We make a reanalysis of microarray gene expression published data to detect many other gene sets with almost the same performance. We conclude in the current stage that it is not possible to extract only informative genes with high performance in the all observed genes. We investigate the reason why this difficulty still exists even though there are actively proposed analysis methods and learning algorithms in statistical machine learning approaches. We focus on the mutual coherence or the absolute value of the Pearson correlations between two genes and describe the distributions of the correlation for the selected set of genes and the total set. We show that the problem of finding informative genes in high dimensional data is ill-posed and that the difficulty is closely related with the mutual coherence. © 2013 Osamu Komori et al.
An extension of the Receiver Operating Characteristic curve and AUC-optimal classification

Takenouchi, T, Komori, O, Eguchi , S

Neural Computation, 24 2789-2824, Jun, 2012 Peer-reviewed
Boosting Learning Algorithm for Pattern Recognition and Beyond

Osamu Komori, Shinto Eguchi

IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, E94D(10) 1863-1869, Oct, 2011 Peer-reviewed

This paper discusses recent developments for pattern recognition focusing on boosting approach in machine learning. The statistical properties such as Bayes risk consistency for several loss functions are discussed in a probabilistic framework. There are a number of loss functions proposed for different purposes and targets. A unified derivation is given by a generator function U which naturally defines entropy, divergence and loss function. The class of U-loss functions associates with the boosting learning algorithms for the loss minimization, which includes AdaBoost and LogitBoost as a twin generated from Kullback-Leibler divergence, and the (partial) area under the ROC curve. We expand boosting to unsupervised learning, typically density estimation employing U-loss function. Finally, a future perspective in machine learning is discussed.
Projective power entropy and maximum Tsallis entropy distributions

Eguchi, S, Komori, O, Kato, S

Entropy, 13 1746-1764, 2011 Peer-reviewed
A boosting method for maximizing the partial area under the ROC curve

Osamu Komori, Shinto Eguchi

BMC BIOINFORMATICS, 11 314, Jun, 2010 Peer-reviewed

Background: The receiver operating characteristic (ROC) curve is a fundamental tool to assess the discriminant performance for not only a single marker but also a score function combining multiple markers. The area under the ROC curve (AUC) for a score function measures the intrinsic ability for the score function to discriminate between the controls and cases. Recently, the partial AUC (pAUC) has been paid more attention than the AUC, because a suitable range of the false positive rate can be focused according to various clinical situations. However, existing pAUC-based methods only handle a few markers and do not take nonlinear combination of markers into consideration. Results: We have developed a new statistical method that focuses on the pAUC based on a boosting technique. The markers are combined componentially for maximizing the pAUC in the boosting algorithm using natural cubic splines or decision stumps (single-level decision trees), according to the values of markers (continuous or discrete). We show that the resulting score plots are useful for understanding how each marker is associated with the outcome variable. We compare the performance of the proposed boosting method with those of other existing methods, and demonstrate the utility using real data sets. As a result, we have much better discrimination performances in the sense of the pAUC in both simulation studies and real data analysis. Conclusions: The proposed method addresses how to combine the markers after a pAUC-based filtering procedure in high dimensional setting. Hence, it provides a consistent way of analyzing data based on the pAUC from maker selection to marker combination for discrimination problems. The method can capture not only linear but also nonlinear association between the outcome variable and the markers, about which the nonlinearity is known to be necessary in general for the maximization of the pAUC. The method also puts importance on the accuracy of classification performance as well as interpretability of the association, by offering simple and smooth resultant score plots for each marker.

Misc.

Duality of Maximum Entropy and Minimum Divergence

S. Eguchi, O. Komori, A.Ohara

Entrtopy, 16 3552-3572, 2014
ゲノム・プロテオミクスデータを用いた予測解析: 機械学習による新しい統計的手法

小森理, 江口真透

日本統計学会誌 38巻シリーズJ 2号, 38(J2) 199-212, 2009 Peer-reviewedInvited

Research Projects

Dynamic β-Maxentによる生物多様性予測とそのアプリケーション実装

科学研究費助成事業, 日本学術振興会, Apr, 2022 - Mar, 2027

小森理, 江口真透, 久保田康裕
Use of deep learning and development of statistical prediction result evaluation methods for the acceleration of personalized medicine

Grants-in-Aid for Scientific Research, Japan Society for the Promotion of Science, Apr, 2023 - Mar, 2026
個別化医療の適応的臨床研究を支える統計・機械学習法に関する研究

科学研究費助成事業, 日本学術振興会, Apr, 2021 - Mar, 2025

松井茂之, 山田誠, 星野崇宏, 三分一史和, 小森理
Innovative Developments of Theories and Methodologies for Large Complex Data

Grants-in-Aid for Scientific Research, Japan Society for the Promotion of Science, Apr, 2020 - Mar, 2025
Developments and applications of statistical prediction method

Grants-in-Aid for Scientific Research, Japan Society for the Promotion of Science, Apr, 2018 - Mar, 2023

Industrial Property Rights

特許5852759 遺伝子発現解析による膵臓癌の検出

金子周一, 酒井佳夫, 小村卓也, 松井茂之, 小森理, 丹野博, 宮崎義孝, 辰巳勇

To the list screen