This paper presents a novel algorithm to optimize the Gaussian kernel for pattern classification tasks, where it is desirable to have well-separated samples in the kernel feature space. We propose to optimize the Gaussian kernel parameters by maximizing a classical class separability criterion, and the problem is solved through a quasi-Newton algorithm by making use of a recently proposed decomposition of the objective criterion. The proposed method is evaluated on five data sets with two kernel-based learning algorithms. The experimental results indicate that it achieves the best overall classification performance, compared with three competing solutions. In particular, the proposed method provides a valuable kernel optimization solution in the severe small sample size scenario.