Geometry &amp; Recognition

KMeans Algorithm

Image Recognition 2008. 7. 19. 21:03

KMeans 알고리즘은 주어진 데이터들을 사전에 정해진 수만큼의 클러스터로 자동 분류하는 간단한 클러스터링 알고리즘의 일종이다. 이 알고리즘의 주된 아이디어는 각 클러스터의 중심을 어떻게 하면 잘 잡는가이다. 좋은 선택은 각각의 중심이 가능한 서로 멀리 떨어지게 잡아야 한다. 일단 클러스터 중심이 설정되면 각각의 데이터에 대해서 가장 가까이에 있는 클러스터에 할당시키면 된다. 다음으로 클러스터의 중심을 그 클러스터에 속한 데이터의 무게중심으로 재설정하고, 다시 각각의 데이터를 클러스터에 할당하는 작업을 반복한다. 이 과정은 클러스터의 중심이 더 이상 이동하지 않을 때까지 반복 시행하면 된다. 결과적으로 이 알고리즘은 클러스터의 중심에서 그 클러스터에 할당된 데이터와의 거리의 제곱을 그 클러스터의 모든 점과 전체 클러스터에 대해서 합한 값을 최소화시키는 클러스터 중심과 데이터의 분할을 찾는 것이 된다.

1. k개의 클러스터의 중심을 설정한다.
2. 설정된 중심을 가지고, 각 데이터를 분할한다.
3. 분할된 데이터를 이용하여서 클러스터의 중심을 재설정한다.
4. 중심이 변화가 없을 때까지 2,3 과정을 반복한다.

비용 함수의 관점에서 보면 KMeans 클러스터링은 다음 함수를

$$ \text{cost}=\sum_{k \in classes} \sum_{i \in class_k} \big| \text{data}_k(i) - \text{mean}_k \big|^2 $$

을 최소화시키는 분할을 하는 것이다. 그러나, KMeans는 항상 최적의 비용 함숫값을 보장하지는 않는다. 그리고 알고리즘의 결과는 초기값의 설정에 의존하여서 다른 결과를 낼 수가 있다. 보다 큰 단점은 클러스터의 수를 사전에 정해주어야 한다는 사실인데, 잘못된 클러스터 개수의 설정은 성능에 크게 영향을 미친다. 자동의 클러스터의 수를 설정하면서 클러스터링을 할 수 있는 간단한 것으로는 isodata 알고리즘이 있다.

typedef double * vector_t ;
double kmeans_label(
             const vector_t *obs,             // observations;
             const int n_obs,                   /* in # of vectors */            
             const vector_t *mean,           // mean vectors;
             const int n_mean,                /* # of mean vectors */
             int* label,                           // cluster labeling;
             const int veclen)                 // dimension of feature;
{
    double sqerr=0;
    for (int i = 0; i < n_obs; i++) {
        vector_t c = obs[i];
        /* Get an estimate of best distance (min_dist) and (min_idx) */
        int min_idx = label[i];
        vector_t m = mean[min_idx];
        double min_sqdist = 0 ;
        for (int l = 0; l < veclen; l++) {
            double t = m[l] - c[l]; min_sqdist += t * t;
        }
        for (int j = 0; j < n_mean; j++) {
            vector_t m = mean[j];
            // save time by checking:: dist < min_sqdist=previous min_val;
            double sqdist = 0;
            for (l = 0; (l < veclen) && (sqdist < min_sqdist); l++) {
                double t = m[l] - c[l]; sqdist += t * t;
            }
            if (sqdist < min_sqdist) {
                min_sqdist = sqdist; min_idx = j;
            }
        }
        label[i] = min_idx;
        sqerr += min_sqdist;
    }
    return sqerr ;
};

int kmeans_update(
              const vector_t *obs, //observations;
              const int n_obs,       //# of observations;     
              vector_t *mean,       // mean vectors;
              const int n_mean,    // # of mean vectors;
              int *label,               // cluster labelling;
              const int veclen      // dimension of features;
              ) {
    std::vector<int> cnt(n_mean);
    for (int i = 0; i < n_mean; i++)
        for (int l = 0; l < veclen; l++) mean[i][l] = 0.0;
        
    for (i = 0; i < n_obs; i++) {
        ASSERT((0 <= label[i]) && (label[i] < n_mean));  
        vector_t m = mean[label[i]];
        cnt[label[i]]++;  
        vector_t c = obs[i];
        if (c == NULL)
            TRACE("No observations for %u, but expected up through %u", i, n_obs-1);
        for (int l = 0; l < veclen; l++) m[l] += c[l];
    }
    for (i = 0; i < n_mean; i++) {
        int j = cnt[i];
        if (j != 0) for (int l = 0; l < veclen; l++) mean[i][l] /= (double) j;
        else {
            TRACE("Empty cluster %u\n", i);
            return FALSE;
        }
    }
    return TRUE;
}

double kmeans(
       const vector_t *obs,     // observations;
       const int n_obs,          /* # of observations */
       vector_t *mean,           /* initial set of means */
       const int n_mean,        /* # of means (should be k_mean?) */
       int *label,                   /* cluster labeling*/
       const int veclen,         /* vector length of means and corpus */    
       const double min_conv_ratio,
       const int max_iter
    ) {
    double p_sqerr = DBL_MAX;
    int ret;
    double sqerr = kmeans_label( obs, n_obs,mean, n_mean, &label[0], veclen);
    double conv_ratio = (p_sqerr - sqerr) / p_sqerr;
    for (int i = 0; (i < max_iter) && (conv_ratio > min_conv_ratio); i++) {
        TRACE("kmeans iter [%u] %e ...\n", i, conv_ratio);
        ret = kmeans_update(obs, n_obs, mean, n_mean, &label[0], veclen );
        if (ret != TRUE)
            return (double)ret;
        p_sqerr = sqerr;
        sqerr = kmeans_label(  obs, n_obs,mean, n_mean, &label[0], veclen);
        conv_ratio = (p_sqerr - sqerr) / p_sqerr;
    }
    TRACE("kmeans n_iter %u sqerr %e conv_ratio %e\n", i, sqerr, conv_ratio);   
    return sqerr;
}

'Image Recognition' 카테고리의 다른 글

Retinex Algorithm (2)	2008.07.26
RANSAC: Circle Fit (0)	2008.07.21
Robust Line Fitting (0)	2008.07.08
EM: Binarization (0)	2008.07.01
EM Algorithm: Line Fitting (0)	2008.06.29

Posted by helloktk

,

Adaptive Binarization

Image Recognition/Fundamental 2008. 7. 14. 11:32

이미지를 이진화시키기 위해서 여러 알고리즘이 사용된다. 그중 이미지 전체에 대해 하나의 임계값으로 이진화시키는 전역 이진화 알고리즘은 간단하고 빠르기 때문에 많이 이용이 된다. 그러나 이미지를 형성할 때 조명 조건이 균일하지 않은 경우에는 전역 이진화는 원하는 결과를 얻기가 힘들다. 이런 경우에는 각각의 픽셀 주위의 그레이 값을 참조하여 임계치를 결정하는 국소적 이진화 방법을 사용한다. 국소적 이진화에서 임계값을 추출하는 간단한 방법은 윈도 내의 평균값을 이용하면 된다. 좀 더 개선된 알고리즘은 평균값($m(x, y)$)을 참조하되, 편차($\sigma(x, y)$)를 한번 더 고려해 주는 것이다. 이렇게 하여 잡은 국소적 임계값은 다음과 같이 표현된다:

$$T_{(x, y)} = m_{(x, y)} [1+ \text{factor}(\sigma_{(x, y)}-128)]$$

여기서 $128$은 그레이 값이 가질 수 있는 최대 편차를 의미한다. 편차가 $128$이면 단순 평균값으로 취한다는 의미가 된다. 그 외의 경우는 표준편차와 128의 차이(항상 음수다)에 비례하는 값으로 윈도 평균값을 offset 한 값을 임계치로 잡는다. $\text{factor}$는 일반적으로 정해지지 않고, 실험적으로 $[0.2, 0.5]$ 사이의 값이 취해진다. (문서처럼 배경이 흰색인 경우는 $\text{factor} > 0$이지만, 검정 배경에 흰색 글씨를 처리하는 경우는 음수의 값을 취하는 것이 맞다)

국소적인 이진화 알고리즘은 매 픽셀마다 윈도를 잡아서 계산해야 하므로 연산 비용이 많이 든다. 충분한 메모리를 갖춘 시스템의 경우에는 적분 이미지(integral image)를 이용하면 윈도 연산에 소요되는 비용을 대폭 줄일 수 있다..

국소적 이진화 알고리즘에서 윈도 크기와 $\text{factor}$를 결정하는 기준은 무엇일까? 이것은 해결하고자 하는 문제의 특성, 예를 들면 스캔된 문서를 이진화시키는 경우에는 윈도에 충분한 글자가 들어 있어야 한다... 등에 많이 의존한다.

void make_int_img12(BYTE *gray, int width, int height, *int intimage, int *intsqimg);

void make_int_img12(BYTE *gray, int width, int height, *int intimage, int *intsqimg) {
    // first row accumulation;
    intimage[0] = gray[0];
    for (int x = 1; x < width; ++x) {
        int a = gray[x] ;
        intimage[x] = intimage[x - 1] + a;
        intsqimg[x] = intsqimg[x - 1] + a * a;
    }
    for (int y = 1, pos = y * width; y < height; ++y) {
        int linesum = 0, linesqsum = 0 ;
        for (int x = 0; x < width; ++x, ++pos) {
            int a = gray[pos];
            linesum   += a;
            linesqsum += a * a;
            intimage[pos] = intimage[pos - width] + linesum ;
            intsqimg[pos] = intsqimg[pos - width] + linesqsum;
        }
    }
};

#define integral_image(x, y) (intimage[(y) * width + (x)])
#define integral_sqimg(x, y) (intsqimg[(y) * width + (x)])
//
void adap_binariztion(BYTE *gray, int width, int height, 
                      int w       /*window size = 15*/,
                      double k    /*factor           = 0.2*/,
                      BYTE *bimage) {
    int whalf = w >> 1; //half of adaptive window;
    int diff, sqdiff;
    // make integral image && square integral image; 
    // if image is sufficiently large, use int64 or floating point number;
    std::vector<int> intimage(width * height) ;
    std::vector<int> intsqimg(width * height) ;

    //make integral image and its square integral image;
    make_int_img12(gray, width, height, &intimage[0], &intsqimg[0]);  
    //algorithm main;
    for (int j = 0, pos = 0; j < height; j++) {
        for (int i = 0; i < width; i++, pos++) {
            // clip windows 
            int xmin = max(0, i - whalf);
            int ymin = max(0, j - whalf);
            int xmax = min(width - 1, i + whalf);
            int ymax = min(height - 1, j + whalf);
            int area = (xmax - xmin + 1) * (ymax - ymin + 1);
            // calculate window mean and std deviation;
            if (!xmin && !ymin) {     // origin
                diff   = integral_image(xmax, ymax);
                sqdiff = integral_sqimg(xmax, ymax);
            } else if (!xmin && ymin) { // first column
                diff   = integral_image(xmax, ymax) - integral_image(xmax, ymin - 1);
                sqdiff = integral_sqimg(xmax, ymax) - integral_sqimg(xmax, ymin - 1);
            } else if (xmin && !ymin){ // first row
                diff   = integral_image(xmax, ymax) - integral_image(xmin - 1, ymax);
                sqdiff = integral_sqimg(xmax, ymax) - integral_sqimg(xmin - 1, ymax);
            } else{ // rest of the image
                int diagsum    = integral_image(xmax, ymax) + integral_image(xmin - 1, ymin - 1);
                int idiagsum   = integral_image(xmax, ymin - 1) + integral_image(xmin - 1, ymax);
                diff           = diagsum - idiagsum;
                int sqdiagsum  = integral_sqimg(xmax, ymax) + integral_sqimg(xmin - 1, ymin - 1);
                int sqidiagsum = integral_sqimg(xmax, ymin - 1) + integral_sqimg(xmin - 1, ymax);
                sqdiff         = sqdiagsum - sqidiagsum;
            }
            // threshold = window_mean *( 1 + factor * (std_dev/128.-1));
            // 128 = max_allowed_std_deviation in the gray image;
            double mean = double(diff) / area;
            double std  = sqrt((sqdiff - double(diff) * diff / area) / (area - 1));
            double threshold = mean * (1.0 + k * ((std / 128.0) - 1.));
            if (gray[pos] < threshold) bimage[pos] = 0;
            else                       bimage[pos] = 255;
        }
    }   
};

'Image Recognition > Fundamental' 카테고리의 다른 글

Fant's Resampling (0)	2008.12.17
Bright Preserving Histogram Equalization with Maximum Entropy (0)	2008.07.31
Histogram Equalization (0)	2008.06.22
FFT2D (0)	2008.06.10
Otsu Algorithm (6)	2008.05.30

Posted by helloktk

,

Robust Line Fitting

Image Recognition 2008. 7. 8. 23:50

이미지에서 관찰된 점집합이 $\{(x_i, y_i)| i = 1, 2,\dots, N\}$이 있다. 이 점집합을 직선 $y = a + bx$ 로 피팅을 하고 싶을 때, 보통 최소자승법을 이용하는데, 원리는 직선의 방정식이 예측한 $y$값과 실제 관찰한 $y$값의 차이의 제곱(=square deviation)을 최소화시키는 직선의 기울기 $a$와 절편 $b$를 찾는 것이다:

$$\chi^2(a, b) = \sum_i |y_i - (b x_i +a) |^2 $$

데이터를 얻는 측정 과정에서 측정값 $y_i$는 랜덤 노이즈를 포함하게 되고, 이 노이즈는 참값 $y(x)$ 근방에서 정규분포를 한다고 가정을 할 수 있다. 만약 모든 측정의 노이즈가 동일한 표준편차 $\sigma$를 갖게 된다면, $N$개의 관측 데이터가 나타날 확률(likelihood)은 (개별 측정은 IID 조건을 만족한다고 전제)

$$P = \prod_{i} e^{ -\frac{ | y_i - (bx_i + a)|^2 }{2\sigma^2 } }$$

의 형태가 된다. 따라서 최소자승법은 이 likelihood를 최대화시키는 파라미터를 찾는 방법이 된다. 최소자승법은 피팅 파라미터를 주어진 데이터를 이용해서 표현할 수 있는 장점은 있지만, outliers에 대해서 매우 취약하다 (아래의 결과 그림을 참조). 이는 적은 수의 outlier도 χ2에 큰 기여를 할 수 있기 때문이다. 따라서 피팅을 좀 더 robust 하게 만들기 위해서는 outliers가 likelihood에 기여하는 정도를 줄여야 한다. 이를 위해서는 likelihood의 지수 인자를 큰 에러에서 덜 민감하게 반응하는 꼴로 바뀌어야 한다. 이를 만족하는 가장 간단한 것 방법 중 하나가 square-deviation 대신에 absolute-deviation을 이용하는 것이다:

$$\text{absolute deviation} = \sum _i | y_i - (bx_i + a)| .$$

그러나 이 식을 사용하는 경우에는 최소자승법과 다르게 기울기 $a$와 절편 $b$를 주어진 데이터 $\{(x_i, y_i)\}$로 표현할 수 없고, 반복적인 방법을 이용해서 찾아야 한다.

수열 $\{c_i\}$에 대해 합 $\sum_{i} |c_i - a|$은 $a$가 수열의 median 값일 때 최솟값을 갖는다는 사실을 이용하면 (증명: 극값을 구하기 위해서 $a$에 대해서 미분하면, $0=(\sum_{c_i > a} 1)-(\sum_{c_i < a} 1)$: 합은 $a$가 $c_i$ 보다 큰 경우와 작은 경우로 분리. 따라서 0이 되기 위해서는 작은 경우와 큰 경우의 수가 같아야 한다. 고로, $a = \text{median}\{c_i\}$ q.e.d.). 고정된 절편 $b$에 대해서 absolute deviation을 최소로 만드는 기울기 $a$는

$$a= \text{median} \{ y_i - b x_i\}$$

임을 알 수 있다. 그리고 absolute deviation 식을 절편 $b$에 대해서 미분해서

$$0 = \sum_i \text{sign} \left( y_i - (bx_i +a) \right)$$

을 얻는데, 위에서 구한 기울기 $a$를 대입한 후 bracketing and bisection 방법을 이용하면 절편 $b$를 얻을 수 있다(불연속 함수이므로 일반적으로 근을 구하는 방법을 사용하는 것은 위험하다). 아래의 코드는 이를 구현한 것이다.

double FitLine_LS(std::vector<double>& x, std::vector<double>& y, double *a, double *b);

// 최소자승법을 이용한 직선 추정:
// return (sigma[dy] / sigma[x]);
double FitLine_LS(std::vector<double>& x, std::vector<double>& y, double& a, double& b) {
    double sx = 0, sy = 0, sxx = 0, sxy = 0;
    for (int i = x.size(); i-->0;) {
        sx  += x[i];        sy  += y[i];
        sxx += x[i] * x[i]; sxy += x[i] * y[i];
    };
    const int n = x.size();
    double det = n * sxx - sx * sx;
    if (det == 0.) return -1;                   // vertical line;
    a = (sxx * sy - sx * sxy) / det;
    b = (n * sxy - sx * sy) / det;
    double chi2 = 0;
    for (int i = x.size(); i-->0;) {
        double t = y[i] - (*a + *b * x[i]);
        chi2 += t * t;
    }
    det /= n;         //det -> var(x) * n;
    // chi2 = var(dy) * n;
    // (dy vs x의 편차비)
    return  sqrt(chi2 / det);
}

// qsort-comparator;
static int comp(const void *a, const void * b) {
    double d = *((double *)a) - *((double *)b);
    return d > 0 ? 1 : d < 0 ? -1 : 0;
}
// 기울기(bb)가 주어진 경우에 y-절편(median = aa)값을 추정하고, 이 때 AD값을 얻는다.
double RhoFunc(std::vector<double>&x, std::vector<double>& y,
               double bb, double& aa, double& abdev) {
    std::vector<double> h(x.size());
    for (int i = x.size(); i-->0;) h[i] = y[i] - bb * x[i];
    qsort(&h[0], h.size(), sizeof(double), comp);
    // median;
    const int med = h.size()/2;
    aa = (h.size() & 1) ? h[med] : (h[med] + h[med-1])/2;

    double sum = 0;
    abdev = 0;
    for (int i = x.size(); i-->0;) {
        double d = y[i] - (bb * x[i] + aa);
        abdev += fabs(d);
        // sign-함수의 원점에서 모호함을 없애기 위해서 증폭을 시킴;
        if (y[i] != 0.) d /= fabs(y[i]);
        if (fabs(d) > DBL_EPSILON) // sign 함수의 모호함에서 벗어나면,
            sum += (d >= 0 ? x[i] : -x[i]);
    }
    return sum; // return sum{xi * sign(yi - (b * xi + a))}
};
// y = a + b * x ;
// Least Absolute Deviation:
double FitLine_MAD (std::vector<double>& x, std::vector<double>& y,
                    double& a, double& b) {
    // least square esimates for (aa, bb);
    double aa, bb, abdev;
    double sigb = FitLine_LS(x, y, aa, bb);   // estimate: y=aa + bb*x;
    double b1 = bb;
    double f1 = RhoFunc(x, y, b1, aa, abdev);
    /* bracket 3-sigma away in the downhill direction;*/
    double b2 = fabs(3 * sigb);
    b2 = bb + (f1 < 0 ? -b2 : b2);
    double f2 = RhoFunc(x, y, b2, aa, abdev);

    // if conditional added to take care of the case of a
    // line input into this function. It is needed to avoid an
    // infinite loop when (b1 == b2) (within floating point error)
    if (fabs(b2 - b1) > (sigb + 0.005)) {
        // bracketing;
        while ((f1 * f2) > 0) {
            bb = 2 * b2 - b1;
            b1 = b2; b2 = bb; 
            f1 = f2; f2 = RhoFunc(x, y, b2, aa, abdev) ;
        }
    }
    // refine until the error is a negligible number of std;
    sigb *= 0.01;
    while (fabs(b2 - b1)> sigb) {
        // bisection;
        bb = (b1 + b2) / 2.;
        if ((bb == b1) || (bb == b2)) break ;
        double f = RhoFunc(x, y, bb, aa, abdev) ;
        if ((f * f1) >= 0) {
            f1 = f; b1 = bb;
        } else {
            f2 = f; b2 = bb;
        }
    }
    a = aa; b = bb; 
    return (abdev/x.size());
}

// 붉은 선--> 최소자승법에 의한 피팅.: outlier에 매우 취약한 구조.
// 파란 선--> least absolute deviation을 이용한 피팅: outlier에 매우 robust 하다.

'Image Recognition' 카테고리의 다른 글

RANSAC: Circle Fit (0)	2008.07.21
KMeans Algorithm (0)	2008.07.19
EM: Binarization (0)	2008.07.01
EM Algorithm: Line Fitting (0)	2008.06.29
Gaussian Mixture Model (2)	2008.06.07

Posted by helloktk

,

EM: Binarization

Image Recognition 2008. 7. 1. 21:04

이미지의 히스토그램을 이용하여 전경과 배경을 분리하는 이진화는 가우시안 mixture model과 EM 알고리즘을 적용하기에 좋은 예다. 히스토그램에는 전경에 해당하는 픽셀 분포와 배경에 해당하는 픽셀 분포가 혼합되어 있다. 이를 두 가우시안의 혼합으로 모델링하고 EM 알고리즘을 사용해서 mixing parameter(π_a), 각 클래스의 평균(μ_a) 과 표준편차(σ_a)를 추정한다. N개의 Gaussian mixture일 때,

Mixing parameter가 π_a (a=1, 2,..., nclass)일 때 특정 픽셀 값(=xi)이 클래스 a 소속일 posterior 확률은

로 쓸 수 있다. posterior 정보를 이용하면 mixing parameter, 평균 그리고 분산은 다음 식으로 주어진다. H[i]=Hi는 이미지의 히스토그램을 나타내고, bin 인덱스 i는 픽셀 값 xi를 나타낸다:

log-likelihood:

// mixing 클래스를 기술하는 클래스;
struct mixclass {
    double prob ;               // mixing parameter;
    double mean ;               // mean
    double var ;                // variance;
};

// N(mean, var);

double gauss1d(double x, double mean, double var)

{

double a = 1 / sqrt(2*M_PI * var);
double b = 0.5*(x-mean)*(x-mean)/var;
return a * exp(-b);
};

// posterior; Pr(Zi = c | xi, Theta);
// 주어진 관측값 x이 클래스 cid에 속할 posterior;
double classprob(double x, int nclass, mixclass* mclass, int cid)

{

double marginal = 0;

for (int c = 0; c < nclass; c++) {

marginal += mclass[c].prob * gauss1d(x, mclass[c].mean, mclass[c].var) ;

};

// Bayes 공식 = prior * PDF;

return mclass[cid].prob * gauss1d(x, mclass[cid].mean, mclass[cid].var) / marginal;

}

// posterior (class_prob[i][c]) table 만들기;

void update_class_prob(int nbins, double * hist, int nclass, mixclass* mclass, double ** class_prob)

{

for (int i = 0; i < nbins; i++) {

for (int c = 0; c < nclass; c++) {

class_prob[i][c] = classprob(double(i), nclass, mclass, c);

}

};

// E-step; pi[c] = mixture parameter for class c;

// posterior를 이용해서 특정클래스의 mixing 정도를 계산;==> next prior;

void update_prob(int nbins, double * hist, int nclass, mixclass* mclass, double ** class_prob)

{

double ntot = 0;

for (int i = 0; i < nbins; i++) ntot += hist[i];

for (int c = 0; c < nclass; c++) {

double s = 0;

for (int i = 0; i < nbins; i++) s += hist[i] * class_prob[i][c];

mclass[c].prob = s / ntot;

}

};

// mu[c]; 클래스의 평균;

void update_mean(int nbins, double * hist, int nclass, mixclass* mclass, double ** class_prob)

{

double ntot = 0;

for (int i=0; i<nbins; i++) ntot += hist[i];

for (int c = 0; c < nclass; c++) {

double sx = 0.0;

for (int i = 0; i < nbins; i++) sx += hist[i] * i * class_prob[i][c];

mclass[c].mean = sx / (ntot * mclass[c].prob);

}

};

// var[c]; 클래스의 분산;

void update_var(int nbins, double * hist, int nclass, mixclass* mclass, double ** class_prob)

{

double ntot = 0;

for (int i = 0; i < nbins; i++) ntot += hist[i];

for (int c = 0; c < nclass; c++) {

double m= mclass[c].mean ;

double sxx = 0;

for (int i = 0; i < nbins; i++) sxx += hist[i] * SQR(i - m) * class_prob[i][c];

mclass[c].var = sxx / (ntot * mclass[c].prob);

}

};

// M-step;

void update_parameters(int nbins, double * hist, int nclass, mixclass* mclass, double ** class_prob)

{

// mixture 파라미터를 갱신;

update_prob(nbins, hist, nclass, mclass, class_prob);

// 각 클래스의 평균을 갱신;

update_mean(nbins, hist, nclass, mclass, class_prob);

// 각 클래스의 분산을 갱신;

update_var(nbins, hist, nclass, mclass, class_prob);

};

// initialization;

void init_em(int nbins, double * hist, int nclass, mixclass* mclass)

{

srand(unsigned(time(0)));

double mean1 = 0, var1 = 0, ntot = 0;

for (int k = 0; k < nbins; k++) ntot += hist[k];

for (int i = 0; i < nbins; i++) mean1 += hist[i] * i;

mean1 /= ntot;

for (int i = 0; i < nbins; i++) var1 += hist[i] * SQR(i - mean1);

var1 /= ntot;

for (int c = 0; c < nclass; c++) {

mclass[c].prob = 1.0 / nclass; //same mixing parameter;

mclass[c].mean = rand() % nbins; // random mean;

mclass[c].var = var1; // same standard deviation;

}

};

// calculate log-likelihood;

double mixLLK(int nclass, mixclass* mclass)

{

double llk = 0;

for (int i = 0; i < nbins; i++) {

double s = 0 ;

for (int c = 0; c < nclass; c++)

s += mclass[c].prob * gauss1d(double(i), mclass[c].mean, mclass[c].var);

llk+= log(s);

}

return llk;

};

// check termination condition;

bool check_tol(double llk, double llk_p, double eps)

{

return (fabs(llk - llk_p) / fabs(llk)) > eps;

};

// 입력은 이미지의 히스토그램;
double em(int nbins/*=256*/, double hist[/*256*/],
    int nclass/*=2*/, mixclass mclass[/*=2*/], double eps/*=1.e-10*/) {
    double llk = 0, prev_llk = 0;
    // allocate memory buffers for the posterior information;
    double ** class_prob = (double**)malloc(sizeof(double*) * nbins);
    class_prob[0] = (double*)malloc(sizeof(double) * nbins * nclass) ;
    for (int i = 1; i < nbins; i++) class_prob[i] = class_prob[i - 1] + nclass;

    // initialization of algorithm;
    init_em(nbins, hist, nclass, mclass);
    //
    do {
        prev_llk = llk;
        // E-step ;
        update_class_prob(nbins, hist, nclass, mclass, class_prob);
        // M-step;
        update_parameters(nbins, hist, nclass, mclass, class_prob);
        llk = mixLLK(nclass, mclass);
        // TRACE("mean1=%f, mean2=%f\n", mclass[0].mean, mclass[1].mean);
        TRACE("log-likelihood=%e\n", llk);
    } while (!check_tol(llk, prev_llk, eps));
    // clean ;
    free(class_prob[0]);
    free(class_prob) ;
    return llk;
};

적색 : 히스토그램
청색, 녹색 : posterior(membership);
Otsu 알고리즘을 쓰는 경우에 100에서 threshold 값이 결정되고 EM은 110 정도임.
Otsu Threshold source code: kipl.tistory.com/17

'Image Recognition' 카테고리의 다른 글

KMeans Algorithm (0)	2008.07.19
Robust Line Fitting (0)	2008.07.08
EM Algorithm: Line Fitting (0)	2008.06.29
Gaussian Mixture Model (2)	2008.06.07
Rasterizing Voronoi Diagram (0)	2008.05.26

Posted by helloktk

,

EM Algorithm: Line Fitting

Image Recognition 2008. 6. 29. 15:30

아래 그림을 보면 우리는 데이터를 연결하는 두 개의 직선을 생각할 수 있을 것이다. 그럼 두 개의 직선을 어떻게 얻을 것인가? 물론, ICA(independent component analysis)를 이용하는 것이 한 가지 방법이 될 것이다. 여기서는 EM 알고리즘을 이용하여서 두 개의 직선을 기술하는 기울기와 $y$-절편의 값을 구하는 방법을 알아보자.

두 직선을 각각 $y=a_1 x + b_1$, $y = a_2 x + b_2$라고 하면, $(a_1, b_1)$, $(a_2, b_2)$를 구하는 문제이다. 만약 각각의 data가 에러가 수반되는 측정에 의해서 얻어졌다고 하자. 에러 분포가 정규분포를 따른다면(여기서는 두 개의 모델 모두 갖은 표준편차를 갖는다고 가정했다) $i$-번째의 데이터가 각각의 직선 모델 1과 2에 속할 확률은 (posterior with equal prior) Bayes 공식에 의해서

$$ w_1[i] = \frac{ e^{ - \frac{ r_1^2[i]}{2\sigma^2} } } {e^{ - \frac{ r_1^2[i]}{2\sigma^2} } + e^{ - \frac{ r_2^2[i]}{2\sigma^2} } }, \quad w_1[i] = \frac{ e^{ - \frac{ r_2^2[i]}{2\sigma^2} } } {e^{ - \frac{ r_1^2[i]}{2\sigma^2} } + e^{ - \frac{ r_2^2[i]}{2\sigma^2} } }, \quad i = 0,1,2,... $$

로 주어진다. 여기서 $r_1(i)$와 $r_2(i)$는 residual error이다:

$$r_1[i] = a_1 x[i] + b_1 - y[i],\quad r_2[i] = a_2 x[i] + b_2 - y[i], \quad i=0,1,2,...$$

(*이 값 대신에 직선까지 거리=$\frac{|a_k x + b_k - y|}{\sqrt{1+ a_k^2}},~ k=1,2)$로 대체해도 된다)

이제 각각의 데이터에 대해서 posterior를 구하였으므로(E-step) 이 값을 가중치로 하여서 직선의 방정식을 다시 갱신한다. 즉, 각각의 data 점들에 대한 $w_1(i)$를 가중치로 하여서 다시 직선 모델 1의 파라미터를 재계산하고, 동일한 과정을 $w_2(i)$를 가지고 직선 모델 2를 계산하여서 $(a_1, b_1)$, $(a_2, b_2)$를 재계산한다(M-step). 이 갱신된 파라미터를 이용하여서 다시 가중치를 구하는 작업과, 직선의 파라미터를 구하는 작업을 일정하게 수렴할 때까지 반복을 하는 과정을 수행한다.

아래의 그림은 3번 만에 원하는 결과를 얻는 것을 보여준다. 직선의 파라미터에 대한 초기값과 residual error의 표준편차 파라미터에 대한 적절한 값의 선택이 중요하다.

// 코드의 일부...
std::vector<CPoint> data ;                             // data,
std::vector<double> w1(data.size()), w2(data.size());  // weights.
double a1, b1, a2, b2 ;                                // line params;
double sigma ;
// E-step;
void calcWeights() {
    for (int i = data.size(); i-- > 0;) {
        double  x = data[i].x, y = data[i].y ;
        double r1 = a1 * x + b1 - y ;
        double r2 = a2 * x + b2 - y ;
        double n1 = SQR(r1) / SQR(sigma) / 2;
        double n2 = SQR(r2) / SQR(sigma) / 2;
        double p1 = exp( - n1);
        double p2 = exp( - n2);
        w1[i] = p1 / (p1 + p2);
        w2[i] = p2 / (p1 + p2);
    }
};
//  M-step
void estimModels() {
    double s1xx = 0, s1x = 0, s1 = 0, s1y = 0, s1xy = 0;
    double s2xx = 0, s2x = 0, s2 = 0, s2y = 0, s2xy = 0;
    for (int i = data.size(); i-- > 0;) {
        double  x = data[i].x,
                y = data[i].y;
            s1xx += w1[i] * SQR(x);
            s1xy += w1[i] * x * y;
            s1x  += w1[i] * x;
            s1y  += w1[i] * y;
            s1   += w1[i];
            //
            s2xx += w2[i] * SQR(x);
            s2xy += w2[i] * x * y;
            s2x  += w2[i] * x;
            s2y  += w2[i] * y;
            s2   += w2[i];
    };
    double det1 = s1xx * s1 - SQR(s1x);
    double det2 = s2xx * s2 - SQR(s2x);
    a1 = (s1 * s1xy  - s1x * s1y ) / det1;
    b1 = (s1xx * s1y - s1x * s1xy) / det1;
    a2 = (s2 * s2xy  - s2x * s2y ) / det2;
    b2 = (s2xx * s2y - s2x * s2xy) / det2;
}

'Image Recognition' 카테고리의 다른 글

Robust Line Fitting (0)	2008.07.08
EM: Binarization (0)	2008.07.01
Gaussian Mixture Model (2)	2008.06.07
Rasterizing Voronoi Diagram (0)	2008.05.26
RANSAC Algorithm (0)	2008.05.24

Posted by helloktk

,

Geometry & Recognition

KMeans Algorithm

'Image Recognition' 카테고리의 다른 글

Adaptive Binarization

'Image Recognition > Fundamental' 카테고리의 다른 글

Robust Line Fitting

'Image Recognition' 카테고리의 다른 글

EM: Binarization

'Image Recognition' 카테고리의 다른 글

EM Algorithm: Line Fitting

'Image Recognition' 카테고리의 다른 글

카테고리

태그목록

최근에 올라온 글

최근에 달린 댓글

글 보관함

달력

링크

티스토리툴바

« 2024/05 »
일	월	화	수	목	금	토
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30	31