-AP&LLE · 无监督学习中的 选代表和被代表问题-AP&LLE 张响亮 XiangliangZhang King...

Post on 07-Jul-2020

4 views 0 download

transcript

无监督学习中的选代表和被代表问题

- AP & LLE张响亮

Xiangliang Zhang

King Abdullah University of Science and Technology

CNCC, Oct 25, 2018

Hangzhou, China

Outline

2

• Affinity Propagation (AP)[Frey and Dueck, Science, 2007]

• Locally Linear Embedding (LLE)[Roweis and Saul, Science, 2000]

Affinity Propagation

3

[Frey and Dueck, Science 2007]

Affinity Propagation

4

[Frey and Dueck, NIPS 2005]

We describe a new method that, for the first time to our knowledge, combines the advantages of model-based clustering and affinity-based clustering.

Component

Mixingcoefficient

Clustering: group the similar points together

21 )( miCx

km x

miµ-SS Î=

iCxm

m xC miÎ

S=||

Minimize

where

21 )( miCx

km x

miµ-SS Î=

}|{ miim Cxx Î=µ

Minimize

where

K-medoidsK-medians

Inspired by greedy k-medoids

Data to cluster:

The likelihood of belong to the cluster with center

is the Bayesian prior probability that is a cluster center

The responsibility of k-thcomponent generating xi

Assign xi with center si

Choose a new center

Understanding the processMessage sent from xi to eachcenter/exemplar: preference tobe with each exemplar

Hard decision for cluster centers/exemplars

Introduce: “availabilities”, which is sent from exemplars to other points, and provides soft evidence of the preference for each exemplar to be available as a center for each point

The method presented in NIPS‘05

8

• Responsibilities are computed using likelihoods and availabilities• Availabilities are computed using responsibilities, recursively

Affinities

Interpretation by Factor Graph

9

is the index of the exemplar for

Objective function is

Constraints:

Should not be emptywith a single exemplar

An exemplar mustselect itself as exemplar

Input and Output of AP in Science07

10

Preference (prior)

AP: a message passing algorithm

11

Iterations of Message passing in AP

12

Iterations of Message passing in AP

13

Iterations of Message passing in AP

14

Iterations of Message passing in AP

15

Iterations of Message passing in AP

16

Iterations of Message passing in AP

17

Iterations of Message passing in AP

18

Iterations of Message passing in AP

19

Summary AP

Xiangliang Zhang, KAUST CS340: Data Mining 20

Extensive study of AP

21

Outline

22

• Affinity Propagation (AP)[Frey and Dueck, Science, 2007]

• Locally Linear Embedding (LLE)[Roweis and Saul, Science, 2000]

Locally Linear Embedding (LLE)

23

[Roweis and Saul, Science, 2000]

Saul and Roweis. Think globally, fit locally: unsupervised learning of low dimensional manifolds. JMLR 2003

LLE - motivations

24

Inspired by MDS

25

Multidimensional Scaling (MDS), find embedding of objects in low-dim space, preserving pairwise distance

Given pairwise similarity Embedding to find

Eliminate the need to estimate pairwise distancesbetween widely separated data points?

LLE – general idea

� Locally, on a fine enough scale, everything looks linear

� Represent object as linear combination of its neighbors

� Assumption: same linear representation will hold in the low dimensional space

� Find a low dimensional embedding which minimizes reconstruction loss

26

LLE – matrix representation

1.Select k nearest neighbors

2.Reconstruct xi by its K nearest neighbors

27

2

i jjiji ||xwx|| (W) å å-=e

Find W by minimizing

LLE – matrix representation

3.Need to solve system Y = W*Y

28

Find the embedding vectors Yi by minimizing:

)econvariancunit with ( IYYN1 and

origin) on the (centered 0Y s.t.

W)(IW)(IM where

)Y(YM||YWY|| (Y)

N

ii

Ti

N

ii

T

N

i

N

jjiij

2

i jjiji

å =

å =

--=

åå •=å å-=e

LLE – algorithm summary

1. Find k nearest neighbors in X space

2. Solve for reconstruction weights W

3. Compute embedding coordinates Y using weights W:

� Create a sparse matrix M = (I-W)T(I-W)

� Set Y to be the eigenvectors corresponding to the bottom dnon-zero eigenvectors of M

29

neighborsnearest K s x'of one is and ),x()x(C whereC

C w jkT

jjk1-T

-1hhh --==

111

Continuing, SNE

30

Allows “many-to-one” mappings in which a single ambiguous object really belongs in several disparate locations in the low-dimensional space, while LLE makes one-to-one mapping.

pj|i is the asymmetric probability that i would pick j as its neighbor

Gaussian Neighborhood in original space

qj|i is induced probability that point i picks point j as its neighbor

Gaussian Neighborhood in low-dim space

Continuing, t-SNE

� uses a Student-t distribution (heavier tail) rather than a Gaussian to compute the similarity between two points in the low-dimensional space

� symmetrized version of the SNE cost functionwith simpler gradients

31

LLE - Follow up work

LLE output strongly depends on selection of k

Jing Chen and Yang Liu. Locally linear embedding: a survey.Artificial Intelligence Review (2011)

32

[Chen and Liu, 2011]

[Ting and Jordan, 2018]

Thank you for your attention!

Lab of Machine Intelligence and kNowledge Engineering (MINE): http://mine.kaust.edu.sa/