Jimmy Blog

Welcome!

PageRank背后的数学

求解pagerank的值,实际上就是求解马尔科夫链的收敛值。

数学抽象

图的顶点值,能抽象成一个向量I。初始值为\(I_0\)。迭代过程就是I不断的与一个矩阵G相乘,最终I的值不变,则达到收敛。

Google Matrix

Google Matrix 就是上面所说的矩阵G,那么G如何获得呢?

矩阵A

A matrix element \(A{i,j}\) is filled with 1 if node j has a link to node i, and 0 otherwise; this is the adjacency matrix of links. 图中的边j->i,则\(A{i,j} = 1\)

矩阵S

A related matrix S corresponding to the transitions in a Markov chain of given network is constructed from A by dividing the elements of column "j" by a number of\(kj\) where \(kj\) is the total number of outgoing links from node j to all other nodes. The columns having zero matrix elements, corresponding to dangling nodes, are replaced by a constant value 1/N. Such a procedure adds a link from every sink, dangling state a to every other node.

S对应矩阵A中的元素,若A的某一列的元素非0,则S相应的列的元素的值,为A中列的对应元素除以当前列中非零元素的个数;若A的某列都为0,则S对应的列的值为\(1/N\),其中N为矩阵的维度。例如:

则S为:;

矩阵G

Then the final Google matrix G can be expressed via S as:

$$G = \alpha S + (1-a) \frac{1}{N}$$

By the construction the sum of all non-negative elements inside each matrix column is equal to unity. The numerical coefficient \alpha is known as a damping factor.

Usually S is a sparse matrix and for modern directed networks it has only about ten nonzero elements in a line or column, thus only about 10N multiplications are needed to multiply a vector by matrix G.

幂法(power iteration)求解

之前其实就解释过了,简单来说,就是I不断的与一个矩阵G相乘,最终I的值不变,则达到收敛。

幂乘法可取吗?

I经过多次与G相乘最终收敛,根据结合律,可以写成 \( I = G^n{I_0} \)。 实际上是G自己的n次方收敛了。采用“幂乘法”先把收敛的\(G^n\) 求出来,看似矩阵相乘的次数变少了,只需要G平方\(log(n) + 1 \)次,实际上这是错误的。G每次平方,复杂度都是向量I乘以G的n倍,n是G的维度。迭代收敛次数往往是小于n的,所以,直接用向量乘矩阵会快一些。

例如,G的维度是1000,I乘上G,迭代500次即可收敛(即 \( G^{500}\)收敛),而计算G的n次方收敛需要7次,才能达到\( G^{512}\),收敛,而一次G的平方耗费的时间是I与G相乘的1000倍,复杂度远远超过预期。所以,“幂乘法”是不可取的。

参考

综述
Google Matrix
幂法wiki
幂法计算
特征值与特征向量

comments powered by Disqus