This work point out four problem of the previous one:
First, the objective of learning is formalized as minimizing errors in classification of document pairs,
rather than minimizing errors in ranking of documents. Second, the training process is computationally costly, as
the number of document pairs is very large. Third, the as-sumption of that the document pairs are generated i.i.d. is also too strong. Fourth, the number of generated document pairs varies largely from query to query, which will result in training a model biased toward queries with more document pairs.
The list-wise procedure can be summarized by the following picture,
where q are set of queries, d(i) associated with respect to query q(i), The judgment y(i)j represents the relevance degree of d(i)j to q(i).
And x are the feature vectors and z are the list of scores obtained, the objective of learning is formalized as minimization of the total losses with respect to the training data.
To measure the loss function, they first convert the ranking into probability scores,
by calculating the permutation probability of the lists.
but to simplify the computation procedure, they calculate the top one permutation probability instead.
And with the use of top one probability, given two lists of scores we can use any metric to represent the distance (list-wise loss function) between the two score lists.
For simplicity again, they transformed the top one probability into exponential form,
And finally they use Neural Network and gradient method to solve the regression problem,
the algorithm is as summarized as following:













