pubjilo.blogg.se -

Proximal gradient algorithm to recover sparse vector code

That's because the $\ell_1$-norm can't measure a difference between a solution where just one entry of $a_i$ is non-zero and a solution where that entry of $a_i$ is "spread out" so that now all entries of $a_i$ are non-zero When columns of $X$ are highly correlated with each other, the $\ell_1$-norm tends not to promote sparsity as much as we would like. In particular, the first block of columns of $X$ are all scalar multiples of each other, and likewise for the second block of columns of $X$, etc.

I think we will run into a problem in that some columns of $X$ are highly correlated with each other. Here are some further thoughts which I believe are rather important but which I'll need to think more carefully about to check to see if I'm making an error. With the above facts, you're now ready to implement the proximal gradient method to solve problem (2). It can be shown that the proximal operator of $g$ (with parameter $t > 0$) simply "shrinks" each component of the input vector $a$ towards the origin by a distance $\lambda t$, stopping if we hit the origin. In more detail, you can solve the optimization problem This approach has the benefit that the $\ell_1$-norm is convex, so the resulting optimization problem is a convex problem. You can see at the end where I start to recognize the simpler solution.Ī common technique is to penalize the $\ell_1$-norm of $\tilde A$ to encourage $\tilde A$ to be sparse. Those components of $Ax$ will be neutralized, and the optimal objective function value will be the sum of the squares of the remaining components of $Ax$.īelow is my original solution using the proximal gradient method. Which rows of $\tilde A$ should be nonzero? It should be the rows of $\tilde A$ that correspond to the $s$ largest (in magnitude) components of $Ax$. Otherwise, the optimal $\tilde A$ will have $s$ nonzero rows. If $s$ is greater than the number of rows of $\tilde A$, then each row of $\tilde A$ will have one non-zero entry (at most), and the optimal objective function value will be $0$. Notice that none of the rows of $\tilde A$ should have more than one non-zero entry.

While describing an optimization algorithm for a relaxed version of this problem, I think I found a simple analytical solution for the original problem.