Sparse optimization has found interesting applications in many areas such as machine learning, signal processing, compressive sensing, medical imaging, finance, etc. Sparse optimization involves minimizing the 1-norm or l1-like functions, which are non-differentiable. In order to apply the classic methods such as gradient descent and quasi Newton, a traditional trick is to consider smoothed approximates of the 1-norm such as the Huber-norm, (1+eps)-norm, and the sum of sqrt(x_i^2 + eps), where eps>0 is a small parameter. They cause a (slight) loss in solution sparsity and often require tuning eps. This talk introduces an alternative 'smoothing' approach that does not directly generate a smooth function but produces an unconstrained dual problem whose objective is differentiable and enjoys a 'restricted' strongly convex property. Not only can one apply a rich set of classic techniques such as gradient descent, line search, and quasi-Newton methods to this dual problem, global linear convergence is guaranteed. In particular, this gives the first global linear convergence result among the gradient-based algorithms for sparse optimization, including the recent algorithms based on operator splitting (ISTA/FISTA), variable spitting (Bregman/ADMM), and/or coordinate descent, which converge sublinearly in the global sense. In addition, our "smoothing" approach produces exactly sparse solutions with recovery guarantees. Numerical examples are presented on problems in compressive sensing, medical imaging, feature selection, video decomposition, and distributed LASSO with big data.