Skip to content

Optimization

  • Maximum Likelihood Estimation(MLE) =Maximize the likelihood to find the best x
  • MSE=MLE with Gaussian noise assumption

Numberical methods

x ← x0% Initialization while not converge h ← descending_direction(x) % determine the direction α ← descending_step(x, h) % determine the step x ← x + αh % update the parameters

now try to find the best h and \(\alpha\)

Steepest descent method

Determine the direction by first-order approximation, namely gradient descent method

  • Taylor expansion at x0: \(F(x_0+\delta x)\approx F(x_0)+J_F\delta x\)
  • when \(J_F\delta x<0\), the object function will descend

  • Advantage:

    • easy to implement
    • perform well when far from the minimum
  • Disadvantage
    • converge slowly when near the minimum
    • waste a lot of computation

Newton method

  • Do second-order expansion
\[ F(x_k+\delta x)\approx F(x_k)+J_F\delta x+\frac{1}{2}\delta x^TH_F\delta x \]
  • Find \(\delta x\) to minimize \(F(x_k+\delta x)\)
\[ H_F\delta x+J_F^T=0 \]
  • the optimal direction(Newton step):\(\delta x=-H_F^{-1}J_F^T\)

  • Advantage

    • Fast convergence near the minimum
  • Disadvantage
    • Hessian requires a lot of computation

Guass-Newton method

  • Useful for solving nonlinear least squares
  • Advantage
    • No need to compute Hessian
    • Fast to converge
  • Disadvantage
    • If \(J_R^TJ_R\) is singular, the algorithm becomes unstable

适用情况

  1. 大规模问题:优先考虑最速下降法的变种(如随机梯度下降、共轭梯度法),或一阶方法(如Adam)。
  2. 中小规模凸优化:若Hessian矩阵易计算且正定,牛顿法(或拟牛顿法如BFGS)效率更高。
  3. 非线性最小二乘:高斯牛顿法及其改进(如Levenberg-Marquardt)是标准选择。

Robust estimation

Outliers

  • Inlier: obeys the model assumption
  • Outlier: differs significantly from the assumption

Outliers make MSE fail: MSE is proportional to residual square, which is affected a lot by outliers We can use other loss functions like L1, Huber

RANSAC

RANSAC:Random Sample Concensus——powerful method to handle outliers

Key ideas

  • The distribution of inliers is similar while outliers differ a lot
  • Use data point pairs to vote

“宁可基于少量可靠数据建立近似正确的模型,也不基于所有数据建立完全错误的模型”

输入:数据点集P,模型M,参数
输出:最佳模型参数,内点集合

1. 初始化:
   best_model = None
   best_inliers = ∅
   best_score = 0
   iterations = 0
   max_iterations = N

2. while iterations < max_iterations:
   a. 随机采样:从P中随机选择最小样本集S(拟合M所需的最少点数)
      - 直线拟合:2个点
      - 平面拟合:3个点
      - 单应性矩阵:4个点

   b. 模型估计:用S估计模型参数M_temp

   c. 共识集识别:对P中每个点p_i
      计算误差 e_i = distance(p_i, M_temp)
      如果 e_i < 阈值t → 加入内点集合C_temp

   d. 模型评估:
      如果 |C_temp| > |best_inliers|:
          best_inliers = C_temp
          best_model = M_temp
          (可选) 更新迭代次数N

   e. iterations += 1

3. 用best_inliers重新估计模型(最小二乘等)
4. 返回最终模型和内点集合

Overfitting and underfitting

  • ill-posed problems: solution is not unique.
    • Eg: \(\text{min}_x||Ax-b||^2\) when #equations < #variables

Solve ill-posed:

  • L2 regularization:

  • L1 regularization:

Comments