Optimization

Maximum Likelihood Estimation(MLE) =Maximize the likelihood to find the best x
MSE=MLE with Gaussian noise assumption

Numberical methods

x ← x0% Initialization while not converge h ← descending_direction(x) % determine the direction α ← descending_step(x, h) % determine the step x ← x + αh % update the parameters

now try to find the best h and \(\alpha\)

Steepest descent method

Determine the direction by first-order approximation, namely gradient descent method

Taylor expansion at x0: \(F(x_0+\delta x)\approx F(x_0)+J_F\delta x\)
when \(J_F\delta x<0\), the object function will descend
Advantage:
- easy to implement
- perform well when far from the minimum
Disadvantage
- converge slowly when near the minimum
- waste a lot of computation

Newton method

Do second-order expansion

\[ F(x_k+\delta x)\approx F(x_k)+J_F\delta x+\frac{1}{2}\delta x^TH_F\delta x \]

Find \(\delta x\) to minimize \(F(x_k+\delta x)\)

\[ H_F\delta x+J_F^T=0 \]

the optimal direction(Newton step):\(\delta x=-H_F^{-1}J_F^T\)
Advantage
- Fast convergence near the minimum
Disadvantage
- Hessian requires a lot of computation

Guass-Newton method

Useful for solving nonlinear least squares
Advantage
- No need to compute Hessian
- Fast to converge
Disadvantage
- If \(J_R^TJ_R\) is singular, the algorithm becomes unstable

适用情况

大规模问题：优先考虑最速下降法的变种（如随机梯度下降、共轭梯度法），或一阶方法（如Adam）。
中小规模凸优化：若Hessian矩阵易计算且正定，牛顿法（或拟牛顿法如BFGS）效率更高。
非线性最小二乘：高斯牛顿法及其改进（如Levenberg-Marquardt）是标准选择。

Robust estimation

Outliers

Inlier: obeys the model assumption
Outlier: differs significantly from the assumption

Outliers make MSE fail: MSE is proportional to residual square, which is affected a lot by outliers We can use other loss functions like L1, Huber

RANSAC

RANSAC：Random Sample Concensus——powerful method to handle outliers

Key ideas

The distribution of inliers is similar while outliers differ a lot
Use data point pairs to vote

“宁可基于少量可靠数据建立近似正确的模型，也不基于所有数据建立完全错误的模型”

输入：数据点集P，模型M，参数
输出：最佳模型参数，内点集合

1. 初始化：
   best_model = None
   best_inliers = ∅
   best_score = 0
   iterations = 0
   max_iterations = N

2. while iterations < max_iterations:
   a. 随机采样：从P中随机选择最小样本集S（拟合M所需的最少点数）
      - 直线拟合：2个点
      - 平面拟合：3个点
      - 单应性矩阵：4个点

   b. 模型估计：用S估计模型参数M_temp

   c. 共识集识别：对P中每个点p_i
      计算误差 e_i = distance(p_i, M_temp)
      如果 e_i < 阈值t → 加入内点集合C_temp

   d. 模型评估：
      如果 |C_temp| > |best_inliers|：
          best_inliers = C_temp
          best_model = M_temp
          (可选) 更新迭代次数N

   e. iterations += 1

3. 用best_inliers重新估计模型（最小二乘等）
4. 返回最终模型和内点集合

Overfitting and underfitting

ill-posed problems: solution is not unique.
- Eg: \(\text{min}_x||Ax-b||^2\) when #equations < #variables

Solve ill-posed:

L2 regularization:
L1 regularization: