Optimization
- Maximum Likelihood Estimation(MLE) =Maximize the likelihood to find the best x
- MSE=MLE with Gaussian noise assumption
Numberical methods
x ← x0% Initialization while not converge h ← descending_direction(x) % determine the direction α ← descending_step(x, h) % determine the step x ← x + αh % update the parameters
now try to find the best h and \(\alpha\)
Steepest descent method
Determine the direction by first-order approximation, namely gradient descent method
- Taylor expansion at x0: \(F(x_0+\delta x)\approx F(x_0)+J_F\delta x\)
-
when \(J_F\delta x<0\), the object function will descend
-
Advantage:
- easy to implement
- perform well when far from the minimum
- Disadvantage
- converge slowly when near the minimum
- waste a lot of computation
Newton method
- Do second-order expansion
- Find \(\delta x\) to minimize \(F(x_k+\delta x)\)
-
the optimal direction(Newton step):\(\delta x=-H_F^{-1}J_F^T\)
-
Advantage
- Fast convergence near the minimum
- Disadvantage
- Hessian requires a lot of computation
Guass-Newton method
- Useful for solving nonlinear least squares
- Advantage
- No need to compute Hessian
- Fast to converge
- Disadvantage
- If \(J_R^TJ_R\) is singular, the algorithm becomes unstable
适用情况
- 大规模问题:优先考虑最速下降法的变种(如随机梯度下降、共轭梯度法),或一阶方法(如Adam)。
- 中小规模凸优化:若Hessian矩阵易计算且正定,牛顿法(或拟牛顿法如BFGS)效率更高。
- 非线性最小二乘:高斯牛顿法及其改进(如Levenberg-Marquardt)是标准选择。
Robust estimation
Outliers
- Inlier: obeys the model assumption
- Outlier: differs significantly from the assumption
Outliers make MSE fail: MSE is proportional to residual square, which is affected a lot by outliers We can use other loss functions like L1, Huber
RANSAC
RANSAC:Random Sample Concensus——powerful method to handle outliers
Key ideas
- The distribution of inliers is similar while outliers differ a lot
- Use data point pairs to vote
“宁可基于少量可靠数据建立近似正确的模型,也不基于所有数据建立完全错误的模型”
输入:数据点集P,模型M,参数
输出:最佳模型参数,内点集合
1. 初始化:
best_model = None
best_inliers = ∅
best_score = 0
iterations = 0
max_iterations = N
2. while iterations < max_iterations:
a. 随机采样:从P中随机选择最小样本集S(拟合M所需的最少点数)
- 直线拟合:2个点
- 平面拟合:3个点
- 单应性矩阵:4个点
b. 模型估计:用S估计模型参数M_temp
c. 共识集识别:对P中每个点p_i
计算误差 e_i = distance(p_i, M_temp)
如果 e_i < 阈值t → 加入内点集合C_temp
d. 模型评估:
如果 |C_temp| > |best_inliers|:
best_inliers = C_temp
best_model = M_temp
(可选) 更新迭代次数N
e. iterations += 1
3. 用best_inliers重新估计模型(最小二乘等)
4. 返回最终模型和内点集合
Overfitting and underfitting
- ill-posed problems: solution is not unique.
- Eg: \(\text{min}_x||Ax-b||^2\) when #equations < #variables
Solve ill-posed:
-
L2 regularization:

-
L1 regularization:
