numpy - Outliers treatment in nonlinear Least squares (scipy) for a 5PL curve

Question

I am currently in need of fitting a 5PL curve to some datapoints that I have. 5PL is an asymmetric logistic function often used in bioassays analysis. Its formula is as follow:

F(x) = D+(A-D)/((1+(x/C)^B)^E)

I was able to obtain a fit using scipy in python (duh). In a first time I use knowledge on my data to determine the starting parameters for the function: -

A is the lower asymptote so guess it with min(y)
B is the Hills slope so guess it with the slope of the line between first and last point.
C is the inflection point (the concentration of analyte where you have
 half of the max response) so guess it finding the concentration whose 
response is nearest to the mid response.
D is the upper asymptote so guess it with max(y)
E is the asymmetric factor and so guess it with no asymmetry (E=1) for starters.

With this I then use res = least_squares(residuals, p0,bounds=bnd args=(x, y)) where residuals is the function that computes the residuals between my data and the 5PL function, p0 contains my initial parameters, bnd the bounds of the problem and args= are the arguments passed to residuals (my data).

Now the result is acceptable but I suspect my measurements to have strong outliers and I would like to get a more robust outcome. I found that you can do that by adding a loss function and modify the nonlin LS (as explained here.

The line for solving this problem becomes res_loss = least_squares(residuals, p0, bounds=bnd,loss='soft_l1', f_scale=1000, args=(x, y)) where loss='soft_l1' determines the type of loss function I use and f_scale the threshold between inliners and outliers.

Now in every example that I could find, people just generate data using the curve that they whant to fit and add noise to that signal. They can then set f_scale value equal to the noise they introduced.

That is nice and all but how should f_scale be chosen if there is no knowledge on what are the values for outliers? Is there a way to automatically determine that for every dataset using the data spread?

If my problem was linear I would just use the SD of the data at each X to create a weight matrix and solve a weighted least squares. Is there a similar method for nonlinear problems?

Thanks in advance

numpy - Outliers treatment in nonlinear Least squares (scipy) for a 5PL curve

0 に答える 0

Related

Reference