Your call to curve_fit
is incorrect. From the documentation:
xdata : An M-length sequence or an (k,M)-shaped array for functions with k predictors.
The independent variable where the data is measured.
ydata : M-length sequence
The dependent data — nominally f(xdata, ...)
In this case your independent variables xdata
are the columns A to D, i.e. table[['A', 'B', 'C', 'D']]
, and your dependent variable ydata
is table['Z_real']
.
Also note that xdata
should be a (k, M) array, where k is the number of predictor variables (i.e. columns) and M is the number of observations (i.e. rows). You should therefore transpose your input dataframe so that it is (4, M) rather than (M, 4), i.e. table[['A', 'B', 'C', 'D']].T
.
The whole call to curve_fit
might look something like this:
curve_fit(func, table[['A', 'B', 'C', 'D']].T, table['Z_real'])
Here's a complete example showing multiple linear regression:
import numpy as np
import pandas as pd
from scipy.optimize import curve_fit
X = np.random.randn(100, 4) # independent variables
m = np.random.randn(4) # known coefficients
y = X.dot(m) # dependent variable
df = pd.DataFrame(np.hstack((X, y[:, None])),
columns=['A', 'B', 'C', 'D', 'Z_real'])
def func(X, *params):
return np.hstack(params).dot(X)
popt, pcov = curve_fit(func, df[['A', 'B', 'C', 'D']].T, df['Z_real'],
p0=np.random.randn(4))
print(np.allclose(popt, m))
# True