I'm in the early stages of learning SkFlow/TensorFlow, so I'll lay out my understanding of what I'm trying to do, incorrect as it may be.
Let's imagine I'm trying to build a model to predict if a car will fail an emissions test.
My training and testing csv might look something like this
make, fuel, year, mileage, days since service, passed test
vw, diesel, 2015, 10000, 20, 0
honda, petrol, 2008, 1000000, 234, 1
So the pass/fail
column being by y, the others being x.
So far, with Baltimore's help in my previous SO question I'm able to process the Iris dataset from a CSV file. That dataset is all numbers however.
This example on the TensorFlow website shows a model built with census data, using categorical and continuous data. I'm trying to use SkFlow as I understand it simplifies the process.
Anyway, to my code
x_train = genfromtxt('/Users/ben/Desktop/data.csv', dtype=None, delimiter=',' , usecols=(0, 1, 2, 3,4))
y_train = genfromtxt('/Users/ben/Desktop/data.csv', dtype='int', delimiter=',', usecols = (5))
feature_columns = [tf.contrib.layers.real_valued_column("", dimension=1)]
classifier = tf.contrib.learn.DNNClassifier(feature_columns=feature_columns,
hidden_units=[10, 20, 10],
n_classes=2,
model_dir="./tmp/model1")
# Fit model. Add your train data here
classifier.fit(x=x_train,y=y_train,steps=2000)
So I've got my csv data reading in fine into my x_train
and y_train
objects. The CSV has no headers, but could do if required.
I believe I'm trying to define which columns have which kind of data, something like
make = tf.contrib.layers.sparse_column_with_hash_bucket("make", hash_bucket_size=1000)
fuel = tf.contrib.layers.sparse_column_with_keys(column_name="fuel", keys=["diesel", "petrol"])
How do I build the feature_columns
object that gets passed into the classifier
?