ML: Embedding trained model to a website (Titanic pt.4)
Once we are satisfied with our trained model we might want to embed it somewhere - perhaps brag with it online. We are going to embed our Titanic model to a website with a simple form so that visitors can use it to get estimate of their survival on a virtual Titanic trip.
This article is part of Titanic series - a short series on basic ML concepts based on the famous Titanic Kaggle challenge
tl;dr
We are embedding a trained model to a website - see the full source code for all the details.
Deployed website - https://would-you-survive-titanic.herokuapp.com
the basics
We will be using Flask to build a simple website. I will assume you know how to use it and how to run/deploy it and not describe everything concerning that part in detail.
the model
The main task to address is getting our trained model ready. We will utilize the Pickle module in Python standard library:
1 2 3 | import pickle pickle.dump(classifier, "classifier.model", "wb")) |
To reuse the model in our web app we will simply load it and use it:
1 2 3 | model = pickle.load(f) ... probability = model.predict_proba(form_data) |
The predict_proba
method returns a numpy array, in our case with one element, which is another array containing probability of 0 (not surviving) and 1 (surviving).
the form
We will be using a simple form to collect user data, I chose WTForms library for this use. Defining form with it is simple, e.g. this is a simple definition for a required Sex
field:
1 2 3 4 5 6 7 8 9 10 | sex = SelectField( "Sex:", choices = [ ("1", "Male"), ("0", "Female") ], validators = [ validators.InputRequired() ] ) |
See the full form definition here.
the data model
I like to pass data around in well formed objects when appropriate, so I defined a simple model to represent our form data:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | class PassengerData(object): def __init__( self, sex: int, title: str, age: float, Pclass: int, ticket_strategy: int, SibSp: int, ParCh: int, embarked: str ): self.sex = sex self.title = title self.age = age self.Pclass = Pclass self.ticket_strategy = ticket_strategy self.SibSp = SibSp self.ParCh = ParCh self.embarked = embarked |
the prediction
The most important thing is preparing data and doing the prediction. The model I used is slightly more complex than the one we did in the previous part, but the main idea remains the same - we need to feed user submitted data to the model predictor - and therefore we need to preprocess the data. Here is a method I used to do so:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 | def __preprocess(passenger_data: PassengerData) -> list: age = passenger_data.age sex = passenger_data.sex embarked = passenger_data.embarked Pclass = passenger_data.Pclass title = passenger_data.title # Set up Fare value based on ticket buying strategy (values are hardcoded here and come from data exploration). s = passenger_data.ticket_strategy if Pclass == 1: if s == 0: fare = 30 elif s == 1: fare = 84 else: fare = 428 if Pclass == 2: if s == 0: fare = 13 elif s == 1: fare = 20 else: fare = 53 if Pclass == 3: if s == 0: fare = 8 elif s == 1: fare = 14 else: fare = 55 x = { "Fare": fare, "AgeCategory_Infant": int(age <= 5), "AgeCategory_Child": int(age > 5 and age <= 12), "AgeCategory_Teenager": int(age > 12 and age <= 18), "AgeCategory_YoungAdult": int(age > 18 and age <= 35), "AgeCategory_Adult": int(age > 35 and age <= 60), "AgeCategory_Senior": int(age > 60 and age <= 100), "Sex_female": int(sex == 0), "Sex_male": int(sex == 1), "Embarked_C": int(embarked == "C"), "Embarked_Q": int(embarked == "Q"), "Embarked_S": int(embarked == "S"), "Pclass_1": int(Pclass == 1), "Pclass_2": int(Pclass == 2), "Pclass_3": int(Pclass == 3), "FamilySize": passenger_data.SibSp + passenger_data.ParCh + 1, "Title_Mr": int(title == "Mr"), "Title_Mrs": int(title == "Mrs"), "Title_Miss": int(title == "Miss"), "Title_Master": int(title == "Master") } return np.array(list(x.values())).reshape(1, -1) |
The basic idea is to match data processing in the model itself, so that the prediction model can accurately predict the outcome.
And then the prediction itself - here we load the model and feed the preprocessed data to the model. It returns the probability of 1
(survived):
1 2 3 4 5 6 7 8 9 | def predict(passenger_data: PassengerData): dirname = os.path.dirname(__file__) filename = os.path.join(dirname, './model.pkl') with open(filename, "rb") as f: model = pickle.load(f) probability = model.predict_proba(Predictor.__preprocess(passenger_data))[0, 1] return probability |
gluing it together
We will build a simple Flask app to use the form and the predictor to predict the results. There is a template and some color-coding to make it look presentable involved.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 | from form.titanic_form import TitanicForm from flask import Flask, render_template, request from modules.prediction.predictor import Predictor from builder.passenger_data_builder import PassengerDataBuilder app = Flask(__name__) # Home page @app.route("/", methods=["GET", "POST"]) def home(): form = TitanicForm(request.form) prediction = None status = None if request.method == "POST" and form.validate(): passengerData = PassengerDataBuilder.build_from_form_data(request.form) prediction = int(Predictor.predict(passengerData) * 100) if prediction > 70: status = "green" elif prediction > 40: status = "yellow" else: status = "red" return render_template( "index.html", form = form, prediction = prediction, status = status ) if __name__ == "__main__": app.run(host = "0.0.0.0", port = 8888, debug = True) |
done
The app is ready to be run/deployed wherever you want, I host mine on Heroku just because it’s free and convenient.
See the full source code for all the details.
Deployed website - https://would-you-survive-titanic.herokuapp.com
This article is part of Titanic series - a short series on basic ML concepts based on the famous Titanic Kaggle challenge