ML: Embedding trained model to a website (Titanic pt.4)

Once we are satisfied with our trained model we might want to embed it somewhere - perhaps brag with it online. We are going to embed our Titanic model to a website with a simple form so that visitors can use it to get estimate of their survival on a virtual Titanic trip.

This article is part of Titanic series - a short series on basic ML concepts based on the famous Titanic Kaggle challenge

tl;dr

We are embedding a trained model to a website - see the full source code for all the details.

Deployed website - https://would-you-survive-titanic.herokuapp.com

the basics

We will be using Flask to build a simple website. I will assume you know how to use it and how to run/deploy it and not describe everything concerning that part in detail.

the model

The main task to address is getting our trained model ready. We will utilize the Pickle module in Python standard library:

1
2
3
import pickle

pickle.dump(classifier, "classifier.model", "wb"))                                           

To reuse the model in our web app we will simply load it and use it:

1
2
3
model = pickle.load(f)                                          
...
probability = model.predict_proba(form_data)                                           

The predict_proba method returns a numpy array, in our case with one element, which is another array containing probability of 0 (not surviving) and 1 (surviving).

the form

We will be using a simple form to collect user data, I chose WTForms library for this use. Defining form with it is simple, e.g. this is a simple definition for a required Sex field:

1
2
3
4
5
6
7
8
9
10
sex = SelectField(
    "Sex:",
    choices = [
        ("1", "Male"),
        ("0", "Female")
    ],
    validators = [
        validators.InputRequired()                                                      
    ]
)

See the full form definition here.

the data model

I like to pass data around in well formed objects when appropriate, so I defined a simple model to represent our form data:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
class PassengerData(object):
    def __init__(
        self,
        sex: int,
        title: str,
        age: float,
        Pclass: int,
        ticket_strategy: int,
        SibSp: int,
        ParCh: int,
        embarked: str
    ):
        self.sex = sex
        self.title = title
        self.age = age
        self.Pclass = Pclass
        self.ticket_strategy = ticket_strategy                                                  
        self.SibSp = SibSp
        self.ParCh = ParCh
        self.embarked = embarked

the prediction

The most important thing is preparing data and doing the prediction. The model I used is slightly more complex than the one we did in the previous part, but the main idea remains the same - we need to feed user submitted data to the model predictor - and therefore we need to preprocess the data. Here is a method I used to do so:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
def __preprocess(passenger_data: PassengerData) -> list:
        age = passenger_data.age
        sex = passenger_data.sex
        embarked = passenger_data.embarked
        Pclass = passenger_data.Pclass
        title = passenger_data.title

        # Set up Fare value based on ticket buying strategy (values are hardcoded here and come from data exploration).
        s = passenger_data.ticket_strategy
        if Pclass == 1:
            if s == 0:
                fare = 30
            elif s == 1:
                fare = 84
            else:
                fare = 428
        if Pclass == 2:
            if s == 0:
                fare = 13
            elif s == 1:
                fare = 20
            else:
                fare = 53
        if Pclass == 3:
            if s == 0:
                fare = 8
            elif s == 1:
                fare = 14
            else:
                fare = 55

        x = {
            "Fare": fare,
            "AgeCategory_Infant": int(age <= 5),
            "AgeCategory_Child": int(age > 5 and age <= 12),
            "AgeCategory_Teenager": int(age > 12 and age <= 18),
            "AgeCategory_YoungAdult": int(age > 18 and age <= 35),
            "AgeCategory_Adult": int(age > 35 and age <= 60),
            "AgeCategory_Senior": int(age > 60 and age <= 100),
            "Sex_female": int(sex == 0),
            "Sex_male": int(sex == 1),
            "Embarked_C": int(embarked == "C"),
            "Embarked_Q": int(embarked == "Q"),
            "Embarked_S": int(embarked == "S"),
            "Pclass_1": int(Pclass == 1),
            "Pclass_2": int(Pclass == 2),
            "Pclass_3": int(Pclass == 3),
            "FamilySize": passenger_data.SibSp + passenger_data.ParCh + 1,
            "Title_Mr": int(title == "Mr"),
            "Title_Mrs": int(title == "Mrs"),
            "Title_Miss": int(title == "Miss"),
            "Title_Master": int(title == "Master")
        }

        return np.array(list(x.values())).reshape(1, -1)

The basic idea is to match data processing in the model itself, so that the prediction model can accurately predict the outcome.

And then the prediction itself - here we load the model and feed the preprocessed data to the model. It returns the probability of 1 (survived):

1
2
3
4
5
6
7
8
9
def predict(passenger_data: PassengerData):
        dirname = os.path.dirname(__file__)
        filename = os.path.join(dirname, './model.pkl')
        with open(filename, "rb") as f:
            model = pickle.load(f)

        probability = model.predict_proba(Predictor.__preprocess(passenger_data))[0, 1]

        return probability

gluing it together

We will build a simple Flask app to use the form and the predictor to predict the results. There is a template and some color-coding to make it look presentable involved.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
from form.titanic_form import TitanicForm
from flask import Flask, render_template, request
from modules.prediction.predictor import Predictor
from builder.passenger_data_builder import PassengerDataBuilder

app = Flask(__name__)

# Home page
@app.route("/", methods=["GET", "POST"])
def home():
    form = TitanicForm(request.form)
    prediction = None
    status = None

    if request.method == "POST" and form.validate():
        passengerData = PassengerDataBuilder.build_from_form_data(request.form)
        prediction = int(Predictor.predict(passengerData) * 100)

        if prediction > 70:
            status = "green"
        elif prediction > 40:
            status = "yellow"
        else:
            status = "red"

    return render_template(
        "index.html",
        form = form,
        prediction = prediction,
        status = status
    )

if __name__ == "__main__":
    app.run(host = "0.0.0.0", port = 8888, debug = True)

done

The app is ready to be run/deployed wherever you want, I host mine on Heroku just because it’s free and convenient.

See the full source code for all the details.

Deployed website - https://would-you-survive-titanic.herokuapp.com

This article is part of Titanic series - a short series on basic ML concepts based on the famous Titanic Kaggle challenge