Predictive Analytics

Online advertising is a rapidly growing industry. The business models change very fast, posing a lot of challenges to engineering as well as machine learning community. And since the technology solutions impact the business directly, it is very important to improve on factors like scalability, optimization performance etc.

Komli has developed a very robust real time ad serving platform which performs better than* the most widely used system available in the market. An important component of the platform is the decisioning engine to choose the best ad per impression request in order to optimize the revenue. The engine uses a prediction system to predict the click through rate/conversion rate of a campaign based on publisher, advertiser and user level attributes. The model is trained on the historical data of these attributes and their performance. This article focuses on one of the experiments performed for feature selection before building the model.

Feature Selection: As in any machine learning task, one of the most important steps before training the model is feature selection. There are many features which could be used for ad optimization. Those features are nothing but variables which you think affects the CTR/CVR and hence would help in prediction, let’s call them predictor variables. Following is the list of predictor variables which we considered for our optimization problem. But since the data required by any statistical model increases exponentially as you increase the number of variables (without degrading the performance), the best strategy is to base your prediction on a subset of available variables. The below subset matters the most in prediction.

sr. no.

Variable

1

Country

2

City

3

DMA

4

Creative height

5

Creative width

6

Publisher Id

7

Site Id

8

Section Id or ad slot Id

9

Advertiser Id

10

Creative Id

11

Hour of the day

12

Refererurl

13

Publisher url

14

Placement id/fold position

15

Different user attributes obtained from a cookie

16

Page context, content category

17

Source Ad Network

18

day of the week (0-6)

19

4 hour (which 4 hour slot it is 0-5)

20

8 hour (which 8 hour slot it is 0-2)

21

Language

22

Social (high, low, average)

23

Demographic segments like Gender, Age, Income etc.

24

lifestyle (equivalent to interest)

25

Landing page url

26

Root domain id of the publisher

27

Root domain id of the landing page

28

Resolution of the user’s screen

29

External events – proximity to these events

30

Frequency of the user on the publisher

31

Frequency of the user on the creative

32

Frequency of the user on the ad network

33

Recency – length of time between exposures to this creative

34

Device of the user (mobile, laptop etc)

35

Browser of the user

36

Recent search history of the user

37

Type of the page the ad is on (can it be pop-up etc)

38

Whether the user is a member of the site

39

Publisher category

40

Metadata for the creative – product category, type of the creative etc.

There are a number of feature selection methods available in store. Methods like correlation analysis, PCA (for dimensionality reduction) or feature ranking methods based on scores like mutual information, information gain, gain ratio etc. Our existing analysis of importance of predictor variables and their correlation gave us an idea of which variables are correlated to each other and which are not. This article does not focus on that discussion.

We will mainly discuss the feature ranking performed by us and the results. We used mutual information as the scoring function to rank the variables. The mutual information of two discrete random variables can be defined as [http://en.wikipedia.org/wiki/Mutual_information]:

Info

where p(x,y) is the joint probability distribution function of X and Y, and p1(x) and p2(y) are the marginal probability distribution functions of X and Y respectively.

 The graph below shows the relative mutual information scores for a small sample of attributes. Most of the attributes are the identifiers of some of the physical entities like creative, publisher etc. The attribute CTR is derived from the click performance of CPA campaigns and frequency is derived from the number of exposures of a creative to a user.

 Multi Score

 This simple experiment gives us an idea on which variables are important and which are not. The results here are only for a sample, our overall results confirm that the variables with high value of mutual information do get a high importance in prediction.

 The next steps could include performing bucket tests (A/B testing) in order to gauge the performance of different models built on different subsets of variables.

*as per our benchmarking experiments.