Aprior关联算法的简单应用


现有用户的浏览行为记录日志,要找出用户最习惯的,特征最明显的浏览模式,如图那个有名的“啤酒和尿布”的例子。
具体应用中,已知某guid最习惯的浏览行为是tag2->tag4->tag6,当我们检测到他有tag2->tag4的浏览行为时候,就可以给它推tag6相关的文章和广告啦!
Apriori算法的wiki:
盗图一张,万一作者看到联系马上更改:
Image

某guid近期浏览记录:
4位数字是tag的编号,例子中,该用户大量的浏览tag是0100,0200,对应的具体tag是”基础服务“和”其他“,均包含了很多子tag,所以每日的浏览模式会很少。

2015-09-02 20:38:58,938 0308 2015-09-02 20:39:03,216 0308 2015-09-02 20:39:15,483 0308 2015-09-02 20:39:20,980 0308 2015-09-02 20:40:00,360 0308 2015-09-02 20:40:10,305 0308 2015-09-02 20:40:37,217 0308 2015-09-02 20:59:43,822 0308 2015-09-02 21:00:22,120 0308 2015-09-02 21:00:27,719 0308 2015-09-02 21:01:50,543 0308 2015-09-02 21:06:02,677 0308 2015-09-02 21:06:50,745 0308 2015-09-02 21:07:05,906 0308 2015-09-02 21:11:06,785 0308 2015-09-02 21:11:24,363 0308 2015-09-02 21:12:14,722 0308 2015-09-02 21:34:14,251 0902 2015-09-02 21:35:05,775 0902 
9月2号:0308->0902

2015-09-03 16:38:06,849 0200 2015-09-03 16:38:18,972 0200 2015-09-03 16:41:40,659 0200 2015-09-03 16:42:02,409 0308 2015-09-03 17:06:28,722 0308 2015-09-03 17:56:40,167 0200 2015-09-03 17:57:36,087 0308 2015-09-03 17:58:00,721 0308 2015-09-03 17:58:18,369 0308 2015-09-03 17:58:20,009 0308 2015-09-03 17:58:36,888 0200 2015-09-03 17:59:02,279 0308 2015-09-03 17:59:30,335 0308 2015-09-03 17:59:43,389 0308 2015-09-03 18:03:59,930 0308 2015-09-03 18:04:36,429 0308 2015-09-03 18:06:49,141 0308 2015-09-03 18:08:04,562 0308 2015-09-03 18:10:36,167 0308 2015-09-03 18:10:43,686 0308 2015-09-03 18:10:54,361 0308 2015-09-03 18:11:36,896 0308 2015-09-03 18:11:51,096 0308 2015-09-03 18:12:14,380 0100 2015-09-03 18:28:55,344 0308 2015-09-03 18:37:45,778 0308 2015-09-03 18:53:40,161 0308 2015-09-03 19:06:26,674 0308 2015-09-03 19:13:46,643 0308 2015-09-03 19:14:54,566 0308 2015-09-03 19:14:57,850 0308 2015-09-03 19:15:00,270 0308 2015-09-03 19:15:04,488 0308 2015-09-03 19:48:51,459 0100 2015-09-03 19:49:30,384 0100 2015-09-03 19:57:12,397 0308 2015-09-03 20:07:24,415 0308 2015-09-03 20:07:55,528 0308 2015-09-03 20:31:12,867 0308 2015-09-03 20:31:41,817 0200 
9月3号:0200->0308,

2015-09-04 21:24:31,505 0200 2015-09-04 21:27:54,477 0200 2015-09-04 21:28:27,640 0200 2015-09-04 21:28:39,372 0200 2015-09-04 21:30:56,856 0100 2015-09-04 21:31:36,348 0200 2015-09-04 21:32:48,291 0200 2015-09-04 21:41:39,930 0200 2015-09-04 21:51:43,956 0200 2015-09-04 22:02:27,459 0200 
9月4号:0200->0100

2015-09-06 20:00:09,345 0100 2015-09-06 20:01:01,266 0100 2015-09-06 20:01:23,361 0100 2015-09-06 20:01:39,573 0100 2015-09-06 20:01:59,884 0100 2015-09-06 20:02:02,774 0100 2015-09-06 20:02:51,668 0100 2015-09-06 20:03:55,547 0100 2015-09-06 20:04:29,131 0100 2015-09-06 20:04:54,032 0100 2015-09-06 20:05:12,680 0100 2015-09-06 20:05:50,725 0100 2015-09-06 20:06:13,606 0100 2015-09-06 20:06:18,025 0904 2015-09-06 20:06:26,665 0904 2015-09-06 21:54:16,613 0902 2015-09-06 21:54:50,259 0200 2015-09-06 22:02:48,431 0701 2015-09-06 22:03:06,127 0701 2015-09-06 22:04:40,222 0701 2015-09-06 22:13:31,744 0701 2015-09-06 22:13:40,333 0701 
9月5号:0100->0902->0904->0701->0200,

2015-09-07 23:38:32,675 0100 2015-09-07 23:39:49,303 0902 2015-09-07 23:42:15,449 0902 2015-09-07 23:42:30,667 0902 2015-09-07 23:45:40,068 0701 2015-09-07 23:47:42,519 0701 
9月7号:0100->0902->0701

2015-09-08 16:39:53,401 0302 2015-09-08 16:43:35,643 0701 2015-09-08 16:43:50,197 0701 2015-09-08 16:47:54,928 0701 2015-09-08 16:48:47,664 0902 2015-09-08 16:50:22,667 0701 2015-09-08 16:50:48,685 0701 2015-09-08 16:51:42,379 0701 2015-09-08 16:52:47,955 0701 2015-09-08 16:54:34,203 0701 2015-09-08 16:56:01,374 0701 2015-09-08 16:57:42,857 0701 2015-09-08 16:59:48,018 0701 2015-09-08 17:01:41,205 0200 2015-09-08 17:01:43,923 0902 2015-09-08 17:02:18,098 0200 2015-09-08 17:05:44,430 0200 2015-09-08 17:06:54,183 0200 2015-09-08 17:08:00,875 0200 2015-09-08 17:09:52,728 0200 2015-09-08 17:11:24,505 0200 2015-09-08 17:12:19,421 0200 2015-09-08 17:12:45,215 0402 2015-09-08 17:13:07,015 0402 2015-09-08 17:13:40,922 0402 2015-09-08 17:13:44,646 0402 2015-09-08 17:13:51,880 0402
9月8号:0302->0701->0902->0200->0402

精简后:

时间 tag的浏览记录
第2天 0200->0308
第3天 0200->0100
第4天 0100->0902->0904->0701->0200
第5天 0100->0902->0701
第6天 0302->0701->0902->0200->0402

扫描产生支持度:
0308:2/19
0902:4/19
0200:4 略去分母
0100:3
0904:1
0701:2
0302:1
0402:1

比较大于1
0308:2/19
0902:4/19
0200:4
0100:3
0701:2

连接:
0308,0902:1
0308,0200:1
0308,0100:0
0308,0701:0
0902,0200:1
0902,0100:0
0902,0701:2
0200,0100:1
0200,0701:2
0100,0701:0

扫描:
0902,0701:2
0100,0701:2

连接:
0902,0701:2
0100,0701:2

0100是“基础服务”,0902比较厉害,所以最终保留模式:
0902->0701
以后该用户看tag是0902的页面时候,为他推0701的文章和广告,或许能减少该用户对广告的厌恶吧。 =~=

Comments
Write a Comment