kafka顺序消费

https://www.cnblogs.com/hopelee/p/7285340.html    kafka与pykafka

https://blog.csdn.net/bigtree_3721/article/details/80953197     顺序消费的讨论

https://blog.csdn.net/clean_fish/article/details/90632988  pykafka生产消息

https://www.jianshu.com/p/453c6e7ff81c   参考rabitmq设计

https://blog.csdn.net/chaiyu2002/article/details/89472416   kafka的offset设置原理

顺序消费,需要只有一个partition

https://blog.csdn.net/qq_24975309/article/details/82026022 熵值法
https://github.com/paulbrodersen/entropy_based_binning/blob/master/entropy_based_binning.py 基于熵的分箱
https://github.com/lisette-espin/pychimerge chimerge算法
https://blog.csdn.net/pzw_0612/article/details/45280411 scipy日志
http://www.cnblogs.com/hdu-zsk/p/6293721.html 显著性检测
https://www.cnblogs.com/think-and-do/p/6509239.html t,chi2,f分布

MITIE

http://www.crownpku.com/2017/07/27/%E7%94%A8Rasa_NLU%E6%9E%84%E5%BB%BA%E8%87%AA%E5%B7%B1%E7%9A%84%E4%B8%AD%E6%96%87NLU%E7%B3%BB%E7%BB%9F.html   参考使用rasa构建中文NLU系统

https://blog.csdn.net/qq_32166627/article/details/68942216   训练word2vec模型

 

 

logstash filebeat

wget https://download.elastic.co/logstash/logstash/logstash-2.3.2.tar.gz

sudo rpm --import https://packages.elastic.co/GPG-KEY-elasticsearch
/etc/yum.repos.d/elastic.repo [elastic-5.x] name=Elastic repository for 5.x packages baseurl=https://artifacts.elastic.co/packages/5.x/yum gpgcheck=1 gpgkey=https://artifacts.elastic.co/GPG-KEY-elasticsearch enabled=1 autorefresh=1 type=rpm-md
sudo yum install filebeat
sudo chkconfig --add filebeat

filebeat.publish_async: false
filebeat.spool_size: 8192
filebeat.idle_timeout: 5s
max_procs: 1
queue_size: 1000

filebeat.prospectors:
– input_type: log
paths:
– /var/www/html/hdp/hadoop-2.6.5/logs/hdfs-audit.log
#tail_files: true
harvester_buffer_size: 8192

output.kafka:
enabled: true
hosts: [“host:9092”]
topic: “hdfs_audit_log_sandbox”
client_id: “ansible-m1”
worker: 10
max_retries: 3
bulk_max_size: 8192
channel_buffer_size: 512
timeout: 10
broker_timeout: 3s
keep_alive: 0
compression: none
max_message_bytes: 1000000
required_acks: 0
flush_interval: 1

logging.metrics.period: 10s

processors:
– include_fields:
fields: [“message”, “beat.hostname”]

 

filebeat.sh -e -c ./filebeat.yml > /dev/null 2>&1 &