博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
聊聊storm trident的state
阅读量:5966 次
发布时间:2019-06-19

本文共 9253 字,大约阅读时间需要 30 分钟。

本文主要研究一下storm trident的state

StateType

storm-2.0.0/storm-client/src/jvm/org/apache/storm/trident/state/StateType.java

public enum StateType {    NON_TRANSACTIONAL,    TRANSACTIONAL,    OPAQUE}复制代码
  • StateType有三种类型,NON_TRANSACTIONAL非事务性,TRANSACTIONAL事务性,OPAQUE不透明事务
  • 对应的spout也有三类,non-transactional、transactional以及opaque transactional

State

storm-2.0.0/storm-client/src/jvm/org/apache/storm/trident/state/State.java

/** * There's 3 different kinds of state: * * 1. non-transactional: ignores commits, updates are permanent. no rollback. a cassandra incrementing state would be like this 2. * repeat-transactional: idempotent as long as all batches for a txid are identical 3. opaque-transactional: the most general kind of state. * updates are always done based on the previous version of the value if the current commit = latest stored commit Idempotent even if the * batch for a txid can change. * * repeat transactional is idempotent for transactional spouts opaque transactional is idempotent for opaque or transactional spouts * * Trident should log warnings when state is idempotent but updates will not be idempotent because of spout */// retrieving is encapsulated in Retrieval interfacepublic interface State {    void beginCommit(Long txid); // can be null for things like partitionPersist occuring off a DRPC stream    void commit(Long txid);}复制代码
  • non-transactional,忽略commits,updates是持久的,没有rollback,cassandra的incrementing state属于这个类型;at-most或者at-least once语义
  • repeat-transactional,简称transactional,要求不管是否replayed,同一个batch的txid始终相同,而且里头的tuple也不变,一个tuple只属于一个batch,各个batch之间不会重叠;对于state更新来说,replay遇到相同的txid,即可跳过;在数据库需要较少的state,但是容错性较差,保证exactly once语义
  • opaque-transactional,简称opaque,是用的比较多的一类,它的容错性比transactional强,它不要求一个tuple始终在同一个batch/txid,也就是说允许一个tuple在这个batch处理失败,但是在其他batch中处理成功,但是它可以保证每个tuple只在某一个batch中exactly成功处理一次;OpaqueTridentKafkaSpout就是这个类型的实现,它能容忍kafka节点丢失的错误;对于state更新来说,replay遇到相同的txid,则需要基于prevValue使用当前的值覆盖掉;在数据库需要更多空间来存储state,但是容错性好,保证exactly once语义

MapState

storm-2.0.0/storm-client/src/jvm/org/apache/storm/trident/state/map/MapState.java

public interface MapState
extends ReadOnlyMapState
{ List
multiUpdate(List
> keys, List
updaters); void multiPut(List
> keys, List
vals);}复制代码
  • MapState继承了ReadOnlyMapState接口,而ReadOnlyMapState则继承了State接口
  • 这里主要举MapState的几个实现类分析一下

NonTransactionalMap

storm-2.0.0/storm-client/src/jvm/org/apache/storm/trident/state/map/NonTransactionalMap.java

public class NonTransactionalMap
implements MapState
{ IBackingMap
_backing; protected NonTransactionalMap(IBackingMap
backing) { _backing = backing; } public static
MapState
build(IBackingMap
backing) { return new NonTransactionalMap
(backing); } @Override public List
multiGet(List
> keys) { return _backing.multiGet(keys); } @Override public List
multiUpdate(List
> keys, List
updaters) { List
curr = _backing.multiGet(keys); List
ret = new ArrayList
(curr.size()); for (int i = 0; i < curr.size(); i++) { T currVal = curr.get(i); ValueUpdater
updater = updaters.get(i); ret.add(updater.update(currVal)); } _backing.multiPut(keys, ret); return ret; } @Override public void multiPut(List
> keys, List
vals) { _backing.multiPut(keys, vals); } @Override public void beginCommit(Long txid) { } @Override public void commit(Long txid) { }}复制代码
  • NonTransactionalMap包装了IBackingMap,beginCommit及commit方法都不做任何操作
  • multiUpdate方法构造List ret,然后使用IBackingMap的multiPut来实现

TransactionalMap

storm-2.0.0/storm-client/src/jvm/org/apache/storm/trident/state/map/TransactionalMap.java

public class TransactionalMap
implements MapState
{ CachedBatchReadsMap
_backing; Long _currTx; protected TransactionalMap(IBackingMap
backing) { _backing = new CachedBatchReadsMap(backing); } public static
MapState
build(IBackingMap
backing) { return new TransactionalMap
(backing); } @Override public List
multiGet(List
> keys) { List
> vals = _backing.multiGet(keys); List
ret = new ArrayList
(vals.size()); for (CachedBatchReadsMap.RetVal
retval : vals) { TransactionalValue v = retval.val; if (v != null) { ret.add((T) v.getVal()); } else { ret.add(null); } } return ret; } @Override public List
multiUpdate(List
> keys, List
updaters) { List
> curr = _backing.multiGet(keys); List
newVals = new ArrayList
(curr.size()); List
> newKeys = new ArrayList(); List
ret = new ArrayList
(); for (int i = 0; i < curr.size(); i++) { CachedBatchReadsMap.RetVal
retval = curr.get(i); TransactionalValue
val = retval.val; ValueUpdater
updater = updaters.get(i); TransactionalValue
newVal; boolean changed = false; if (val == null) { newVal = new TransactionalValue
(_currTx, updater.update(null)); changed = true; } else { if (_currTx != null && _currTx.equals(val.getTxid()) && !retval.cached) { newVal = val; } else { newVal = new TransactionalValue
(_currTx, updater.update(val.getVal())); changed = true; } } ret.add(newVal.getVal()); if (changed) { newVals.add(newVal); newKeys.add(keys.get(i)); } } if (!newKeys.isEmpty()) { _backing.multiPut(newKeys, newVals); } return ret; } @Override public void multiPut(List
> keys, List
vals) { List
newVals = new ArrayList
(vals.size()); for (T val : vals) { newVals.add(new TransactionalValue
(_currTx, val)); } _backing.multiPut(keys, newVals); } @Override public void beginCommit(Long txid) { _currTx = txid; _backing.reset(); } @Override public void commit(Long txid) { _currTx = null; _backing.reset(); }}复制代码
  • TransactionalMap采取的是CachedBatchReadsMap,这里泛型使用的是TransactionalValue,beginCommit会设置当前的txid,重置_backing,commit的时候会重置txid,然后重置_backing
  • multiUpdate方法中判断如果_currTx已经存在值,且该值!retval.cached(即不是本次事务中multiPut进去的),那么不会更新该值(skip the update),使用newVal = val
  • multiPut方法构造批量的TransactionalValue,然后使用CachedBatchReadsMap.multiPut(List<List> keys, List vals)方法,该方法更新值之后会更新到缓存

    OpaqueMap

    storm-2.0.0/storm-client/src/jvm/org/apache/storm/trident/state/map/OpaqueMap.java

    public class OpaqueMap
    implements MapState
    { CachedBatchReadsMap
    _backing; Long _currTx; protected OpaqueMap(IBackingMap
    backing) { _backing = new CachedBatchReadsMap(backing); } public static
    MapState
    build(IBackingMap
    backing) { return new OpaqueMap
    (backing); } @Override public List
    multiGet(List
    > keys) { List
    > curr = _backing.multiGet(keys); List
    ret = new ArrayList
    (curr.size()); for (CachedBatchReadsMap.RetVal
    retval : curr) { OpaqueValue val = retval.val; if (val != null) { if (retval.cached) { ret.add((T) val.getCurr()); } else { ret.add((T) val.get(_currTx)); } } else { ret.add(null); } } return ret; } @Override public List
    multiUpdate(List
    > keys, List
    updaters) { List
    > curr = _backing.multiGet(keys); List
    newVals = new ArrayList
    (curr.size()); List
    ret = new ArrayList
    (); for (int i = 0; i < curr.size(); i++) { CachedBatchReadsMap.RetVal
    retval = curr.get(i); OpaqueValue
    val = retval.val; ValueUpdater
    updater = updaters.get(i); T prev; if (val == null) { prev = null; } else { if (retval.cached) { prev = val.getCurr(); } else { prev = val.get(_currTx); } } T newVal = updater.update(prev); ret.add(newVal); OpaqueValue
    newOpaqueVal; if (val == null) { newOpaqueVal = new OpaqueValue
    (_currTx, newVal); } else { newOpaqueVal = val.update(_currTx, newVal); } newVals.add(newOpaqueVal); } _backing.multiPut(keys, newVals); return ret; } @Override public void multiPut(List
    > keys, List
    vals) { List
    updaters = new ArrayList
    (vals.size()); for (T val : vals) { updaters.add(new ReplaceUpdater
    (val)); } multiUpdate(keys, updaters); } @Override public void beginCommit(Long txid) { _currTx = txid; _backing.reset(); } @Override public void commit(Long txid) { _currTx = null; _backing.reset(); } static class ReplaceUpdater
    implements ValueUpdater
    { T _t; public ReplaceUpdater(T t) { _t = t; } @Override public T update(Object stored) { return _t; } }}复制代码
    • OpaqueMap采取的是CachedBatchReadsMap,这里泛型使用的是OpaqueValue,beginCommit会设置当前的txid,重置_backing,commit的时候会重置txid,然后重置_backing
    • 与TransactionalMap的不同,这里在multiPut的时候,使用的是ReplaceUpdater,然后调用multiUpdate强制覆盖
    • multiUpdate方法与TransactionalMap的不同,它是基于prev值来进行update的,算出newVal

    小结

    • trident严格按batch的顺序更新state,比如txid为3的batch必须在txid为2的batch处理完之后才能处理
    • state分三种类型,分别是non-transactional、transactional、opaque transactional,对应的spout也是这三种类型
      • non-transactional无法保证exactly once,它可能是at-least once或者at-most once;其state计算参考NonTransactionalMap,对于beginCommit及commit操作都无处理
      • transactional类型能够保证exactly once,但是要求比较严格,要同一个batch的txid及tuple在replayed的时候仍然保持一致,因此容错性差一点,但是它的state计算相对简单,参考TransactionalMap,遇到同一个txid的值,skip掉即可
      • opaque transactional类型也能够保证exactly once,它允许一个tuple处理失败之后,出现在其他batch中处理,因而容错性好,但是state计算要多存储prev值,参考OpaqueMap,遇到同一个txid的值,使用prev值跟当前值进行覆盖
    • trident将保证exactly once的state的计算都封装好了,使用的时候,在persistentAggregate传入相应的StateFactory即可,支持多种StateType的factory可以选择使用StateType属性,通过传入不同的参数构造不同transactional的state;也可以通过实现StateFactory自定义实现state factory,另外也可以通过继承BaseQueryFunction来自定义stateQuery查询,自定义更新的话,可以继承BaseStateUpdater,然后通过partitionPersist传入

    doc

转载地址:http://yvtax.baihongyu.com/

你可能感兴趣的文章
mysql简单的命令centos版
查看>>
maven spring 使用memcached方法
查看>>
线程安全总结
查看>>
【非常有用=小白也可以简单操作】越狱系统中可以让多个Kindle应用程序同时使用的办法--自己......
查看>>
Emacs-24.2 中很重要的几个函数--实现自定义语法高亮的关键
查看>>
JEECMS站群管理系统-- 标签的配置流程
查看>>
一致性哈希算法及其在分布式系统中的应用
查看>>
Kubernetes PV/PVC/StroageClass 持久化存储简介
查看>>
无维护地稳定运行了8 年的 Hyperic HQ
查看>>
Ripple(Glance)
查看>>
SpringMVC工作原理
查看>>
一个月薪12000的北京程序员的真实生活
查看>>
ArrayList add方法深度解析。
查看>>
CCIE-交换路由复习笔记
查看>>
PHP 服务器变量 $_SERVER(转)
查看>>
概念清晰至关重要
查看>>
《时间投资法》读书笔记
查看>>
varnish 4.0 官方文档翻译14-Built in subroutines
查看>>
Linux基础 -- vim编辑器3 -- 查找和替换
查看>>
openssh-server (>= 1:6.6p1-2ubuntu1) but it is not going to be installed
查看>>