当前位置：首页 > 编程日记 > 正文

solrcloud Read and Write Side Fault Tolerance

编程日记 2024-11-21 11:00:01

2019独角兽企业重金招聘Python工程师标准>>>

SolrCloud supports elasticity, high availability, and fault tolerance in reads and writes. What this means, basically, is that when you have a large cluster, you can always make requests to the cluster: Reads will return results whenever possible, even if some nodes are down, and Writes will be acknowledged only if they are durable; i.e., you won't lose data.

Read Side Fault Tolerance

In a SolrCloud cluster each individual node load balances read requests across all the replicas in collection. You still need a load balancer on the 'outside' that talks to the cluster, or you need a smart client which understands how to read and interact with Solr's metadata in ZooKeeper and only requests the ZooKeeper ensemble's address to start discovering to which nodes it should send requests. (Solr provides a smart Java SolrJ client called CloudSolrClient.)

Even if some nodes in the cluster are offline or unreachable, a Solr node will be able to correctly respond to a search request as long as it can communicate with at least one replica of every shard, or one replica of every relevant shard if the user limited the search via the 'shards' or '_route_' parameters. The more replicas there are of every shard, the more likely that the Solr cluster will be able to handle search results in the event of node failures.

`zkConnected`

A Solr node will return the results of a search request as long as it can communicate with at least one replica of every shard that it knows about, even if it can not communicate with ZooKeeper at the time it receives the request. This is normally the preferred behavior from a fault tolerance standpoint, but may result in stale or incorrect results if there have been major changes to the collection structure that the node has not been informed of via ZooKeeper (ie: shards may have been added or removed, or split into sub-shards)

A zkConnected header is included in every search response indicating if the node that processed the request was connected with ZooKeeper at the time:

Solr Response with partialResults

{

"responseHeader": {

"status": 0,

"zkConnected": true,

"QTime": 20,

"params": {

"q": "*:*"

}

},

"response": {

"numFound": 107,

"start": 0,

"docs": [ ... ]

}

`shards.tolerant`

In the event that one or more shards queried are completely unavailable, then Solr's default behavior is to fail the request. However, there are many use-cases where partial results are acceptable and so Solr provides a boolean shards.tolerant parameter (default 'false'). If shards.tolerant=true then partial results may be returned. If the returned response does not contain results from all the appropriate shards then the response header contains a special flag called 'partialResults'. The client can specify 'shards.info' along with the 'shards.tolerant' parameter to retrieve more fine-grained details.

Example response with partialResults flag set to 'true':

Solr Response with partialResults

{

"responseHeader": {

"status": 0,

"zkConnected": true,

"partialResults": true,

"QTime": 20,

"params": {

"q": "*:*"

}

},

"response": {

"numFound": 77,

"start": 0,

"docs": [ ... ]

}

Write Side Fault Tolerance

SolrCloud is designed to replicate documents to ensure redundancy for your data, and enable you to send update requests to any node in the cluster. That node will determine if it hosts the leader for the appropriate shard, and if not it will forward the request to the the leader, which will then forward it to all existing replicas, using versioning to make sure every replica has the most up-to-date version. If the leader goes down, another replica can take its place. This architecture enables you to be certain that your data can be recovered in the event of a disaster, even if you are using Near Real Time Searching.

Recovery

A Transaction Log is created for each node so that every change to content or organization is noted. The log is used to determine which content in the node should be included in a replica. When a new replica is created, it refers to the Leader and the Transaction Log to know which content to include. If it fails, it retries.

Since the Transaction Log consists of a record of updates, it allows for more robust indexing because it includes redoing the uncommitted updates if indexing is interrupted.

If a leader goes down, it may have sent requests to some replicas and not others. So when a new potential leader is identified, it runs a synch process against the other replicas. If this is successful, everything should be consistent, the leader registers as active, and normal actions proceed. If a replica is too far out of sync, the system asks for a full replication/replay-based recovery.

If an update fails because cores are reloading schemas and some have finished but others have not, the leader tells the nodes that the update failed and starts the recovery procedure.

Achieved Replication Factor

When using a replication factor greater than one, an update request may succeed on the shard leader but fail on one or more of the replicas. For instance, consider a collection with one shard and a replication factor of three. In this case, you have a shard leader and two additional replicas. If an update request succeeds on the leader but fails on both replicas, for whatever reason, the update request is still considered successful from the perspective of the client. The replicas that missed the update will sync with the leader when they recover.

Behind the scenes, this means that Solr has accepted updates that are only on one of the nodes (the current leader). Solr supports the optional min_rf parameter on update requests that cause the server to return the achieved replication factor for an update request in the response. For the example scenario described above, if the client application included min_rf >= 1, then Solr would return rf=1 in the Solr response header because the request only succeeded on the leader. The update request will still be accepted as the min_rf parameter only tells Solr that the client application wishes to know what the achieved replication factor was for the update request. In other words, min_rf does not mean Solr will enforce a minimum replication factor as Solr does not support rolling back updates that succeed on a subset of replicas.

On the client side, if the achieved replication factor is less than the acceptable level, then the client application can take additional measures to handle the degraded state. For instance, a client application may want to keep a log of which update requests were sent while the state of the collection was degraded and then resend the updates once the problem has been resolved. In short, min_rf is an optional mechanism for a client application to be warned that an update request was accepted while the collection is in a degraded state.

转载于:https://my.oschina.net/u/172871/blog/854106

https://www.dkcj.cn/info/28468.html

solrcloud Read and Write Side Fault Tolerance

Read Side Fault Tolerance

`zkConnected`

`shards.tolerant`

Write Side Fault Tolerance

Recovery

Achieved Replication Factor

相关文章：

XML的二十个热点问题

5G+云网融合，移动云带领开发者释放边缘计算的力量

Linux下模拟RAID5实现磁盘损坏,数据自动切换到备份磁盘上

jsp9大内置对象

RHCSA 解析-01

关于Visual C#装箱与拆箱的研究

Imagination推出全新多核GPU IP系列：提供33种不同配置，AI算力达24 TOPS

ES6: 字符串

警惕！新版Net Transport（影音传送带）安装有猫腻

我是一个平平无奇的AI神经元

mysql的越过用户权限表登录

互联网引发全面深刻产业变革

apache模块

如何实现iframe(嵌入式帧)的自适应高度

拖拉机也将自动驾驶，日本劳动力短缺大力发展无人农业

php字符串操作

javascript事件列表解说

Facebook如何预测广告点击：剖析经典论文GBDT+LR

centos lustre 简单安装教程

安装flash

左右漂浮的广告代码

韩辉：国产操作系统的最大难题在于解决“生产关系”

java中运用label跳转

【转】超简单利用UGUI制作圆形小地图

Jmail的主要参数列表

VMware VSphere 虚拟化云计算学习配置笔记(一)

“国产操作系统最大难题在于解决「生产关系」” | 人物志

详解 Vue Vuex 实践

可控制的页面内滚动区域

提升对ASP.NET网站性能和多并发的设计的讨论