Database sharding

DB sharding in YHD

There are two solutions when DB becoming bottleneck in yihaodian.

  • Scale up Upgrade Oracle DB, adding more CPU , Disk and memory to incrase I/O performance. This is for short term only, high cost.
  • Scale out Divide the order table to multiple DBs, which is support horizontal extension, for long term purpose.

Orgional Oracle is replaced by multiple MySQL DB, supporintg one master and multiple slaves, supporitng segratation of read and write. Leveraging MySQL built-in Master-slave replication (SLA<1 second)

sharding dimensions

  • DB Field chosing, it should chose the filed that lead to least SQL and code change, to make the access fall in one database, instead of multiple DBs, which result in high I/O and significant logic change.
  • Here is one practice – Get all SQL – Pick up top fields appear in where clause. – List break down from three categories
    1. Single ID, i.e. userID=?
    2. Multiple ID. i.e. userID in (?,?,?)
    3. Not show

|Field| Single ID | Multiple ID | Not show| |:—| —:| —:| —:| |userID | 120 | 40| 330| |orderID | 60 | 80| 360| |shopID | 15 | 0| 485| It’s obviously we should chose userID for sharding. Hold on, this is just static analysis, we should conduct *dynamic** study as well, so list most executed SQLs, e.g. top 15 SQL (account to 85% of SQL calls), if we conduct sharding by user ID, 85% of those SQL will fall in single DB and 13% fall in multiple DB, and only 2% will scan all DB, so the performance is must better than sharding on other ID fields.

sharding strategy

There are two type of strategies

  1. By value range, e.g. user ID 1-9999 to DB1, and 10000-20000 to DB2. For this option,
  2. By value mod, e.g. userID mod n, when reminder is 0, go to DB1, reminder is 1, to DB2, etc.

Pros and Cons:

Criteria By Range By Mod
number of DBs initially only require small amount of DBs, can increasse by business requests initially number based on mod number, normally a big number
Accessibility initially only few DBs, perforamce cost is small, single DB performance query is poor initially big number of DBs, query acorss DBs may consume many resources, better for query on single DB
DBs adjustment easy, just add new DB, and impact is limit when split existing DB not easy, change mod value may result in DB migration across DBs
Data hotspot there are data hotspot issues no data hotspot issues

In practice, for the sake of simplicity, mod sharding is often used. To manage further sharding, and for smooth data migration, normally new DBs are added by folds, e.g. intially 4 DBs, furhter split will be 8 DBs, then 16 DBs. This is becuase only half of data in existing DB will be migrated to new DB, while the rest half will be remain unchanged. However, there are some super IDs, e.g. one big shop with massive records than normal, if we shard DB by user ID, there will one DB will many records than others. For this case, we need to provide separate DB for those super IDs.

sharding numbers

Firslty, that’s depends on the ability of single DB, e.g. normally one MySQL DB can support upto 50mio records, and Oracle can support 100mio. Normally multiple DBs may leads to certain perforamnce issues, when data query across multiple DBs, if there are multithreading call, it will cost precious thread resource, while it’s single thread, the wating time will be unacceptable. Normally, the initial sharding is 4-8 DBs.

Router transparency

To certain extent, DB sharding means change of DBSChema, which inevitable result in application, however, this is irrelavent to business logic, so the DB sharding should be transparent to business logic code, therefore, DB sharding should be handled at DAL (Data Access Layer) or DDAL (Distributed Data Access Layer).

  1. For access to single DB, e.g. query by certain user id, DAL will automatically route to that DB, even further split by mod, still no applicaiton logic code change impacted.
  2. For simple across DB query, DAL in charge to aggregate results from every DB query, still transparent to upper application logic.
  3. For query involves multiple DBs with aggretation functions, e.g. groupBy, order by, min, max, avg. It’s recommended DAL consolidate request from single DB, while upper layers do further processing. That’s becuase if rely on DAL, it would be too complex, and such case is relatively rare case, so leave it to upper layer.

Oracle Sharding

It’s required in Web 2.0 and high availability technologies

Shardingis an application-managed scaling technique using many (hundreds /thousands of) independent databases

  • Data is split into multiple databases (shards)
  • Each database holds a subset (either range or hash) of the data
  • Split the shards as data volume or access grows
  • Shards are replicated for availability and scalability

Sharding is the dominant approach for scaling massive websites

  • Application code dispatches request to a specific database based on key value
  • Queries are constrained -simple queries on shard-key
  • Data isdenormalizedto avoid cross-shard operations (no joins)
  • Each database holds all the data
  • Request dispatched to a specific database based on read/write,key value
  • Updates go to one database, changes are replicated to the other databases. The other databases are available for reads
  • Provides read scalability
  • Can be combined with horizontal sharding so that each shard is replicated to a different degree
  • Main benefit is that you do not need to reshard

Downsides of DB replica

  • Only async log shipping which can lose data in case of failure
  • Slaves can return inconsistent data
  • Statement based replication has correctness issues & row-based replication is immature
  • Replication is slow (high overhead on each reader, slaves are single-threaded)
  • No support for failover between master (primary) & slaves (backup)
  • Does not handle failure conditions such as missing or damaged logs
  • Storage engine and replication state may become inconsistent after a crash
  • Bringing a failed master back requires copying the database

–End–

2021

How to user fire extinguisher

Summary As you know, staff and your safety is paramount. So what if emergency take place, such as fire in office, how to help yourself and your colleagues by...

Deep dive into Kubernetes Client API

Summary To talk to K8s for getting data, there are few approaches. While K8s’ official Java library is the most widely used one. This blog will look into thi...

Whitelabel Error Page

Summary Whitelabel Error Page is the default error page in Spring Boot web app. It provide a more user-friently error page whenever there are any issues when...

Google maps no photos reviews

Summary I found a weird problem of the app Google Maps of my Oppo Android phone. That’s when you search a place in Google map, say “Central Park”, ideally th...

Debts in a nutshell

A debt security represents a debt owed by the issuer to an investor. Here, the investor acts as a lender to the issuer which may be a government, organisatio...

Back to Top ↑

2020

Debug Stuck IntelliJ

What happened to a debug job hanging in IntelliJ (IDEAS) IDE? You may find when you try to debug a class in Intellij but it stuck there and never proceed, e....

Awesome Kotlin

Difference with Scala Kotlin takes the best of Java and Scala, the response times are similar as working with Java natively, which is a considerable advantag...

JVM热身

此文是作者英文原文的翻译文章,英文原文在:http://todzhang.com/posts/2018-06-10-jvm-warm-up/

Mock in kotlin

Argument Matching & Answers For example, you have mocked DOC with call(arg: Int): Intfunction. You want to return 1 if argument is greater than 5 and -1 ...

Curl

Linux Curl command

AOP

The concept of join points as matched by pointcut expressions is central to AOP, and Spring uses the AspectJ pointcut expression language by default.

Micrometer notes

As a general rule it should be possible to use the name as a pivot. Dimensions allow a particular named metric to be sliced to drill down and reason about th...

Awesome SSL certificates and HTTPS

What’s TLS TLS (Transport Layer Security) and its predecessor, SSL (Secure Sockets Layer), are security protocols designed to secure the communication betwee...

JVM warm up by Escape Analysis

Why JVM need warm up I don’t know how and why you get to this blog. But I know the key words in your mind are “warm” for JVM. As the name “warm up” suggested...

Java Concurrent

This blog is about noteworthy pivot points about Java Concurrent Framework Back to Java old days there were wait()/notify() which is error prone, while fr...

Back to Top ↑

2019

Conversations with God

Feelings is the language of the soul. If you want to know what’s true for you about something, look to how your’re feeling about.

Kafka In Spring

Enable Kafka listener annotated endpoints that are created under the covers by a AbstractListenerContainerFactory. To be used on Configuration classes as fol...

Mifid

FX Spot is not covered by the regulation, as it is not considered to be a financial instrument by ESMA, the European Union (EU) regulator. As FX is considere...

Foreign Exchange

currency pairs Direct ccy: means USD is part of currency pair Cross ccy: means ccy wihtout USD, so except NDF, the deal will be split to legs, both with...

Back to Top ↑

2018

Guice

A new type of Juice Put simply, Guice alleviates the need for factories and the use of new in your Java code. Think of Guice’s @Inject as the new new. You wi...

YAML

Key points All YAML files (regardless of their association with Ansible or not) can optionally begin with — and end with …. This is part of the YAML format a...

Sudo in a Nutshell

Sudo in a Nutshell Sudo (su “do”) allows a system administrator to give certain users (or groups of users) the ability to run some (or all) commands as root...

Zoo-keeper

ZK Motto the motto “ZooKeeper: Because Coordinating Distributed Systems is a Zoo.”

Cucumber

Acceptance testing vs unit test It’s sometimes said that unit tests ensure you build the thing right, whereas acceptance tests ensure you build the right thi...

akka framework of scala

philosophy The actor model adopts the philosophy that everything is an actor. This is similar to the everything is an object philosophy used by some object-o...

Apache Camel

Camel’s message model In Camel, there are two abstractions for modeling messages, both of which we’ll cover in this section. org.apache.camel.Message—The ...

JXM

Exporting your beans to JMX The core class in Spring’s JMX framework is the MBeanExporter. This class is responsible for taking your Spring beans and registe...

Solace MQ

Solace PubSub+ It is a message broker that lets you establish event-driven interactions between applications and microservices across hybrid cloud environmen...

Core Java

Annotation retention policy What is Retention policy in java annotations?

Apigee

App deployment, configuration management and orchestration - all from one system. Ansible is powerful IT automation that you can learn quickly.

Ansible

Ansible: What Is It Good For? Ansible is often described as a configuration management tool, and is typically mentioned in the same breath as Chef, Puppet, a...

flexbox

How Flexbox works — explained with big, colorful, animated gifs

KDB

KDB However kdb+ evaluates expressions right-to-left. There are no precedence rules. The reason commonly given for this behaviour is that it is a much simple...

Agile and SCRUM

Key concept In Scrum, a team is cross functional, meaning everyone is needed to take a feature from idea to implementation.

Strategy-Of-Openshift-Releases

Release & Testing Strategy There are various methods for safely releasing changes to Production. Each team must select what is appropriate for their own ...

NodeJs Notes

commands to read files var lineReader = require(‘readline’).createInterface({ input: require(‘fs’).createReadStream(‘C:\dev\node\input\git_reset_files.tx...

CORS :Cross-Origin Resource Sharing

Cross-Origin Request Sharing - CORS (A.K.A. Cross-Domain AJAX request) is an issue that most web developers might encounter, according to Same-Origin-Policy,...

ngrx

Why @Effects? In a simple ngrx/store project without ngrx/effects there is really no good place to put your async calls. Suppose a user clicks on a button or...

iOS programming

View A view is also a responder (UIView is a subclass of UIResponder). This means that a view is subject to user interactions, such as taps and swipes. Thus,...

Back to Top ↑

2017

cloud computering

openshift vs openstack The shoft and direct answer is `OpenShift Origin can run on top of OpenStack. They are complementary projects that work well together....

cloud computering

Concepts Cloud computing is the on-demand demand delivery of compute database storage applications and other IT resources through a cloud services platform v...

Redux

whats @Effects You can almost think of your Effects as special kinds of reducer functions that are meant to be a place for you to put your async calls in suc...

reactive programing

The second advantage to a lazy subscription is that the observable doesn’t hold onto data by default. In the previous example, each event generated by the in...

Container

The Docker project was responsible for popularizing container development in Linux systems. The original project defined a command and service (both named do...

promise vs observiable

The drawback of using Promises is that they’re unable to handle data sources that produce more than one value, like mouse movements or sequences of bytes in ...

TypeScript noteworthy notes

Async Await keywords Async Await Support in TypeScript Async - Await has been supported by TypeScript since version 1.7. Asynchronous functions are prefixed ...

JDK source

interface RandomAccess Marker interface used by List implementations to indicate that they support fast (generally constant time) random access. The primary ...

SSH SFTP

Secure FTP SFTP over FTP is the equivalant of HTTPS over HTTP, the security version

AWS Tips

After establishing a SSH session, you can install a default web server by executing sudo yum install httpd -y. To start the web server, type sudo service htt...

Oracle

ORA-12899: Value Too Large for Column

Java Security Notes

Java Security well-behaved: programs should be prevent from consuming too much system resources

R Language

s<-read.csv("C:/Users/xxx/dev/R/IRS/SHH_SCHISHG.csv") # aggregate s2<-table(s$Original.CP) s3<-as.data.frame(s2) # extract by Frequency ordered s3...

SSH and Cryptography

SFTP versus FTPS SS: Secure Shell An increasing number of our customers are looking to move away from standard FTP for transferring data, so we are ofte...

Eclipse notes

How do I remove a plug-in? Run Help > About Eclipse > Installation Details, select the software you no longer want and click Uninstall. (On Macintosh i...

Maven-Notes

Maven philosophy “It is important to note that in the pom.xml file you specify the what and not the how. The pom.xml file can also serve as a documentatio...

Java New IO

Notes JDK 1.0 introduced rudimentary I/O facilities for accessing the file system (to create a directory, remove a file, or perform another task), accessi...

IT-Architect

SOA SOA is a set of design principles for building a suite of interoperable, flexible and reusable services based architecture. top-down and bottom-up a...

Algorithm

This page is about key points about Algorithm

Java-Tricky-Tech-Questions.md

What is the difference between Serializable and Externalizable in Java? In earlier version of Java, reflection was very slow, and so serializaing large ob...

Compare-In-Java

Concepts If you implement Comparable interface and override compareTo() method it must be consistent with equals() method i.e. for equal object by equals(...

Java Collections Misc

Difference between equals and deepEquals of Arrays in Java Arrays.equals() method does not compare recursively if an array contains another array on oth...

HashMap in JDK

Hashmap in JDK Some note worth points about hashmap Lookup process Step# 1: Quickly determine the bucket number in which this element may resid...

Java 8 Tips

This blog is listing key new features introduced in Java 8

IntelliJ Tips

Shortcuts Expand/collapse method body in code editor Cmd + +/- to expand and collapse a method body Show java doc Ctrl+J: To show JavaDoc

Back to Top ↑

2016

Java GC notes

verbose:gc verbose:gc prints right after each gc collection and prints details about each generation memory details. Here is blog on how to read verbose gc

Hash Code Misc

contract of hashCode : Whenever it is invoked on the same object more than once during an execution of a Java application, the hashCode method must consis...

Angulary Misc

Dependency Injection Angular doesn’t automatically know how you want to create instances of your services or the injector to create your service. You must co...

Java new features

JDK Versions JDK 1.5 in 2005 JDK 1.6 in 2006 JDK 1.7 in 2011 JDK 1.8 in 2014 Sun之前风光无限,但是在2010年1月27号被Oracle收购。 在被Oracle收购后对外承诺要回到每2年一个realse的节奏。但是20...

Simpler chronicle of CI(Continuous Integration) “乱弹系列”之持续集成工具

引言 有句话说有人的地方就有江湖,同样,有江湖的地方就有恩怨。在软件行业历史长河(虽然相对于其他行业来说,软件行业的历史实在太短了,但是确是充满了智慧的碰撞也是十分的精彩)中有一些恩怨情愁,分分合合的小故事,比如类似的有,从一套代码发展出来后面由于合同到期就分道扬镳,然后各自发展成独门产品的Sybase DB和微...

浅谈软件单元测试中的“断言” (assert),从石器时代进步到黄金时代。

大家都知道,在软件测试特别是在单元测试时,必用的一个功能就是“断言”(Assert),可能有些人觉得不就一个Assert语句,没啥花头,也有很多人用起来也是懵懵懂懂,认为只要是Assert开头的方法,拿过来就用。一个偶然的机会跟人聊到此功能,觉得还是有必要在此整理一下如何使用以及对“断言”的理解。希望可以帮助大家...

Kubernetes 与 Docker Swarm的对比

Kubernetes 和Docker Swarm 可能是使用最广泛的工具,用于在集群环境中部署容器。但是这两个工具还是有很大的差别。

Mac tips

how to show full path in Finder window Open and run following command in terminal window defaults write com.apple.finder _FXShowPosixPathInTitle -bool true; ...

http methods

RFC origion http://www.w3.org/Protocols/rfc2616/rfc2616-sec9.html#sec9.1.2)

Spark-vs-Storm

The stark difference among Spark and Storm. Although both are claimed to process the streaming data in real time. But Spark processes it as micro-batches; wh...

微服务

可以想像一下,之前的传统应用系统,像是一个大办公室里面,有各个部门,销售部,采购部,财务部。办一件事情效率比较高。但是也有一些弊端,首先,各部门都在一个房间里。

kibana, view layer of elasticsearch

What’s Kibana kibana is an open source data visualization plugin for Elasticsearch. It provides visualization capabilities on top of the content indexed on...

kibana, view layer of elasticsearch

What’s Kibana kibana is an open source data visualization plugin for Elasticsearch. It provides visualization capabilities on top of the content indexed on...

iConnect

UI HTML5, AngularJS, BootStrap, REST API, JSON Backend Hadoop core (HDFS), Hive, HBase, MapReduce, Oozie, Pig, Solr

Data Structure

Binary Tree A binary tree is a tree in which no node can have more than two children. A property of a binary tree that is sometimes important is that th...

Something about authentication

It’s annoying to keep on repeating typing same login and password when you access multiple systems within office or for systems in external Internet. There a...

SQL

Differences between not in, not exists , and left join with null

Github page commands notes

404 error for customized domain (such as godday) 404 There is not a GitHub Pages site here. Go to github master branch for gitpages site, manually add CN...

RenMinBi International

RQFII RQFII stands for Renminbi Qualified Foreign Institutional Investor. RQFII was introduced in 2011 to allow qualified foreign institutional investors to ...

Linux Tips

Get permission denied error when sudo su (or hyphen in sudo command) bash: /home/YOURNAME/.bashrc: Permission denied That’s because you didn’t add “-“ hyphen...

Load Balancing

Concepts LVS means Linux Virtual Server, which is one Linux built-in component.

Python

(‘—–Unexpected error:’, <type ‘exceptions.TypeError’>) datetime.datetime.now()

Microservices vs. SOA

Microservice Services are organized around capabilities, e.g., user interface front-end, recommendation, logistics, billing, etc. Services are small in ...

Java Class Loader

Codecache The maximum size of the code cache is set via the -XX:ReservedCodeCacheSize=N flag (where N is the default just mentioned for the particular com...

Back to Top ↑

2007

Back to Top ↑