Decentralized distributed database system
full symmetric structure, no central server
Web scheme:
only retrieve qualified records from local database, Give the results
every retrieval should be carried out from the massive data of the local server
database scheme:
the database saves the index content of all servers
records with high cache hit rate, recing the retrieval time
server load analysis:
server load assumption:
100 nodes, each node 100 people use at the same time, Each node has 10000 records
Web server: at the same time, 100 threads search in the local database server
database server: receive 100 query requests each time; Each request is retrieved from a million indexes (in the worst case); Buffer mechanism can slightly rece the burden of data update operation:
update all databases at the same time / only update the local database, and the servers are synchronized with each other
scheme 2 (database saves local index and a small amount of buffer)
each university is a node
all nodes are fully symmetric structure, There is no central server in the network
Web Solution:
when receiving a request, multithreading searches other servers at the same time (server pressure problem?)<
database scheme:
database saves local data
database saves a certain amount of buffered data,
server load analysis:
server load assumption:
100 nodes, 100 people in each node use at the same time
then each web server initiates 10000 threads to search other data servers (Oops!)
each database server will receive 10000 query requests (Oops!)<
learning process can only slightly rece query requests and web server search threads
data update operation:
only update local
scheme 3 (central server scheme 1)
each university has the same structure and connects to the same central server
Web scheme
each query is made to the central server, The central server performs the search, and the central server returns the search results
database scheme
the central database saves all index information
each node can only save local users and other information with a small database
server load analysis:
server load assumption:
100 nodes, 100 people in each node use at the same time, Each node records 10000 data
Web server: start 100 processes to query the central database at the same time
database server (central): receive 10000 query requests and return large capacity results at the same time
database server (node): a small amount of work
data update operation:
only update the central server
scheme 4 (central service)
each university has a node
which has the same structure and connects to the same central server
Web scheme:
each query is made to the central server, which forwards the query content to the node database, and then returns the result from the node database
database scheme:
the central server saves the classification information of each node, According to the classification of page requests, forward queries to the corresponding server
server load analysis:
server load assumption:
100 nodes, 100 people in each node use at the same time, and 10000 data records in each node, Each node has 100 categories
Web server: 100 processes query the central database at the same time
database server (central): receive 10000 requests and forward them at the same time
database server (node): receive query requests from the central server, In the worst case, each node receives 10000 query requests
data update operation:
only update the local server
update the central server when the classification changes
a distributed database is a unified whole in logic, and it is stored in different physical nodes in physics. An application can access databases distributed in different geographical locations through network connection. Its distribution performance is that the data in the database is not stored in the same site. More specifically, it is not stored on the storage device of the same computer. This is the difference from the centralized database. From the user's point of view, a distributed database system is logically the same as a centralized database system. Users can execute global applications in any site. As if those data are stored on the same computer and managed by a single database management system (DBMS), users don't feel different
distributed database system is developed on the basis of centralized database system, which is the proct of the combination of computer technology and network technology. The distributed database system is suitable for the departments with scattered units. It allows each department to store its commonly used data locally and use them locally, so as to improve the response speed and rece the communication cost. Compared with the centralized database system, the distributed database system has scalability. By adding appropriate data rendancy, the reliability of the system can be improved. In the centralized database, one of the goals of the system is to rece rendancy as much as possible. The reason is that rendant data waste storage space, and it is easy to cause inconsistency between copies. In order to ensure the consistency of data, the system has to pay a certain maintenance cost. The goal of recing rendancy is achieved by data sharing. But in the distributed database, we want to add rendant data and store multiple copies of the same data in different sites. The reasons are as follows: 1. Improve the reliability and availability of the system. When a site fails, the system can operate the same in another site, and the whole system will not be paralyzed e to one failure. ② To improve the performance of the system, the system can select the nearest data to operate according to the distance, rece the communication cost and improve the performance of the whole system.
There are two kinds of distributed database systems: one is physically distributed, but logically centralized. This kind of distributed database is only suitable for relatively single and small units or departments. Another kind of distributed database system is distributed both physically and logically, which is called federated distributed database system. Because each sub database system of federation is relatively autonomous, this kind of system can accommodate a variety of databases with different uses and great differences, which is more suitable for large-scale database integration

Tidb community (asktug)
distributed database is a logically unified database composed of multiple physically dispersed database units connected by computer network. Each connected database unit is called a site or node. Distributed database has a unified database management system to manage, which is called distributed database management system. The basic characteristics of distributed database include: physical distribution, logical integrity and site autonomy
horizontal elastic expansion
tidb can be expanded horizontally by simply adding new nodes, expanding throughput or storage on demand, and easily coping with high concurrency and massive data scenarios
distributed transactions
tidb 100% supports standard acid transactions
real financial high availability
compared with the traditional master-slave (M-S) replication scheme, majority election protocol based on raft can provide 100% strong data consistency guarantee for financial level, and can realize automatic failure recovery without manual intervention on the premise of not losing most copies
one stop HTAP solution
tidb, as a typical OLTP row storage database, has powerful OLAP performance at the same time. With tispark, it can provide one-stop HTAP solution, which can process OLTP at the same time; OLAP, without the traditional tedious ETL process
distributed database. The localization of data access is an important part of distributed database design. This paper introces the main features and key technologies of
distributed database system, focusing on the segmentation of relationship and the access of distributed data.
Distributed database system (DDBS) includes distributed database management system (DDBMS) and distributed database (DDB). In the distributed database system, an application program can operate the database transparently. The data in the database are stored in different local databases, managed by different DBMS, run on different machines, supported by different operating systems, and connected by different communication networks
a distributed database is a unified whole in logic, and it is stored in different physical nodes in physics. An application can access databases distributed in different geographical locations through network connection. Its distribution performance is that the data in the database is not stored in the same site. More specifically, it is not stored on the storage device of the same computer. This is the difference from the centralized database. From the user's point of view, a distributed database system is logically the same as a centralized database system. Users can execute global applications in any site. As if those data are stored on the same computer and managed by a single database management system (DBMS), users don't feel different
distributed database system is developed on the basis of centralized database system, which is the proct of the combination of computer technology and network technology. The distributed database system is suitable for the departments with scattered units. It allows each department to store its commonly used data locally and use them locally, so as to improve the response speed and rece the communication cost. Compared with the centralized database system, the distributed database system has scalability. By adding appropriate data rendancy, the reliability of the system can be improved. In the centralized database, one of the goals of the system is to rece rendancy as much as possible. The reason is that rendant data waste storage space, and it is easy to cause inconsistency between copies. In order to ensure the consistency of data, the system has to pay a certain maintenance cost. The goal of recing rendancy is achieved by data sharing. But in the distributed database, we want to add rendant data and store multiple copies of the same data in different sites. The reasons are as follows: 1. Improve the reliability and availability of the system. When a site fails, the system can operate the same in another site, and the whole system will not be paralyzed e to one failure. ② To improve the performance of the system, the system can select the nearest data to operate according to the distance, rece the communication cost and improve the performance of the whole system

incremental data subscription and consumption, user's database operation, such as DML, DCL, DDL, etc., these operations will generate incremental data, and lower level applications can process these incremental data by monitoring them. Typical representative of canal, according to MySQL binlog implementation. There are also middleware for incremental data subscription and consumption of Oracle Can, erosa)
database synchronization middleware involves the synchronization operation between databases, which can realize the functions of Cross (same) computer room synchronization, remote disaster recovery backup, streaming and so on. It can involve a variety of databases, and the processed data can also be stored in a variety of forms Otter, jingobus, DRC)
there will be data migration (synchronization) between databases. The same data synchronization principle is relatively simple, such as MySQL master-slave synchronization, which can be configured in the database layer, but cross database synchronization is more complex, such as Oracle - & gt; Mysql. Data migration generally includes three steps: full , which migrates the data of the original database to the new database. In the process of migration, new data will be generated; Incremental synchronization is used to synchronize the newly generated data for a period of time to ensure data synchronization; Stop writing the original library and switch to the new one. Expand the meaning of "cross database" - cross data sources, such as HDFS, HBase, FTP, etc. can be synchronized with each other yugong, DataX
