PDA

View Full Version : Distributed Shared User DB Discussion


Mouton
12-09-2003, 11:17 AM
Was wondering if anyone (dark?) had thought about how a distributed shared user db module would work..?

ie, not a single db with which all servers sync, but a module that propagate local changes to all servers.

I think using a single db is a little too dangerous... creates a single point of failure...

Any thought on how a distributed system should work ?
I didn't dwelve the how and what of all the users and groups functions in modules, but i'd guess there's one for each user or group modification that happens.

I thought to use transactions objects, which would contain the actual modification, the origin and destination servers, the delivery status for each destination server(s) and maybe a priority value and some other params...
Each transaction would then be pooled, and the pool would be propagated to all destination servers at regular interval, depending on the priority value of the transaction...

So basically, each function in the user and group module would create transactions objects, save them to disk, and a timer would propagate those to everyone.

Another way to do it would be to give each server a list of all the other servers, and at regular interval, they would 'pull' all transactions since last communication from 'alive' servers. This would help deal with dead servers; when they come back up, each server would pull all the new transactions from it since last communication, so nothing would be lost.

How does that sound ? Does it fit the users&groups module architecture ?

Any suggestion / comments about distributed system like that are welcome.

WarC
12-09-2003, 01:45 PM
Couldnt resist :)

my little exp in this topic is that following situation works very well. Servers going down for some reason do happen and a transactionlog is then nice. distributed systems are one of the hardest part to design to reduce redundancy and transactions.



Originally posted by Mouton

Another way to do it would be to give each server a list of all the other servers, and at regular interval, they would 'pull' all transactions since last communication from 'alive' servers. This would help deal with dead servers; when they come back up, each server would pull all the new transactions from it since last communication, so nothing would be lost.

Mouton
12-09-2003, 01:53 PM
Originally posted by WarC
Couldnt resist :)LIAR! I had to force u to get a reply!! ;)

Originally posted by WarC
distributed systems are one of the hardest part to design to reduce redundancy and transactions.Always up for some new challenge... Didn't play much with distributed app in the past. I thought it would be a good application of what i learned in that course at university...

darkone
12-09-2003, 06:31 PM
I believe that one database server is enough. Database, however may, and should have several backup routes.

Mouton
12-09-2003, 11:30 PM
Then what should happen when db is down ?

darkone
12-10-2003, 01:04 AM
If database is down, sites are down (well, site may be up - but users can't do much) Just like with any www-site , or windows network. Database is supposed to be on stable computer, on a link which has very high availability.

neoxed
12-10-2003, 01:07 AM
Well the average sitering will just host the DB server on one the servers in their sitering...I doubt anyone would dedicated a box entirely for the DB. So chances are, it will have some downtime.

I like the idea of a distruted shared user DB, much more fault tolerant and relliant. ;)

peep
12-10-2003, 03:02 AM
The average sitering ey.. Well I know there'll be rings running a dedicated box for the db. The db would need good uptime and be stable and fast-accessed, right? so running it on a dedicated box wouldn't be so weird would it..

neoxed
12-10-2003, 03:23 AM
Well...I for one do not use a dedicated box for the DB. I ported the MSS (MultiSiteSync) script from gl to io in TCL to use on sites for the time being until a *real* shared/distrobuted userdb is ready.

My sitesync script keeps all sites in sync without a central DB, which I find better then relying on a central server. :p

SomeoneWhoCares_2
12-10-2003, 07:10 AM
:confused: :eek: :eek: any chance u will share it? :eek: :eek: :confused:
i wanted to do something like that but if its already done... :cool: :D

dasOp
12-10-2003, 07:27 AM
I've considered the specific topic for gl alot of times before and with io usermodules its actually a possibility. One of the major problems with the distributed approach is how to resolve collisions and ordering. Ie, does the order of updates matter? This is central cause if they do, you need numbered transactions. And each server numbering their own transactions syncronization becomes a real issue and prevents lazy updates which are essential for efficiency.

So how important is it for a users account to be totally updated on all servers all the time? Is there an acceptable error margin here?

For situations where complete accuracy is required I've thought of two approaches. Both build on transactions and ordering (and thus, numbering) them.

The first is to basically emulate token ring or the more advanced version, fddi. To simplify greatly, every part is connected in a ring. In this ring you pass a token around and around and around. When you have the token and it is empty, you may pass data with it. When its not, you simply forward it.
Despite the relative simplicity of token ring, it works and its quite speedy. The disadvantage is when the ring breaks or the token is lost. I wont delve more into it as I think a ring-based solution will be too complicated.

The second, and viable solution I think is to combine the push/pull solution.
At any given time, one server in the ring is the update server. Each change is sent to the update server, who packages it and numbers it. That change is then either pushed to other servers or pulled from the update server. If/when the update servers go down, an election is forced from some criteria and the winning server takes over as update server. One of the criteria could be the server with the highest update number.
Disadvantage? For an election to be forced, the servers need to be aware of eachother.
For efficiency, each site prolly needs its own local db. No problem since mysql or any free db would work well.

These are just some ideas, please dissect them and flame them. :)

Mouton
12-10-2003, 08:02 AM
I don't really think exact ordering is that important for io.
Some basic ordering using timestamps/priority value should be enough.
Only case where i see it would matter is like when u create a new user for example, well, u don't want the +credits transactions to take place before the +user transaction!
But then, u can almost completely defeat those errors using a priority on each transaction, so the transactions with higher priority are processed before those with lower priority.
Another case would be when u deluser someone... U don't really want other operations on that account to take place after the deluser... But then, it doesn't really matter either... And anyway, using a higher priority on the deluser transaction, it wouldn't happen.
I have thought about the election process too, but since i didn't find a situation where ordering was important, i removed that from a possible implementation.

Thx for the comments / thoughts sharing btw! :)

dasOp
12-10-2003, 08:20 AM
Another point is exactly what does one want to share?
For instance, if all sites in a ring do different sections, is sharing stats really wanted?
And if you have several sites sharing sections, does one want to merge those specific sections on each site etc?

Just sharing adduser/deluser, site change and credits probably goes a long way.

Mouton
12-10-2003, 10:24 AM
Ppl probably want shared credits for quota or trial or such.
But I'm not sure what the best way to achieve that easily would be.

Anyway, I doubt one could implement only half the modules functions and leave the rest as they are... Would be rather more complicated to deal with stats locally and credits in shared db than do both in shared db...

A good guess would probably be to force ppl to use the same sections on all sites, so adding stats would be easy. I don't see that as a big drawback personnaly.
More uniformity can't be bad...