Large databases

Jared Rhine jared@wordzoo.com
Tue, 15 Mar 2005 11:22:47 -0800


  Sean> Lastly Jared was talking about how Debian is great where he
  Sean> works, the trials and tribulations of 20 terabytes of data,
  Sean> postgres v. oracle, etc.

  Alvin> what are the problems with the 20TB of data ( db ) ?

Clarifying: I was not discussing how to run a 20TB (actually 60TB)
database on Debian.  Instead, I was saying how I was somewhat humbled
in my current position because I couldn't offer any realistic
lean/mean/open-source way to replace our current 60TB Oracle database.

I also said I have yet to find any reference to anyone anywhere
running a transactional database this large on an open platform.

Usually, I can offer ways to distribute, redesign, or otherwise
simplify a multi-million dollar "big iron" installation.  But instead,
I'm mostly shutting up and saying, "Well, Oracle+Sun+SAN is a pretty
good solution here".

The boxes still go down, and we can blame every vendor in the mix
above for some downtime, but there's no way I know of to migrate to
improve costs while maintaining performance on this transactional
system.

My best plan right now is to investigate and benchmark MaxDB to
determine how high up it can scale as a possible Oracle replacement.
MAYBE I can convince them to save an Oracle license or two if MaxDB is
as solid as it appears on paper.

I surprised Sean by mentioning that I believe MySQL's corporate plan
is to integrate MySQL DB and MaxDB in the v5/v6 timeframe and
eventually deprecate one of the code bases.  Those committed to MySQL
should have MaxDB on their radar and track developments there.  Oracle
shops should definitely have MaxDB on their radar because of its
Oracle compatibility hacks.

  Alvin> - db backups/recovery/txn logging/corruptions

Near as I can figure, these are all difficult issues regardless of the
platform.  They are doubly-hard when using open-source tools.  I
believe there would be benefits to separating the storage from the
database itself, so one can build a snapshotable, recoverable,
backed-up, highly-available storage server, and then export that over
gigabit ethernet (maybe over iSCSI) to a separate database server.

But scaling up 20TB of storage to "enterprise" requirements I perceive
as still painful and risky these days on pure opensource.  LVM just
got snapshots, for instance, and there are still reported problems.

-- jared@wordzoo.com

"Truth is a great flirt." -- Franz Liszt