Lot of
enterprises are becoming more innovative in terms of using the open source
technologies on the pay-as-you-go platforms making it convenient for their
organization to optimize their IT costs. In this the database stacks are
playing majority of the role, I would name few DB stacks PostGreSQL, MariaDB,
MySQL, MongoDB etc. With this you’ll be paying only for the computing capacity which means
for the underlying Infrastructure or for the maintenance of the hardware.
- Why enterprises are becoming more Nimble & leaning toward open source DB stacks?
Without going
in-depth in to more technical concepts I would try to convince you guys here,
making it easy for you to gain the high level knowledge.
- Easy to install &
configure
- Availability of documents
in public domain
- Connectivity in both Native
/ JDBC and ODBC
- Rich client side command
lines
- Easy to maintain the DB instances
with small team
- What are the pain areas using the open source DB stack like PostGreSQL ?
But there are
certain pain areas with respect to using this open source DB stacks, INSERT,
UPDATE, DELETE DML statements creates bloats in these databases and there would
be dead tuples occupying the storage. When DELETE operation is being done the
jobs dead tuples would be created utilizing much storage causing the space
unavailability for newly INSERTED & UPDATED record set.
In order to
clean up the dead tuples and bloats there needs to be a VACUUM activity which
needs to be done for each table on the database schema. Think that there is a
Datawarehouse schema which is having the
capacity of 5 TB storage, having 1000s of tables with larger data set. It is
time killing task to hit the VACUUM command manually on each of the table.
Well, there is not only a vacuum task however, there are so many other
pre-vacuum and post vacuum activities those needs to be done. Perhaps with my
own experience the entire maintenance activity would take 7 to 8 hours per week
on each database instance.
Think if you
have 16 databases instances for variety of applications including the Build,
Test and Prod environments. How much time your team is going to spend on this
every week ? surprising right !!
- What precaution models we have to build to avoid those pain areas ?
Well, nevertheless
to say with the right strategic process and governance set, with the extremely
high technology mindset. Such above scenario could be tackled by automating
those. While doing the vacuum activity several steps are being carried.
- Extract all the tables
names with bloat ratio before the Vacuum
- Extract all the tables
names with Dead Tuples
- List the tables those needs
to be Vacuumed
- Run the Vacuum on all the
tables
- Extract all the tables
names with bloat ratio after the Vacuum
- Extract all the tables
names with after Dead Tuples
- What is our suggestion ?
Incorporate everything in a single intelligent program that runs
on the WAN network to Vacuum the N-Number of PostGreSQL RDS instances. Since
most of the larger enterprises are the combination of both OnPremise and Cloud
by creating a Hybrid cloud economics, I’m bringing in here the
concept of WAN. You can also ask me “Why do you want your program to run on the
WAN network & not directly on the cloud or EC2 instance ?” Well, the answer
is very simple, it is based on the organization’s application and network
architectures.
No comments:
Post a Comment