Databases from Inside

10+ things you should do before building a custom Access database

2008-03-04T17:23:00.000-08:00

Whether you take on freelance work or you support your organization’s IT needs by developing custom database solutions, you must build an efficient, easy-to-use database if you plan to stay employed. Like most things, there’s a right and a wrong way. You might not get credit for doing things right, but you’ll certainly hear about it if you do things wrong.

The road to the right database starts well before you start building tables. There are a number of things you can do before you build a database to make sure that the development process goes smoothly and that your custom database fits the organization’s purpose and goals. The following tips are aimed as Access users, but most of them apply to just about any custom database.

#1: Make nice

You’ll get nowhere without the support and guidance of two specific groups of people:

Those who update the data. These people know what’s needed to get the job done.
Those who use the information. These people know the goals for the database and the business at large.

In a small company, one person might fill both positions, although they have different needs. However, that person’s experiences with the data are valid. It’s your job to find solutions that satisfy everyone, within reason.

#2: Bend but don’t break

Being just a developer won’t get the job done. Sometimes, you must be a diplomat. I recommend that you practice the art of persuasion: “Let me show you something…” will serve you better than “That can’t be done.” This may require you to think fast on your feet. Of course, “Let me work up an example” can always buy a little time.

#3: Actually review their specs

Sometimes, you get lucky and someone in-house supplies a list of specifications. If that happens, it’s information worth keeping, so don’t be too eager to trash the list. Working with those original specs will save you some time and might keep you from stepping on someone’s toes — never a good idea if you can help it.

#4: Compare the specs to the working environment

Most Access databases have just a few users, but Access can handle numerous users. You probably won’t build an interactive intranet database the same way you’d build the solution for a single user. Access seldom fails to meet the demands if you develop for multiple users from the beginning.

#5: How many keys are there to the front door?

Keeping hackers out of your intranet or Web-based database is much more complex than using Access’ workgroup security. In fact, if you need this article and you’ve taken on a Web-based database project, you might have bitten off more than you can chew — good luck! Access is certainly up to the challenge, but the truth is, most developers aren’t. That’s why IT professionals scoff at Access. The sad truth is, many developers don’t understand the Web. If you’re one of them, don’t take on a Web project hoping to learn on the job. You and your client will pay too high a price.

#6: Do the work

Sit down with the people who do the work and learn the process:

Review all paper forms in the current process.
How much data — both records and fields — will the database store?
How much searching and sorting will the users require?
Where does the data come from? Will the system need to accommodate foreign data?
Will the system export data to foreign formats?
Review the current reports and analysis. Talk with the people who use them, for insight.

In short, follow the data from beginning to end. There’s no substitute for knowing the data and the current motivations that push that data from collection to final form.

#7: Re-evaluate

Once you’re familiar with the specs and environment, you might have to shoot yourself in the foot. Access just might not be the best solution for your client. A more powerful system, such as SQL Server 2005 Express Edition, might be a better choice. Or Access might be just part of the solution. For instance, InfoPath’s XML-based forms or .NET forms might be more efficient than Access forms, especially if you’re publishing data to an intranet or to the Internet. Certainly, Excel’s analytical tools are superior to those Access provides. Don’t try to stuff the entire works into an Access-or-bust solution.

#8: Recommend the best route — not the easiest one

Don’t be afraid to suggest a major overhaul if you’re upgrading a legacy database. Neither the latest and greatest version of Access nor more expensive hardware will resolve performance issues that stem from bad design.

#9: Improve the process

Work with the end users to improve the manual process if there’s room for improvement, and there usually is. It’s a mistake to computerize the existing workflow until it’s the best it can be. Software alone won’t improve a bad routine — it’ll just change the problems.

#10: Define and redefine

Once you’ve gathered all your facts, compose a mission statement for the application. This might require one to several paragraphs. I’m not talking about a new set of specifications. Rather, give your client a realistic review of their needs versus reality. You’re simply restating the database’s purposes, but with the benefit of your insight into the workflow and organization’s needs.

#11: How’s that for quick response?

Once you believe you have a good feel for the client’s needs and the database’s purposes, create a series of mock-up forms to show the client. You’ll get a few oohs and ahhhs, but listen to the souls brave enough to say, “But wait…” Their insights may be valid and could save you some trouble down the road. On the other hand, this is where #2 can come in handy. Sometimes, people just can’t conceive of doing something any way but the way they know.

You can use graphics software to draw and print the forms or use Access — it’s a great rapid applications development (RAD) tool. And you can really impress your clients by actually using their data. Sometimes, a quick run at normalizing the data can help the mock-up process. It’s not strictly necessary, but it may show you some holes you might otherwise miss.

How to Backup SQL Database - 4 Top Methods

2008-02-24T11:05:00.000-08:00

If you have installed a blog, forum or content management system (CMS) such as Joomla on your server you will have installed a mySQL database. One day without warning your server may crash and you’ll lose all your files. This is why it is important to create a backup of database and files.

If your web site only contains html files it’s very easy to restore them because they are stored on your computer. You simply need to upload them to the server.

Methods of backing up your SQL database

1. Web hosting providers - Most web hosting providers backup all files on their server. My server gets backed up daily, weekly and monthly. Check with your web host to see how often they do backups.

2. Backup software for your forum, blog or cms

Most CMS web sites that use databases have their own backup system software. e.g.,

a) WordPress has a plugin (http://www.ilfilosofo.com/blog/wp-db-backup/) where you can select how you want the backup to be delivered:

* Save to server : this will create a file in /wp-content/backup-*/ for you to retrieve later
* Download to your computer : this will send the backup file to your browser to be downloaded
* Email : this will email the backup file to the address you specify

b) Joomla has an extension called JoomlaPack (http://www.joomlapack.net/). It’s a component that creates a backup of your whole site (files and database) in the form of a single archive. In order to help you restore this, it also adds an installer. All you have to do to restore your backup is follow the regular Joomla! installation procedure: unpack the archive, upload files, point your browser to the installation script, follow the installation screens and you’re ready.

c) Simple Machines Forum (SMF) allows you to create a backup of the forum database within the admin panel.

3. Cpanel backup

Use the MySQL Database Wizard to download a zipped copy of your entire site or parts of it onto your computer.

The following are backed up and included in a zip file:

* Home Directory
* MySQL Databases
* Email forwarders configuration
* Email filters configuration

To access the MySQL Database Wizard login to:

www.domain.com/cpanel
files
backup wizard
save the files to your local drive

4. phpMyAdmin

Have you ever lost all your computer files because you didn’t back them up?

It’s a sickening feeling when you can’t restore thousands of files collected over several years of work. The same thing can happen to your forum, blog or CMS web site that uses a mySQL database to store information. One of the easiest and most full proof backup methods is to use phpMyAdmin.

How to use phpMyAdmin

Go to cPanel->mySQL databases->phpMyAdmin and choose your database.
Click on the Export link.
Choose Select All to select all the tables.
Select “SQL”->for output format.
Check “save as file”
Select gzipped and hit Go to download the backup file.
Save it to a folder labeled “backupFeb08″ so you know when you created the last backup.

Create a backup the first of every month.

How to restore a backup of a mySQL database

Click the SQL tab
On the “SQL” page, unclick “show this query here again”
Browse to the backup of the database on your computer
Click Go.

When not to normalize your database

2008-02-24T11:02:00.000-08:00

Database normalization is a formal process of designing your database to eliminate redundant data, utilize space efficiently and reduce update errors. Anyone who has ever taken a database class has it drummed into their heads that a normalized database is the only way to go. This is true for the most part . However there are certain scenarios where the benefits of database normalization are outweighed by its costs. Two of these scenarios are described below.

Immutable Data and Append-Only Scenarios

Pat Helland, an enterprise architect at Microsoft who just rejoined the company after a two year stint at Amazon, has a blog post entitled Normalization is for Sissies where he presents his slides from an internal Microsoft gathering on database topics. In his presentation, Pat argues that database normalization is unnecessary in situations where we are storing immutable data such as financial transactions or a particular day's price list.

When Multiple Joins are Needed to Produce a Commonly Accessed View

The biggest problem with normalization is that you end up with multiple tables representing what is conceptually a single item. For example, consider this normalized set of tables which represent a user profile on a typical social networking site.

user table
user_id	first_name	last_name	sex	hometown	relationship_status	interested_in	religious_views	political_views
12345	John	Doe	Male	Atlanta, GA	married	women	(null)	(null)

user_affiliations table
user_id (foreign_key)	affiliation_id (foreign key)
12345	42
12345	598

affiliations table
affiliation_id	description	member_count
42	Microsoft	18,656
598	Georgia Tech	23,488

user_phone_numbers table
user_id (foreign_key)	phone_number	phone_type
12345	425-555-1203	Home
12345	425-555-6161	Work
12345	206-555-0932	Cell

user_screen_names table
user_id (foreign_key)	screen_name	im_service
12345	geeknproud@example.com	AIM
12345	voip4life@example.org	Skype

user_work_history table
user_id (foreign_key)	company_affiliation_id (foreign key)	company_name	job_title
12345	42	Microsoft	Program Manager
12345	78	i2 Technologies	Quality Assurance Engineer

This is the kind of information you see on the average profile on Facebook. With the above design, it takes six SQL Join operations to access and display the information about a single user. This makes rendering the profile page a fairly database intensive operation which is compounded by the fact that profile pages are the most popular pages on social networking sites.

The simplest way to fix this problem is to denormalize the database. Instead of having tables for the user’s affiliations, phone numbers, IM addresses and so on, we can just place them in the user table as columns. The drawback with this approach is that there is now more wasted space (e.g. lots of college students people will have null for their work_phone) and perhaps some redundant information (e.g. if we copy over the description of each affiliation into an affiliation_name column for each user to prevent having to do a join with the affiliations table). However given the very low costs of storage versus the improved performance characteristics of querying a single table and not having to deal with SQL statements that operate across six tables for every operation. This is a small price to pay.

As Joe Gregorio mentions in his blog post about the emergence of megadata, a lot of the large Web companies such as Google, eBay and Amazon are heavily into denormalizing their databases as well as eschewing transactions when updating these databases to improve their scalability.

Maybe normalization is for sissies…

UPDATE: Someone pointed out in the comments that denormalizing the affiliations table into user's table would mean the member_count would have to updated in thousands of user's rows when a new member was added to the group. This is obviously not the intent of denormalization for performance reasons since it replaces a bad problem with a worse one. Since an affiliation is a distinct concept from a user, it makes sense for it to have it's own table. Replicating the names of the groups a user is affiliated with in the user table is a good performance optimization although it does mean that the name has to be fixed up in thousands of tables if it ever changes. Since this is likely to happen very rarely, this is probably acceptable especially if we schedule renames to be done by a cron job during offpeak ours On the other hand, replicating the member count is just asking for trouble.

UPDATE 2: Lots of great comments here and on reddit indicate that I should have put more context around this post. Database denormalization is the kind of performance optimization that should be carried out as a last resort after trying things like creating database indexes, using SQL views and implementing application specific in-memory caching. However if you hit massive scale and are dealing with millions of queries a day across hundreds of millions to billions of records or have decided to go with database partitioning/sharding then you will likely end up resorting to denormalization. A real-world example of this is the Flickr database back-end whose details are described in Tim O'Reilly's Database War Stories #3: Flickr which contains the following quotes

tags are an interesting one. lots of the 'web 2.0' feature set doesn't fit well with traditional normalised db schema design. denormalization (or heavy caching) is the only way to generate a tag cloud in milliseconds for hundereds of millions of tags. you can cache stuff that's slow to generate, but if it's so expensive to generate that you can't ever regenerate that view without pegging a whole database server then it's not going to work (or you need dedicated servers to generate those views - some of our data views are calculated offline by dedicated processing clusters which save the results into mysql).

federating data also means denormalization is necessary - if we cut up data by user, where do we store data which relates to two users (such as a comment by one user on another user's photo). if we want to fetch it in the context of both user's, then we need to store it in both shards, or scan every shard for one of the views (which doesn't scale). we store alot of data twice, but then theres the issue of it going out of sync. we can avoid this to some extent with two-step transactions (open transaction 1, write commands, open transaction 2, write commands, commit 1st transaction if all is well, commit 2nd transaction if 1st commited) but there still a chance for failure when a box goes down during the 1st commit.

we need new tools to check data consistency across multiple shards, move data around shards and so on - a lot of the flickr code infrastructure deals with ensuring data is consistent and well balanced and finding and repairing it when it's not."

The part highlighted in red is also important to consider. Denormalization means that you you are now likely to deal with data inconsistencies because you are storing redundant copies of data and may not be able to update all copies of a column value simultaneously when it is changed for a variety of reasons. Having tools in your infrastructure to support fixing up data of this sort then become very important.