First a clarification:
a very large user's table say expecting to be around 1,000,000 records
10 years ago there was this joke that Access was no proper database because you could not deal with 1 million rows in a table. At this time, Foxpro did handle this amount without sweating.
Today, one million records in a table is nt worth mentioning. One thing it clearly is not, though, and this is very large. It is not even large. It is non-trivial anymore (i.e. better know what a database is and read up what an index is), because at this size table scans will have nasty side effects. But you do notneed to do any of the professional stuff that you need to make to handle very large tables, like partitioning and replication. I would personally say that a very large table propably starts at around 1000 times this data (i.e. one billion rows), and large at possibly 10 times.
Now, for a SMALL table that contains about one million rows, there is no technical need NOT to use an identity field, especially as the data is not heavy written. Users may get created, but not by hundreds of thousands an hour, so even if you get replicating, you can handle all inserts on one location (identity fields pretty much say goodbye if you get into replication).
i want to know what would be the best datatype or format to use as a primary key
I normally use a GUID for this type of data. It is easily generated, it is unique, it is realtively small (16 bytes ONLY) and has enough room for the users to seriously grow. Not taking a GUID, in the given case, I would assign it an int identity field - with the int being 32 bits signed you ahve enough room to grow a little more than just one million entries.
And also i want to know what format of a primary key would give the best performance eg. a series sequential numbers like 1000001,1000002,100003, etc... or some random numbers?
Empirical evidence (done to get a feasibility on the speed of the GUID as primary data type) suggests that random inserts (not updates) are lower by a 10:1 factor compared to linear inserts of identity fields (and only about 10% slower for read operations), due to the necessity to frequenty rebalance large parts of the corresponding index. Common sense, in most parts, will tell you that this is totally irrelevant for a user table. You will not insert hundreds of thousands of users per hour (after the initial load, maybe, if you transfer an old system). I would suggest that if your user table is like all other user tables I have seen so far, you will proapbly read it 1000 times more than you insert into it. As such, you have to stand the question whether you really care about insertion speed, or whether you basically fall into the typical developer trap of premature optimization, and try to optimize an item that is just irrelevant in the light of the complete application. Especially in the area of user management my stomach says that this will be totally irrelevant for the application performance.
One thing yo have to be aware of when working with random inserts is the possibility of doubles. A random generator may generate the same number twice, which you have to handle. You can only ignore it if your number space is large enough that you basically are more lkely to win in the lottery than to get two doubles (like with the GUID, which IIRC has more numbers available than there are atoms in the universe or something like this). Too many double values and your app starts looking pretty bad, especially if you ignore the item and let the error propagate to the end user (which is likey unless you make mass tests with real amounts of data).
What I would NOT take, if I had any choice, is a primary key that is a CHAR (or VarChar) field. Numbers are not strings. This is really the slowest of all possible solutions, takes up the most space and in general will make people have a good laugh in this case.
Thomas Tomiczek
PowerNodes ApS
(Microsoft MVP C#/.NET)
---
Building a Website? Try the PowerNodes CMS - http://www.powernodes.com/