using …

Pure C#

25 Nov 2008 için Arşiv

SQL Server 2008 Books Online (October 2008) Programming Reference

Yazan: esersahin 25/11/2008

Yazı kategorisi: SQL, Sql Server, Transact-SQL | » yorum bırak;

SQL Server 2008 Tutorials

Yazan: esersahin 25/11/2008

http://msdn.microsoft.com/en-us/library/ms167593.aspx

Other versions are also available for the following:

Tutorials help in learning new functionality in SQL Server. In SQL Server Books Online, tutorials have been integrated into the content that is associated with each component technology. For example, tutorials that describe how to design and implement Integration Services packages are under “Designing and Implementing Packages” in the Integration Services section of Books Online.

To help you quickly locate the various tutorial sections that are associated with each component technology, this topic lists the SQL Server tutorials that are currently available.

To open an interactive version of this topic

  • From the Start menu, point to All Programs, point to the Microsoft SQL Server program group, point to Documentation and Tutorials, point to Tutorials, and click SQL Server Tutorials.

Analysis Services icon Analysis Services—Multidimensional Data
Designing and Implementing: Tutorials (Analysis Services – Multidimensional Data)
Use the tutorials in this section to learn how to develop multidimensional data solutions.

Data mining icon Analysis Services—Data Mining
Designing and Implementing: Tutorials (Analysis Services – Data Mining)
Use the tutorials in this section to learn how to develop data mining solutions.

Creating and Querying Data Mining Models with DMX: Tutorials (Analysis Services – Data Mining)
Use the tutorials in this section to learn how to create queries against data mining models to predict trends, retrieve patterns in the data, and measure the accuracy of the mining models.

Integration Services icon Integration Services
Designing and Implementing Packages Tutorials
Use the tutorials in this section to learn how to create packages in an Integration Services project and guide you through the use of the Integration Services tools.

Tutorial: Deploying Packages
Use this tutorial to learn how to use Integration Services tools to install packages and their dependencies to a different computer.

Replication icon Replication
Designing and Implementing: Tutorials (Replication)
Use the tutorials in this section to learn how to use SQL Server Management Studio to set up and run replication topologies.

Reporting Services icon Reporting Services
Tutorial: How to Locate and Start Reporting Services Tools
Use this tutorial to learn how to use the tools to configure a report server, manage report server content and operations, and create and publish reports and report models.

Tutorials (Reporting Services)
Use the tutorials in this section to learn basic report design skills, one step at a time.

Designing Ad Hoc Reports: Tutorials
Use the tutorials in this section to learn necessary skill for creating reports based on models

Designing Report Models: Tutorials
Use the tutorials in this section to learn necessary skills for creating models that you can use in Report Builder or Report Model Query Designer.

Developer’s Guide: Tutorials (Reporting Services)
Use the tutorials in this section to learn about the many different ways to develop Reporting Services solutions.

Service Broker icon Service Broker
Service Broker Tutorials
Use the tutorials in this section to learn how to create conversations to exchange messages between databases.

Yazı kategorisi: SQL, Sql Server, Transact-SQL | » yorum bırak;

SQL Server 2008 Books Online

Yazan: esersahin 25/11/2008

http://msdn.microsoft.com/en-us/library/ms130214.aspx

Other versions are also available for the following:

Welcome to Microsoft SQL Server Books Online. This set of documentation helps you understand SQL Server, and how to implement data management and business intelligence projects. SQL Server includes several data management and analysis technologies. To learn more about these technologies, click the following links.

Database Engine icon Database Engine The Database Engine is the core service for storing, processing and securing data. The Database Engine provides controlled access and rapid transaction processing to meet the requirements of the most demanding data consuming applications within your enterprise. The Database Engine also provides rich support for sustaining high availability.
Analysis Services icon Analysis Services – Multidimensional Data Analysis Services supports OLAP by allowing you to design, create, and manage multidimensional structures that contain data aggregated from other data sources, such as relational databases.
Data mining icon Analysis Services – Data Mining Analysis Services enables you to design, create, and visualize data mining models. These mining models can be constructed from other data sources by using a wide variety of industry-standard data mining algorithms.
Integration Services icon Integration Services Integration Services is a platform for building high performance data integration solutions, including packages that provide extract, transform, and load (ETL) processing for data warehousing.
Replication icon Replication Replication is a set of technologies for copying and distributing data and database objects from one database to another, and then synchronizing between databases to maintain consistency. By using replication, you can distribute data to different locations and to remote or mobile users by means of local and wide area networks, dial-up connections, wireless connections, and the Internet.
Reporting Services icon Reporting Services Reporting Services delivers enterprise, Web-enabled reporting functionality so you can create reports that draw content from a variety of data sources, publish reports in various formats, and centrally manage security and subscriptions.
Service Broker icon SQL Server Service Broker Service Broker helps developers build scalable, secure database applications. This new Database Engine technology provides a message-based communication platform that enables independent application components to perform as a functioning whole. Service Broker includes infrastructure for asynchronous programming that can be used for applications within a single database or a single instance, and also for distributed applications.

Yazı kategorisi: SQL, Sql Server, Transact-SQL | » yorum bırak;

Transact-SQL Reference

Yazan: esersahin 25/11/2008

http://msdn.microsoft.com/en-us/library/aa299742(SQL.80).aspx

Transact-SQL Overview
Transact-SQL Syntax Conventions
New and Enhanced Features in Transact-SQL
+ (Add)
+ (Positive)
+ (String Concatenation)
- (Negative)
- (Subtract)
* (Multiply)
/ (Divide)
% (Modulo)
% (Wildcard – Character(s) to Match)
& (Bitwise AND)
| (Bitwise OR)
^ (Bitwise Exclusive OR)
~ (Bitwise NOT)
= (Equals)
> (Greater Than)
< (Less Than)
>= (Greater Than or Equal To)
<= (Less Than or Equal To)
<> (Not Equal To)
!< (Not Less Than)
!= (Not Equal To)
!> (Not Greater Than)
– (Comment)
/*…*/ (Comment)
[ ] (Wildcard – Character(s) to Match)
[^] (Wildcard – Character(s) Not to Match)
_ (Wildcard – Match One Character)
@@CONNECTIONS
@@CPU_BUSY
@@CURSOR_ROWS
@@DATEFIRST
@@DBTS
@@ERROR
@@FETCH_STATUS
@@IDENTITY
@@IDLE
@@IO_BUSY
@@LANGID
@@LANGUAGE
@@LOCK_TIMEOUT
@@MAX_CONNECTIONS
@@MAX_PRECISION
@@NESTLEVEL
@@OPTIONS
@@PACK_RECEIVED
@@PACK_SENT
@@PACKET_ERRORS
@@PROCID
@@REMSERVER
@@ROWCOUNT
@@SERVERNAME
@@SERVICENAME
@@SPID
@@TEXTSIZE
@@TIMETICKS
@@TOTAL_ERRORS
@@TOTAL_READ
@@TOTAL_WRITE
@@TRANCOUNT
@@VERSION
ABS
ACOS
ALL
ALTER DATABASE
ALTER FUNCTION
ALTER PROCEDURE
ALTER TABLE
ALTER TRIGGER
ALTER VIEW
AND
ANY
APP_NAME
ASCII
ASIN
ATAN
ATN2
AVG
BACKUP
BEGIN…END
BEGIN DISTRIBUTED TRANSACTION
BEGIN TRANSACTION
BETWEEN
binary and varbinary
BINARY_CHECKSUM
bit
BREAK
BULK INSERT
CASE
CAST and CONVERT
CEILING
char and varchar
CHAR
CHARINDEX
CHECKPOINT
CHECKSUM
CHECKSUM_AGG
CLOSE
COALESCE
COLLATE
COLLATIONPROPERTY
COL_LENGTH
COL_NAME
COLUMNPROPERTY
COMMIT TRANSACTION
COMMIT WORK
Constants
CONTAINS
CONTAINSTABLE
CONTINUE
Control-of-Flow Language
COS
COT
COUNT
COUNT_BIG
CREATE DATABASE
CREATE DEFAULT
CREATE FUNCTION
CREATE INDEX
CREATE PROCEDURE
CREATE RULE
CREATE SCHEMA
CREATE STATISTICS
CREATE TABLE
CREATE TRIGGER
CREATE VIEW
CURRENT_TIMESTAMP
CURRENT_USER
cursor
CURSOR_STATUS
Cursors
DATABASEPROPERTY
DATABASEPROPERTYEX
Data Types
DATALENGTH
DATEADD
DATEDIFF
DATENAME
DATEPART
datetime and smalldatetime
DAY
DB_ID
DB_NAME
DBCC
DEALLOCATE
decimal and numeric
DECLARE @local_variable
DECLARE CURSOR
DEGREES
DELETE
DENY
DIFFERENCE
DROP DATABASE
DROP DEFAULT
DROP FUNCTION
DROP INDEX
DROP PROCEDURE
DROP RULE
DROP STATISTICS
DROP TABLE
DROP TRIGGER
DROP VIEW
DUMP
ELSE (IF…ELSE)
END (BEGIN…END)
EXECUTE
EXISTS
EXP
Expressions
FETCH
FILE_ID
FILE_NAME
FILEGROUP_ID
FILEGROUP_NAME
FILEGROUPPROPERTY
FILEPROPERTY
float and real
FLOOR
fn_get_sql
fn_helpcollations
fn_listextendedproperty
fn_servershareddrives
fn_trace_geteventinfo
fn_trace_getfilterinfo
fn_trace_getinfo
fn_trace_gettable
fn_virtualfilestats
fn_virtualservernodes
FORMATMESSAGE
FREETEXT
FREETEXTTABLE
FROM
FULLTEXTCATALOGPROPERTY
FULLTEXTSERVICEPROPERTY
Functions
GETANSINULL
GETDATE
GETUTCDATE
GO
GOTO
GRANT
GROUP BY
GROUPING
HAS_DBACCESS
HAVING
HOST_ID
HOST_NAME
IDENT_CURRENT
IDENT_INCR
IDENT_SEED
IDENTITY (Property)
IDENTITY (Function)
IF…ELSE
image
IN
INDEXKEY_PROPERTY
INDEXPROPERTY
INDEX_COL
Information Schema Views
INSERT
int, bigint, smallint, and tinyint
IS_MEMBER
IS_SRVROLEMEMBER
ISDATE
IS [NOT] NULL
ISNULL
ISNUMERIC
KILL
LEFT
LEN
LIKE
LOAD
LOG
LOG10
LOWER
LTRIM
MAX
MIN
money and smallmoney
MONTH
NCHAR
nchar and nvarchar
NEWID
Northwind Sample Database
NOT
ntext, text, and image
NULLIF
numeric
OBJECT_ID
OBJECT_NAME
OBJECTPROPERTY
OPEN
OPENDATASOURCE
OPENQUERY
OPENROWSET
OPENXML
Operators
OR
ORDER BY
PARSENAME
PATINDEX
PERMISSIONS
PI
POWER
Predicate
PRINT
pubs Sample Database
QUOTENAME
RADIANS
RAISERROR
RAND
READTEXT
real
RECONFIGURE
REPLACE
REPLICATE
Reserved Keywords
RESTORE
RESTORE FILELISTONLY
RESTORE HEADERONLY
RESTORE LABELONLY
RESTORE VERIFYONLY
RETURN
REVERSE
REVOKE
RIGHT
ROLLBACK TRANSACTION
ROLLBACK WORK
ROUND
ROWCOUNT_BIG
RTRIM
SAVE TRANSACTION
SCOPE_IDENTITY
Search Condition
SELECT @local_variable
SELECT
SERVERPROPERTY
SESSION_USER
SESSIONPROPERTY
SET @local_variable
SET
SETUSER
SHUTDOWN
SIGN
SIN
smalldatetime
smallint
smallmoney
SOME | ANY
SOUNDEX
SPACE
sql_variant
SQL_VARIANT_PROPERTY
SQUARE
SQRT
STATS_DATE
STDEV
STDEVP
STR
STUFF
SUBSTRING
SUM
SUSER_ID
SUSER_NAME
SUSER_SID
SUSER_SNAME
System Stored Procedures
System Tables
SYSTEM_USER
table
TAN
text
TEXTPTR
TEXTVALID
timestamp
tinyint
Trace Flags
Transactions
TRIGGER_NESTLEVEL
TRUNCATE TABLE
TYPEPROPERTY
UNICODE
UNION
uniqueidentifier
UPDATE
UPDATE STATISTICS
UPDATETEXT
UPPER
USE
USER
USER_ID
USER_NAME
VAR
varbinary
varchar
VARP
WAITFOR
WHERE
WHILE
WRITETEXT
YEAR

Yazı kategorisi: SQL, Sql Server, Transact-SQL | » yorum bırak;

Random Sampling in T-SQL

Yazan: esersahin 25/11/2008

http://msdn.microsoft.com/en-us/library/aa175776(SQL.80).aspx

Brian Connolly

Whether you’ve got a gargantuan data warehouse, a huge transaction database, or a smaller workgroup database or data mart, it’s not uncommon to want to “sample” your data. Although selecting a random sample of rows isn’t a natural SQL operation, over the years, SQL Server professionals have developed a variety of techniques to sample a table. In this feature article, Brian Connolly presents these techniques, discusses their advantages and disadvantages, and performs a statistical evaluation of the population samples that each technique produces.

Imagine a company with a data warehouse that contains tens of millions of records on individuals. The analysts want to use it to do statistical analysis or data mining. Because it can be time-consuming to work with such a large number of records, analysts often select a small, random subset of records, work with the sample to build a model or draw some conclusions, and then test the model against the full population. Alternatively, an analyst could use OLAP, data mining, or some other type of BI tool to study the full customer population and identify some pattern or patterns that seem to represent a business opportunity. The smart analyst would then test these conclusions against a small random subset of “real” customers to ensure that his or her conclusions were valid.

So random sampling is important. But there’s a conceptual hurdle to random sampling within SQL: Since SQL is a set-oriented language, the only subset operations are those based on column criteria or join operations. There’s no notion of a “random sample” of rows. There are three techniques that SQL Server developers use to surmount this problem:

• Add a column to the table, fill it with values from the Rand() function, and select an ordered subset based on that new column.

• The same as the first technique, except that the Rand() column is seeded with a row primary key.

• Use the T-SQL NewID() function as the random operator.

The RandomPopulation table

We’ll use the following table to implement and compare each of these techniques:

create table RandomPopulation
( rowid  int not null,
  pure_random float,
  seeded_random float NULL,
  newid_random uniqueidentifier NULL,
  pure_random_order int not null,
  seeded_random_order int not null,
  newid_random_order int not null)

This table represents the population to be sampled. Rowid is the primary key. If the table has 100,000 rows, a 1 percent random sample should return a set of 1000 random Rowids. Pure_random, seeded_random, and newid_random are the columns that I use to employ the three techniques listed earlier. For example, pure_random will have values obtained from successive calls to the Rand() function.

Pure_random_order will contain a rowid value. If there are 100,000 rows in the table, rowids will be between 0 and 99,999. The table row with the smallest pure_random number will have a pure_random_order value of 0. The row with the largest pure_random_number will have a pure_random_order value of 99,999. The following query returns a sample of RandomPopulation with @samplecount rows:

  select rowid
  into #tpure
  from RandomPopulation
  where pure_random_order < @rowsample

Seeded_random_order and newid_random_order are used in the same way to order the seeded_random and newid_random columns, and select samples into temporary tables #tseeded and #tnewid. The following sections show how the three techniques listed earlier are used to populate the random values in the table and determine their relative order.

Technique 1: Use a Rand() column

The fundamental obstacle to using the Rand() function is that Rand() is only evaluated once per query. The only way to assign a different random number to every row in the table is to make a separate assignment for each row, either when creating the row or updating the table one row at a time:

while (@row < @rowcount)
  begin
    insert into RandomPopulation
      (rowid,pure_random, pure_random_order,
       seeded_random_order, newid_random_order)
    values(@row,rand(),-1,-1,-1)
    select @row = @row+1
end

Once pure_random is assigned, pure_random_order can be assigned to reflect the row order according to the Rand() function:

select IDENTITY(int, 0,1)  as row_order,*
into #tpureorder
from
  /* need the derived table to force ordering
  need top 100 percent to allow an order by */
  (select top 100 percent  rowid,
     pure_random as random
     from RandomPopulation
     order by pure_random asc) x
update RandomPopulation
  set pure_random_order = row_order
from #tpureorder
where #tpureorder.rowid = RandomPopulation.rowid

Techniques 2 and 3: Seeded random and NewID()

The following code shows how the other two techniques can be used. If Rand is seeded with a row-dependent value, it’s evaluated not once per query, like Rand(), but rather once for each row. NewID() is also evaluated once per row, and the unique identifiers that it returns can be ordered. Both of these techniques allow assignment of “random” columns in a single query:

  update RandomPopulation
  set
     seeded_random =
      rand( (( (rowid+1)) % 1000000) *
          (datepart(ms, GETDATE()) +1) ),
     newid_random = newid()

The seed chosen for Rand is a function of the row primary key and also the current millisecond. Rowid is included in the seed function because it forces Rand to be executed once per row.

Getdate() is evaluated once per query. The reason it’s included in the seed function is so that when multiple tests of these techniques are executed, a variety of seeds are used. Note that if I run this test two times, both runs will have the same rowids, so in the absence of Getdate(), or another “random” function in the seed (such as @@idle), both Rand calls would return the same values. The additions and the modulo operation in the seed function guarantee non-zero numbers within the int datatype range.

The use of seeded random is problematic. Rand has been designed (and tested) to give a stream of statistically independent, uniformly distributed numbers on successive calls after it’s seeded. However, the “seeded random” technique used earlier essentially takes the first random number from a different seed for each row. So while Rand(X), Rand(), Rand() … Rand() would be expected to yield a valid stream of random numbers, there’s no reason to expect Rand(N1), Rand(N2) … Rand(NNN) to be random. For this reason, and also because the tests that follow demonstrate how bad it can be, I don’t recommend you use this technique.

NewID() has the same advantage as seeded random: It’s evaluated once per row, so “random” values can be assigned to each row in a table in a single query. However, it’s evaluated by the operating system, not T-SQL. In addition, while unique identifiers can be ordered, the comparison functions aren’t documented. Unlike Rand(), NewID() has not been designed for use as a statistically valid random number generator.

Once seeded_random and newid_random are assigned, seeded_random_order and newid_random_order can be assigned using the same algorithm that was used to assign pure_random_order earlier.

Random sampling experiment design

You can use any of the three techniques to select a random sample of rows from a table. Rand() is expensive because it requires the use of INSERTs, cursors, or single-row UPDATEs. Seeded Rand() and NewID() are easier to use, but it’s important to test the statistical validity of the samples they produce.

There are many tests of random number generators. The Chi-squared test divides the output into “buckets” and ensures that on average the number of items in each bucket falls within the expected count for each bucket. For example, a random 1 percent sample of a RandomPopulation table with 100,000 rows will return 1000 rowids that are scattered between 0 and 99,999. Divide the space into 40 “buckets” from 0-2499, 2500-4999 … 97500-99999. The random 1 percent sample should fill each bucket with about 25 rowids from RandomPopulation. Naturally, some buckets will have more and some will have less. The Chi-square test evaluates this deviation to see whether there’s evidence of non-randomness. [See www.georgetown.edu/faculty/ballc/webtools/web_chi_tut.html for one of many tutorials on Chi-square. The site also has an associated Web Chi calculator you can use.–Ed.]

Given k buckets, the Chi-square statistic that measures independence is as shown in Figure 1.

I use the following table to compute the Chi-square statistic for each of the three sampling techniques:

create table chi_squared_hist
(  range_low int,
   range_high int,
   expected_rows int,
   pure_rows int null,
   seeded_rows int null,
   newid_rows int null
)

The ranges and expected counts are dependent on the RandomPopulation row count and the number of buckets:

  select @buckettop = @rowcount -1
  while (@buckettop > 0)
  begin
     insert into chi_squared_hist
    (range_low, range_high, expected_rows)
     values
        (@buckettop - @rowcount/@bucketcount+1,
          @buckettop,
          (@rowcount/@bucketcount)*@sampleprop)
     select
      @buckettop = @buckettop -
        @rowcount/@bucketcount
  end

I compute the counts within each rowid range for each sampling technique as follows:

update chi_squared_hist
  set
     pure_rows = row_count_hist.pure,
     seeded_rows = row_count_hist.seeded,
     newid_rows = row_count_hist.newid
  from
  (
     select range_low, range_high,expected_rows,
        count(all #tpure.rowid) as pure,
        count(all #tseeded.rowid) as seeded,
        count(all #tnewid.rowid) as newid
     from chi_squared_hist h
     inner join
     RandomPopulation rp
        on rp.rowid  between
        h.range_low and h.range_high
     left outer join #tpure
        on rp.rowid = #tpure.rowid
     left outer join #tseeded
        on rp.rowid = #tseeded.rowid
     left outer join #tnewid
        on rp.rowid = #tnewid.rowid
     group by range_low, range_high, expected_rows
  ) row_count_hist
  where
    row_count_hist.range_low =
    chi_squared_hist.range_low

Using the counts in each bucket, the Chi-square statistic is computed like so:

  update up_test_random_selection_results
  set
    pure_chi_sq = sum_diff.pure_chi_sq,
    seeded_chi_sq = sum_diff.seeded_chi_sq,
    newid_chi_sq = sum_diff.newid_chi_sq
  from
  (
    select
      pure_chi_sq =
       sum( power
        (expected_rows - pure_rows,2)
        /1.0/expected_rows),
      seeded_chi_sq =
       sum( power
        (expected_rows - seeded_rows,2)
        /1.0/expected_rows),
      newid_chi_sq =
       sum( power
        (expected_rows - newid_rows,2)
        /1.0/expected_rows)
    from chi_squared_hist
  ) sum_diff
  where run_number = @run_number

There are other tests for randomness. Most of them test for serial independence, which means that there’s no correlation between successive numbers, or that there’s no correlation between arbitrary groups of numbers that have a fixed lag between them. However, in the sampling case, the order of the rowids that are selected from RandomPopulation isn’t important. Only the coverage is important, and this is why the Chi-square test is used to measure the effectiveness of the T-SQL sampling techniques.

The following stored procedure (see the Download) performs all of the preceding steps N times, where N is less than the runs parameter. Rowcount is the RandomPopulation size, sampleprop is the proportion of RandomPopulation to be sampled, and bucket count is the number of buckets to be used for the Chi-square analysis.

create procedure up_test_random_selection
   @runs int,
   @rowcount int,
   @sampleprop float,
   @bucketcount int

Test results

I called the up_test_random_selection procedure, specifying 30 runs, 100,000 RandomPopulation rows, .01 for a 1 percent sample, and 40 buckets. Consequently, each 2,500 interval range should have about 25 rowids. Since 40 buckets were used, there are 39 degrees of freedom. We can say that the row sample for each technique is independent and uniformly distributed at the 90 percent confidence level if the Chi-square statistic for each technique is Chi-Sq[.90, 39]. The threshold level of this statistic from a Chi-square function table is 50.660. As long as the Chi-square total for each technique is less than this level, there’s no reason to suspect that the sample isn’t random.

The pure random technique is clearly the best method for obtaining a random sample, consistently yielding a Chi-square statistic less than the threshold value (see Figure 2). The average Chi-square statistic for the NewID() method is slightly higher than the average for the pure random technique, and it has considerably more variance (94.2 vs. 57.9). It exceeded the threshold value on three of the 30 runs. However, we’re working with a 90 percent confidence threshold, so this doesn’t necessarily prove that its samples are non-random.

Seeded random is either very good (almost too good) or extremely bad. In run 12, its Chi-square statistic is 538.7, more than 10 times the threshold. It’s clearly not random, as the histogram in Figure 3 shows.

This shows the observed counts for the first quarter of the buckets in run 12. Three of the 10 buckets are completely empty for the seeded random technique, and a similar pattern holds true for the other 30 buckets in run 12. Buckets that aren’t empty have far more observed rows than the expected value of 25.

Conclusions and recommendations

The seeded random technique should never be used. It gives results that are dangerous and misleading, because it can give seemingly random results in one run, and completely non-random results in the next.

Pure random is the best technique, provided that you can afford the overhead of row-at-a-time operations to set up the table for sampling, but if this overhead is too great, consider using NewID(). Bear in mind, though, that because NewID() is assigned by the operating system, you should always test its behavior on the target system–perhaps using the code included in the accompanying Download as a starting point.

Download 403BRIAN.SQL

To find out more about SQL Server Professional and Pinnacle Publishing, visit their Web site at http://www.pinpub.com/

Note: This is not a Microsoft Corporation Web site. Microsoft is not responsible for its content.

This article is reproduced from the March 2004 issue of SQL Server Professional. Copyright 2004, by Pinnacle Publishing, Inc., unless otherwise noted. All rights are reserved. SQL Server Professional is an independently produced publication of Pinnacle Publishing, Inc. No part of this article may be used or reproduced in any fashion (except in brief quotations used in critical articles and reviews) without prior consent of Pinnacle Publishing, Inc. To contact Pinnacle Publishing, Inc., please call 1-800-788-1900.

Yazı kategorisi: Random Number, SQL, Sql Server, Transact-SQL | » yorum bırak;

Narayana Vyas Kondreddi

Yazan: esersahin 25/11/2008

http://vyaskn.tripod.com/code.htm

Hi! Thank you for visiting my website! Let me quickly tell you about this site.

Top two reasons for building this site:

  • Want to have an online presence.
  • Want to share whatever I know (Microsoft technologies, especially SQL Server), with you all!

I launched this site in 1997 as a personal website, and it has come a long way since then. Lately, I’ve been getting a great response from my visitors, and that forced me to track traffic, and see what’s going on! To my surprise, traffic is growing day by day, and this site is receiving more than 500,000 page views per month.

Now a brief introduction of myself. My name is Narayana Vyas Kondreddi (Just call me ‘Vyas’) and I am 30. I come from a small village called Veeravasaram, which is located in the West Godavari district of the state of Andhra Pradesh, India. I’m happily married to Priya. I come from a wonderful, loving and caring family. Visit ‘My picture album’ page to meet my family members.

I am an SQL Server database administrator (DBA) , currently working for Barclays Capital, one of the biggest investment banks in the UK. Prior to this, I was with Aditi, Microsoft, Compaq, and Tesco.com, the world’s biggest online grocer.

Coming to my hobbies and interests, I like writing code. I like spending time online. I love to travel and want to visit all the beautiful places on earth. I’m interested in photography and love to capture the nature’s beauty in my small camera! I enjoy driving, let it be a bike or a car. I like listening to soft, melodious music, as well as these latest noisy albums. Last, but not the least, I am a great Cricket fan, but suffering from football fever after moving to UK :-) I also enjoy watching Formula 1 car racing every fortnight, during the season.

This is it for now. Hope you enjoy this site. I’ll appreciate your comments and suggestions. Feel free to drop a line or Sign my guestbook!
Thank you
Vyas

Latest SQL Server Articles:

Keep a check on your IDENTITY columns in SQL Server New
This article shows you how to keep an eye on your IDENTITY columns and find out before they run out of values, and fail with an arithmetic overflow error.

Scripting permissions of a database user or role in SQL Server 2005 New
This article shows you how to script or copy the permissions of an existing database user or role, using the new SQL Server 2005 security catalog views.

Making SQL Server 2005 SSIS packages portable New
This articles shows you how to use the new XML package confuguration files to make SSIS packages portable. It also provides you with downloadable templates for different XML configuration requirements.

Review of AdventNet SwisSQL Data Migration Tool 3.0 New
A review of an exciting new database migration product from SwisSQL.

SQL Server 2005 Replication Error : The Distributor has not been installed correctly New
This short article introduces you to a little bug in SQL Server 2005 (Yukon) replication. Though it is not a critical defect, it makes one of the SQL server replication jobs fail.

Review of Adventnet SwisSQL Sybase to SQL Server Migration Tool 2.1 New
A review of AdventNet SwisSQL Sybase to SQL Server Migration tool, that automates the migration of database objects from Sybase Adaptive Server Enterprise to Microsoft SQL Server.

Yazı kategorisi: Developer, SQL, Sql Server, Transact-SQL | » yorum bırak;

SQL SERVER – Random Number Generator Script – SQL Query

Yazan: esersahin 25/11/2008

http://blog.sqlauthority.com/2007/04/29/sql-server-random-number-generator-script-sql-query/

Random Number Generator

There are many methods to generate random number in SQL Server.

Method 1 : Generate Random Numbers (Int) between Rang

---- Create the variables for the random number generation
DECLARE @Random INT;
DECLARE @Upper INT;
DECLARE @Lower INT

---- This will create a random number between 1 and 999
SET @Lower ---- The lowest random number
SET @Upper 999 ---- The highest random number
SELECT @Random ROUND(((@Upper @Lower -1) * RAND() + @Lower), 0)
SELECT @Random

Method 2 : Generate Random Float Numbers

SELECT RAND( (DATEPART(mmGETDATE()) * 100000 )
                + (DATEPART(ssGETDATE()) * 1000 )
                + DATEPART(msGETDATE()) )

Method 3 : Random Numbers Quick Scripts

---- random float from 0 up to 20 - [0, 20)
SELECT 20*RAND()
    -- random float from 10 up to 30 - [10, 30)
SELECT 10 + (30-10)*RAND()
            --random integer BETWEEN 0 
        AND 20 [0, 20]
SELECT CONVERT(INT, (20+1)*RAND())
            ----random integer BETWEEN 10 
        AND 30 [10, 30]
SELECT 10 CONVERT(INT, (30-10+1)*RAND())

Method 4 : Random Numbers (Float, Int) Tables Based with Time

DECLARE @t TABLErandnum float )
DECLARE @cnt INTSET @cnt 0
WHILE  @cnt <=10000
    BEGIN
    SET @cnt @cnt 1
    INSERT INTO @t
        SELECT RAND( (DATEPART(mmGETDATE()) * 100000 )
                    + (DATEPART(ssGETDATE()) * 1000 )
                    + DATEPART(msGETDATE()) )
    END
SELECT randnumCOUNT(*)
    FROM @t
    GROUP BY randnum

Method 5 : Random number on a per row basis

    ---- The distribution is pretty good however there are the occasional peaks.
    ---- If you want to change the range of values just change the 1000 to the maximum value you want.
    ---- Use this as the source of a report server report and chart the results to see the distribution
SELECT randomNumberCOUNT(1countOfRandomNumber
    FROM (
    SELECT ABS(CAST(NEWID() AS binary(6)) %1000) + 1 randomNumber
        FROM sysobjectssample
    GROUP BY randomNumber
    ORDER BY randomNumber

Reference : Pinal Dave (http://www.SQLAuthority.com)

Yazı kategorisi: Random Number, SQL, Sql Server, Sql Server Join, Stored Procedure | » yorum bırak;

Using TOP and ORDER BY with UNION / UNION ALL

Yazan: esersahin 25/11/2008

http://sqlblogcasts.com/blogs/tonyrogerson/archive/2007/05/18/using-top-and-order-by-with-union-union-all.aspx

Got asked this from a friend, I have a column treeID that has values 1 – 5 and for each value I require 20 random rows, so 20 for 1, 20 for 2 etc… Is there a query that will do this other than calling the stored procedure 5 times?

First thought is to use 5 queries and a UNION ALL, however the syntax does not support it ->

select top 20 somedata

from #blah

order by newid()

union all

select top 20 somedata = somedata + 1

from #blah

order by newid()

Gives this error ->

Msg 156, Level 15, State 1, Line 4

Incorrect syntax near the keyword ‘union’.

To get round this problem we can use derived tables, think of these as a self-contained table, a bit like a temporary table within the query.

select somedata

from (

    select top 20 somedata

    from #blah

    order by newid() ) as d

union all

select *

from (

    select top 20 somedata = somedata + 1

    from #blah

    order by newid() ) as d

The NEWID() gives a different GUID per row, unlike CURRENT_TIMESTAMP for instance which is evaluated once regardless of how many rows (you always get the same value), NEWID() evaluates for every row.

You can probably do something with a Common Table Expression as well, any takers?

Congrats to Wanderer (see comments) on the first to give me a CTE example, nice!…

WITH Top20Blah (SomeDataOut) as (

    select top 20 somedata

    from #blah

    order by newid() )

select 1, *

from Top20Blah

union all

select 2, *

from Top20Blah

union all

select 3, *

from Top20Blah

union all

select 4, *

from Top20Blah

union all

select 5, *

from Top20Blah

It just gets better, here is a CROSS APPLY example (see comments below) from Adam Machanic, I never even thought about CROSS APPLY – perfect use for it.

select p.somedata, x.treeid

from (   select 1

         union all

         select 2

         union all

         select 3

         union all

         select 4

         union all

         select 5

        ) x (treeid)

    cross apply (

           select top 20 somedata

           from #blah

           where treeid = x.treeid

           order by checksum(newid())

        ) p

 

Got another I’ve just worked out myself…

select *

from (

    select somedata,

           treeid,

           row_number() over ( partition by treeid order by newid() ) as rownumber

    from #blah

    ) as r

where r.rownumber <= 20

order by treeid

Yazı kategorisi: ORDER BY, SQL, Sql Server, TOP, Transact-SQL, UNION, UNION ALL, newid() | » yorum bırak;