Sunday, February 19, 2012

*unsigned* 32-bit column type?

I need to stored lots of unsigned 32-bit values. (I'm actually storing IPv4 addresses). These values will frequently exceed 2^31 but never exceed 2^32

My options are:

Use bigint. This is undesirable since this wastes 32-bits of space for every value.
Use int and use the negative value range to get full 32-bits worth of data. In C, it is easy and fast to cast a signed -1 to an unsigned 2^32-1. In SQL, it will be more expensive: Must cast to 64-bit and if val < 0 then val = 2^32 + val.

Is there a better alternative?

Is there a fast (binary) way to cast a value > 2^31 to a negative signed value?

DECLARE @.value BIGINT
SET @.value = 3000000000

SELECT CONVERT(INT, @.value) -- causes error
SELECT CAST(@.value AS INT) -- causes errorIf it's IP addresses, why don't you break them into parts and store each part in TINYINT? For any sub-mask value you would take only 4 bytes.|||The quick and dirty answer is to do a C union to make both the UINT 32 and INT 32 values easily accessible.

-PatP|||If it's IP addresses, why don't you break them into parts and store each part in TINYINT? For any sub-mask value you would take only 4 bytes.

The quick and dirty answer is to do a C union to make both the UINT 32 and INT 32 values easily accessible.

Thanks guys. I'll stick with the signed 32-bit column and some ugly code to convert between unsigned 32-bit values.

Using 4 columns makes for uglier joins and queries. And Pat, I don't quite follow your approach.

CREATE FUNCTION parseIPv4(@.ipString VARCHAR(30))
RETURNS INT AS
BEGIN
DECLARE @.dotIndex1 INT
SET @.dotIndex1 = CHARINDEX('.', @.ipString)
DECLARE @.dotIndex2 INT
SET @.dotIndex2 = CHARINDEX('.', @.ipString, @.dotIndex1 + 1)
DECLARE @.dotIndex3 INT
SET @.dotIndex3 = CHARINDEX('.', @.ipString, @.dotIndex2 + 1)

IF @.dotIndex3 < 1 RETURN 0

DECLARE @.section1 BIGINT
SET @.section1 = CAST(SUBSTRING(@.ipString, 1, @.dotIndex1 - 1) AS BIGINT)
DECLARE @.section2 BIGINT
SET @.section2 = CAST(SUBSTRING(@.ipString, @.dotIndex1 + 1, @.dotIndex2 - @.dotIndex1 - 1) AS BIGINT)
DECLARE @.section3 BIGINT
SET @.section3 = CAST(SUBSTRING(@.ipString, @.dotIndex2 + 1, @.dotIndex3 - @.dotIndex2 - 1) AS BIGINT)
DECLARE @.section4 BIGINT
SET @.section4 = CAST(RIGHT(@.ipString, LEN(@.ipString) - @.dotIndex3) AS BIGINT)

DECLARE @.result BIGINT
SET @.result = (@.section1 * 0x1000000) | (@.section2 * 0x10000) | (@.section3 * 0x100) | (@.section4)

DECLARE @.t INT

-- If the value can fit in a signed 32-bit value
IF @.result < 2147483648 RETURN @.result

-- The value can't fit in a signed 32-bit value. Use negative space.
-- This is what would happen if the value was binary cast from unsigned to signed.
RETURN CAST(@.result - 4294967296 AS INT)
END|||What I meant was:// ptp 200408021738 demo for dbforums.com

#include <stdio.h>

void main()

{
union bar {
unsigned int u;
int s;
} foo;

foo.u = 4294967293;
printf("%d\n", foo.s); // outputs "-3"
}
-PatP|||What I meant was:

Ah, I see what you are saying. In C you can also just do a binary cast:

unsigned int value = 0xFFFFFFFF;
int signedValue = (int) value; // will be -1

In a T-SQL stored proc, I don't see an equivalent. I'm stuck using if statements and subtraction:

DECLARE @.value BIGINT
DECLARE @.signedValue INT

SET @.value = 4294967295

IF @.value < 2147483648 SET @.signedValue = @.value
ELSE SET @.signedValue = @.value - 4294967296

SELECT @.signedValue -- Will be -1

This works but I had a nagging feeling that there was a cleaner solution... Maybe not.|||Uglier joins and queries? Have you tried it?

begin tran
go
create function fn_RogerWilco(
@.section1 bigint ,
@.section2 bigint ,
@.section3 bigint ,
@.section4 bigint ) returns bigint
as begin
declare @.result bigint
set @.result = (@.section1 * 0x1000000) | (@.section2 * 0x10000) | (@.section3 * 0x100) | (@.section4)
return @.result
end
go
create table RogerWilco (
section1 tinyint not null,
section2 tinyint not null,
section3 tinyint not null,
section4 tinyint not null,
IPAddress as dbo.fn_RogerWilco(section1, section2, section3, section4) )
insert RogerWilco (section1, section2, section3, section4) values (192,168,1,1)
select * from RogerWilco
rollback tran|||In C you can also just do a binary cast:

unsigned int value = 0xFFFFFFFF;
int signedValue = (int) value; // will be -1

I think that you are making this orders of magnitude more difficult than it needs to be. Any flavor of four-byte int reaches the SQL Server as a signed int... It is the only way that they come. If you send any four-byte int down the wire, you'll get back the same bit pattern regardless of how you reference it within your client code. If you use either %d or %u to format it, you'll format the strings as signed or unsigned, regardless of how the int values are declared. The only reason that signed/unsigned makes any difference at all is if you want to do math on the server and have it match math done on the client, if you don't care about that, then nothing else matters!

-PatP|||i prefer the rdjabarov solution myself
i worked on a ip storage question where they had stored it as varchar...(yuck, first timers)
you can however write a pretty complex little function to break apart an ip address using substring and case.
either way i would be very interested to see the storage in a single column with a simple way to use the decimal storage inplace and in the context of an ip address not a decimal.

"The suspense is terrible. I hope it will last." Willy Wonka|||I'm confused. You see value in the database engine being able to address the octets of an IP address? So would a BINARY(4) be what you want, or four separate TINYINT columns, or something different?

I guess I'd never given real thought to allowing the database engine to manipulate the IP addresses directly. I know that it can be done, I'm just not sure that I can see the value in doing it.

-PatP|||I'm confused...I'm just not sure that I can see the value in doing it.Me neither. But maybe he knows something we don't? Do IP addresses actually mean something other than the fact that they belong to class A, B, or C and are unique for their sub-nets? Does converting them to signed/unsigned actually add value?|||Answering the question as given, you could use numeric (11, 0). In the database, it would not look like an IP address as such. Is your application going to decode the data?|||free at last, free at last...thank god almighty...

well you get the idea...

ahem...

anyone want to suggest that an ip address is NOT a number....|||Me neither. But maybe he knows something we don't? Do IP addresses actually mean something other than the fact that they belong to class A, B, or C and are unique for their sub-nets? Does converting them to signed/unsigned actually add value?There are ways you can manipulate IP addresses to obtain useful information if you know the network topology, such as knowing for any address A.B.C.D in your network, its router will be A.B.C.1 or similar operations. You might also know that for a device of class X, there will be a device of class Y at the IP address of X + 2 (for example in a cluster configuration).

I was thinking that RogerWilco wanted to ensure that the addresses computed would match between the client and the server. For that usage, since the server must use signed ints, it would help to be able to have signed ints available on the client too, but that is the only reason I can imagine.

Oh well, maybe RogerWilco will weigh in with some details that my feeble brain can't find at the moment!

-PatP|||anyone want to suggest that an ip address is NOT a number....What else is an IP address but a number? Over the years, folks have put some kind of funky spins on how it is used, but if you go back to Metcalfe's original writings you'll see that it really truly is just a number!

-PatP|||What else is an IP address but a number? Over the years, folks have put some kind of funky spins on how it is used, but if you go back to Metcalfe's original writings you'll see that it really truly is just a number!

-PatP

Do you do math on it?|||How else do you figure out which node is which in a cluster without doing math using the IP address? There might be a way to do it using some really arcane bit mapping dreck, but I'll stick with simple math myself.

-PatP|||What's the formula for it, Pat?|||In our configurations (which aren't standard), node N is the cluster IP + N. Granted that it isn't pretty, but it works.

-PatP|||Well, then why making it so hard when all you need to do is to increment a specific part? Or I missed your answer and it means to add to the signed representation of the IP address?|||Nah, you missed a bunch of stuff riddled throughout this thread. Brett was just freaking because I see an IP address as a UINT32 instead of an abstract data type. He couldn't imagine that I might ever do math on an IP address. Not only do I do simple arithmetic, but I do bitmasks and other fancy tricks too!

-PatP|||For what purpose?

How dynamic does this stuff have to be?|||I still don't see the formula...What do you add N to? UINT of the IP or a specific part of IP?|||If you think of the IP address as a 32 bit unsigned integer, that is what you manipulate. In nearly every case, that means that you'll only manipulate the low order octet. Very few clusters have more than four machines, and they almost never cross a subnet boundary (although in larger "server farms" they certainly could).

-PatP|||OK, I got it. So in regards to what I posted way up there, RogerWilco will have to have a translation back from signed to a 4-part number, huh?|||Most of the C routines that use IP addresses store them as an int or an unsigned int. The bit pattern itself is all that really matters, and ADO does the translation for me.

I suspect that we've made a mountain out of a mole hill. Unless RogerWilco has problems, I think we ought to let this thread die gracefully.

-PatP|||Yes, an IP address is just a number. But:

- There is frequently a need to convert the value in the database to/from the dotted notation that humans are accustomed to. When I do a query, I want to see the IP results in dotted notation format or I want to specify an IP in the WHERE clause in dotted notation. Obviously, a client app can do this kind of thing easily but I frequently need to write one-off SQL statements and run directly in Query Analyzer.
- I'd like to avoid strings since they are an inefficient use of space and are subject to various formatting errors.
- Frequently I want to join two tables together by IP, ORDER BY IP, or GROUP by IP. This is definitely possible but bulkier when the IP value is split across multiple columns: WHERE A.IP1 = B.IP1 AND A.IP2 = B.IP2 AND ...

I really never do anything with the numeric value of an IP other than equality tests. It would be nice to use the numeric value for "range" purposes, as in from 192.168.1.0 to 192.168.1.255 but that doesn't work quite right with signed values.

Thanks for all the answers, everyone! I don't think I've ever gotten this much of a response!

Here are the two TSQL functions I'm using to convert between dotted notation and 32-bit numeric:

CREATE FUNCTION parseIPv4(@.ipString VARCHAR(30))
RETURNS INT AS
BEGIN
DECLARE @.dotIndex1 INT
SET @.dotIndex1 = CHARINDEX('.', @.ipString)
DECLARE @.dotIndex2 INT
SET @.dotIndex2 = CHARINDEX('.', @.ipString, @.dotIndex1 + 1)
DECLARE @.dotIndex3 INT
SET @.dotIndex3 = CHARINDEX('.', @.ipString, @.dotIndex2 + 1)

IF @.dotIndex3 < 1 RETURN 0

DECLARE @.section1 BIGINT
SET @.section1 = CAST(SUBSTRING(@.ipString, 1, @.dotIndex1 - 1) AS BIGINT)
DECLARE @.section2 BIGINT
SET @.section2 = CAST(SUBSTRING(@.ipString, @.dotIndex1 + 1, @.dotIndex2 - @.dotIndex1 - 1) AS BIGINT)
DECLARE @.section3 BIGINT
SET @.section3 = CAST(SUBSTRING(@.ipString, @.dotIndex2 + 1, @.dotIndex3 - @.dotIndex2 - 1) AS BIGINT)
DECLARE @.section4 BIGINT
SET @.section4 = CAST(RIGHT(@.ipString, LEN(@.ipString) - @.dotIndex3) AS BIGINT)

DECLARE @.result BIGINT
SET @.result = (@.section1 * 0x1000000) | (@.section2 * 0x10000) | (@.section3 * 0x100) | (@.section4)

DECLARE @.t INT

IF @.result < 2147483648 RETURN @.result

RETURN CAST(@.result - 4294967296 AS INT)
END

CREATE FUNCTION formatIPv4(@.ipValue INT)
RETURNS VARCHAR(30) AS
BEGIN
DECLARE @.unsignedValue BIGINT
SET @.unsignedValue = @.ipValue
IF @.unsignedValue < 0 SET @.unsignedValue = @.unsignedValue + 4294967296

RETURN
CAST((@.unsignedValue / 0x1000000 & 0xFF) AS VARCHAR(4)) + '.'
+ CAST((@.unsignedValue / 0x0010000 & 0xFF) AS VARCHAR(4)) + '.'
+ CAST((@.unsignedValue / 0x0000100 & 0xFF) AS VARCHAR(4)) + '.'
+ CAST((@.unsignedValue / 0x0000001 & 0xFF) AS VARCHAR(4))
END|||I'd probably fall back to using BINARY(4) then, just because it is more efficient to convert and allows range comparisons. You could use something like:DROP FUNCTION dbo.f_b2s
GO
CREATE FUNCTION dbo.f_b2s(@.ipColumn BINARY(4)) RETURNS VARCHAR(15) AS
BEGIN
RETURN Convert(VARCHAR(3), Ascii(@.ipColumn)) + '.'
+ Convert(VARCHAR(3), Ascii(SubString(@.ipColumn, 2, 1))) + '.'
+ Convert(VARCHAR(3), Ascii(SubString(@.ipColumn, 3, 1))) + '.'
+ Convert(VARCHAR(3), Ascii(SubString(@.ipColumn, 4, 1)))
END
GO
DROP FUNCTION dbo.f_s2b
GO
CREATE FUNCTION dbo.f_s2b(@.pcDotted VARCHAR(20)) RETURNS BINARY(4) AS
BEGIN
DECLARE @.bWork VARBINARY(4)
, @.i1 INT
, @.i2 INT
, @.i3 INT

SET @.i1 = CharIndex('.', @.pcDotted, 1)
SET @.i2 = CharIndex('.', @.pcDotted, 1 + @.i1)
SET @.i3 = CharIndex('.', @.pcDotted, 1 + @.i2)

RETURN
CASE
WHEN @.pcDotted LIKE '%[^.0-9]%' THEN NULL
WHEN @.i3 < @.i2 OR @.i2 < @.i1 OR 0 = @.i1 THEN NULL
ELSE
CAST(Char(Convert(int, SubString(@.pcDotted, 1
, @.i1 - 1))) AS BINARY(1))
+ CAST(Char(Convert(int, SubString(@.pcDotted, @.i1 + 1
, @.i2 - @.i1 - 1))) AS BINARY(1))
+ CAST(Char(Convert(int, SubString(@.pcDotted, @.i2 + 1
, @.i3 - @.i2 - 1))) AS BINARY(1))
+ CAST(Char(Convert(int, SubString(@.pcDotted, @.i3 + 1
, 20))) AS BINARY(1))

END
END
GO-PatP|||free at last, free at last...thank god almighty...

well you get the idea...

ahem...

anyone want to suggest that an ip address is NOT a number....

OK i will byte.
I will suggest that an ip address is not a number..

it is a collection of 4 8bit sections (octets) that make up a network address. so each octet is a "number". because the ip address is not valuable without the subnet mask. 1921681100 as a number means something specific.
192.168.1.100 could be the 192 network or the 192.168 network or the 192.168.1 network etc ...
another problem is that with using numeric (11,0) each octet can contain a decimal equivalent of 1 through 255 so the periods (.) are necessary to decoding the address.

NOW technically if you broke each octet up into it's binary equivalent then each section would be an exact 8 character number consisting of ones and zeros.

rdjabarov,, becuase of cidr and vlsm the old classfull based system of ip addresses is a bit out of date. Classless addressing(Supernetting and subnetting) kind of make the classfull system a bit old fashioned and not the most effective.

My solution in this problem (considering all of the hassles) has always been to have 4 tinyint cols and display accordingly. not the most elegant solution, but it has always been the simplest and i am always one for bringing the app logic to the db

there , i think that about covers it.|||Ah brings tears to a data modelers heart

very nice...

And le the denormalization begin

And, which one are you?

http://www.techwise.com/techwise/index.cfm?ID=F66B888A-1D5D-4FAB-BE0BB5361EDC93BA|||Ah brings tears to a data modelers heart

very nice...

And le the denormalization begin

And, which one are you?

http://www.techwise.com/techwise/index.cfm?ID=F66B888A-1D5D-4FAB-BE0BB5361EDC93BA

front row third from the right.

i'm not in that picture.

as far as normalization goes, they are not rules but guidelines.
i am very dedicated to making my dbs as "normal" as possible but sometimes denormalizing can solve some certain problems but at the same time cause more**.

i realize that breaking up the ip address into 4 individual columns violates the entity rules but dammit it's so much easier than all the other work that has to be performed to treat the ip correctly.

**little known fact is that i invented the 2nd normal form. it was 1965 and i was bored with 1nf (i mean how many tables can you build??) so i tried to spice it up. i thought "Hey lets make more tables, it'l be cool" but shortly after i came out with it [i had ordered all the t shirts and the beer coozies and those arent cheap] codd and date come out with 3nf and steal all of my thunder. i had to eat it on the promotional 2nf pocket protectors and pens i even had my catchy saying and everything

"repeating groups, repeating groups, repeating groups, repeating groups"
catchy huh
"i'll admit it's not as good as the key the whole key and blah blah blah" but dammit those bastards ruined me... :D|||Hey, I'll buy some of your pocket protectors! They'll make great Christmas presents for all of my dba friends and co-workers!

-PatP

No comments:

Post a Comment