"Why is it necessary to classify data according to its type in computer programming?" [General Programming]

Prev: "Why is it necessary to classify data according to its type incomputer programming?"
Next: "Why is it necessary to classify data according to its type in computer programming?"

From: Pascal J. Bourguignon on 8 Jul 2010 04:47

Matt <30days(a)net.net> writes:

> On Wed, 7 Jul 2010 21:06:02 -0500, osmium wrote:
>
>>Fred Nurk wrote:
>>
>><Presumed question contained in the subject line. Questioner has a very
>>busy life.>
>>
>>A computer has no notion of context or meaning. A human seeing an integer,
>>a number containing a decimal point, and a number given in scientific
>>notation immediately and automatically classifies them and would add them in
>>a way appropriate to the form used for the number. If conversion from one
>>type to another is needed, the human can even see the need and make the
>>necessary conversion. A computer must be *explicitly* told the form.
>
> The type is implicit in the operator or function used.
>
> When else does it matter what type it is?
>
> I see no need (apart from ease of memory management for the compiler,
> which shuld be transparent to, not a constraint on, the user) to care
> what type of variable the characters "1.23" are assigned to. I may
> want to use that string as a number or as text in the same program.
> Why do I need to be burdened with type conversion as a separate step
> in my program?

Again, it depends on the operations you want to apply on these data.

I agree with you that for user problems, 1.23 or "1.23" could make no
difference (and indeed, there are several languages where such
conversions are done automatically).

The problem come when you want to benefit from the floating point
operations that have been implemented in the hardware to have them
quick. Then you need the number ni floating point representation, and
as soon as you operate on them you get other floating point numbers
that may be surprizing. Eg. 1.23 / 5 = 0.24600001

This can be explained only knowing what a floating point is, how the
floating point operations work, and why this is a correct result.
These are considerations about the representation type used here.

Indeed the users would not care, but for this in general, you need
more complex data structures even for simple numbers. The fact is
there is no nake number. If I tell you 37.9 that doesn't mean
anything. We need some context to interpret this number. If I add
the context that's the temperature of a patient expressed in Celcius,
you may be worried, and the computer could know that we don't want a
resulting compution about temperature to return 37.900001.

--
__Pascal Bourguignon__ http://www.informatimago.com/

From: tm on 8 Jul 2010 08:43

On 8 Jul., 08:50, Matt <30d...(a)net.net> wrote:
> On Wed, 7 Jul 2010 21:06:02 -0500, osmium wrote:
> >Fred Nurk wrote:
>
> ><Presumed question contained in the subject line. Questioner has a very
> >busy life.>
>
> >A computer has no notion of context or meaning. A human seeing an integer,
> >a number containing a decimal point, and a number given in scientific
> >notation immediately and automatically classifies them and would add them in
> >a way appropriate to the form used for the number. If conversion from one
> >type to another is needed, the human can even see the need and make the
> >necessary conversion. A computer must be *explicitly* told the form.
>
> The type is implicit in the operator or function used.
>
> When else does it matter what type it is?
>
> I see no need (apart from ease of memory management for the compiler,
> which should be transparent to, not a constraint on, the user) to care
> what type of variable the characters "1.23" are assigned to. I may
> want to use that string as a number or as text in the same program.
> Why do I need to be burdened with type conversion as a separate step
> in my program?

A type is a concept which adds a meaning to some data. Data can
be represented in different ways. E.g. the number 128 can be
represented as integer, floating point or string. You might not
care and you might want some automatic conversion but distinct
representations are a fact which cannot be discussed away.
Performance considerations led to a hardwired floating point
representation and memory considerations led to a packed
representation for strings. I wrote a little C program to show the
different representations:
====================================================================
# include "stdio.h"
# include "string.h"

int main (int argc, char *argv[])
{
int int32 = 128;
float float32 = 128.0;
char char32[4] = "128\0";

union {
int int32value;
float float32value;
char char32value[4];
} hardcast;

printf("int int32 = %d, sizeof(int32) = %d, bits in int32:
0x%08x\n",
int32, sizeof(int32), int32);
hardcast.float32value = float32;
printf("float float32 = %0.0f, sizeof(float32) = %d, bits in
float32: 0x%08x\n",
float32, sizeof(float32), hardcast.int32value);
memcpy(hardcast.char32value, char32, 4);
printf("char char32[4] = %s, sizeof(char32) = %d, bits in char32:
0x%08x\n",
char32, sizeof(char32), hardcast.int32value);
return 0;
}
====================================================================

This program writes:

int int32 = 128, sizeof(int32) = 4, bits in int32:
0x00000080
float float32 = 128, sizeof(float32) = 4, bits in float32:
0x43000000
char char32[4] = 128, sizeof(char32) = 4, bits in char32:
0x00383231

As you can see the bit patterns 0x00000080, 0x43000000 and 0x00383231
all can represent 128. The different representations have all their
assets and drawbacks. Arithmetic computations with character arrays
are much slower than doing it with integer representation. The three
representations do not describe the same possible values. E.g.
strings can contain letters also and floating point numbers can
represent rational numbers which integers cannot represent (there
are also integers which cannot be represented as floating point
of the same size).

You wanted type conversions to take place automatically behind the
scene without your intervention. That's okay, but you must keep in
mind that you have to pay a price for this:

- The first thing is performance. Conversions cost time and you
can expect that a lot of unnecessary conversions will occur
(because the run-time system will not always guess right).
- Data loss can happen. Conversions between floats and strings
lose accuracy since floats use binary representation and
strings use decimal coding.
- Some operations may behave in an unplanned manner. E.g. Instead
of concatenating two decimal integer strings they might be
added or vice versa (when both use the same operator symbol).
- It can happen that a functions runs across a value that makes
no sense (such as dividing an integer by a string). It is
possible to define some "logic" even for such a strange case
but it is probable much wiser to consider this as program
error. With the automatic type conversions it is harder to find
errors since the cause for an error and the place where it pops
up can be a long way away from each other.

More info about this problems can be found here:

http://seed7.sourceforge.net/faq.htm#static_type_checking

Greetings Thomas Mertes

Seed7 Homepage: http://seed7.sourceforge.net
Seed7 - The extensible programming language: User defined statements
and operators, abstract data types, templates without special
syntax, OO with interfaces and multiple dispatch, statically typed,
interpreted or compiled, portable, runs under linux/unix/windows.

From: Malcolm McLean on 8 Jul 2010 10:54

Many high-level languages carry type information about with them.

Whilst in some ways this makes programming easier, in many ways it
makes it harder. Many variables must inherently be integers, or
scalars, or strings. Expicit typing acts as a sort of documentation of
this, and attempts to, as ypu say, divide an integer by a string will
be caught at compile time.

From: tm on 9 Jul 2010 03:39

On 9 Jul., 03:20, p...(a)informatimago.com (Pascal J. Bourguignon)
wrote:
> Malcolm McLean <malcolm.mcle...(a)btinternet.com> writes:
> > Many high-level languages carry type information about with them.
>
> > Whilst in some ways this makes programming easier, in many ways it
> > makes it harder. Many variables must inherently be integers, or
> > scalars, or strings. Expicit typing acts as a sort of documentation of
> > this, and attempts to, as ypu say, divide an integer by a string will
> > be caught at compile time.
>
> But many other variables have no inherent type, and could be as well
> any number or of any other kind.
> See my factorial example in another answer!

When you use a dynamic typed solution for the 'fact' function
the following can happen:

- fact(15/4) might be wrong since for x < 1 you return 1, but
some people might prefer x instead (This might be a problem,
but lets assume that your definition of 'fact' is the right one).

- fact("abcd") will, in the optimal case, give you a run-time
error and in the not so optimal case some implied automatic type
conversion rule (or a "clever" * or - function) will give you a
wrong result which later leads to severe problems.

Why not make it explicit that you intend to reuse a function for
several types and specify exactly for which types you will do that.

In Seed7 you do exactly that: Specify a template which defines the
function 'fact' for the given type 'aType' (Note that the Seed7
template below is just a function with type parameter which contains
definitions in the body). After the template is defined it is
instantiated explicitly. This explicit instantiation is done on
purpose to improve the readability of the program. Please don't
praise implicit template instantiations since they save just one
line per instantiation.

====================================================================
$ include "seed7_05.s7i";
include "float.s7i";
include "bigint.s7i";
include "bigrat.s7i";
include "complex.s7i";

const proc: FACT_DECL (in type: aType) is func
begin

const func aType: fact (in var aType: argument) is func
result
var aType: result is aType conv 1;
begin
if argument >= (aType conv 1) then
result := argument * fact(argument - aType conv 1);
end if;
end func;

end func;

FACT_DECL(integer);
FACT_DECL(float);
FACT_DECL(bigInteger);
FACT_DECL(bigRational);
FACT_DECL(complex);

const proc: main is func
begin
noop;
end func;
====================================================================

When you try this program you will get the compile time error:

fact3.sd7(14):51: Match for {argument >= {complex conv 1 } } failed
if argument >= (aType conv 1) then

It tells you that >= is not defined for the type 'complex'.

*** SURPRISE, SURPRISE ***

Mathematicians will tell you >= cannot be defined reasonable for
'complex' values.

You get this error at compile time for free without heavy testing.

Now you can define >= for 'complex' or decide for something else.

Greetings Thomas Mertes

Seed7 Homepage: http://seed7.sourceforge.net
Seed7 - The extensible programming language: User defined statements
and operators, abstract data types, templates without special
syntax, OO with interfaces and multiple dispatch, statically typed,
interpreted or compiled, portable, runs under linux/unix/windows.

From: Paul N on 9 Jul 2010 08:33

On 9 July, 13:14, blm...(a)myrealbox.com <blm...(a)myrealbox.com> wrote:
> In article <87y6dnexi1....(a)kuiper.lan.informatimago.com>,
> Pascal J. Bourguignon <p...(a)informatimago.com> wrote:
>
> > Paul N <gw7...(a)aol.com> writes:
>
> > > Incidentally, I would be a bit surprised if a language actually
> > > provided a type for real numbers.
>
> > This is not impossible, using the right representation. Eg. continued
> > fractions. There are libraries, that you could integrate like gmp in
> > a given language.
>
> Are there libraries that allow representing *all* real numbers,
> including the irrationals? I'm trying to imagine how that could
> work .... ?

As I understand it, for the majority of real numbers, it would take an
infinite amount of space just to store one number. Which is why I
expressed my doubts. On the other hand, if it's a number that you are
interested in, it's probably one of the tiny minority that *can* be
represented in a finite space.

Of course, any computer using a fixed size for numbers can only store
a tiny minority of the integers and a tiny minority of the floating
point numbers, so we're used to getting half measures anyway.

First | Prev | Next | Last
Pages: 1 2 3 4
Prev: "Why is it necessary to classify data according to its type incomputer programming?"
Next: "Why is it necessary to classify data according to its type in computer programming?"