Common Concepts of Programming Part 2 – Data Types

Introduction

In the prior article, I introduced you to some common concepts and terms in programming. In this article, I’m going to expand on that a bit further. You’ll learn about data types and their affect on functions and methods.

What is a Data Type?

Simply put, a data type is an indicator of what sort of values are used in a particular situation. The simplest way to examine this is to consider variables. In the last article, we learned that a variable is a placeholder for some piece (or pieces) of data. It can be any sort of value we wish to store. But we must understand that there are different types of values. This shouldn’t be entirely surprising. Your first name and last name may be similar types of values, but your birthday is NOT.

What I mean is that if you wanted to store your name into some piece of memory on the computer, your first name and last name would consist of similar types of data; characters. You can’t add your name, or subtract your name. You can really only check if your name matches a certain value, or see if your name contains certain characters. However, there are mathematical functions that can be used with your birthday. We can see how many years have passed since you were born, how many days it is before your next birthday, and more.

Data types give us a way to store and process these pieces of data be forcing them to maintain a standard and structure. In that way, we can know what operations we can perform with that data, and just as important, we can know which operations we CAN’T.

Basic Data Types

There are a few common categories of values, and those common categories can then be broken down further into the specific data types that programming languages can handle. It’s important to note that although the general concept of data types apply through programming, some languages will differentiate data types further than others. We’ll talk about that more in a bit, but let’s start with the general categories.

Numeric
Fairly straight forward in concept, numeric data types are exactly what they sound like: those that hold numbers. Whether it’s a whole number (1, 2, 3) or decimal values (0.1, 0.2, 0.3), numeric fields are obviously important because they serve as the foundation of performing computations – and that’s a primary purpose of computer programs.
Character
Character data refers to everything from alphabetic characters (a – z) to punctuation (?,!) to numbers… wait, numbers, too? Yes. Technically, a number can be part of a character data set. In this capacity, you don’t generally use them for mathematical calculations. Your phone number doesn’t play much of a part in a mathematical formula; it doesn’t have to do with alphabetic order, it doesn’t translate into the position of your house… adding 1 to a phone number doesn’t give you any indication of who that number is or where that person can be found; not directly, anyway. So a number as a character is more about recording a value, not about performing a calculation.
Date and Time
Date and time are generally treated as another explicit category of data type, separate from numeric or character data. This is because there are aspects of both numeric and character data combined in these forms of data. You can use dates for calculation, but they can also be about recording a discrete value and not about specific mathematical functions.
Boolean /ˈbo͞olēən/
Boolean values are both a utterly mundane yet very important form of data. They’re utterly mundane in that a boolean value contains one of two values, and that’s it. Their importance is that the two values are TRUE and FALSE. This might not seem like such a big deal, but realistically, they are the basis of every single logical step of a program. Comparing data and determining if it meets certain criteria ultimately always results in a form of a TRUE / FALSE value, and as a result, a boolean value is essentially the indicator of what way to proceed.

Interestingly, you can also look at booleans as a representation of ON and OFF, 1 and 0, YES and NO, and any other bipolar choices. In fact, the entire idea of how computers work is a form of a boolean. Power flowing through a circuit means it is on… this also means that the presence of power is 1, indicates a value that is true, and a value that is YES. If there is no power flowing, it is FALSE, OFF, 0 and NO.

Even more interesting, next time you are in your kitchen or bathroom, take a look at some of your appliances (small ones like blenders, hair dryers, etc). You may notice that the power switch has a 0 and 1 on it, to indicate Off and On. In some cases, the power switch will have a symbol that looks like a 0 with a 1 running through it; those tend to be found on switches that are push button, where as flip buttons tend to have the separate symbols.

Now you have some idea of what they do that. Your appliance is on or off. This will make for a great example later on with our programming.

Specific Data Types

Now that we have a starting idea of how data is defined, we can talk about the specific data types that exist. We’ll start by talking about the data types associated with numbers.

Numeric Data Types

The numeric data types, as described above, cover the range of data used in mathematical calculations. That means everything from simply addition and subtraction all the way to complex formulas like those used in calculus. Don’t worry, we’re not going to make you do all sorts of crazy math problems here. In fact, the whole point of using the computer is to reduce that load to a manageable format.

Integer (Int or Int32)
Integers are whole numbers. That is straight forward enough. The limit to use of integers is that they can only be up to a certain size. That size is a total of 4,294,967,296 values. What that means is that you are limited to what the largest (and smallest) numbers are that you can hold. Integers, as with all numeric fields in programming, are either signed (meaning they have a positive or negative) or unsigned (meaning only positive numbers are possible). If you’re using an unsigned integer, that means you have a range from 0 to 4,294,967,295. If you’re using an unsigned integer, you can hold from -2,147,483,648 to 2,147,483,647.

An integer is created as an INT, and an unsigned integer is created as a UINT
Long (Long or Int64)
A long is a second form of a whole number. The difference is that a long is actually able to handle a much larger range of numbers. The range of numbers allows for up to 18,446,744,073,709,551,616 values. So unsigned long values can go from 0 to 18,446,744,073,709,551,615 ; signed long values can go from -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807. Variables of these types are created as LONG and ULONG.
Short (Short or Int16)
A short is a whole number with a range of 65,536 values. They are divided up similarly to the INT and LONG formats, and can be declared as SHORT and USHORT.
Byte (Byte and SByte)
Not quite as commonly used in every day calculations, the byte is a holdover, to some extent, from the early days of having to manually program hardware. A byte is a block of 8 binary bits. That means a range of values from 0 (represented as 00000000 in binary) to 255 (11111111 in binary). Binary is a separate lesson to work through, so for now it’s enough just to be aware that bytes are a data type that exist. One difference between these and the other forms above is that a byte is unsigned by default, and to get a signed version, you need to indicate sbyte; this is as opposed to all the other types that are signed in their default form with an explicit override for unsigned.
Decimal
Of course, we don’t only deal with whole numbers when it comes to mathematics in programming. Otherwise we’d never do anything with monetary values because, honestly, that would be a mess to keep organized. The decimal is a standard format for numbers that have, well, decimals. A decimal number has a range of -79228162514264337593543950335 to 79228162514264337593543950335. But wait, where’s the decimal in that?

Well, the decimal concept gets a bit trickier, because let’s remember that now instead of a fixed increment going up or down (whole numbers are fixed increments going up 1 whole unit or down 1 whole unit) we now have to consider all of the decimal places that appear before OR after a number, so they sizing can be a bit crazy. The fact is that you typically see an ability with a decimal number to go out to 28 or 29 places (before or after a decimal point, or combined before AND after). That means that a value like 0.0000000000000000000000000001 (28 places after the decimal) will actually print as it is. If you have one more decimal place before that 1, the one will disappear off the number. Likewise, if you add numbers BEFORE the decimal, the number of spaces AFTER the decimal will also be truncated (cut off).

0.0000000000000000000000543210
2.0000000000000000000000543210
22.000000000000000000000054321
220.00000000000000000000005432
2200.0000000000000000000000543
22000.000000000000000000000054
220000.00000000000000000000005

Line 1 and Line 2 have the same number of places, so even changing the ones place from 0 to 2 doesn’t do anything. However, adding a second place in front of the decimal means that a place at the end of the number is truncated. Keep in mind that truncation is NOT rounding. It is literally the place where the number just drops off.

While it is possible to have longer numbers using scientific notations, for the purposes of programs we’d write, this is more than sufficient, so we will not worry about those specialty cases. If you’re writing a recipe program, a check book program, and even a video game, the chances of getting to numbers more precise than this is limited at best.

Frankly, if you have a recipe calling for a measurement more precise than 0.0000000000000000000000000001 teaspoons of an ingredient, I don’t think I’m going to be following your recipes.
Single and Double
As is the case with the whole numbers, there are a few variations of these decimal numbers as well. The SINGLE and DOUBLE value are versions of a decimal that store numbers with lower limits to the number of places. A single (sometimes called a float) can hold 6 to 9 places, and a double holds 15 to 17).

So the question is probably brewing in your head: why all these versions of numbers? The answer is that in earlier days of programming, computer memory and processing power was far more limited than what it is today.

The processing power of machines at the advent of computing would seem almost laughable now. In fact, the average hand-held calculator holds as much processing power as some of the earliest computers. If you have a smart phone, you probably are holding more processing power in your hand than what the government had available in some of it’s best machines in the early 1950s.

Concerns of storage and memory space and processing power made it critical that values were only made large enough to hold what was really necessary. And it still makes sense to do that today. Why get a 3 ring binder capable of holding 1000 sheets of information if the only thing you intend to store is a piece of paper with your first, middle and last name?

While we don’t need to be quite so strict today in limiting use of memory as was needed 50 or more years ago, it still does slow down and use up the resources of a computer each time you declare a variable. While it might not be significant in one case, thousands of variables could begin to tax a computer over time.

So these different size data types still exist so we can program for what we will need, reasonably, in the future. And if we find that we do need to expand a variable’s size at some later point, it’s not a horrible ordeal in most cases to do that.

For most of the examples we’ll look at for the foreseeable future, we’ll use Int and Decimal values. If there is a special case where we need to use something else, we can review the reasons why at that time.

Character Data Types

As was the case with the numeric data types, there are a few different character data types to consider as well. For our purposes, we’re just going to look at two types for now.

Char
You might hear people pronounce this like the word describing the burned / black portion of barbecued meats. While people will probably overlook that, the truth is that the word is actually pronounced like “care”. It’s actually short for Character, and it is exactly what it sounds like – a single character value. The uses of single character fields might seem minimal, but in actuality, you can use them for quite a few purposes. For now, though, we’ll look at the more common data type.
String
The string is actually a more commonly used character-based data type in many languages. It is a block of character data, and can be used to hold up to 2 billion characters. Yes, you read that right, 2 billion character is the maximum theoretical limit. I saw theoretical, because it’s highly unlikely you’ll actually allocate that much, and frankly, it’d be bad to do so. 2 billion characters is a LOT of data, and having to load and process that much in a single value would be taxing to nearly any computer system. Even if you’re writing a novel that’s about 350 pages, you’re not going to get to those sorts of numbers. A typical page on a standard hard-cover book is about 300 to 350 words. At an average of 5 characters per word, plus one for a space, that translates into 1500 to at most 2100 characters. Even if you double that and say 2400 characters on a page, that still means 735,000 characters in total for a book. That doesn’t even come close to 1% of our memory max limit for a string.

So a string is a pretty versatile form of data in it’s capacity. Strings can be used for storage, can be searched, and compared against other strings. That allows us to do the vast majority of what we need to write most programs. We will use this data type a LOT.

Data Types and Functions or Methods

We’re just going to introduce this concept briefly for now, because there is a lot more we can and will learn about when it comes to data types. But it’s important to understand that data types have importance beyond use for variables. They also have significance in use with Functions or Methods.

When it comes to producing code, we tend to rely on functions and methods, as described in the prior article, to preform multiple tasks or steps that need to be repeatable. In other words, a task to calculate someone’s age based on a birthday is something we don’t want to have to write the code for manually EVERY time we need to use it; for one thing, programs that are meant to handle multiple people, a growing list of them, would never be manageable this way. For another, doing anything like that would defeat the purpose of using a computer at all – automation of process is a big factor in why computers are so valuable.

If we’re going to use functions or methods, it is likely that we’ll need to get feedback or results from those functions and methods at least some of the time. Data types are a part of that process, because they indicate what type of data will come back from a particular function or method. For example, a method that performs a calculation of the age of a person based on their date of birth could just spit out a number to a screen. But what we need that number for some other process later on?

The function could be set up to return the value to us, and we could decide what to do with it later. Initially, many methods are set up with a return type of VOID. Void means there is no return type. We perform some process and then move on. But if we had our age calculating function, we would set it up to take in a date, and return an integer indicating how old the person is; perhaps in years, perhaps in day; maybe it would go deeper depending on the time of birth.

But the fact is, we can set up our functions or methods to indicate what type of data will come back from a set of tasks. When we write code like this, we declare the return type of the method or function to use one of the aforementioned data types. How we do that is something we’ll cover soon. But for now, it’s enough to know that these pieces fit together. The how we’ll cover in time.

Conclusion

This is a good place for us to stop; there is a lot to absorb before we get into writing actual programs, so it’s important that we have groundwork laid out before we just start typing stuff up on our screens. The next article will discuss the actual anatomy of a program in more depth, and we’ll look at the structure of blocks of code and talk about how the terms of part 1 and the data types in part 2 fit into that picture. Once we’ve done that, we can begin to make some serious effort at writing a simple program, and know what we are doing, and why.