When we start learning about programming, it’s very common to think of Arrays and Array in Programming, probably the Adam & Eve of Data structure, as the most important one.
Maybe it is.
Oh well, maybe it isn’t…
Sure, you’d have a tough time looking for a software code which doesn’t rely on an array. It is most fundamental and commonly used data structure across all programming languages.
Mostly, the programming languages have built-in support for simple arrays. And by that I mean, no more of linking external framework, library or package into languages to make arrays work.
It is simply made available to us.
But, it’s natural to think: what makes array so popular? What makes it tagged as a data structure with an omnipresent superpower?
Every superhero has an origin, its powers, and weakness, just like our Array.
Note : For Python folks, a built-in array is known as a list. But all the points discussed here works in the same fashion until mentioned otherwise
Starting with big but the most obvious question:
What is an Array?
I’ll give you a short and a long answer to this question.
An array can be defined as an ordered collection of items indexed by contiguous integers.
Let us begin to break the technical jargon in the long answer:
Arrays closely relate to a way how books store information. Let me give you a small example.
In the above diagram, I’ve listed our Nameless book foreword and following four chapters of the book. Now imagine, if I ask you to summarize the story of this Nameless book.
How would you proceed?
An obvious logical and sane way would be to start from foreword all the way to chapter four. Nobody (I hope) looks at the chapter titles in the index page and goes, “Hmm, that looks like an interesting chapter title, let me start with chapter 3 first”.
Here, the order is an extremely important factor in not only organizing but making sense of information/data we’re reading. Just like book index makes our search highly efficient and helps us jump to any book section, arrays come with their own index numbers.
Decoding The Structure Of An Array:
Let’s say we want to store the score of top 5 players in some video game. This can be accomplished by declaring 5 different variables like below:
var score1= 56 ; var score2 = 50 ; var score3 = 34 ; var score4 = 24 var score5 = 12;
Or we can group the score information and use an array. Arrays contain multiple individual elements wrapped inside one big container. So this container is given a name, the individual elements inside the array are not.
Scores = [56, 50, 34, 24, 12]
//Scores in an array container holding individual elements which have no identify(name) of their own
This looks great. We don’t have to declare so many variables. But, how do we access them now? With individual variables, all we had to do was use the name given to them. But, elements inside array has no name.
Accessing Elements Of An Array:
Remember, our book example where we spoke about index and order. In an array, each element inside the array has an index. The index is nothing but a number. But, the index is not random. It follows an order or a sequence.
The index starts at zero and goes up one at a time. But there are few programming languages like Lua, Cobol where array index starts at 1. But for clarity, all our discussion in this series will assume an array to have a zero-based index.
Hence, array gets a reputation for being an ordered collection of elements/items.
//Books in an array container holding individual elements which have no identify of their own
Books = [‘Foreword’, ‘Chapter 1’, ‘Chapter 2’, ‘Chapter 3’, ‘Chapter 4’]
//Accessing elements of an array with index numbers
Books = 'Foreword'
Books = 'Chapter 1'
Books = 'Chapter 2'
Books = 'Chapter 3'
Creating & Initializing an Array:
Arrays can be declared in different ways in different programming languages.
Let’s see how arrays are represented in JAVA. It has two steps:
Step 1: Creating/Declaring An Array:
In JAVA, an array can hold similar data types elements. It means no grouping of types like int or float together.(Only Python supports different datatype array)
An array declaration has two major components:1) Type 2) Name
type arr-name; OR type arr-name;
The type component of array declaration tells us what type of element/data will be stored in it. Data types could be primitive ones(int,char, float,long, double) or user-defined data type(objects).
int intArray; OR int intArray;
So, far, we’ve just created an array variable named intArray(you can pick any name, but sensible one) with a rule saying this array will strictly hold only integers(int datatype).
But array will store real data and for that, we need to assign it some memory. This gets us to step 2.
Step 2: Instantiating an Array with new keyword:
array_name = new type [size];>
- type specifies what kind of data to be used. Integers in our case.
- size specifies numbers of items/elements/data our array will hold
- The magic keyword new allocates the memory we need to store the data.
int intArray; //declaring array with type and name
intArray = new int; // allocating memory to array with new
int intArray = new int; // Or combining both in one statement
Done! We’ve successfully declared and instantiated an array.
Do you guys remember the part where we spoke about the size of elements? If we say array must hold 5 elements, we gave it a size of 5.
But, the world we live in a world is surrounded with questions like What If.
Let’s talk about Resizable arrays:
Also fondly known as Mutable(changeable) array or Dynamic array
// Simple Fixed Size Array In JAVA
String  fixedArr = new String;
fixedArr = "Cannot";
fixedArr = "Grow";
But for resizable version, JAVA offers a couple of classes, best one is known as ArrayList. Like I mentioned at the beginning of this article, simple arrays are built in into languages & there’s is no need for any import.
A resizable array is something not present in the language of JAVA. Here, we will need to link to an external framework and instantiate it. So instead of creating an array, we create an ArrayList object.
//we need to import this as JAVA doesn't provide inbuilt support
// create an array list
ArrayList resizable_arr = new ArrayList();
// add elements to the array list
ArrayLists can be created totally empty, or with an initial capacity or even some initial values.After creation of this ArrayList, we can use methods to add and remove elements from it. Something we can’t do with a standard array.
Important decisions to take when adding elements:
Adding new elements in the array at different position involves a different kind of heavy-lifting work. Generally, adding a new element at the end of an array is an easy-peasy, faster and requires less work. All we have to do it go one number higher to the last index number on the array.
But, what if we want to add a new item to a very specific position?
Let’s say we want to add the value 100 at position two and not at the end of the array. This will require elements to be shuffled around, reorganizing and renumbering the indexes.
These things will be automatically taken care by programming languages.
But just because somethings are happening under the hood doesn’t mean we turn a blind eye towards the fact that this does have a performance impact. Larger the size of the array, then more items will need to be shuffled around and re-indexed.
The rearrangement of data is implemented differently from language to language. If the array size isn’t very big, some languages will attempt to reshuffle them in place. But if the array size is too big, a brand new array will be created and all the old contents will be copied to new one organizing things as they go.
Different terms used for adding items to arrays:
Like I mentioned before, — generally if not mentioned otherwise — array methods/functions in programming languages add newest elements to the end of the existing array.
Different programming languages use variations of words like add, push or append for method names.
However, when we don’t want to add a new item not at the end, but at some specific position, we may need a new word.
In Java, the method is still add but we also have to provide an index for what position should our array accommodate the newest element.
Similarly, to remove elements from the array, we use a variant of words like remove, delete or pop in different languages.
Okay, I know what you might be thinking. This is an information overload.Where am I going with all these information?
Believe me, if this information had relevance only to arrays, I wouldn’t have written these wall of text. But this is the beginning of an understanding of the things we will encounter in any data structure we study.
Different Operations in Data Structure
In any data structures, there are 5 major aspects we will often come across:
- Access An Element
- Insert An Element
- Delete An Element
- Find An Element
- Sort An Element
While studying any data structure, these are five fundamental behaviors we might need to figure out. And these behaviors will roughly cover 99% of everything need to know about any structure. But not all data structure support all five behavior. For example, many data structures don’t support searching behavior.
Coming back to arrays, we’ve already seen all behavior of arrays except finding & sorting so far.
Earlier, when we spoke about the order of arrays, we had concluded that order of information is extremely important here. But, the indexes of the array were ordered data like zero, one, two and so on.
Not the contents of the array.
But it isn’t unusual to want the option to sort these above integers numerically in ascending or descending order or bunch of strings alphabetically.
Sorting will just involve a reshuffling of contents in the array, not the structure or the value of indexes.
Sorting of an array is generally made available to us built-in by programming languages. Often we need to call a method like array.sort() or could be a utility function where we pass our array as a parameter.
Either way, it’s not more than one line of code. But, just because something is inbuilt and we can’t see what’s happening in the background, doesn’t mean it won’t affect our performance.
Sometimes, we need to think beyond one line of code. How much data are we asking our language to sort? It could be dozens, thousands and tens of thousands of data. And how often are we calling this sort function?
Some programming languages may look at how big your array size is and internally switch between various different sorting algorithms like quicksort, heap sort etc. Or some may prefer to create a new sorted copy of the original array.
This is not a place where I am going to list of pros or cons of any of these techniques. But, whenever, you sort an array or any data structure in fact, you should always be remembering our golden rule.
Sorting is always computationally intensive. We need it for our use-case but always think how much data we have and how often we will call this sort.
Such questions lead us into choosing different data structures based on their sorting efficiency.
This gets me to the final behavior of our data structure.
How to find search/find An Element In An Array:
So, how are we supposed to find an element in an array? Let’s say I want to know if integer 3 exists or the index at which it resides, how do we go about looking for it?
One obvious way is to go through each item in the array and see it for ourselves if element 3 exists in the array.We can use something like this below pseudo code
set i to 0
while i < array.length
if array[i] == 3
add 1 to i
- Initial index value(i) is set to 0 as we are using zero-based indexing.
- Followed by looping across the array as we need to check each and every element to see if 3 exists in our array. You can use for, while, for each or do while loops to accomplish this task as long as it doesn’t go beyond array length looping
- Inside the loop, we check whether the element matches the one we are looking for. If we get it, we return true and that’s a happy ending to the story.
- But, if not, we will increment or add 1 to our index(var i in our case) and keep going around the loop
- Till we either find our element(i.e we get output like true) or we get to the end of the loop where we will get an output of false.
If we want a fairytale ending to this search, the best case scenario we can hope for that element we are searching for resides at index number 0. We find our element at first round. End of the story.
Dealing with Worst-Case Scenario:
But, as a programmer, we always deal in a worst-case scenario. Not everyone gets a fairytale ending.
In our worst-case scenario, there is a possibility that element which we were searching wasn’t part of the array. We will still have to look at each and every element. 8 elements in our case, not a big number.
But, what if, we are dealing with tens of thousands of numbers and we looped through all of them and couldn’t find our number?
What we did above with looping through each and every element is known as linear/sequential search. I, personally like to call it highly inelegant brute-force method. We go through everything, one at a time.
For our initial 8 elements search, this method is a huge success. I agree.But when it comes to searching elements in a pool of tens of thousands of elements, this is extremely slow.
To determine the worst case of any operation, we use something called as Big-O notation/ Big-O complexity.
Big-O notation/ Big-O complexity:
Big O will always look at the worst-case scenario. Our linear search worst case scenario is that the element we are looking for does not exist in the array or it might be the last element of the array. In both cases, we have to check everything till the very last for a concrete output.
This linear search would be considered a big-O of n, or order n complexity. Also called linear time algorithm. Meaning if we’ve 10 items(n =10), we’ve to search or loop through all ten items.
This is not an ideal way to go about. Extremely slow. Something we can’t afford in our real-life setting. We need an alternative or something which will lower our burden.
Read a detailed explanation about Big – O here: The Big O notation
Sorted Data To The Rescue:
What if our data is ordered? Remember we spoke about sorting in an earlier section. If our numbers are in ascending or descending order, there are much better options for searching than a linear search. We can use binary search.(divide the array, look in which part the no we’re searching lies)
But, still, another problem crops up.
Our searching of element got easier but our sorting is computationally intensive. If the data is in tens of thousands in number, it could be extremely demanding. Just to improve our searching performance, we can’t add an additional overhead of sorting to our operation. It will lead to degradation of our performance. Something we just can’t allow to happen.
So, what do we do now?
Here’s a thing with data structures. You just cannot have any data structure that is a superhero in all the operations it performs. Or is consistently good in all the real-world situations.
Array, a beast in itself, can perform all the operations one expects from a data structure.
Just not efficiently.
This lead to the creation of multiple data structures. If you read the introductory article of the series, this is something I had mentioned earlier. It is simply impossible to rank or label data structures. Every data structure, like our superheroes, has their own strength and weakness. Something, we aim to cover in great depth by answering all Wh questions regarding them by looking at different real-world examples.