Removing duplicates in an unsorted Linked List

I started reviewing Cracking the Coding Interview by Gayle Laakmann, since my last coding interview doesn’t come as good as I expected. The book is very well written and the problems contained there come from the very basic to some advanced stuff, and it represent a good choice if you want to refresh your rusty algorithms knowledge from college.

Another rusty skill I’m in urgent need to update is my blogging, so I thought reading and reviewing that book would be an excellent opportunity to also retake my long forgotten English blog. What I will  try to do is post here my very first attempts of  resolution for some problems contained in the book, share my code with the community and -hopefully- get some feedback from the readers. These are not the responses contained in the book (I believe in some cases they might coincide) but my personal (and maybe naive) approach for solutions in the Java Programming Language.

I also hope to don’t get in some copyright trouble posting the problems contained in the book. If that’s the case Gayle and you’re reading this, please let me know xD.

Well, saying that, what I’m trying to do in this post is to remove duplicates from an unsorted Linked List. The first step is to define the classes that will represent that data structure, so I used two simple classes to represent the Head of the List and another one for every Node in it, using generics to make them flexible. This might seem overkill -specially the Head class- but I’m more comfortable with that approach,  having a clear separation between the List and the elements contained within it.

So, here’s the Node:

package coding.problems.linkedlists;

/**
 * A node in a Linked List.
 *
 * @author Carlos G. Gavidia (cgavidia@acm.org)
 *
 * @param <E>
 *            Data type of the node's data.
 */
public class LinkedListNode<E> {

	private LinkedListNode<E> nextNode;
	private E data;

	public LinkedListNode(E data) {
		this.data = data;
	}

	public LinkedListNode<E> getNextNode() {
		return nextNode;
	}

	public void setNextNode(LinkedListNode<E> nextNode) {
		this.nextNode = nextNode;
	}

	public E getData() {
		return data;
	}
}

And here’s the List. The getElementCount method will prove useful in testing:

package coding.problems.linkedlists;

/**
 * A Linked-List representation.
 *
 * @author Carlos G. Gavidia (cgavidia@acm.org)
 *
 * @param <E>
 *            Data type of the node's data.
 */
public class LinkedList<E> {

	private LinkedListNode<E> head;

	public LinkedListNode<E> getHead() {
		return head;
	}

	public void setHead(LinkedListNode<E> head) {
		this.head = head;
	}

	/**
	 * Gets the number of nodes that have a particular data value.
	 *
	 * @param elementToCount
	 *            Data value to count.
	 * @return
	 */
	public int getElementCount(E elementToCount) {
		int elementCount = 0;
		LinkedListNode<E> currentNode = this.getHead();
		while (currentNode != null) {
			if (currentNode.getData().equals(elementToCount)) {
				elementCount++;
			}
			currentNode = currentNode.getNextNode();
		}
		return elementCount;
	}

Then there comes the fun part: How do I remove duplicates? My plan was quite simple:

  • Per every node in the list, I will traverse the rest of the list -excluding the current node- to search if there are other nodes that have the same value. To traverse the Linked List, I use a simple loop in the removeDuplicates method.
  • Then I need a method that will do the dirty work, that is, search and remove every node in the list that has the same data value as the current node and remove it from the list. All that is done in the removeFrom method.
  • In removeFrom everything is pretty straightforward: What I do is break the “next” references of the nodes I need to remove so they will be garbage-collected later. There’s a special edge case when I’m in the need to remove the last item of the list, so I will keep a reference of the previous reference of the current node that will allow me to remove the last node if I’m in the need of it.

The advantages of that approach is that I don’t require additional memory to perform the operation, and that the search and removal will require less computing power through iterations since the list will become shorter after the removals were made. Saying that, the solutions is 50 lines long, including comments:

package coding.problems.linkedlists;

/**
 * Class for remove duplicate elements from an unsorted linked list. Problem
 * taken from Cracking the Coding Interview from Gayle Laakmann.
 *
 * @author Carlos G. Gavidia (cgavidia@acm.org)
 *
 * @param <E>
 *            Data type of the Linked List Node's data.
 */
public class DuplicatesRemover<E> {

	public void removeDuplicates(LinkedList<E> linkedList) {
		LinkedListNode<E> currentNode = linkedList.getHead();
		while (currentNode != null) {
			removeFrom(currentNode.getData(), currentNode);
			currentNode = currentNode.getNextNode();
		}
	}

	/**
	 * Removes elements that have a particular data value, excluding the node
	 * that starts the traversal.
	 *
	 * @param elementToRemove
	 *            Element to remove.
	 * @param nodeToStart
	 *            Node that starts the list traversal.
	 */
	private void removeFrom(E elementToRemove, LinkedListNode<E> nodeToStart) {
		LinkedListNode<E> currentNode = nodeToStart;
		LinkedListNode<E> nextNode = null;
		LinkedListNode<E> previousNode = null;

		while (currentNode != null) {
			nextNode = currentNode.getNextNode();
			if (nextNode != null && elementToRemove.equals(nextNode.getData())) {
				currentNode.setNextNode(nextNode.getNextNode());
			} else if (nextNode == null
					&& elementToRemove.equals(currentNode.getData())) {
				previousNode.setNextNode(null);
			}
			previousNode = currentNode;
			currentNode = currentNode.getNextNode();
		}
	}
}

And here’s a Unit Test to verify that everything is working as expected.

package coding.problems.linkedlists;

import static org.junit.Assert.*;

import org.junit.Before;
import org.junit.Test;

public class DuplicatesRemoverTest {

	private LinkedList<Integer> listWithDuplicates;
	private DuplicatesRemover<Integer> duplicatesRemover;

	@Before
	public void setUp() throws Exception {
		listWithDuplicates = new LinkedList<Integer>();
		duplicatesRemover = new DuplicatesRemover<Integer>();

		LinkedListNode<Integer> firstNode = new LinkedListNode<Integer>(1);
		LinkedListNode<Integer> secondNode = new LinkedListNode<Integer>(2);
		LinkedListNode<Integer> thirdNode = new LinkedListNode<Integer>(3);
		LinkedListNode<Integer> fourthNode = new LinkedListNode<Integer>(2);
		LinkedListNode<Integer> fifthNode = new LinkedListNode<Integer>(4);
		LinkedListNode<Integer> sixthNode = new LinkedListNode<Integer>(4);
		LinkedListNode<Integer> seventhNode = new LinkedListNode<Integer>(4);

		firstNode.setNextNode(secondNode);
		secondNode.setNextNode(thirdNode);
		thirdNode.setNextNode(fourthNode);
		fourthNode.setNextNode(fifthNode);
		fifthNode.setNextNode(sixthNode);
		sixthNode.setNextNode(seventhNode);

		listWithDuplicates.setHead(firstNode);

	}

	@Test
	public void testRemoveDuplicates() {
		duplicatesRemover.removeDuplicates(listWithDuplicates);
		assertEquals(1, listWithDuplicates.getElementCount(1));
		assertEquals(1, listWithDuplicates.getElementCount(2));
		assertEquals(1, listWithDuplicates.getElementCount(3));
		assertEquals(1, listWithDuplicates.getElementCount(4));
	}
}

The books solution makes a smart use of a Hash Table to keep track of the existing items, with that the solution is way simpler (12 lines length) but additional memory is required. In the book there’s also a solution without a buffer that makes use of two “pointers” to detect duplicates while traversing the list. Quite smart also.

I will be keeping my solutions and the corresponding tests in a Github repository. All criticism is welcome.