Removing duplicates in an unsorted Linked List

I started reviewing Cracking the Coding Interview by Gayle Laakmann, since my last coding interview doesn’t come as good as I expected. The book is very well written and the problems contained there come from the very basic to some advanced stuff, and it represent a good choice if you want to refresh your rusty algorithms knowledge from college.

Another rusty skill I’m in urgent need to update is my blogging, so I thought reading and reviewing that book would be an excellent opportunity to also retake my long forgotten English blog. What I will  try to do is post here my very first attempts of  resolution for some problems contained in the book, share my code with the community and -hopefully- get some feedback from the readers. These are not the responses contained in the book (I believe in some cases they might coincide) but my personal (and maybe naive) approach for solutions in the Java Programming Language.

I also hope to don’t get in some copyright trouble posting the problems contained in the book. If that’s the case Gayle and you’re reading this, please let me know xD.

Well, saying that, what I’m trying to do in this post is to remove duplicates from an unsorted Linked List. The first step is to define the classes that will represent that data structure, so I used two simple classes to represent the Head of the List and another one for every Node in it, using generics to make them flexible. This might seem overkill -specially the Head class- but I’m more comfortable with that approach,  having a clear separation between the List and the elements contained within it.

So, here’s the Node:

package coding.problems.linkedlists;

/**
 * A node in a Linked List.
 *
 * @author Carlos G. Gavidia (cgavidia@acm.org)
 *
 * @param <E>
 *            Data type of the node's data.
 */
public class LinkedListNode<E> {

	private LinkedListNode<E> nextNode;
	private E data;

	public LinkedListNode(E data) {
		this.data = data;
	}

	public LinkedListNode<E> getNextNode() {
		return nextNode;
	}

	public void setNextNode(LinkedListNode<E> nextNode) {
		this.nextNode = nextNode;
	}

	public E getData() {
		return data;
	}
}

And here’s the List. The getElementCount method will prove useful in testing:

package coding.problems.linkedlists;

/**
 * A Linked-List representation.
 *
 * @author Carlos G. Gavidia (cgavidia@acm.org)
 *
 * @param <E>
 *            Data type of the node's data.
 */
public class LinkedList<E> {

	private LinkedListNode<E> head;

	public LinkedListNode<E> getHead() {
		return head;
	}

	public void setHead(LinkedListNode<E> head) {
		this.head = head;
	}

	/**
	 * Gets the number of nodes that have a particular data value.
	 *
	 * @param elementToCount
	 *            Data value to count.
	 * @return
	 */
	public int getElementCount(E elementToCount) {
		int elementCount = 0;
		LinkedListNode<E> currentNode = this.getHead();
		while (currentNode != null) {
			if (currentNode.getData().equals(elementToCount)) {
				elementCount++;
			}
			currentNode = currentNode.getNextNode();
		}
		return elementCount;
	}

Then there comes the fun part: How do I remove duplicates? My plan was quite simple:

  • Per every node in the list, I will traverse the rest of the list -excluding the current node- to search if there are other nodes that have the same value. To traverse the Linked List, I use a simple loop in the removeDuplicates method.
  • Then I need a method that will do the dirty work, that is, search and remove every node in the list that has the same data value as the current node and remove it from the list. All that is done in the removeFrom method.
  • In removeFrom everything is pretty straightforward: What I do is break the “next” references of the nodes I need to remove so they will be garbage-collected later. There’s a special edge case when I’m in the need to remove the last item of the list, so I will keep a reference of the previous reference of the current node that will allow me to remove the last node if I’m in the need of it.

The advantages of that approach is that I don’t require additional memory to perform the operation, and that the search and removal will require less computing power through iterations since the list will become shorter after the removals were made. Saying that, the solutions is 50 lines long, including comments:

package coding.problems.linkedlists;

/**
 * Class for remove duplicate elements from an unsorted linked list. Problem
 * taken from Cracking the Coding Interview from Gayle Laakmann.
 *
 * @author Carlos G. Gavidia (cgavidia@acm.org)
 *
 * @param <E>
 *            Data type of the Linked List Node's data.
 */
public class DuplicatesRemover<E> {

	public void removeDuplicates(LinkedList<E> linkedList) {
		LinkedListNode<E> currentNode = linkedList.getHead();
		while (currentNode != null) {
			removeFrom(currentNode.getData(), currentNode);
			currentNode = currentNode.getNextNode();
		}
	}

	/**
	 * Removes elements that have a particular data value, excluding the node
	 * that starts the traversal.
	 *
	 * @param elementToRemove
	 *            Element to remove.
	 * @param nodeToStart
	 *            Node that starts the list traversal.
	 */
	private void removeFrom(E elementToRemove, LinkedListNode<E> nodeToStart) {
		LinkedListNode<E> currentNode = nodeToStart;
		LinkedListNode<E> nextNode = null;
		LinkedListNode<E> previousNode = null;

		while (currentNode != null) {
			nextNode = currentNode.getNextNode();
			if (nextNode != null && elementToRemove.equals(nextNode.getData())) {
				currentNode.setNextNode(nextNode.getNextNode());
			} else if (nextNode == null
					&& elementToRemove.equals(currentNode.getData())) {
				previousNode.setNextNode(null);
			}
			previousNode = currentNode;
			currentNode = currentNode.getNextNode();
		}
	}
}

And here’s a Unit Test to verify that everything is working as expected.

package coding.problems.linkedlists;

import static org.junit.Assert.*;

import org.junit.Before;
import org.junit.Test;

public class DuplicatesRemoverTest {

	private LinkedList<Integer> listWithDuplicates;
	private DuplicatesRemover<Integer> duplicatesRemover;

	@Before
	public void setUp() throws Exception {
		listWithDuplicates = new LinkedList<Integer>();
		duplicatesRemover = new DuplicatesRemover<Integer>();

		LinkedListNode<Integer> firstNode = new LinkedListNode<Integer>(1);
		LinkedListNode<Integer> secondNode = new LinkedListNode<Integer>(2);
		LinkedListNode<Integer> thirdNode = new LinkedListNode<Integer>(3);
		LinkedListNode<Integer> fourthNode = new LinkedListNode<Integer>(2);
		LinkedListNode<Integer> fifthNode = new LinkedListNode<Integer>(4);
		LinkedListNode<Integer> sixthNode = new LinkedListNode<Integer>(4);
		LinkedListNode<Integer> seventhNode = new LinkedListNode<Integer>(4);

		firstNode.setNextNode(secondNode);
		secondNode.setNextNode(thirdNode);
		thirdNode.setNextNode(fourthNode);
		fourthNode.setNextNode(fifthNode);
		fifthNode.setNextNode(sixthNode);
		sixthNode.setNextNode(seventhNode);

		listWithDuplicates.setHead(firstNode);

	}

	@Test
	public void testRemoveDuplicates() {
		duplicatesRemover.removeDuplicates(listWithDuplicates);
		assertEquals(1, listWithDuplicates.getElementCount(1));
		assertEquals(1, listWithDuplicates.getElementCount(2));
		assertEquals(1, listWithDuplicates.getElementCount(3));
		assertEquals(1, listWithDuplicates.getElementCount(4));
	}
}

The books solution makes a smart use of a Hash Table to keep track of the existing items, with that the solution is way simpler (12 lines length) but additional memory is required. In the book there’s also a solution without a buffer that makes use of two “pointers” to detect duplicates while traversing the list. Quite smart also.

I will be keeping my solutions and the corresponding tests in a Github repository. All criticism is welcome.


On Programming

Programming, it turns out, is hard. The fundamental rules are typically simple and clear -but programs, while built on top of these basic rules, tend to become complex enough to introduce their own rules and complexity. Because of this, programming is rarely simple or predictable. As Donald Knuth, who is something of a founding father of the field, says, it is an art rather than a science.

Marijn Haverbeje, in Eloquent JavaScript.


A Recipe for building RESTful Services

For a small project in my Master program I was in charge of building some RestFUL services for a fictional –an minimal- College Library. The clients we’re supposed to be a Mobile .NET Application and a Web Application built in PHP, being the only requirement that the data payload should be XML documents. Building that small piece of software teach me some things about building RESTful Services , and I like to share some of them in this post.

You need to select an environment

So I needed a hosting that has Java Support (because I’m a Java guy) and that also has no cost (because I’m poor), so I selected Google App Engine because it fulfills all this requirements. Then, the natural IDE Selection was Eclipse with the Google Plugin because it included some features that ease the deployment and development of software for that environment.

You need to select a Persistence Framework

Using Google App Engine, a relational database wouldn’t be available for free. so you so with that you can discard every  popular ORM Framework in the Java World (like Hibernate and friends). Google App Engine works with a Non-SQL repository called App Engine DataStore, and the SDK offers you the possibility to access it through JPA or JDO. Personally, I found that accessing the DataStore through JPA added artificial complexity to the code, complexity proper of a RDBMS solution that’s not necessary in a Schemaless product like the DataStore. So my choose was using Objectify, a simple-to-use framework based on the Low-Level API to access the DataStore. With the DAO implementation suggested by David Chandler, my data access layer was composed by only this class:

package pe.edu.pucp.dao;

import java.util.ArrayList;
import java.util.List;
import java.util.logging.Logger;

import pe.edu.pucp.model.Book;
import pe.edu.pucp.model.BookReservation;
import pe.edu.pucp.model.Student;

import com.google.appengine.api.datastore.QueryResultIterable;
import com.googlecode.objectify.Key;
import com.googlecode.objectify.NotFoundException;
import com.googlecode.objectify.ObjectifyService;
import com.googlecode.objectify.Query;
import com.googlecode.objectify.util.DAOBase;

/**
 * @author cgavidia
 *
 */
public class LibraryServiceDAO extends DAOBase {

	public static final Logger LOG = Logger.getLogger(LibraryServiceDAO.class
			.getName());

	static {
		ObjectifyService.register(Book.class);
		ObjectifyService.register(Student.class);
		ObjectifyService.register(BookReservation.class);
	}

	protected Class clazz;

	public LibraryServiceDAO(Class clazz) {
		this.clazz = clazz;
	}

	public Key add(T book) {
		Key key = ofy().put(book);
		return key;
	}

	public void delete(Key key) {
		ofy().delete(key);
	}

	public T get(Long id) throws NotFoundException {
		return ofy().get(clazz, id);
	}

	public T get(Key key) {
		return ofy().get(key);
	}

	public List listByProperty(String propertyName, String propertyValue) {
		Query query = ofy().query(clazz);
		if (propertyName != null && propertyValue != null) {
			query.filter(propertyName, propertyValue);

		}
		return asList(query.fetch());
	}

	private List asList(QueryResultIterable fetch) {
		ArrayList list = new ArrayList();
		for (T t : fetch) {
			list.add(t);
		}
		return list;

	}
}

You need to select a RESTful framework

Handling HTTP requests using Servlets was never in my plans  because of deadlines and because I’m lazy. I have to make sure that the framework I choose has support Google App Engine because it is not an standards-compliant Servlet container and has several limitations and restrictions.  A Google search led me to  the Restlet Framework, and easy and simple REST framework fully compliant with the REST architectural style that also has support to Google App Engine. Using Restlet abstractions, all the resources of my application are exposed trough this class:

package pe.edu.pucp.library;

import org.restlet.Application;
import org.restlet.Restlet;
import org.restlet.routing.Router;

import pe.edu.pucp.resource.BookReservationResource;
import pe.edu.pucp.resource.BookReservationSerializer;
import pe.edu.pucp.resource.BookReservationsResource;
import pe.edu.pucp.resource.BookResource;
import pe.edu.pucp.resource.BookSerializer;
import pe.edu.pucp.resource.BooksResource;
import pe.edu.pucp.resource.StudentResource;
import pe.edu.pucp.resource.StudentSerializer;
import pe.edu.pucp.resource.StudentsResource;

/**
 * @author cgavidia
 *
 */
public class LibraryApplication extends Application {

	@Override
	public Restlet createInboundRoot() {
		Router router = new Router(getContext());
		router.attach("/books", BooksResource.class);
		router.attach("/books/{" + BookSerializer.CODE_ELEMENT + "}",
				BookResource.class);
		router.attach("/students", StudentsResource.class);
		router.attach("/students/{" + StudentSerializer.CODE_ELEMENT + "}",
				StudentResource.class);
		router.attach("/reservations", BookReservationsResource.class);
		router.attach("/reservations/{"
				+ BookReservationSerializer.CODE_ELEMENT + "}",
				BookReservationResource.class);
		return router;
	}

}

The RESTful Service code is now available in my Github account, so if you need some extra reference you can find it there.


The Java Sorting Algorithm Playground

Algorithms and I have a love-hate history. I met them in college in a course in fifth semester, and I was pretty excited about it because it was the first course of the Software Development track of my major. Systems Engineering (that’s what Computing Professionals study in Pery) in my university has a very peculiar curricula, and to start coding you have to first pass several hard and heavy math and science courses. So after two years of suffering with engineering fundamentals I finally get to what I really expect from College education: to learn how to code.

But bad luck is always after me, and I got the worst teacher from the entire faculty. It was and old school guy that teaches algorithms with pseudo-code and diagrams and never –I mean NEVER- showed us a running program or a software demonstration. And also the syllabus of the course was very light and superficial (I don’t remember even a mention of Binary Search Trees). The semester ended, I got a nice grade and I forgot about algorithms for a very long time.

A few years ago my interest for algorithms was renewed when I discovered that all Computer Science dream Jobs –like Google or Facebook- have an algorithm-based interview in their hiring process, so I decided to get ready when the moment comes and start to learn by myself all the topics that I was supposed to learn at college. I did a light forum-research for recommendations of good algorithm books and started my training with Skienna and his Algorithm Design Manual. It’s a very theoretical book and while reading I needed a lot of Wikipedia searching to get some notions. Reading was hard and slow –but very instructive- and the coding part was written all in C, a Programming Language that I’m not very familiar with (I know, shame on me!). So learning algorithm concepts was hard and getting C syntax made it harder, so I decided to get a Java-Based Algorithm Book and speed up my learning process.

The selected book was Data Structures and Algorithms in Java by Robert Lafore and was more beginner-friendly than the academically-oriented Skienna book. Reading was lots of fun, until I found code like this:

// insertSort.java
// demonstrates insertion sort
// to run this program: C>java InsertSortApp
//--------------------------------------------------------------
class ArrayIns
	{
	private long[] a; // ref to array a
	private int nElems; // number of data items
	//--------------------------------------------------------------
	public ArrayIns(int max) // constructor
		{
		a = new long[max]; // create the array
		nElems = 0; // no items yet
		}
	//--------------------------------------------------------------
	public void insert(long value) // put element into array
		{
		a[nElems] = value; // insert it
		nElems++; // increment size
		}
	//--------------------------------------------------------------
	public void display() // displays array contents
		{
		for(int j=0; j<nElems; j++) // for each element,
		System.out.print(a[j] + “ “); // display it
		System.out.println(“”);
		}
	//--------------------------------------------------------------
	public void insertionSort()
		{
		int in, out;
		for(out=1; out<nElems; out++) // out is dividing line
			{
			long temp = a[out]; // remove marked item
			in = out; // start shifts at out
			while(in>0 && a[in-1] >= temp) // until one is smaller,
				{
				a[in] = a[in-1]; // shift item to right
				--in; // go left one position
				}
			a[in] = temp; // insert marked item
			} // end for
		} // end insertionSort()
	//--------------------------------------------------------------
} // end class ArrayIns

That code violated almost all the coding conventions that Sun established and that almost every Java Programmer in the world know. Just to mention some of them:

  • The method-separator comment lines are a really weird convention .
  • There are several patterns for bracket usage (or not using them when they should) and indentation.
  • All the variables/attributes had non descriptive names like “a”, “temp”, “in” and “out”.

Maybe I’m just been too picky, but I think those issues -at least for me – really make the code less readable and more hard to understand. And it’s not a problem only in Lafore book, many other books that I consulted have the same problem: Coding style that doesn’t follow industry-accepted practices. So when I got the assignment in my Algorithm Course to write a Sorting Algorithm Demonstration program I decided the improve the way Java algorithms are presented and rewrite them in a more Convention-Based way, that’s also a more less intimidating way of presenting algorithms to the students (like myself). And that project is now finished and available to everyone interested in my Github account.

For example, let’s see now Insertion Sort:

package pe.edu.pucp.algorithms.sorting.algs.impl;

import pe.edu.pucp.algorithms.sorting.algs.BaseSorter;

/**
 * Implementation of Insertion Sort algorithm. Based on the implementation
 * described in Robert Sedgewick's Algorithm book.
 * 
 * @author Carlos Gavidia (cgavidia@acm.org)
 * 
 * @param <T>
 *            Type of the array to be sorted
 */
public class InsertionSorter<T extends Comparable<T>> extends BaseSorter<T> {

    public InsertionSorter(Class<T> clazz, T[] data) {
        super(clazz, data);
    }

    /*
     * (non-Javadoc)
     * 
     * @see pe.edu.pucp.algorithms.sorting.algs.BaseSorter#sortData()
     */
    @Override
    public void sortData() {
        for (int currentIndex = 0; currentIndex < getLength(); currentIndex++) {
            T currentItem = getDataAtIndex(currentIndex);
            int auxIndex = currentIndex;
            while (auxIndex > 0
                    && getDataAtIndex(auxIndex - 1).compareTo(currentItem) >= 0) {
                setDataAtIndex(auxIndex, getDataAtIndex(auxIndex - 1));
                --auxIndex;
            }
            setDataAtIndex(auxIndex, currentItem);
        }
    }

}

I got some ideas from the Sedgewick book to build an Algorithm Framework that allow the sorting of any Type –that implement Comparable- and to extract the common logic to a Base Class to get the Algorithm Class the cleaner possible, with the only responsibility of implementing the algorithm logic. Also, the access to the array to sort is encapsulated by some methods to makes easier the Observer Pattern implementation required to implement the Sorting process animation. And that animation looks like this (captions in Spanish):

Algorithm Demo

Top-Down merge while executing

As professor requests, I included in the program implementations of Bubble Sort, Merge Sort, Insertion Sort, Quick Sort and some bonus algorithms like Several Unique Sort and Odd-Even Transposition (as you can see in the SortingAlgorithm enumeration). Building the software required a lot of time but I entertained myself doing it, and I really hope that the code can help some other Algorithm Expert wannabe to learn sorting algorithms a little faster (and easier).


A Programmer? So you’re like a bricklayer

I’m not sure if my parents know exactly what I do for a living. For several years they thought I do something like technical support, so they call me every time that they need to replace the printer cartridge or get scared with some Windows blue-screen error message. I did the work gladly, but the idea that they don’t understand the skills that I get with the college education that they pay makes me uncomfortable.  So one day, I told them both:

I can fix your computer any time you want but that’s not what I do for a living. I’m a Web Programmer, so I use programming languages to build applications that work on the Internet.

My Dad -he’s a writer- wanted to know if he can write poems in that programming language that I know so well and my mom thought the Internet was some sort of gadget included in the PC. I tried to explain them each notion carefully but I get interrupted by my brother and his cool hospital histories. Doctors always get all the attention.

I’m pretty sure that my girlfriend doesn’t get my job either. She has an administration degree, and when I started talking about Java, IDE’s and Servers she stopped me right away and said:

In conclussion, you can build stuff like Facebook. Right?

This is a pretty close approximation of what a Web Programmer does, so I said “Yes” and she get that she wanted. At least, that’s what I thought.

It results that my girlfriend has a friend from high school that’s now in college trying to get a degree in Information Systems. When my girlfriend asks her what exactly a Web Programmer does, this friend elaborated this metaphor:

Imagine that Software is like a house, and you want to build a house.

First, you need an Architect that translates your needs into a design. In Software, the Architect is an Analyst. When I finished college, I want to be one of those.

With the Architect design, you need to hire a Civil Engineer that makes a Blueprint based on the Architect conception. In Software, that guy is a Software Architect.

Finally, there’s a need also of some Bricklayers that gets the building done. Those low-skill craftsmen in Software are called Programmers. That’s what your beloved Carlos does.

My girlfriend told me that and that day she offered to pay the dinner. And the movie tickets.

I don’t know why IT community looks downwards on programmers, like we were Second-Class citizens in the software world. Maybe it’s only a Peruvian thing, but almost every programmer that I know wants to quit programming in some years in advance. In fact, I know almost no programmer whose age is more than 35, because many of them have turned into Analysts or Project Managers.

Myself, I’m very proud of having the skill to turn ideas into working software. I enjoy writing elegant code and reading high-quality code from fellow programmers, and I feel I have a lot to learn to become really skillful in coding. I don’t want to become an Analyst (whose talent is to put requirements in a Word file) or a Project Manager (who is hired to play with MS Project). At least not yet.

Maybe I’m just being romantic. Or stupid.

PS: This is my first post in English, so any spell corrections will be welcome.


Follow

Get every new post delivered to your Inbox.

Join 182 other followers