Your smartphone will make you late

I have an appointment at 9:45. Upon entering the date into my phone’s calendar, it asks if I would like to take into account travel time, which it estimates to be 15 minutes. It then asks if I would like to receive a notification when it’s time to leave at 9:30.

My phone is not aware, however, that my appointment is in a large office building with a parking structure. I’m arriving during business hours. The structure will be full. It will take 5-10 minutes to find a parking spot, another five minutes to walk to the office, and I really ought to arrive at least five minutes early, anyway.  So if I leave home on time, I’ll be late.

I wish my smartphone was smart enough to know those things.

Advertisements

The Purpose of Classes

This was an essay I wrote to my students near the beginning of my CS2 course to help them get their heads around designing classes and writing methods.

This is the second semester of Java programming. The first semester was about introducing you to the “tools” of programming: Java keywords, variables, loops, arrays, methods, and so forth. The second semester advances your knowledge by taking the tools you learned about and applying them in new ways so that your programming skills can be further refined. It may come as a surprise to you, then, than what you’re encountering now in some fundamental ways from what you learned before.

In a typical first-semester programming course that uses Java, methods are introduced as a way to do a computation. For example, “convert Fahrenheit to Celsius” or “compute the interest on a loan.” You might have done all the input, computations, and output in a single method — most likely the main() one. In such a setup, the class exists simply as a wrapper for the method and is otherwise superfluous. It provides a home for the computation.

At some later point you were taught about creating other methods, but perhaps these were introduced as helpers for the main method. If the main method got too long, for example, you were encouraged to write these helpers to break down the computation into smaller pieces. Or you were told not to write a piece of code twice; instead, put it in a method and call it. That’s good advice, but it doesn’t paint the whole picture.

It’s time to turn some of these concepts inside-out and learn what classes are really for.

A Class Encapsulates Data

A class exists to encapsulate (hold) data. These data are held in the instance variables, or fields, of the class. The class defines what the data is through the names and types of the instance variables.

An object, as you may know, is an instantiation of the class. Where the class just defines what the variables are, the object holds the actual values. The values of these variables at any given moment in time represents the state of the object at that moment.

I just wrote a lot of deep stuff there. Go back and re-read it. Let in sink it. It may conflict with what you think classes and objects are for; that’s my goal.

Note that I haven’t yet talked about methods. They’re not important now. We’ll get to those in a moment. For now, we are only discussing data.

State Changes Over Time

The state of an object (that is, the data it encapsulates) changes over the life of the object. The object is created and moves from state to state by having the values of its fields mutated.

Consider a class that represent a bank account:

public class Account
{
    private int acctNum;
    private String owner;
    private double balance;
}

An object is instantiated with some initial values:

    Account a = new Account(100, “J Smith”, 50.00);

The state of the object at this moment is the value of the three fields:

    (100, “J Smith”, 50.00)

At some point in the future, the balance on the account changes:

    (100, “J Smith”, 70.00)

And then changes again:

    (100, “J Smith”, 40.00)

Perhaps the owner goes through a life-altering event and needs to change their name. This, too, is reflected as a new state:

    (100, “J Lee”, 40.00)

And then maybe that’s the end of it. The object is destroyed and the data erased. This object had four distinct states over its life. At any moment we could inspect the object and say “this is what the bank account looks like right now.”

When we define a class, we must first analyze what data we need to store. That is because classes exist to store data. We should choose only the data that is minimally necessary to represent the state that we need. For our purposes, it is sufficient to store three pieces of information about a bank account. In a different situation, we might need more, or less, or a different set of fields.

Before you write any methods, before you even think about computations, you must figure out what the data is supposed to be. That will drive the whole programming process.

Methods Mutate State

Now we’ll talk about methods.

Methods exist to manipulate data — to change it or to move it around. Most of the methods you write modify the state of an object; that is, they alter the values stored in the instance variables.

Methods should be defined in terms of behaviorswhat they do, not how they do it. Look at the state changes from the data analysis phase and figure out what methods are needed to make those state changes happen. It helps to use active verbs that are divorced from the actual underlying implementation. Remember, the user can’t see the fields because they are private. Ideally, the method names should make no reference to the field names, although this is not always possible in practice.

From the state changes shown above, three possible methods jump out immediately:

    deposit
withdraw
changeName

These are the three actions we can perform on the account to move it from state to state: we can deposit some money, we can withdraw some money, and we can change the owner’s name. At minimum, we need these three methods to be able to implement the state changes in our use case.

Once we’ve identified the methods we need, only then would we start writing code. It is a mistake, which many beginners make, to start writing code and then figure out what it’s supposed to do. You are no longer beginners. Figure out what it’s supposed to do, then write code.

Data-Driven Design

This process is called “data-driven design.” Start by figuring out what data you want to store. Then design the methods around them.

Draw pictures. You’ll see me draw lots of pictures in class. They are part of my mental design process. They should be part of yours, too. I see the data moving around. I see where it comes from and where it goes. I see it mutating. Then I write code to make it happen.

Write psuedo-code. Pseudo-code is half-English (or the language of your choice), half-code. It’s a quick jotting down of the steps that the code will need to perform in order to carry out the implementation. I usually write psuedo-code on paper or in comments. It’s my to-do list to make sure I didn’t forget any steps.

Thoughts on the BetterArray

(Note: the first assignment in this course was to implement our own version of Java’s ArrayList, which we called BetterArray.)

During lecture we went through the data-driven design process with the BetterArray. We started with the data (an array of strings and a counter) and then wrote down the methods we needed.

We drew pictures. We wrote pseudo-code. We wrote comments and to-do lists.

Only then did we write actual Java code. And we tested it thoroughly to make sure it conformed to our desired behavior.

This project is nothing more than array manipulation, for which you often need loops. Working with arrays usually goes hand-in-hand with loops. It will help tremendously if you draw “before” and “after” pictures of what the array should look like. Draw arrows showing how the strings move around. Develop a “movie” in your mind so you can see the array transforming itself from one state to the next.

You will probably need to work several examples for each method so you can get a clear grasp on the starting and ending points of each loop. For example, since you can insert at the beginning, middle, and end of the array, make sure your loop works for all three cases. Draw a picture of each of the three cases showing the before and after state.

I won’t claim that this stuff is easy. If programming was easy, everyone would be doing it. It’s hard, but it is possible. Learn to resist jumping in to start coding and take some time to analyze. You will be rewarded.

Another un-productive weekend

The Problem

Convert a 12-variable Boolean expression in disjunctive normal form (DNF) to conjunctive normal form (CNF) so that generating the truth table is NP-hard.  Resulting CNF expression will be used in a student assignment.

Convert DNF to CNF is itself an NP-hard problem. If it were easy, SAT would not be NP. But I need a way to computationally convert DNF to CNF.

Possible solution: Wolfram|Alpha

Wolfram|Alpha is certainly able to convert DNF to CNF. Initial testing shows it works quite well for many variables, but it has a 200-character input limit. DNF expression is much longer than that.

Paid for annual Wolfram|Alpha subscription, hoping that would raise the character limit. Unsuccessful.

Applied for and was granted Wolfram|Alpha API access to try to get around the 200-character input limit. Unsuccessful. API error message is non-existent; it just silently refuses to run the query.

Possible solution: Mathematica

Download trial version of Mathematica and install on Mac. Success! Mathematica is able to convert DNF to CNF quickly. Twelve variables isn’t very many, anyway.

But Mathematica costs $500 to buy — a lot of money for a one-off problem I need to solve a couple times a year.

Possible solution: Karnaugh Maps

Learned that DNF can be converted to DNF using Karnaugh Maps. Just minimize the “zero” cells instead of the usual “one” cells. Invert the inputs. The result is a DNF expression. Seems easy enough.

Research K-map solvers. Most free ones are educational tools designed to show students how K-maps are solved. Because of the limited screen real estate, most can use at most four or five variables — nowhere near the twelve I need.

Possible solution: Roll my own K-Map solver

Getting desperate here. Spend a few hours thinking about how to solve K-Maps using only a truth table, as it would be silly to implement the actual 2D grid in a computer program. Came up with Quine-McCluskey algorithm on my own, so there’s that I guess.

But the Quine-McCluskey algorithm is also NP-hard. The number of prime implicants for 12 variables is as high as 3^n/3, or 177,147, so I see this is going nowhere fast.

Possible solution: Find a Q-M solver

Googled around for a Quine-McCluskey solver, ideally in Python. Found one called QM, referenced from the Wikipedia article. Ran it on my DNF expression. Waited about an hour, but still no results. Gave up.

Also found an optimized version of QM, but I’m already onto the next Possible Solution…

Possible solution: Symbolic computation package

Google around for a symbolic computation package for Python. Found one called Sympy. Install. Run. Plug in my DNF expression. Wait about 30 minutes with the CPU running at 100%. Give up.

Possible solution: Logic Friday

Learned that although DNF-to-CNF is NP-hard, some fine folks at UC Berkeley have discovered a heuristic approach that gives non-optimal but good-enough-for-me solutions. Originally for developing PLA devices at IBM, Logic Friday is a Windows application that implements the Espresso Heuristic Logic Minimizer.

Fire up a Windows VM, download, install Logic Friday. Plug in my DNF and, huzzah!, out pops a CNF within moments!

But.. It’s Windows. And non-scriptable. I’d like to be able to generate the DNF and convert to CNF in an automated fashion on a Linux box.

Working solution: Espresso on Linux

The Espresso Wikipedia page references a working copy of Espresso updated for ANSI C. (Note: this link is to the tar.gz file itself.)

Downloaded, installed on Linux.

Spent about six hours combing through the docs to get it to convert DNF to CNF (what it calls Product of Sum form). But output is still in a DNF formula.

Hack at the source code to get it to print CNF when the -pos command line argument is specified.

Finally get something to work:

$ espresso -pos -oeqntott mydnf.esp
v13.0 = (d|!j|!l|!m) & (!a|!g|m) & (d|!k|l|m) & (c|!d|!e|!j|!l) & (c|d|i|k|m) & (b|!g|!j|m) & (!a|!i) & (!c|k|l|m) & (a|b|j|!m) & (!c|!e|!k|l|!m) & (!b|d|!e|!l|!m) & (a|c|!e|l) & (!b|j|!l|m) & (e|i|j|!k) & (!d|e|!k|l) & (!c|h) & (b|!d|i|k) & (!c|d|!i) & (!a|d|l) & (!b|!c|!l) & (!b|c|e|!j) & (c|e|k|!l) & (a|g) & (!c|e|!l|!m) & (!c|j|!k) & (d|h) & (e|j|l|!m) & (b|d|!g|!k) & (!a|!d|j) & (c|!d|!i) & (!a|!c) & (!b|c|!h|l) & (!d|g) & (!d|!e|k) & (d|!j|k) & (!i|!j) & (!e|g) & (h|!k) & (!f);

Total time spent on this problem: 12+ hours spread across several days. But in the end I got something that worked.

New Problem

Didn’t get any grading done.

Dabbling in Factor

Every once in a while I drag out some stack-based programming language, such as Forth. For some reason, there’s a soft spot in my heart for stack languages. The latest one I’ve been experimenting with is Factor.

My first words in Factor:

: square ( x -- x-squared ) dup * ;
: circle-area (r -- area ) square pi * ;
: hypotenuse ( a b -- c ) square swap square + sqrt ;
: distance ( x1 x2 y1 y2 -- dist ) rot - rot rot - hypotenuse ;
: gcd ( a b -- c ) dup 0 = [ drop ] [ dup rot swap mod gcd ] if ;

Error messages

My course load is the same as it usually is every spring:

  • Intermediate programming (CSCI 13)
  • Discrete structures for CS (CSCI 26)
  • System programming in C (CSCI 46)
  • Introduction to SQL (CSCI 52)

In my students I see the usual patterns emerging: some students pay zero attention to compiler error messages. They do notice the existence of the message, but don’t actually read it. Rather, they just go back to the source code and stare at it, trying to figure out where the error could be. If I happen to be looking over their shoulders when the message pops up, I read it to them, point out the line number, and then show how the message correlates to their code. But, alas, for some students it doesn’t appear to sink in.

This phenomenon only happens with the traditional programming languages: Java, C, and the like. It does not happen in my SQL class. There, the students without fail diligently read the error message and make corrections to their code.

Why is this?

It’s the interactive nature of the SQL user interface. Type in a query, press Enter, and the computer compiles and executes it right on the spot. If there is an error, the message appears right below the input and, often, there is an arrow pointing at the exact character where the compiler noticed the error.

db=> select name, balance from account were balance < 200;
ERROR:  syntax error at or near "balance"
LINE 1: select name, balance from account were balance < 200;
                                               ^

See the arrow pointing at “balance?”

Contrast this with a traditional compiler, whose error messages appear detached from the actual source code, and frequently in abundance. It’s not only overwhelming to the beginning student, it’s very difficult to connect the error message with the location in the code.

IDEs and text editors could certainly help out. How about graphical “speech balloons” pointing at the troublesome code?

Teachers make up 15% of Coursera users

Coursera Poll question
According to this Coursera poll question, approximately 15% of its users are teachers and about 9% teach at the college/university level.

I enroll in MOOCs primarily to see what the state of the art is and to pick up ideas I can use in my own teaching. We teachers borrow from each other all the time. Chances are I found that assignment you did was found in another textbook, at a conference, or on some other teacher’s website.

Linux on JavaScript: new horizons in interactive content

busybox in browser

Linux-in-Javascript emulators are very interesting to me. These are stripped-down Linux systems (usually running busybox) running atop a virtual CPU whose emulation engine is written in JavaScript.

It started with the groundbreaking JavaScript PC Emulator by Fabrice Bellard in 2011. His project emulates a 486 processor. Follow that link to see Linux boot right in your browser. It even comes with gcc to compile programs! The first time you boot up it’s pretty slow, but caching by your browser will result in better performance on subsequent boots.

Check out busybox’s own version of Bellard’s emulator with a fully tricked-out busybox executable.

Today I discovered the JOR1K Emulator which uses the OpenRISC 1000 emulator and includes network and graphics frame buffer support. All things considered, it’s very fast, even on an iPhone. Be sure to run the demo.

These emulators would be a great way to introduce novice students to the command line without needing to set up an account on a real Linux system in advance. They could be embedded directly into educational content–picture an online textbook where the student could try commands in the safety of an emulator. With a suitable suspend/resume feature, we could forego the whole “boot” process entirely and simply jump into a working shell.

As I bring more of my classroom content online (the “flipped classroom” model), it has to be more than just “read this chapter” or “watch these videos.” Interactive content is the way to keep students motivated and engaged. Projects like these really get me thinking about the possibilities.

WASD is the new HJKL

wasd_vs_arrows

I was showing some students the vi editor recently. One can, of course, use the arrow keys to move around the screen, but HJKL can be used without moving the hands from the home row. (For those who are unfamiliar with vi, H will move left, J down, K up, and L right.)H J K L keys on a keyboard

They commented that HJKL seemed like a strange choice. Why not the more intuitive WASD?

I’m not a gamer. WASD isn’t in my muscle memory. The adoption of WASD as movement keys is a much, much more recent development than vi. Once you see the ADM-3A terminal it was created on, it’s easy to understand why HJKL was used. Nevertheless, an era before first-person shooters was clearly the Dark Ages of computing.

bigpicture

The next time you’re using an application or a website that uses keyboard shortcuts, such as Boston Globe’s The Big Picture or Trello, take at look at what the navigation keys are. If they are H, J, K, and L, you can bet it was developed by an old-fogey vi user like me.

Are physical machines still the better choice?

Dell Optiplex 755This post is a bit of a ramble. My apologies in advance. Just looking for advice, I guess.

If I’m going to dive head-first into producing videos for my upcoming online Introduction to SQL course, plus supplementary “flip-classroom” videos for other courses, I need some horsepower. For screen recording and video editing, I’ve purchased an i7 iMac with Fusion drive and 8GB of RAM. That should be arriving next week and will replace the Mac mini I’ve been using the past year.

For a database, I’ll use the same PostgreSQL backend software that I use for my on-ground class. But since the videos will be publicly accessible on YouTube, I don’t want to use the same server machine. That is, I don’t want the public to see exactly what my students will see, since doing so may pose a bit of a security risk. Consequently, I’ll run the database on a separate machine.

A few options are available:

I have a small Dell Optiplex 620 “Slim Form Factor” machine sitting in a closet. It’s the machine that ran the database for my previous videos already on YouTube. It’s perfectly adequate for basic stuff and it has an Nvidia GT 610 in it for simple CUDA programming, but I’m afraid that if I start working with larger datasets, it may start to creak. The temptation to upgrade to something snappier is looming large.

I could throw together an Intel i3 machine out of parts for about $400. As a plus, I can toss into it a Nvidia GTX 680 I happen to have lying around for bigger CUDA projects.

I could purchase a used, but more current, Optiplex 755 or 780 from Craigslist for about $180, but I wouldn’t be able to put anything larger than the GT 610 into it. Still, it would run the database with ease. I like small machines that I can tuck into an out-of-the-way place.

I suppose I could run the Ubuntu server inside a VirtualBox on the soon-to-be-unused Mac mini. It would be difficult to run the machine headless, though.

But then I think: for $400, I could rent a Linode virtual machine for almost two years! And for the few times I need more oomph, I can temporarily fire up a more capable one for just a few dollars. When I want to do CUDA, just rent an Amazon EC2 GPU machine at 35 cents/hour.

A few years ago I promised myself I’d stop running server hardware and just go virtual. Why am I still thinking about buying new stuff? What is it about spending a few hundred dollars up front that seems better than paying a dozen dollars every month?

Having a local machine means zero latency and no setup necessary when I just want to record a video or two; I can shut it down when I’m done. A virtual machine means setting it up from scratch every time I need to start it up. (Some StackScripts could help with that.) A low-end local machine would satisfy 95% of my needs. A more capable machine would satisfy 100%, but at 2x the expense.

What would you do?

Where is the MOOC train headed? No one knows.

Keith Devlin, a MOOC veteran who has run his Mathematical Thinking course three times, recently attended a conference on MOOC education and research. Kudos for recognizing that online education hasn’t been the domain of tier-one universities.

If you want to see the future of MOOCs, you need to hang around with the instructors in the lesser known, small universities and community colleges who, for many years, have been experimenting with online learning. Most of the leaders of that loosely knit band could be found in a large, cavernous room in Arlington for two days last week – along with key figures from such new-MOOC, Ivy League players as Stanford, MIT, and Georgia Tech.

From The MOOC Express – Less Hype, More Hope.