« Rollover Crisis? | Main | WSJ.com - Model Reveals Social Insecurity »

December 22, 2004

A Rant by Joel on Software

Joel Spolsky thinks that computer science courses have become too abstract, and that they need to get much closer to the semiconductors:

Joel on Software - Back to Basics: At this point a good programmer would say, well, let's parse the XML into a tree in memory so that we can operate on it reasonably quickly. The amount of work that has to be done here by the CPU to SELECT author FROM books will bore you absolutely to tears. As every compiler writer knows, lexing and parsing are the slowest part of compiling. Suffice it to say that it involves a lot of string stuff, which we discovered is slow, and a lot of memory allocation stuff, which we discovered is slow, as we lex, parse, and build an abstract syntax tree in memory. That assumes that you have enough memory to load the whole thing at once. With relational databases, the performance of moving from record to record is fixed and is, in fact, one CPU instruction. That's very much by design. And thanks to memory mapped files you only have to load the pages of disk that you are actually going to use. With XML, if you preparse, the performance of moving from record to record is fixed but there's a huge startup time, and if you don't preparse, the performance of moving from record to record varies based on the length of the record before it and is still hundreds of CPU instructions long.

What this means to me is that you can't use XML if you need performance and have lots of data. If you have a little bit of data, or if what you're doing doesn't have to be fast, XML is a fine format. And if you really want the best of both worlds, you have to come up with a way to store metadata next to your XML, something like Pascal strings' byte count, which give you hints about where things are in the file so that you don't have to parse and scan for them. But of course then you can't use text editors to edit the file because that messes up the metadata, so it's not really XML anymore.

For those three gracious members of my audience who are still with me at this point, I hope you've learned something or rethought something. I hope that thinking about boring first-year computer-science stuff like how strcat and malloc actually work has given you new tools to think about the latest, top level, strategic and architectural decisions that you make in dealing with technologies like XML. For homework, think about why Transmeta chips will always feel sluggish. Or why the original HTML spec for TABLES was so badly designed that large tables on web pages can't be shown quickly to people with modems. Or about why COM is so dang fast but not when you're crossing process boundaries. Or about why the NT guys put the display driver into kernelspace instead of userspace.

These are all things that require you to think about bytes, and they affect the big top-level decisions we make in all kinds of architecture and strategy. This is why my view of teaching is that first year CS students need to start at the basics, using C and building their way up from the CPU. I am actually physically disgusted that so many computer science programs think that Java is a good introductory language, because it's "easy" and you don't get confused with all that boring string/malloc stuff but you can learn cool OOP stuff which will make your big programs ever so modular. This is a pedagogical disaster waiting to happen. Generations of graduates are descending on us and creating Shlemiel The Painter algorithms right and left and they don't even realize it, since they fundamentally have no idea that strings are, at a very deep level, difficult, even if you can't quite see that in your perl script. If you want to teach somebody something well, you have to start at the very lowest level. It's like Karate Kid. Wax On, Wax Off. Wax On, Wax Off. Do that for three weeks. Then Knocking The Other Kid's Head off is easy.

Posted by DeLong at December 22, 2004 11:29 AM

Comments

It is not as stark as this, every computer science student will have a course that utilizes assembly language, a course in digital logic, and a course in computer architecure. Is it really so important that this material is covered right at the start. Not really.

Posted by: Walker at December 22, 2004 11:52 AM


"With relational databases, the performance of moving from record to record is fixed and is, in fact, one CPU instruction."

This is flat false--the time is fixed only in certain special cases and is not, in any system I can think of, a single instruction. (Usually the time involes at least one spin of a disk spindle.) String processing performance seldom produces slowdowns on modern hardware. As both the older X-windows and the newer Mac OS X Quartz implementation show, graphics performance does not depend on having display drivers in the kernel, and MS's practice of doing so creates significant security and reliability problems. And so on...

Nearly 30 years ago, when I was first studying computer science, I learned to determine where the bottlenecks were before attempting to improve performance, and I learned to focus on algorithms rather than cycles and bits for the best performance improvements. These lessons, I fear, are still ill-taught.

Posted by: Randolph Fritz at December 22, 2004 11:54 AM


"With relational databases, the performance of moving from record to record is fixed and is, in fact, one CPU instruction."


I think he's just saying that with fixed sized records you do an add to move your pointer to the next record whereas with XML you have to parse for the next tag.

Posted by: Andrew Kanaber at December 22, 2004 12:04 PM


Sure there is some truth in his words, but as the other side of the coin says, in the words of The Knuth, premature optimization is the root of all evil.

I think the future job descriptions for geeks will sound like Vernor Vinge's Programmer Archeologist.

Posted by: idook at December 22, 2004 12:05 PM


Joel's right, and he's wrong.

He's right that there are too many programmers who don't consider what happens under the hood. But he's wrong that this should be in your intro course.

After you've learned a little bit about programming then your CS courses should branch: one side being high level work and the other side starting at the CPU, teaching assembly language, and working up to C (and beyond).

meno

Posted by: meno at December 22, 2004 12:18 PM


I agree with Meno. I work with embedded systems. My standard interview question is (you can cheat now if you ever have an interview with me):
Write a function clear_bit such that it clears "bitNum" bit of a given integer, as in the following program snippet:

int x = 0xABCD;
clear_bit (&x, 3);
assert (x == 0xABC5);

You would be absolutely astonished at how few M.S.C.S. grads -cannot- do this. It's also astonishing that those that can do this have turned out (about 80%) to be really good embedded programmers.

Lots of people get downright agitated, insulted even. "Nobody ever needs to know how to do this" I've heard. It's always amusing as an interviewer to have your applicants get hostile with you.

Posted by: Stoffel at December 22, 2004 12:42 PM


Actually, one sad truth is that a lot of performance issues, and even a fair number of correctness issues, now involve things at a significantly lower level than C or even assembly language. If you analyze performance the way Knuth does (assigning a cost to each MIX instruction takes and then adding up the total cost), you will get an incomplete picture. Saying that an operation is just a single machine instruction doesn't say as much as you might hope; the number of cycles it takes to execute that one innocent looking machine instruction may vary by orders of magnitude.

[see Ars Technica on the design of modern microprocessors, I presume...]

Posted by: Matt Austern at December 22, 2004 12:50 PM


You know, my first professor in CS had a motto: memory is cheap. As someone who grew up working with Commodore 64s and TRS-80's and never had much money to buy hardware with, I highly disagreed with him.

The older I get, however, and now working on large Java-based systems, my views have changed. At the high level enterprise-type stuff the level of complexity is what you are attempting to manage. Not only the complexity of what the application has to do, but the environment it does it in and the anticipated requirements of the application in the future. If "performance" is an issue and the choice is between another server in the cluster and 20 hours of coder time, another server is cheaper.

Now, I know there are embedded people here who would see the world much differently. When I first got out of school I did embedded stuff, and it was much more akin to working on those C=64's and Model 4s of my youth. I can understand the difference. I can also understand the frustration at the number of people out there who don't seem to know their Knuth coming out of CS programs. A lot of that knowledge should be there at an almost subconcious level while code is written.

However, in terms of things people have to think about rated for critical to success, I would much rather see good "Software Engineering" than good "Computer Programming".

Posted by: Robert kebernet Cooper at December 22, 2004 12:55 PM


No offence, but this article is rubbish. C v. Java is a false dichotomy (not to mention that the author does not seem to understand relational databases - what is this about one instruction). What you need is for programmers to understand how a) algorithms work, b) how compilers interact with hardware, and c) the pluses and - substantial - negatives of relational databases.
Neither C nor Java helps with the first (any book with a title like 'Algorithms in C' (or Java, or Cobol, or Fortran, or Perl, or anything else except possibly MIX) is - by definition - an abomination - apologies to Robert Sedgewick). For the second, unfortunately, there is no good single undergraduate level book on compiler optimisation techniques - don't know why, the necessary stuff is not difficult. After that (while we are dreaming), they can read Hennessy and Patterson.
Before all that, of course, they have to be able to program. Whenever I interview a programmer, I ask them if they have read SICP (alas, to date, no-one has said 'yes' - SICP-literate programmers have better ways to earn their bread than in commercial DP - but the day one does, he gets the job). In the absence of SICP literate programmers, the next hope is that a programmer knows how a regular expression works. That's rare enough.

Posted by: Sean Matthews at December 22, 2004 01:19 PM


I have absolute BURNING HATRED for people with a CS degree who feel that actual knowledgde (as in, done it) of programming/technologies is below them.

This is like an architect not knowing about structural integrity or the properties of building materials.

These people then go on and "design" large systems by layering opaque buzzword-blocks on top of each other without having a clue about cost or pesky interdependencies.

Kill them all.

Posted by: Felix Deutsch at December 22, 2004 01:52 PM


I suppose Brad added the comment about Ars Tehnica on the design of modern microprocessors?

Actually, it's not so much the design of modern microprocessors I was thinking of, but the design of modern memory hierarchies. An instruction that loads a a word from memory into a register is fast, right? Well, maybe. It'll be fast if it's in L1 cache, slow if it's in main memory, and horrendously slow if it causes a page fault that requres the OS to bring in a new page from disk. NUMA machines cause whole new kinds of fun. And yes, if you really want to know how this works, I second the recommendation of Hennessy and Patterson's Computer Architecture: A Quantitative Approach.

Posted by: Matt Austern at December 22, 2004 02:00 PM


Sean Matthews--

I know what SICP stands for and have read & enjoyed it, but I haven't got the faintest idea what you mean by "commercial DP" (while I hope it doesn't involve the adult industry).

Posted by: Felix Deutsch at December 22, 2004 02:01 PM


For todays world add networking basics. People designing large network depending systems and not knowing the basics of the underlying protocols are catastrophic in bigger projects.

If you want to build an AOL or Google system you do have to now each bit in the wire and in the CPUs personally. If you don´t, you will never be able to get the performance you need.

Posted by: b at December 22, 2004 02:04 PM


Felix: I had the same question about DP :P. I only know of Desktop Publishing or Data Processing that would seem contextually correct. I presuyme the latter.

b wrote:
>People designing large network depending systems
>and not knowing the basics of the underlying
>protocols are catastrophic in bigger projects.


I think there is a world of difference, though, between your average enterprise coder understanding basic networking, requisite protocol stuff (HTTP, HTTP+SOAP, IIOP), basic understanding of what is "easy" and what is "hard" for databases to do and a low level understanding of copiler issues, big O of database operations and ...

>For homework, think about why Transmeta chips
>will always feel sluggish. Or why the original
>HTML spec for TABLES was so badly designed that >large tables on web pages can't be shown quickly
>to people with modems. Or about why COM is so
>dang fast but not when you're crossing process
>boundaries. Or about why the NT guys put the
>display driver into kernelspace instead of
>userspace.

Now, in real terms, understanding the "Why" of most of these is secondary to just understanding the truth of them. People work in different areas, of course. Frankly, I have never had an opportunity to give a damn about the video driver in NT and though I could potificate on the design of the NT kernel I certainly don't have hands on experience with those issues.

Having familiarity with the layers of abstraction under where you work definitely is a plus, but saying that everyone needs to know everything is becoming less and less reasonable an expectation for a CS school.

Posted by: Robert kebernet Cooper at December 22, 2004 02:16 PM


The relational database folks have been pointing out for quite a while now that things like XML (when used to construct databases) are similar to previous, non-relational databases, efforts which failed decades ago.

See "Database Debunkings" (http://www.dbdebunk.com/index.html) for rants...

Posted by: liberal at December 22, 2004 02:26 PM


Oooh, I can play this game too!

I am actually physically disgusted that so many computer science programs think that C is a good introductory language, because it's "easy" and you don't get confused with all that boring assembly/register stuff but you can learn cool for/while loops which will make your big programs ever so readable. This is a pedagogical disaster waiting to happen.

Posted by: digamma at December 22, 2004 02:30 PM


"The relational database folks have been pointing out for quite a while now that things like XML (when used to construct databases) are similar to previous, non-relational databases, efforts which failed decades ago."

And SOAP/WSDL is similar to a thousand marginal or failed efforts. Database vs Filesystem storage arguements are back on the table after sleeping for 30 years.

Reality changes with scale and time in many of these issues.

Posted by: Robert kebernet Cooper at December 22, 2004 02:57 PM


Quote: This is why my view of teaching is that first year CS students need to start at the basics, using C and building their way up from the CPU.

Sigh. We tried it. The result was Microsoft.

Seriously. The first generation of microprocessor programmers started as asocial junior- high school geeks who had an orgasm when they were first able to make a front- panel light blink on and off. By the time they started selling their code, they knew everything about saving bytes and machine cycles and nothing at all about computer science or software engineering concepts. In fact, they tended to dismiss any ideas of modularity or structure with total, sneering contempt. The result was several generations of buggy, unmaintainable programs. Supposedly, Windows XP is the first Microsoft general- use operating system that has all the original cruddy sofware boiled out of it.

As to using C as a first language, no. It's good for experts who need to wring the last bit of performance out of their code, but, if the programmer wants to shoot himself in the foot, C will load the gun and hand it to him. A beginner needs a language that enforces good habits.

Posted by: lightning at December 22, 2004 04:01 PM


> int x = 0xABCD;
> clear_bit (&x, 3);
> assert (x == 0xABC5);

Shouldn't you want the answer to be 0xABC9? Or are you just trying to make it harder on the candidate?

Posted by: jjohnston at December 22, 2004 05:39 PM


C is an appalling language for first-time programmers. I've been there, I've taught it. One thing I see it that it buries the student with confusing symbols, such as semicolons and curly slashes. Then you have to teach them about buffer overflows and protection faults. Students end up overwriting that terminating zero, and it crashes their computer. All of this obscures what programming is about: using your computer to solve problems.
I'd say "Start them with Python" which forces indentation on them - a good skill for making understandable code. Then migrate them to C in second year, when they actually understand why they're doing this.

Posted by: Peter Murphy at December 22, 2004 06:04 PM


Writing good software (correct, maintainable, and
efficient) is a craft. You don't learn it in
computer science courses. You learn it by working
on projects with good people, and by making your
own mistakes slowly and painfully.

If you read any book on data structures and
algorithms, you'll find 4 or 5 different ways
to implement a dictionary as a balanced binary
tree. Then in the real world you'll find a
hash table is almost always better.

The algorithm books also give you careful
analysis of how many "operations" the algorithm
will take. And then if you measure a fast cpu
you'll find it can do an ALU operation
in 0.5nsec, but a cache miss takes 150nsec -
so worrying about the memory access pattern
may be 300x more important than counting the
operations.

In college you might learn how to build a
5000-line program from scratch. You won't
learn how to deal with 200Klines of code
written by people who left the company
5 years ago.

I wish there were a quicker, more predictable,
and less costly way to produce good programmers.
But with the current state of the art it seems
to be more like producing novelists than
say, civil engineers: talent, experience, and
an ability to take criticism are crucial,
I'm not at all sure that CS courses are, or
even could be, very helpful.

Posted by: Richard Cownie at December 22, 2004 06:24 PM


>In college you might learn how to build a
>5000-line program from scratch. You won't
>learn how to deal with 200Klines of code
>written by people who left the company
>5 years ago.

You will at certain schools. I know team iterative development is part of the curriculum at Georgia Tech.

Posted by: Robert kebernet Cooper at December 22, 2004 06:36 PM


> Shouldn't you want the answer to be 0xABC9?

No, it is an almost standard question. But it is customary to do this with a macro (and let those that inherit your code deal with the pitfalls of macros).

As the Swedish Chef says here:
http://www.km.fauskes.net/innhold/bitmanipulering/index.php?p=4

#define clear_bit(var, bit) var &= ~(1 What this means to me is that you can't use XML if you need performance and have lots of data.

This is a done deal. XML is used about anywhere. Either get onboard and enjoy the bumpy ride, or wait for the next train. But it is too late to stop the XML train.


It is just like the Swedish Chef. Those bits and bytes are just some random and funny old skool stuff to Joe Average in the computer bizz. He has to use XML, lots of data and make the best of the performance.

When an average computer does a million instructions in a fraction of a second, you don't complain about strcat, you complain about a slow XML parser. And improving that is an interesting project for someone still interested in performance.

Posted by: Luc at December 22, 2004 08:27 PM


The Chef ate my lines!

> Shouldn't you want the answer to be 0xABC9?

No, it is an almost standard question. But it is customary to do this with a macro (and let those that inherit your code deal with the pitfalls of macros).

As the Swedish Chef says here:
http://www.km.fauskes.net/innhold/bitmanipulering/index.php?p=4

#define clear_bit(var, bit) var &= ~(1 "shift op here" bit)

(I don't understand a word Swedish, but google doesn't seem to care.)


From the original rant:
> What this means to me is that you can't use XML if you need performance and have lots of data.

This is a done deal. XML is used about anywhere. Either get onboard and enjoy the bumpy ride, or wait for the next train. But it is too late to stop the XML train.


It is just like the Swedish Chef. Those bits and bytes are just some random and funny old skool stuff to Joe Average in the computer bizz. He has to use XML, lots of data and make the best of the performance.

When an average computer does a million instructions in a fraction of a second, you don't complain about strcat, you complain about a slow XML parser. And improving that is an interesting project for someone still interested in performance.

Posted by: Luc at December 22, 2004 08:31 PM


Why not go all the way and learn assembler instead of C? It's not that hard (to learn - it IS hard to actually write a program of any complexity in ASM) and it gives you a much better understand of what's really going on. I wouldn't want to really program in it, but I'd say the same for C.

Posted by: rps at December 22, 2004 08:53 PM


I concur with the article, in spirit.

I have recently been trying both .net and j2ee and find them to be horrifically bloated and awful. The entire world of these systems eats its own complexity and grows ever fatter, with its feces being picked up by legions of consultants on time and materials contracts.

I am less sure that XML is the root, or even a major part, of the problem. It is actually nice to have fairly universal tagging system for data interchange.

RDBs have their own problems of inefficiency and bloat.

Since the complexity of these layered systems now has a life of its own none of this will be solved until computers start knowing enough to reprogram themselves. When they do, they will probably use something like ColorForth imo.

Andy

Posted by: Andrew Price at December 23, 2004 02:32 AM


P.S. And incidentally C is an awful language imo that hides as many memory allocation defects as it forces one to expose. It was designed by geniuses for other geniuses to write operating systems in. That's just about the only circumstance where it works well, and only then when there are many, many eyes to find all the bugs and buffer overflows.

Posted by: Andrew Price at December 23, 2004 02:36 AM


So what we have discovered is that XML is an interchange format, not a data store format. Not even I am surprised by that and I'm not in a database programming line of work.

For the bigger issue, the ruined expectations of what CS graduates are skilled at is as much about the confusion of what CS means as it is what are the important parts of programming. Since people believe that CS is about programming they expect CS gradutes to be programming graduates and are disapointed. Sure, and architect needs to be concerned with strengths of materials, but architecture is not about just strengths of materials. Materials science can be concerned with strengths of materials, but a materials scientist is not interchangeable with a computer scientist.

Or to bring the issue closer to home: it may be useful for an accountant to know something of economics and for an economist to know something of accounting. But if you were looking to hire an (general purpose) accountant would you look at only people with economics degrees?

Posted by: Matthew Ernest at December 23, 2004 07:52 AM


Err... "interchangeable with an architect"

Posted by: Matthew Ernest at December 23, 2004 08:01 AM


All the above discussion, while quite interesting (I especially liked the reference to a site in Swedish, and the references to SCIP affirmed my religion) passes over the fact that the original purpose of XML was to be a data transfer standard, rather than data storage format. Parsing XML is a treat compared to what usually needs to be done to translate the data format of one program to another's. Therefore, to be a good citizen a program should export in XML and be able to import the same way. It's all right as a data storage format for things that might have been in plain text before, such as a browser's bookmark list.

As for

That assumes that you have enough memory to load the whole thing at once. With relational databases, the performance of moving from record to record is fixed and is, in fact, one CPU instruction. That's very much by design. And thanks to memory mapped files you only have to load the pages of disk that you are actually going to use. 

You hope. The old adage in RDBM programming was "Compared to disk access, anything done in memory is free." Saving CPU at the cost of disk access is the silliest tradeoff imaginable.

Posted by: Jonathan Goldberg at December 23, 2004 08:10 AM


At the rate Computer Science students are dropping out there will be no first-year
Computer Science students left to worry about.

:-)

For example, Computer Science enrollments are down over 50% at MIT in the last few years.

Posted by: bhaim at December 23, 2004 10:45 AM


>Or to bring the issue closer to home: it may be useful for an accountant to know something of economics and for an economist to know something of accounting. But if you were looking to hire an (general purpose) accountant would you look at only people with economics degrees?

No. But both directions are not interchangeble.

You DO want to hire an economist to have knowledge of, not necessarily accounting, but basic math.

You DO NOT want to hire some higher-up team leader in software design WITHOUT some rugged experience in programming (and I'm not talking Visula Basic here).

THis also helps with communicating ideas to the grunts.

Posted by: Felix Deutsch at December 23, 2004 11:38 AM


It is always best to use the right tool for the job. Which tool is right is generally a question of economics: Which resources are scarce, and which abundant? If hardware resources are abundant and programming resources scarce (in relative terms), and slow code is fast enough, then Java and XML may be the best choice. If the opposite is true, C and minimalist data representations may be best. Where we have trouble is where people know only one and not the other. And that, too, is an economic issue -- people who know both are hard to find, and educating people more widely consumes more resources.

I find that, even though most of the software developers I work with originally got their degrees in electronics engineering and learned programming on the job, few have any real idea of how a modern computer works, or of the hardware performance implications of software design tradeoffs. Alas, most of the nominal speed increase we have seen in the last few decades has come from increasingly complex hardware mechanisms which can be rendered completely ineffective by ignorant coding or algorithm design.

Ironically, those hardware mechanisms make the relative performance penalties for code and data bloat now many times greater than they were a decade ago. Now, the more you can make code and data reside in the fastest of the multiple memory caches, the more you can structure it to exploit the CPU's pipelining, branch prediction, and parallel-execution features, and the more you can avoid page faults in your virtual memory paging, the faster your software will run. The speed differences will range from large to huge. Last year I was able in just a few months, by attention only to such details, to speed up by a factor of three a program which others had been working on for years from an algorithmic angle with little success. (But you should always start with the algorithms.)

BTW, knowledge of both accounting and economics is important to software developers. All too many developers, told their code is too slow, will respond reflexively that the right solution is more or faster hardware, oblivous to the cost implications (especially important in embedded systems where hardware is a variable cost and development a fixed cost, but also important if slower code means you need three times more servers and disk space to meet response time specs). And there are also developers who will expend thousands of dollars worth of time optimizing code whose execution cost will never accumulate to even ten dollars.


Posted by: jm at December 23, 2004 01:29 PM


"I think he's just saying that with fixed sized records you do an add to move your pointer to the next record whereas with XML you have to parse for the next tag."

But he's still wrong, and the comment reveals either sloppiness or ignorance. There are many other errors in the article; I just chose the most egregious one. I do agree that the choice is important, and speed and space considerations involved in making it are important. But he's not providing the tools to make the choice; he's just waving his hands, and then slamming students for not having the kind of knowlege that comes after several years of practice in software engineering. Other things he's simply got wrong.

Decades ago, about the time computing was a gleam in the eyes of Norbert Weiner, architect Mies van der Rohe said, "God is in the details." Now, "details", to architects, has the special meaning of the craft details of building construction (or the drawings of them). Mies, son of a brickmason, cared very much how buildings were put together. Were his designs, then, altogether successful and loved? No. Mies's solutions to other design issues were more questionable. His geometric formalism was, to some eyes, elegant, but many other people found it cold, and there were problems of comfort and usability. To some extent for Mies, and to a much larger extent for the generation of designers he influenced, formalism and attention to detail led to inhuman design.

And so in computing. The older hardware, with its intense constraints of interface, processor time, and memory, made the production of software the business of very intelligent well-educated experts, and even the use of computers required special training. The constraints became ingrained in software design practice, and, even after more capable hardware was available, it took a generation to get to the point where most software and systems designed was aimed at non-specialists. Too much attention to the low-level technical details of most software warps the software into a specialist tool and software developers whose education starts with such details often never learn to work in any other way. I suspect that, as a practitioner, Spolsky is well aware of these issues. But his article does not show awareness of the educational issues.

In his eagerness to browbeat students about the details of software, he misses, I think his own major point: most of our software is too complex, and that is the major source of the performance problems which Spolsky decries. It is not that we use interpreters; software of excellent performance has been written in interpretive langauges for decades. We have built layers on layers on layers, many of them unnecessary, and each layer multiplies the capacity required to do the basic jobs of the system. It is this layering, and not the minutae of string processing, that gobbles most of the processor time and memory space in modern systems. And it creates a different educational problem, one that is central to Spolsky's problem: the layers get us to forget the problems we are solving. The issue is not whether we can write, the fast string processing code, the shortest Perl program, or even the most elegant UI; it's whether we can write software to do what we need it to do. And to the extent that software professionals focus on the ever-accumulating layers of APIs in our systems, to that extent we distance ourselves from the problem of developing systems for human purposes.

Posted by: Randolph Fritz at December 23, 2004 11:30 PM


I notice I pretty much agree with everybody on this. Joel can point out a real problem without knowing the solutions -- he can explain his own limitations in a way that make him look like part of the problem and still he's pointing out the problem.

Software professionals tend to get graduated with some understanding of the current challenges. One year it's graphics, another year it's distributed stuff, and the specifics they're focused on are likely to not be such a hot topic in 3 years or so. But if they didn't have that specific training, who would hire them? Would you want a generalist with no training when you could get a generalist with 5 years experience? I'm not sure there's any possible CS solution to that, it's an economic problem.

I think it would be a very good thing for beginning CS students to spend a year with Forth. They'd have a simplified assembler for a simplified VM. A compiler and interpreter and programming editor, simple database and simple OS written in Forth, that they could modify. So that's early experience with low-level coding, and early experience maintaining somebody else's code. With essentially no modern restrictions on (aids to) coding practices they get to see how useful such things are. The interpreter gives them an easy start on test scripts. All at a simple level that doesn't take years to follow. And they can share code and be assigned projects documenting each other's code etc. The second quarter they could use simple networking protocols and see what it takes to do sandboxing etc. Study algorithms. Find bottlenecks to optimise, and learn how often the bottlenecks migrate. All of it with simple code, and the immediate feedback that when they try to make things too complicated the simple obvious approach suddenly feels like wading through glue.

It would mean they'd go a year without getting experience with a language that could make them money. They'd be a year behind learning the vagaries of the usual C libraries. But they'd learn a lot about compiler design (with a simplified system that bypasses many of the problems), and optimisation, and test suites/debugging, algorithms, dealing with complexity, lots of other useful things. It could be a very good platform to explore the big problems, in a system simple enough to understand completely in a few weeks.

But anyway, there's the central problem that CS majors want (or ought to want) to learn everything it takes to excel at their field, but a major part of the job market is looking for people who have adequate skills to get today's work done cheaply today -- and any bug report that comes in after you're laid off is not your concern.

And there's the central problem that the quickest cheapest way to get today's work done is to cobble together things that appear to work, and test your stitching just enough to get it to hold up until after you're laid off. Then if you don't get laid off that quick after all you can do maintenance. There are big problems with this approach but they're usually tomorrow's problems. The payoff for preventing tomorrow's problem today is likely to be small.

Posted by: J Thomas at December 24, 2004 06:20 PM