|
[Chapter 1] Getting Started with Java
When it was introduced in late 1995, Java took the Internet by storm.
Java 1.1, released in early 1997, nearly doubles the speed of the Java
interpreter and includes many important new features. With
the addition of APIs to support database access, remote
objects, an object component model, internationalization,
printing, encryption, digital signatures, and many other
technologies, Java is now poised to take the rest of the
programming world by storm.
Despite all the hype surrounding Java and the
new features of Java 1.1, it's important to remember that
at its core, Java is just a programming language, like many
others, and its APIs are just class libraries, like
those of other languages. What is interesting about Java,
and thus the source of much of the hype, is that it has a
number of important features that make it ideally suited for
programming in the heavily networked, heterogenous world of
the late 1990s. The rest of this chapter describes those
interesting features of Java and demonstrates some simple
Java code. Chapter 4, What's New in Java 1.1 explores the new features
that have been added to version 1.1 of the Java API.
In one of their early papers about the language, Sun
described Java as follows:
Java: A simple, object-oriented, distributed, interpreted,
robust, secure, architecture neutral, portable,
high-performance, multithreaded, and dynamic language.
Sun acknowledges that this is quite a string of buzzwords,
but the fact is that, for the most part, they aptly describe
the language. In order to understand why Java is so
interesting, let's take a look at the language features
behind the buzzwords.
Java is an object-oriented programming language. As a
programmer, this means that you focus on the data in your
application and methods that manipulate that data, rather
than thinking strictly in terms of procedures. If you're
accustomed to procedure-based programming in C, you may find
that you need to change how you design your programs when
you use Java. Once you see how powerful this new paradigm
is, however, you'll quickly adjust to it.
In an object-oriented system, a class is a collection
of data and methods that operate on that data. Taken
together, the data and methods describe the state and
behavior of an object. Classes are arranged in a
hierarchy, so that a subclass can inherit behavior from its
superclass. A class hierarchy always has a root class; this
is a class with very general behavior.
Java comes with an extensive set of classes, arranged in
packages, that you can use in your programs. For
example, Java provides classes that create graphical user
interface components (the java.awt package), classes
that handle input and output (the java.io package),
and classes that support networking functionality (the
java.net package). The Object class (in the
java.lang package) serves as the root of the Java
class hierarchy.
Unlike C++, Java was designed to be object-oriented from the
ground up. Most things in Java are objects; the primitive
numeric, character, and boolean types are the only
exceptions. Strings are represented by objects in Java, as
are other important language constructs like threads. A
class is the basic unit of compilation and of execution in
Java; all Java programs are classes.
While Java is designed to look like C++, you'll find that
Java removes many of the complexities of that language. If
you are a C++ programmer, you'll want to study the
object-oriented constructs in Java carefully. Although the
syntax is often similar to C++, the behavior is not nearly
so analogous. For a complete description of the
object-oriented features of Java, see Chapter 3, Classes and Objects in Java.
Java is an an interpreted language: the Java compiler
generates byte-codes for the Java Virtual Machine
(JVM), rather than native machine code. To actually run a
Java program, you use the Java interpreter to execute the
compiled byte-codes. Because Java byte-codes are
platform-independent, Java programs can run on any platform
that the JVM (the interpreter and run-time system) has been
ported to.
In an interpreted environment, the standard "link" phase of
program development pretty much vanishes. If Java has a
link phase at all, it is only the process of loading new
classes into the environment, which is an incremental,
lightweight process that occurs at run-time. This is in
contrast with the slower and more cumbersome
compile-link-run cycle of languages like C and C++.
Because Java programs are compiled to an architecture
neutral byte-code format, a Java application can run on
any system, as long as that system implements the Java
Virtual Machine. This is a particularly important for
applications distributed over the Internet or other
heterogenous networks. But the architecture neutral
approach is useful beyond the scope of network-based
applications. As an application developer in today's
software market, you probably want to develop versions of
your application that can run on PCs, Macs, and UNIX
workstations. With multiple flavors of UNIX, Windows 95, and
Windows NT on the PC, and the new PowerPC Macintosh, it is
becoming increasingly difficult to produce software for all
of the possible platforms. If you write your application in
Java, however, it can run on all platforms.
The fact that Java is interpreted and defines a standard,
architecture neutral, byte-code format is one big part of
being portable. But Java goes even further, by making
sure that there are no "implementation-dependent" aspects of
the language specification. For example, Java explicitly
specifies the size of each of the primitive data types, as
well as its arithmetic behavior. This differs from C, for
example, in which an int type can be 16, 32, or 64
bits long depending on the platform.
While it is technically possible to write non-portable
programs in Java, it is relatively easy to avoid the few
platform-dependencies that are exposed by the Java API and
write truly portable or "pure" Java programs. Sun's new
"100% Pure Java" program helps developers ensure (and
certify) that their code is portable. Programmers need only
to make simple efforts to avoid non-portable pitfalls in
order to live up to Sun's trademarked motto "Write Once, Run
Anywhere."
Java is a dynamic language. Any Java class can be
loaded into a running Java interpreter at any time. These
dynamically loaded classes can then be dynamically instantiated.
Native code libraries can also be dynamically
loaded. Classes in Java are represented by the
Class class; you can dynamically obtain
information about a class at run-time.
This is especially true in Java 1.1, with the addition of the
Reflection API, which is
introduced in Chapter 12, Reflection.
Java is also called a distributed language. This
means, simply, that it provides a lot of high-level support
for networking. For example, the URL class and
OArelated classes in the java.net package make it
almost as easy to read a remote file or resource as it is to
read a local file. Similarly, in Java 1.1, the Remote
Method Invocation (RMI) API allows a Java program to invoke
methods of remote Java objects, as if they were local
objects. (Java also provides traditional lower-level
networking support, including datagrams and stream-based
connections through sockets.)
The distributed nature of Java really shines when combined
with its dynamic class loading capabilities. Together,
these features make it possible for
a Java interpreter to download and
run code from across the Internet. (As we'll see below,
Java implements strong security measures to be sure that
this can be done safely.) This is what happens when a Web
browser downloads and runs a Java applet, for example.
Scenarios can be more complicated than this, however.
Imagine a multi-media word processor written in Java. When
this program is asked to display some type of data that it
has never encountered before, it might dynamically download
a class from the network that can parse the data,
and then dynamically download another class (probably a Java
"bean") that can display the data within a compound
document. A program like this uses distributed resources on
the network to dynamically grow and adapt to the needs of
its user.
Java is a simple language. The Java designers were
trying to create a language that a programmer could learn quickly, so
the number of language constructs has been kept relatively
small. Another design goal was to make the language look
familiar to a majority of programmers, for ease of migration.
If you are a C or C++ programmer, you'll find that Java uses
many of the same language constructs as C and C++.
In order to keep the language both small and familiar, the
Java designers removed a number of features available in C
and C++. These features are mostly ones that led to poor
programming practices or were rarely used. For example,
Java does not support the goto statement; instead,
it provides labelled break and continue
statements and exception handling. Java does not use header
files and it eliminates the C preprocessor. Because Java is
object-oriented, C constructs like struct and
union have been removed. Java also eliminates the
operator overloading and multiple inheritance features of
C++.
Perhaps the most important simplification, however, is that
Java does not use pointers. Pointers are one of the most
bug-prone aspects of C and C++ programming. Since Java does
not have structures, and arrays and strings are objects,
there's no need for pointers. Java automatically handles
the referencing and dereferencing of objects for you. Java
also implements automatic garbage collection, so you don't
have to worry about memory management issues. All of this
frees you from having to worry about dangling pointers,
invalid pointer references, and memory leaks, so you can
spend your time developing the functionality of your
programs.
If it sounds like Java has gutted C and C++, leaving only a shell of a
programming language, hold off on that judgment for a bit. As we'll
see in Chapter 2, How Java Differs from C, Java is actually a full-featured
and very elegant language.
Java has been designed for writing highly reliable or
robust software. Java certainly doesn't eliminate the
need for software quality assurance; it's still quite
possible to write buggy software in Java. However, Java
does eliminate certain types of programming errors, which
makes it considerably easier to write reliable software.
Java is a strongly typed language, which allows for
extensive compile-time checking for potential type-mismatch
problems. Java is more strongly typed than C++, which
inherits a number of compile-time laxities from C,
especially in the area of function declarations. Java
requires explicit method declarations; it does not support
C-style implicit declarations. These stringent requirements
ensure that the compiler can catch method invocation errors,
which leads to more reliable programs.
One of the things that makes Java simple is its lack of
pointers and pointer arithmetic. This feature also increases the
robustness of Java programs by abolishing an entire class of
pointer-related bugs. Similarly, all accesses to arrays and
strings are checked at run-time to ensure that they are in
bounds, eliminating the possibility of overwriting memory
and corrupting data. Casts of objects from one type to
another are also checked at run-time to ensure that they are
legal. Finally, and very importantly, Java's automatic
garbage collection prevents memory leaks and other
pernicious bugs related to memory allocation and
deallocation.
Exception handling is another feature in Java that makes for
more robust programs. An exception is a signal that
some sort of exceptional condition, such as a "file not
found" error, has occurred. Using the
try/catch/finally statement, you can
group all of your error handling code in one place, which
greatly simplifies the task of error handling and recovery.
One of the most highly touted aspects of Java is that it's a
secure language. This is especially important because
of the distributed nature of Java. Without an assurance of
security, you certainly wouldn't want to download code from
a random site on the Internet and let it run on your
computer. Yet this is exactly what people do with Java
applets every day. Java was designed with security in mind,
and provides several layers of security controls that
protect against malicious code, and allow users to
comfortably run untrusted programs such as applets.
At the lowest level, security goes hand-in-hand with
robustness. As we've already seen, Java programs cannot
forge pointers to memory, or overflow arrays, or read memory
outside of the bounds of an array or string. These features
are one of Java's main defenses against malicious code. By
totally disallowing any direct access to memory, an entire
huge, messy class of security attacks is ruled out.
The second line of defense against malicious code is the
byte-code verification process that the Java interpreter
performs on any untrusted code it loads. These verification
steps ensure that the code is well-formed--that it doesn't
overflow or underflow the stack or contain illegal
byte-codes, for example. If the byte-code verification step
was skipped, inadvertently corrupted or maliciously crafted
byte-codes might be able to take advantage of
implementation weaknesses in a Java interpreter.
Another layer of security protection is commonly referred to
as the "sandbox model": untrusted code is placed in a
"sandbox," where it can play safely, without doing any damage to
the "real world," or full Java environment. When an applet,
or other untrusted code, is running in the sandbox, there
are a number of restrictions on what it can do. The most
obvious of these restrictions is that it has no access
whatsoever to the local file system. There are a number of
other restrictions in the sandbox as well. These
restrictions are enforced by a SecurityManager
class. The model works because all of the core Java classes
that perform sensitive operations, such as filesystem
access, first ask permission of the currently installed
SecurityManager. If the call is being made,
directly or indirectly, by untrusted code, the security
manager throws an exception, and the operation is not
permitted. See Chapter 6, Applets for a complete
list of the restrictions placed on applets running in the
sandbox.
Finally, in Java 1.1, there is another possible solution to
the problem of security. By attaching a digital signature
to Java code, the origin of that code can be established in
a cryptographically secure and unforgeable way. If you have
specified that you trust a person or organization, then code that
bears the digital signature of that trusted entity is trusted, even
when loaded over the network, and may be run without the restrictions
of the sandbox model.
Of course, security isn't a black-and-white thing. Just as
a program can never be guaranteed to be 100% bug-free, no
language or environment can be guaranteed 100% secure. With
that said, however, Java does seem to offer a practical
level of security for most applications. It anticipates and
defends against most of the techniques that have
historically been used to trick software into misbehaving,
and it has been intensely scrutinized by security experts
and hackers alike. Some security holes were found in early
versions of Java, but these flaws were fixed almost as soon
as they were found, and it seems reasonable to expect
that any future holes will be fixed just as quickly.
Java is an interpreted language, so it is never going to be
as fast as a compiled language like C. Java 1.0 was said to
be about 20 times slower than C. Java 1.1 is nearly twice
as fast as Java 1.0, however, so it might be reasonable to
say that compiled C code runs ten times as fast as
interpreted Java byte-codes. But before you throw up your
arms in disgust, be aware that this speed is more than
adequate to run interactive, GUI and network-based
applications, where the application is often idle, waiting
for the user to do something, or waiting for data from the
network. Furthermore, the speed-critical sections of the
Java run-time environment, that do things like string
concatenation and comparison, are implemented with efficient
native code.
As a further performance boost, many Java interpreters now
include "just in time" compilers that can translate Java
byte-codes into machine code for a particular CPU at
run-time. The Java byte-code format was designed with these
"just in time" compilers in mind, so the process of
generating machine code is fairly efficient and it produces
reasonably good code. In fact, Sun claims that the
performance of byte-codes converted to machine code is
nearly as good as native C or C++. If you are willing to
sacrifice code portability to gain speed, you can also write
portions of your program in C or C++ and use Java native
methods to interface with this native code.
When you are considering performance, it's important to
remember where Java falls in the spectrum of available
programming languages. At one end of the spectrum, there
are high-level, fully-interpreted scripting languages such
as Tcl and the UNIX shells. These languages are great for
prototyping and they are highly portable, but they are also
very slow. At the other end of the spectrum, you have
low-level compiled languages like C and C++. These
languages offer high performance, but they suffer in terms
of reliability and portability. Java falls in the middle of
the spectrum. The performance of Java's interpreted
byte-codes is much better than the high-level scripting
languages (even Perl), but it still offers the simplicity
and portability of those languages.
In a GUI-based network application such as a Web browser,
it's easy to imagine multiple things going on at the same
time. A user could be listening to an audio clip while she
is scrolling a page, and in the background the browser is
downloading an image. Java is a multithreaded
language; it provides support for multiple threads of
execution (sometimes called lightweight processes) that can
handle different tasks. An important benefit of
multithreading is that it improves the interactive
performance of graphical applications for the user.
If you have tried working with threads in C or C++,
you know that it can be quite difficult. Java makes
programming with threads much easier, by providing built-in
language support for threads. The java.lang package
provides a Thread class that supports methods to
start and stop threads and set thread priorities, among other
things.
The Java language syntax also supports threads directly with
the synchronized keyword. This keyword makes it
extremely easy to mark sections of code or entire methods
that should only be run by a single thread at a time.
While threads are "wizard-level" stuff in C and C++, their
use is commonplace in Java. Because Java makes threads so easy
to use, the Java class libraries require their use in a
number of places. For example, any applet that performs
animation does so with a thread. Similarly, Java does not
support asynchronous, non-blocking I/O with notification
through signals or interrupts--you must
instead create a thread that blocks on every I/O channel you are
interested in.
|