The following is copyright 1988 and 1998 Randolph J. Herber. This material was originally written as an outline for a presentation to Amdahl sales staff whom were having difficulties selling Amdahl UTS, a UNIX-like operating system, to customers outside the telephony business. AT&T was Amdahl's largest customer for Amdahl UTS and used it for the development of the operating system, called a generic, for the AT&T #5ESS telephony (e)lectronic (s)witching (s)system. I have removed some of the more dated references to Amdahl and Amdahl UTS from this version of this document. See also: http://www.unix-systems.org/ A. Legalisms "Justice is the earnest and constant will to render to every man his due. The precepts of the law are these: to live honorably, injure no other man, to render to every man his due.", Justinian I, The Institutes of Justinian. 1. Trademarks a) Trademarks should always be used as adjectives and not as nouns. b) Trademarks may be registered with the government. c) Trademarks, if not defended by the owner and enter common usage, lose their trademark status. Some examples are Asprin and Cellophane. d) UNIX was a registered trademark of AT&T. 1) "UNIX" is used by AT&T to identify a family of operating systems they sell. 2) It is conceivable that and legal if AT&T decided to for AT&T sell UNIX telephones or even UNIX shoes. 3) AT&T gave the UNIX trademark and rights to UNIX Systems Laboratory which AT&T later sold to Novelle. Novelle in turn sold the UNIX rights to Santa Crux Operation and gave the UNIX trademark to X/Open which is using it to `brand' its statements of what UNIX is today. e) UTS is a registered trademark of Amdahl Corp and used by Amdahl to identify a family of operating systems they sell. 2. Trade secret a) A means of protecting a process or system by not disclosing how it works and requiring customers not disclose the same data. b) Trade secret status is lost when a non-licensee becomes aware of the protected data by non-criminal means. (Check with your lawyer for exact details.) c) Trade secrets are not registered with the government. c) the AT&T UNIX licenses required a licensee not to allow "unauthorized use or distribution of the code, methods and concepts contained in or derived from the UNIX product." d) The UNIX product was a trade secret of AT&T Corporation. e) Amdahl had a license from AT&T to issue AT&T licenses for binary versions of operating systems that Amdahl developed from the AT&T UNIX operating system. f) Amdahl can license the source code of operating systems Amdahl developed from the AT&T UNIX operating system only if the customer already has an appropriate source license for the version of the UNIX operating system that the Amdahl operating system was developed from. g) The changes that Amdahl made in developing the UTS operating system that do not derive from AT&T UNIX code, methods, and concepts, may be trade secrets of Amdahl Corporation. You should treat the UTS operating system's code, methods, and concepts as if they were. 3. Copyrights a) A copyright allows a author to restrict the conditions under which a written work can be copied or distributed for a period of years. b) Copyrights may be registered with the government. c) The source code, object code, and machine code were unpublished copyrighted material of AT&T. You would need to contact SCO for the current status of UNIX source code. The Lions' book on Release 6 UNIX operating system can now be bought at book stores and used to require a source code license because it contains most of the Release 6 kernel's code. d) The Amdahl changes to the source code, object code, and machine code are unpublished copyrighted material of Amdahl Corporation. e) The UNIX operating system manuals are published copyrighted material of AT&T and follow-on corporations. f) The UTS operating system manuals as derived works are published copyrighted materials of both AT&T and follow-on corporations and of Amdahl Corporation. B. A short list of buzzwords with OS/MVS and VM counterparts when available "The beginning of wisdom is the definition of terms.", Sophocles 1. Kernel -- the operating system control program, e.g. OS/MVS - nucleus or supervisor; VM - CP (which stands for control program). 2. System call -- A system call is a request for services from the operating system control program. The instruction used for this purpose in the UTS operating system is the SVC (supervisor call) instruction. The MONITOR CALL or PROGRAM CALL instructions could have been be used. There is a need to interrupt the processor so that the hardware's state and priviledges can be changed from those allowed to users to those needed for the proper functioning of the system control program, in the UNIX(registered trademark of X/Open) or UTS operating systems, the kernel. The system calls for the UTS and UNIX operating systems are documented in section 2 of the programmer reference manual. 3. Boot -- the loading of the kernel, e.g. IPL (stands for initial program loading) 4. Process -- the unit of work, e.g. a combination of the OS/MVS task and address space, VM - a virtual machine 5. Fork -- a system call that make two nearly identical copies of a process. The two copies share the same virtual address space, open files, and most other used system resources. One is called the parent and the other is called the child. The parent process receives the process identification (pid) of the child as a result of the fork system call. The child receives from its copy of the same system call a pid of 0. 6. Exec -- a family of system calls that replace one program executing in process with another. The system calls differ in format of the arguments and not in the purpose of the system call. 7. Panic -- a system crash. A panic dump is like an OS/MVS SYS1.DUMP? data set, or a VM CP dump. 8. File -- e.g. OS/MVS data set. 9. Driver -- the device dependent code for a particular device class, e.g. OS/MVS the EXCP or JES I/O driver 10. File system a) the component of the kernel that manages DASD resources into files e.g. in OS/MVS DASD space management, catalog management, and data management services. b) a DASD data structure used to manage a tree of files, e.g. in VM, minidisks 11. Slice -- a separately managed DASD area. A slice is also called a minidisk. The closest MVS equivalent is a VSAM dataspace. It is a close parallel to a VM minidisk. 12. Directory -- a system managed file containing the associations between path components and inodes. Each directory entry contains a inode number and a path component. 13. Path -- a file's name. An absolute path starts with a slash and is relative to a process's definition of the systems root directory. (The default and normal value is the root directory of the IPL slice. It can be changed by a priviledged system call, chroot.) A relative path does not have an initial slash and is relative to a process's definition of its current working directory. (The current working directory can be changed by an unpriviledged system call, chdir.) 14. Path component -- a field of a path delimited by the beginning, a slash, and the end of the path. 15. Inode - the data area used to manage a file's data. The MVS equivalent is the DSCB (data set control block). Unlike the DSCB, the inode does not contain the file name. A file may have zero or more names, the typical number is one. All of a file's names must be contained in a single slice (frequently called a file system). 16. Daemons -- continuously running programs which monitor and manage system resources such as printers, working sets, etc. See "Spooler". E.g. cron is a daemon that starts programs at selected times. And, the tape daemon manages the tape drive pool and works with the operator. 17. Sync -- sychronizing the DASD to the file system image of the DASD by writing buffers to the DASD. 18. Orange book -- named for the color of its cover - defines DOD/NCSC security standards 19. Filter -- a program that reads from standard input, does something, writes to standard output. Designed to work in a pipeline. 20. Pipe -- a kernel facility for process to process communication via I/O system calls 21. Pipeline -- a group of programs connected by pipes 22. Spooler - a program or system of programs that accepts files to be delivered to a system resource that needs to be used serially. E.g, printers or communications equipment 23. Panic - the kernel has detected a fatal error and is terminating. 24. Getty - the terminal line monitoring program. (See system bringup) 25. Stack - stacks are a means of handling recursive functions. Recursive functions can call themselves directly or indirectly by calling some other functions which eventually calls the function again. Each invocation of the function requires new storage to store the function's local variables. The behavior of the storage area used for this storage is like a FIFO (first-in-first-out) queue which is commonly called a stack, like a stack of plates in a kitchen cupboard. 26. A.out - binary executable file. The layout of the a.out file is documented in section 4 of the programmer reference manual. 27. asynchronous - no fixed timing relationship between the ends of the communications path. Each unit of communication, a character of 5 to 8 bits, has framing signals around it so that the receiver can recognize it. The communication path has an idling mode which indicates that no signal is being sent. Communications protocols using packets or frames may be built on top of the basic asynchronous communications. 28. synchronous - there is a fixed timing relationship between the ends of communications path. The ends synchronize at the beginning of communication and may resynchronize on errors during communication. Each unit of communication, a packet or frame with headers and trailers, contains zero or more upto some limit characters or bits, has no framing signals; because of the synchronization, each end can recognize the beginning or a packet or frame. The communications path has an idling mode of an otherwise illegal sequence of characters or bits that indicates that no signal is being sent. 29. half-duplex - the communications path sends signals in one direction at time. The path may be capable of communicating in one direction only. Otherwise, there is some agreed signal to allow the other end to become the sender. 30. full-duplex - the communications path is bidirectional and both ends may be sending at the same time. 31. Ethernet - a packetized asynchronous protocol using a coaxial or optical fiber cable with multiple senders and receivers. Each node listens for packet which addressed to it. When a node wants to send it waits for the idling signal and then sends. If two or more nodes attempt to send at the same time, they detect this condition by checking their own signals as they come back to the node and if the signals are damaged, the node waits for a short random amount of time and then waits for the idling signal again. 32.TCP/IP - a communications protocol that may be embedded within a physical communications protocol such as Ethernet that supports methods of communications that are sending letters called datagrams or like telephone calls called virtual circuits. Datagrams may arrive out of order or be lost. Virtual circuits protect against packets arriving out of order but may lose packets. Higher order protocols may provide either order services or replacements for lost or damaged packets. C. What is the UNIX operating system environment? 1. A tool for getting useful work from a computer. "If the poor workman hates his tools, the good workman hates poor tools. The work of the workingman is, in a sense, defined (their italics) by his tools -- witness the way in which the tool is so often taken to symbolize the worker: the tri-square for the carpenter, the trowel for the mason, the transit for the surveyor, the camera for the photographer, the hammer for the laborer, and the sickle for the farmer. Working with defective or poorly designed tools, even the finest craftsman is reduced to producing inferior work, and thereby reduced to being an inferior craftsman. No craftsman, if he aspires to the highest work in his profession, will accept such tools; and no employer, if he appreciates the quality of work, will ask a craftsman to accept them." The Psychology of Computer Programming, Part 4: Programming Tools, page 203. 2. A philosophy of data processing "UNIX is a general-purpose, multi-user, interactive operating system for the larger Digital Equipment Corporation PDP-11 and the Interdata 8/32 computers. It offers a number of features seldom found even in larger operating systems, including (i) A hierarchical file system incorporating demountable volumes, (ii) Compatible file, device, and inter-process I/O, (iii) The ability to initiate asynchronous processes, (iv) System command language selectable on a per-user basis, (v) Over 100 subsystems including a dozen languages, (vi) High degree of portability." Ritchie, D.M. and Thompson, K., "The UNIX Time-Sharing System", Bell System Technical Journal, Vol. 57, No. 6, Part 2, July-August 1978. a) Minimalist 1) Minimize the number of functions in a program. 2) Minimize the number of places a function exists. 3) Minimize the dependence on machine characteristics. 4) Minimize the amount of information passed through each interface. 5) Minimize the size of the system by moving function out of the kernel. 6) Minimize the size of the system by using the text model for all practical system control files. b) Tool building. 1) Write utilities and libraries to handle commonly occurring functions 2) Write utilities to handle the general case. Special cases are frequently not handled by the utilities. 3) Write programs assuming that their output maybe used by other programs. 3. A balm a) A solution to portability issues 1) C programming language a> K&R b> ANSI 2) Standard libraries and library functions at the source level 3) Application Binary Interfaces at the binary level 4) Use of C, a higher level language (HLL), in coding the operating system supervisor (kernel) allows fast porting of entire system. b) A solution to the need for standard computing environments. Since the entire operating system with its utilities can be implemented on a new machine with much less effort than writing a new operating from scratch, it encourages the implementation of the same enviroment on many different machines. 4. A battleground a) System V vs. 4.xBSD (Berkley Software Description) System V refers to the UNIX operating system as implemented by AT&T. 2.xBSD, 3.xBSD, and 4.XBSD refers to the UNIX operating as modified by academia with the University of California at Berkley doing most of the modifications and gathering many of the modifications from other universities, colleges and government research facilities. AT&T has been more concerned with efficiency and security. Berkley has been more concerned with richness of function. b) IEEE P1003 (or POSIX) vs. X/OPEN. American or European standards? As X/OPEN is changing those elements that differ from POSIX to make it into a superset of POSIX, this battle is winding down. 5. A operating system a) What is an operating system? 1) An operating system manages computing system resources. 2) An operating system controls access to computing and operating system resources. 3) An operating system provides enhanced and additional services beyond those of the bare hardware. b) The UNIX operating system is an operating system. 1) The UNIX operating system manages the CPU(s), memory, and I/O of the computing system. 2) The UNIX operating system provides means to limit access to the a) the CPU(s) -- some instructions are restricted to kernel use only, e.g. the I/O and the memory management instructions. b) memory -- the number of virtual memory pages and physical page frames backing the virtual storage for each process is managed by the kernel, c) I/O -- all I/O is done through the kernel. 3) The UNIX operating system has about seventy system calls defined and implemented in the kernel for data, memory, device, program, interprocess communication and system management. 6. Security orientation. a) From near its beginning, UNIX has had access controls available. b) UNIX, without additional security facilities added, can be qualified for C1. IBM MVS, also without additional security facility, has been rated D. With ACF2 or Top Secret, IBM MVS is rated C2. AT&T is working on a UNIX/MLS (for multi-level secure) operating system which AT&T expects to qualify at a B2 or B3 level. Amdahl agreements with AT&T will allow a coresponding UTS/MLS. c) The UNIX operating system because of its simplicity would be the easiest of the major operating systems to rewrite to qualify for the higher security ratings. 7. Humor filled. From the UNIX operating system's beginning with playing games, it has been humor filled. Jokes abound in the manuals, the sources and the executables of the system. D. Overview the past and future of the UNIX operating system "The economic interpretation of history does not necessarily mean that all events are determined solely by economic forces. It simply means that economic facts are the recurring decisive forces, the chief points in the process of history.", Eduard Bernstein, Evolutionary Socialism 1. A brief history - General a) Start in 1969 as a platform to play a video game, Space Travel, on in Bell Labs. b) Continuation as a platform for text and word processing in Bell Labs. c) In 1973, recoded in C language. d) Development into a platform for hardware and software development e) Escape into academia, non-commerical due to 1956 consent degree. 1) cheap - for near duplication cost 2) easily understood a> 400 pages of documentation b> 10,000 lines of code in kernel c> 50,000 lines of code in applications 3) unsupported - you got a "tape of bits" and no warranties 4) easily ported basis - use whatever hardware you could get These porting efforts were usually done at universities and other customer sites. Major computer vendor support for the UNIX operating system started in the last 5 to 8 years. 5) good platform for computer science education 6) engineers use what the computer science department has available 7) Version 5 was the first version (or edition) released to universities. 8) Version 6 was the first licensed and generally available version 9) AT&T and IBM start, in 1979, a joint development project: UNIX/370 which was the UNIX operating system for IBM System 370 running under the TSS/370 Resident Supervisor. TSS/370 is a sibling of VM/370. A variant of TSS/370 call SSS was later used. f) Graduates, as workers, want a familiar environment in industry 1) Version 7 was the first commerically licensed version 2) Version 7 was the basis of the early microprocessor ports 3) System III and System V were further developments of Version 7 4) System V Release 2, announced in 1983, was the first version of the UNIX operating system supported by AT&T directly g) In 1984, AT&T is allowed to enter the data processing industry 1) AT&T initial offering was System V Release 2 running on AT&T 3B's. 2) In 1987, System V Release 3 was announced a> memory management enhancements b> improved networking c> Network File System (NFS(registered trademark of Sun)) d> Remote File Sharing (RFS(registered trademark of AT&T)) 3) In 1988, System V Release 4 was announced a> improved internal security b> improved networking security c> "details to follow" approach lead to the OSF and UI organizations 2. Growth of Unix popularity a) 1 machine with 2 users in 1969 b) 10 machines in 1972 c) 300 machines in 1978 c) 70,000 machines in 1983 d) 40,000 DEC VAX and 60,000 other machines in 1986 e) over a million in 1988 f) by including Linux, the best estimate is about 100,000,000. 4. Uses of the UNIX operating system a) Office Automation b) Engineering c) Scientific d) Communication e) Embedded controllers f) Software development g) File server 1) to Supercomputers 2) to LAN-based minicomputers and microcomputers 5. Full spectrum of computer users The number of computer system architectures with available versions of the UNIX operating system number several hundred. a) embedded controllers b) palmtop computers b) laptop computers c) desktop computers, e.g. most PC-compatibles d) minicomputers, e.g. AT&T 3B family and DEC VAX's e) mainframes, e.g. Amdahl-compatibles and UNISYS f) supercomputers, e.g. Cray (now part of SGI) 5. Users of the UNIX development a) Communications industry -- that's where it started b) Education c) Government -- trying to save a dollar d) Engineering e) Scientific f) Graphics g) Entertainment 6. Applications found on UNIX operating systems. a) Communications b) Text processing and publishing c) Accounting and Finance (incluuding spreadsheets) d) AI, CAD, CAE, CAM, CASE d) Databases and Database management systems e) Graphics and animation f) Electronic mail and messaging systems g) Modeling h) On-line transaction processing (OLTP) i) Over a thousand third-party software packages are available. Most, if not already ported, can be ported very easily to Amdahl's UTS operating system. Amdahl Marketing has a catalog listing several packages that have been ported already to the UTS operating system. 7. Future directions -- System V Release 4.0 a) Redefinition of directory structure Support for Berkley style directories with 255 character file name components. This eases both portability and some office automation problems. This includes symbolic links -- files which name aliases for other files. b) Device Driver Interface The interface of device drivers to the remainder of the kernel will be as well defined as the user interface via system calls. This allows improved portability of device drivers between versions of the UNIX operating system. This allows device manufacturers to write drivers for their devices that can be used in any System V Release 4. c) Virtual File System VFS supports an abstract model of what a file system is. If an file system can be mapped to this abstract model, file system drivers can be written to run under VFS to support that file system. This could make it possible for System V Release 4 to use MSDOS floppies and hard disks, IBM VM minidisks, IBM OS DASD, etc. directly. It will also allow treat system resources and managed entities as files or file systems. For example, the processes on the system could appear as files in a process directory. d) Graphic User Interface This defines both a standard interface for users and a standard interface for application programs to describe output. e) Sockets and Streams Both models of generalized I/O may be present in the system. f) Virtual Memory Manager changes. Supports a generalized model of virtual address space regions. This allows simpler and more consistent code for kernel services, such as program (executable image) management, interprocess communication, and kernel and process communications. h) Access Control Lists Allows a process to be a member of multiple groups at the same time. i) Application Binary Interfaces, per processor type Complete specification of the interface between application code and the operating system. This includes the structure of executables libraries, system files, memory usage and availability, function calling protocols, etc. These interface allows an application to be distributed as object code for the given processor regardless of the specific implementation of the operating system. This would a "shrink wrap" software market to develop for given processor types. j) Reduction of portability problems in the kernel. 1) Eases the porting of the operating system to new machines 2) Reduces the differences between implementations k) Dynamic Linking and Loading Code from libraries is linked to as needed during execution. Pages of executables are brought into real memory during execution as they are used by the paging supervisor. (Shades of IBM TSS.) l) National Language Support Support for messages in various languages selectable by each user. Formatting of time, dates, money, numbers according to custom of the country selected by the user. Handling of character set differences including collating sequences. m) Expanded Fundamental Types The field sizes of fundamental kernel data types are being enlarged. Some examples: 1) Device numbers from 16 to 32 bits. This allows more than 64 different drivers of each type, block and character special, and more than 1024 devices per driver. 2) File offsets and sizes from 32 to 64 bits. This increases the maximum file size from 2 GB to 8 EB (9.2E18). 3) Timestamps from 32 to 64 bits. This allows both extended range and finer resolution. 8) Future directions -- other a) Symmetric multiprocessing b) Massively ~Symmetric multiprocessing b) Virtual DASD 1) allows disk stripping 2) allows mixed device type in one file system 3) allows file systems larger than a single device c) Wide area distributed processing. E. UNIX architecture 1. I/O model a) Data structure - a stream of bytes, a byte array if indexable. b) File name space 1) Hierarchical, without loops 2) Case sensitive - Upper and lower case are different 3) File name components may have any 8-bit character in them but "/" and null (0x00). Note, some characters are hard to keyboard and others have special meanings to shell programs and must therefore be escaped. It is suggested that only alphanumerics, dash, underscore and period be used in file name components. 2) A list of zero or more components separated by slashes 3) Optionally started with a slash a> With a slash, relative to the process's current root directory b> Without a slash, relative to the process's current working directory Terms, process and directory to be defined later c) File system characteristics 1) All file types are treated similiarly by the same system calls. 2) Control of access to data is built into the file system. 3) Space management is fine grained and as-needed. d) Types of files 1) Regular files - contain user or system data 2) Directories - system managed files containing node of tree-structured file catalog. Node indexed by a component of file name or path. 3) Block special device files -- frequently used to contain file systems 4) Character special device files -- raw I/O devices 5) Pipes 2. Work management model a) The unit of work in the system is called a process. b) New processes are started by process forking. Another copy of process is made by the kernel. 1) Most characteristics are shared Some examples: a> Virtual memory b> Open files c> File locks d> Signal handling e> User id (uid) and group id (gid) f> Environment - a table of named strings For example: 1> PATH - a list of directories to search for executables 2> HOME - what directory is this user's home directory 2) A few characteristics are not shared. Some examples: a> Process identification number (pid) b> Return value from the fork system call 1> Parent receives the pid of the child 2> Child (the new process) receives 0 3) Made more efficient by using shared virtual memory a> copy-on-touch - give a process its own copy of a page, if the process reads or writes to a shared page. b> copy-on-write would be more efficient yet. Present 370 architecture processors, including Amdahl's, do not have a method of memory protection or management that would allow copy-on-write. A change to the architecture implementation that is backward compatiable with the "Principle of Operation" could be made that would allow copy-on-write. c) Program loading 1) It is a separate process from process creation. Programs can chain by load new programs on top of them selves. 2) Normally, children do this shortly after being created. 3) Program load is caused by the exec system call. 4) Most process characteristics are passed onto new program. Some characteristics that are not passed are: a> The virtual memory - of course. b> Files marked for "close on exec" and their locks. c> Signal handling by functions 3. System components a) User level 1) System supplied tools and libraries 2) Appplication programs b) Kernel level 1) System call routines 2) File system management 3) Device drivers 4) I/O system supervisor 5) Interrupt level c) Device file system 1) Block 0 does not exist - value indicates no block assigned 2) Block 1 is reserved - historically used for system bootstrapping 3) Block 2 is the superblock - descriptor of file system The closest MVS equivalent is the Format 4 DSCB. a> Size of file system b> Size of ilist, the array of inodes c> Block number of first block of free list d> Number of free inodes e> Number of free blocks f> File system state - codes indicating mounted or not mounted 4) Ilist and inodes - descriptors of file within file system, blocks 3 to n. The ilist is the array of inodes. The closest MVS equivalent is the format 1 DSCBs in the VTOC (volume table of contents). a> Inode 0 does not exist - value indicates no inode assigned b> Inode 1 is reserved - historically used for bad block table c> Inode 2 describes the file system root directory. This inode is its own parent. d> For each device, file, or directory: 1> User id number (uid) 2> Group id number (gid) 3> Size of file in bytes (offset of last byte plus 1) 4> Number of links (names) the file has 5> Time of last read to file data 6> Time of last write to file data 7> Time of last change to file data or inode 8> Pointers to data blocks or index blocks 5) the data area which contains data blocks, index blocks, and the free list chain of available blocks. The closest MVS equivalent to the index blocks is the extent information in the format 1 DSCB -- the unit of allocation is the block and not the track. The closest MVS equivalent to freelist is the format 5 DSCBs. 6) File names is scattered in directories and is not in inode d) Memory maps. 1) The kernel's view of system memory The kernel's memory map, from low address to high address, is: the kernel's code, the kernel's statically allocated initialized work areas, the kernel's statically allocated uninitialized work areas, and then the remainder of the system's storage, which is dynamically allocated as needed to either the kernel's or various process's use. The pages from this area as used to "back" the virtual memory of a process, manage resources on behalf of a process or for system-wide usage -- such as, the system buffer cache used by file system management. The kernel stacks are allocated in the same page frames used to store the user areas which are used to manage system resources assigned to the associated process. The kernel stack is built in the page that holds the user area of the process that was interrupted to give the kernel control. 2) A process's view of its virtual storage. A process's memory map, from low address to high address, is: an optional segment (on non-XA systems, a 64K block on a 64K boundary and on XA systems, a 1M block on a 1M boundary) which is invalid and can not be addressed without causing a program fault, zero or more segments for the process's code, one or more segments for the statically allocated initialized work areas, zero or more segments for the statically allocated uninitialized workareas, zero or more segments or virtual address space requested by system call (sbrk, the closest MVS equivalent is GETMAIN), zero or more discontinuous segments for interprocess communication (obtained and managed by shared memory system calls), and at the high end of virtual storage, the process's stack which is limited to 1M. The text (code), data (initialized work areas), and bss (unitialized work areas) areas are specified by the a.out. The presence of an initial invalid segment, whether or not the text, data, and bss share or do not share segments, and whether or not the text segments can be shared with other processes is coded in the a.out header which is not loaded into the process virtual address space. The a.out may also contain a symbol table for debugging purposes. 5. Historical assumpations a) Small computer memories. 1) Small programs with a few functions 2) Short or non-existant error messages b) Slow terminal I/O. 1) Short program names 2) Short or non-existant error messages c) Small Disks 1) Fine grained allocation of disk space. 2) As-needed allocation of disk space. d) Uniprocessor system F. Access to UNIX systems 1. UNIX operating systems do not make any destinction between batch and interactive work. All work is effectively interactive. The system process scheduler may permit different treatment of classes of processes to support such concepts as real-time, interactive, batch and allocated shares of a system. 2. By convention, users may be logged in to a system multiple times 3. A login is viewed by the system as a group of processes with a process which is the group leader, usually a shell program, which is the ancestor of the surviving processes. 4. Common methods for users to use to communicate to a UNIX machine. a) The most common method of communicating with a UNIX machine is by using a asynchronous terminal or a device which emulates such a terminal, e.g. a DEC VT100. b) Over a packet network, using either asynchronous terminals or file transfers, e.g. Ethernet, TCP/IP, SDLC, DECNET, SNA, RJE, & NJE. c) A few UNIX systems support sychronous terminals, e.g. 3270s. These terminals are generally considered unfriendly by users. Most UNIX applications generally available assume an asynchronous terminal is being used. The UNIX operating system has several general purpose libraries for device independence among asynchronous terminals. Amdahl's UTS operating system is one of the few. It has support for synchronous terminals because of its initial development under VM and because some customers do not want to waste their investment in these terminals. d) RFS and NFS are methods, using communication services, to allow data sharing between cooperative computer systems. To application programs, the appearance is created that the data is on the system that the program is running on. The details of data sharing differ between the two methods. In particular, the degree to which the fact that communication is going on differs. e) 3270 terminals use synchronous communications. Both bisynchronous and SNA are examples of synchronous communications. The use of a PA, PF, clear or enter key causes a 3270 terminal to send a packet to the host. The host sends packets that describe the desired changes to the screen or terminal status. f) The typical UNIX or UTS terminal uses a full-duplex asychronous communications method where each character may require a separate response from the operating system or application. g) 3270 terminals make more effective use of the computer's I/O capacity on a computer with an Amdahl-like I/O structure. The penalty on memory mapped I/O machine like DEC VAX's or DEC PDP's is much smaller. Asynchronous terminal are also typically much cheaper than synchronous terminals. G. Getting the UNIX operating system started and getting started with the UNIX operating system. 1. System generation a) A system description is read and system tables generated. b) The system tables are compiled c) The system table object codes are linked with existing libraries d) The a.out (executable) of the kernel is written to the root directory. 2. System bring up a) Hardware IPL (in UNIX terminology - booting, short for bootstrapping) b) Software bootstrap loader finds and loads the kernel a.out c) Invocation of kernel's main routine d) The "hand-crafted" process 0 which becomes the dispatcher e) The first child, process 1, init, becomes the parent of nations 1) controlled by /etc/inittab 2) runs /etc/rc to initialize system 3) respawns (forks to recreate) system processes when they die a] daemons - e.g. lpsched, the print spooler b] getty - terminal line monitoring program 1] waits for line open 2] requests login name 2] loads login 4) on system operating level changes, kills or creates processes as needed. 3. Login a) processes login name b) requests passwords as needed c) sets up initial environment, uid, gid, process root directory. d) loads designated program, usually a shell 4. Shell programs. a) Shell programs run as application programs b) Shell programs do not require any special priviledges. c) Generally, a user has a choice of several shell programs. d) A user can write and use one of his own. e) a shell program reads and processes command lines until end-of-file and then terminates. 5. A selection of useful commands from Sections 1 and 1M cd - change the current working directory pwd - print the current working directory mkdir - make a new directory rmdir - remove a empty directory ls - list the characteristics of one or more files by path name. cat - copy data to standard output echo - generate and send to stdout a text line specified by the command's arguments cp - copy a file or a list of files to a directory mv - move a file to a new name (rename), copy the file if to a different file system ln - link another name to a file, all names in the same file system rm - delete (or erase) one or files mkfs - make a file system structure in a minidisk fsck - validate and, if desired, attempt to repair a file system structure grep - search for a regular expression in one or more files find - search for files with specified characteristics and apply specified commands to them xargs - reads filenames from standard input and applys a command optionally with other arguments to groups of these files diff - compare two text files and report the differences cmp - compare two arbitary files and report if different more - copy text data to standard output pausing after each screen full ed - a line-by-line editor for any terminal and is found on almost all UNIX operating systems. If you can learn only one editor, learn this editor. vi - a full screen editor for asynchronous terminals emacs - another full screen editor for asynchronous terminals sed - a stream editor, makes line-by-line changes while reading stdin and writing stdout awk - a stream editor and report writing language tr - a character set translation filter cut - extracts a set of columns demarked by position or characters paste - concatenates a list of text files columnwise ned - a UTS editor for 3270 terminals cc - the compiler for the C language sh, ksh, csh - some common shell programs sort - sort text data uniq - compares consecutive lines and e.g. removes duplicates stty - set various terminal and communications characteristics mail, mailx - electronic mail uucp - the "standard" UNIX-to-UNIX copy program - file transfer stty - inform kernel of terminal characteristics passwd - changes your password mesg - controls whether other users (except super users) can write to your terminal using the write command write - a user-to-user communications tool, frequently disabled for security reasons. If so, use mail or mailx or similiar program. umask - controls what access permissions may not be enable when a file is created. The access permissions can be changed later using the chmod command. cpio - a file and directory archiving program tar - a file and directory archiving program nice - change the priority base of a process, ordinary users can only lower the priority, super users can raise or lower the priority The process is specified by the command line which is nice's arguments ps - reports on the status of processes in the system. kill - send signals to processes. May be used to terminate processes that are not waiting on critical system resources. sync - a program to issue the sync(2) system call to cause the disks to be sync'ed. There is a custom to do this three times; the reason is to slow the operator down and allow the writes to complete. 6. Logout a) Give an end-of-data signal to the shell b) Give an "exit" command c) Set the terminal data rate to zero, i.e. stty 0 7. System shutdown As user root, issue the /etc/shutdown command. This command stops the daemons, flushs the file system buffers to the DASD, and tells init (process 1) to go to state s (single user). When the single user state is reached, resync the system (reflush the file buffers) and stop the system. H. Piping and redirection 1. Standard files a) stdin 1) file descriptor (fd) 0 2) normally attached to your terminal 3) frequently attached to a file for input to a program. b) stdout 1) fd 1 2) normally attached to your terminal 3) frequently attached to a file for output from a program c) stderr 1) fd 2 2) normally attached to your terminal 3) seldom attached to a file for error report output from a program 2. I/O redirection a) Input, redefines fd 0 by default, other fd's may be specified 1) File, signaled by a filename 2) Append to a file, signaled by >>filename 3) To an existing fd, "2>&1" redirects stderr to stdout c) Pipe, "|", defines fd 1 of the left side and fd 0 of the right side I. Shell language (briefly described here - subject for another class) 1. Read the sh(1), ksh(1) and csh(1) man pages for more information. 2. Characters with special meaning to shell (not all explained here): ' " ` \ space tab ^ ; & ( ) { } [ ] | < > ? $ 2. Each command line is separated into "words" separated by the separator characters. The environment variable IFS specifies the separator characters. The default separator characters are blank, tab, and new line. Command lines are terminated by new lines. Comments start with an unescaped number sign (#) and end at a new line. 3. Each word is expanded in to a list of word by processing the shell meta characters listing all filenames which match the word. a) Except for the metacharacters, each character represents itself. b) Prceeding a character with a special meaning with a reverse solidus (\) removes its special meaning and is called escaping. c) a question mark (?) matches any arbitary character d) an asterisk matchs (*) matches zero or more arbitary characters e) a list of characters within square brackets matches any one character specified by the list of characters within the square brackets. E.g. [A-Z] matches any upper case letter, [^cat] matches any character except "c", "a", or "t". f) a dollar sign ($) causes the value of an environment variable named by the following word to be subsituted for the dollar sign and the word. 4. The first word of the resulting command is considered the name of the program or command to be executed. If it does not contain a slash (/), all directories named by the PATH environment variable are searched for the command; otherwise the word is used as the filename of the desired executable. 5. The common shell languages have syntax or statments to do: a) Group commands together to be run in a sub-shell. b) Loop a command list depending on the list of positional parameters to the shell file, a list of values, the success or failure of commands. c) Select a command list depending on string matching similiar to file name matching. d) A if statement with then and else clauses. e) Trap signals from the user or other processes. J. Common shell environment variables. 1. HOME - name of your home directory, set from /etc/passwd entry 2. PATH - list of directories to be searched for executables 3. CDPATH - list of directories to be searched for targets of the change directory command, cd. 4. PS1 - shell prompt string when waiting for a new command 5. PS2 - shell prompt string when waiting for a command continuation line 6. TERM - the type of terminal you are using. Used as a key to the terminal description databases, terminfo and termcap, used by most full-screen programs for asynchronous terminals. 7. DISPLAY - where X Window system client programs should direct their requests for display services. K. UNIX file system security 1. Categories of users in the system a) root, uid 0, the 800-pound gorilla b) system group priviledged users, in or allowed to enter groups which allow access to system files c) regular - the unpriviledged users 2. Types of file access premissions a) for directories 1) read - ability to read directory as regular file - needed to do ls 2) write - ability to request system modifications to directory, e.g. file creation or deletion within the directory 3) access - ability to resolve file names using components within the directory. b) for non-directory 1) read - ability to read file data 2) write - ability to modify file data Note: file creation and deletion is handled at the directory level 3) execute - ability to execute the file as a program a> if text file - system invokes shell program with file as input b> if executable binary - system loads and executes binary image 4) set-uid a> with execute permission - program runs with file owner's uid effectively b> without execute permission - reserved combination 5) set-gid a> with execute permission - program runs with file's group, gid, effectively b> without execute permission - mandatory file locking for this file 6) sticky bit a> for directories - restricts deletion of files to owner of the directory, owner of the file, or super-users. b> for executables - if used, the operating system should attempt retain in main memory even if there are no active users c> other file types - no special meaning, treat as reserved. 3. Levels of permission a) Owner's permissions - read, write, and access or execute b) Group's permissions - read, write, and access or execute c) Other's permissions - read, write, and access or execute L. UNIX system administration 1. Administrative data a) Directories 1) /bin - directory for frequently used or system critical executables 2) /dev - directory for device special files 3) /dump - directory where dumps are kept. The file /dump/dump is where the kernel writes a panic dump. This file must be be preallocated the same size as main memory. Preallocation is required so that allocation is not needed during dumping. When the system panics, assure that a matching /unix copy is saved when you rename (mv) the /dump/dump file and create a /dump/dump. 3) /etc - directory for both system critical files and executables 4) /tmp - directory for user work files 5) /usr/adm - contains files and directories for logging and accounting data 6) /usr/bin - secondary directory for executables 7) /usr/include - files and directories describing system data areas and services in compilable form 8) /usr/lib - files and subdirectories of system data to support functions such as compiling and linking 9) /usr/mail - contains files of users' pending electronic mail 10) /usr/man - contains files and directories of manual pages 11) /usr/spool - contains directories used by various system spoolers, e.g. lp - printers, uucp - communications, news - Usenet distributed bulletin board 12) /usr/src - contains system source files 13) /usr/tmp - directory for system utilities' work files 14) /lost+found - one per file system, in its root directory the directory where fsck puts files which lost their names, i.e. the link count is non-zero in inode; but, no directory points to the inode. b) Files 1) /dump/dump - see /dump directory 2) /unix - executable of the kernel 3) /dev/null - when read returns end of file, when wrote throws the data away and return normal completion 4) /etc/devicelist - contains a desription of the I/O configuration, desired features, and sizes of system tables. 5) /etc/gettydefs - contains a description of each method of setting up a terminal. Each entry has a name and is a single line in the file. The program getty refers to the entries in this file by name. 5) /etc/group - names groups, supplies password, lists members 6) /etc/inittab - control file for process 1 - init 7) /etc/motd - broadcast message for users as they log on 8) /etc/passwd - names users, supplies password, uid, gid, full name, home directory, desired shell program 9) /etc/rc - shell to start up system functions for running multi-user 2. Checking and repairing file systems a) /etc/fsck - checks file system sanity and can make common repairs automatically b) /etc/fsdb - general purpose display and alter program for file systems 3. Backing up and restoring file systems using the method supplied with the UTS operating system. a) Its uses a version of INGRES database that does not require a signed separate license. b) The major programs of the backup and restore facility. 1) backup - does incremental and full file system backups and update the database 2) recover - accepts requests for later restoration (spooler) 3) restore - restores requested files 4) frest - used to restore an entire file system 4. Mounting and dismounting file systems a) mount - adds a file system on a block special device to the file tree on top of an existing directory, which is usually empty. b) umount - removes a file system on a block special file and exposes the underlaying directory. 5. System bring up and shutdown Discussed in "Getting started with the UNIX operating system and getting the UNIX operating system started" 6. System generation. a) config - a program which reads /etc/devicelist and generates the necessary sources and other files b) sysgen - a shell file which runs config and uses make to compile and link the necessary parts of the operating system, /unix, the new /dev directory, etc. c) newsys - a shell file which make final system changes to make a generated system effective at the next system boot (reIPL). 7. The systems administrator is responsible for managing the various system spooler and monitoring their various log files evidence of problems. 8. The systems administrator is responsible for monitoring system preformance. Depending on site policies, the system administrator is also responsible for taking action to relieve preformance problems. M. Structure of documentation 1. Sections of the manuals a) Section 1 - Application programs b) Section 1M - Administrative programs (sometimes Section 8) c) Section 2 - System calls d) Section 3 - Library calls e) Section 4 - System file structures f) Section 5 - Miscellenous notes, e.g. ASCII code table g) Section 6 - Games h) Section 7 - System I/O device control data structures 2. Sections of a manual page (which may go on for pages) a) NAME - the name of the service b) SYNOPSIS - the syntax of invocation 1) Boldface indicates literals - code as is 2) Italic indicates variables - subsitute your values 3) [ optional material ] - use as needed 4) ... - etc. 5) arguments starting with +, -, or = are frequently taken as command options and not file names, use ./file if necessary c) DESCRIPTION - the semantics concisely described d) FILES - systemm files and libraries used, omissions may occur e) DIAGNOSTICS - explanations of non-self-explanatory error messages f) BUGS - bugs or surprising limitations, work-arounds sometimes given g) CAVEATS - other limitations or warnings h) SEE ALSO - other manual pages of interest 3. The source code a) Header or include files in /usr/include, suffixed with .h Contains conditional compilation macros and compilable descriptions of system data structures. b) User level programs 1) Vendor specific in e.g. /usr/src/amdahl/cmd The "amdahl" may be replaced by other names. Most administrative programs' source can be found here. 2) UNIX utilities generally are in /usr/src/cmd. c) Kernel and file system level in e.g. /usr/src/uts/uts. (The second uts component is frequently the operating system name.) 1) subdirectory io - I/O components of the kernel 2) subdirectory os - system call and file system kernel components 3) subdirectory ml - interrupt level and machine language 4) subdirectory cf - system generation 5) subdirectory sys - header and include files for the kernel Normally /usr/include/sys and this subdirectory have the same contents 6) other subdirectories as needed - such as net for networking 4. Find yourself a wizard or guru The wizard may be someone at your site, in your company, a computer vendor person or someone who responses to your question in comp.unix.questions or comp.unix.wizards newsgroups. N. Bibliography (this list is dated; but still somewhat useful) The asterisked items are the should-get items. 1. Aho, Alfred V., Kernighan, Brian W. & Weinberger, Peter J., The AWK Programming Language, Addison-Wesley, 1988. -- a generally useful data reformating and reporting language 2. AT&T Bell Laboratories, Technical Journal, October 1984, Vol. 63, No. 8, Part 2, entitled "The UNIX System". -- the "offical" UNIX history, part 2 *3. Bach, Maurice J., The Design of the UNIX Operating System, Prentice-Hall, 1986. -- a structures and flow course for System V 4. Bell System, Technical Journal, July-August 1978, Vol. 57, No. 66, Part 2, entitled "UNIX Time-sharing system". -- the "offical" UNIX history, part 1 5. Bolsky, Morris I. & Korn, David G., The Kornshell: Command and Programming Language, Prentice-Hall, 1989. -- a detailed description of the facilities of the Korn shell 6. Bourne, S. R., The UNIX System, Addison-Wesley, 1983. -- a detailed description, somewhat dated 7. Department of Defense, Department of Defense Trusted Computer System Evaluation Criteria, DOD, December 1985 -- DOD 5200.28-STD, also known as the Orange Book 8. Fiedler, David & Hunter, Bruce H., UNIX System Administration, Hayden, 1986. -- this book explains the whys of system administration that the UTS -- manuals do not effectively cover. The UTS manuals do an effective -- job otherwise. 9. Goodheart, Berny & Cox, James, The Magic Garden Explained, Prentice-Hall, 1994, ISBN 0-13-098138-9 -- A discussion of System V Release 4 internals. 10. Johnson, Michael K. & Troan, Erik W., Linux Application Development, Addision-Wesley, 1998, ISBN 0-201-30821-5 11. Kernighan, Brian W. & Pike, Rob, The UNIX Programming Environment, Prentice-Hall, 1984. -- how to write an effective program in the UNIX programming environment *12. Kernighan, Brian W. & Ritchie, Dennis M., The C Programming Language, Prentice-Hall, 1978. -- available in two editions and are considered to be the language -- definition manuals for pre-ANSI and ANSI versions of the language -- At the present, the version for the UTS operating system is the first. 13. Lapin, J. E., Portable C and UNIX System Programming, Prentice-Hall, 1987. -- describes how to write programs and system components to make them -- easier to port from one version of the UNIX operating system to -- another including from one architecture to another 14. Leffler, Samuel J., McKusick, Marshall Kirk, Karels, Michael J., Quarterman, John S., The Design and Implementation of the 4.3BSD UNIX Operating System, Addison-Wesley, 1989. -- the counterpart for 4.3BSD version of the UNIX operating system of -- the Maurice Bach book, The Design of the UNIX Operating System. *15. Libes, Don & Ressler, Sandy, Life with UNIX: A Guide for Everyone, Prentice-Hall, 1989. -- the folklore and myths, gentle explanations, and Old-Boy-Network -- of the UNIX operating system and its users and maintainers 16. Lions, John, Lions' Commentary on UNIX 6th Edition with Source Code, Peer-to-Peer, 1977, 1996, ISBBN 1-57398-013-7 -- The `famous' Lions book. One often might have RTFS after RTFM. 17. O'Reilly & Associates, Nutshell Handbook series, O'Reilly & Associates, various dates. -- Some members of this useful series -- Learning the UNIX operating system -- Learning the VI Editor -- *Reading and Writing Termcap Entries -- Programming with Curses -- Managing Projects with Make -- *Managing UUCP & Usenet -- Using UUCP & Usenet *18. Rochkind, Marc J., Advanced UNIX Programming, Prentice-Hall, 1985. -- a how-to-do-the-tricky-stuff book 19. Weinberg, Gerald M., The Psychology of Computer Programming, Van Nostrand Reinhold, 1971 -- As the UNIX operating system is an operating system written by -- for prgrammers, it is useful to learn how programmers think and work *20. Wood, Patrick H. & Kochan, Stephan G., UNIX System Security, Hayden, 1985. -- how to keep the hackers, worms and viri out of your system without -- locking out your users too.