Softpanorama
(slightly skeptical) Open Source Software Educational Society

May the source be with you, but remember the KISS principle ;-)

Google   


Solaris Internals

News

See also

Recommended Books

Recommended Links Papers, ebooks  tutorials

Man pages

Reference  

FAQs

 Unix Internals Filesystems Unix System Calls Solaris vs Linux Whitepapers History Tips Humor Etc

 

Notes:
  • This is a Spartan WHYFF (We Help You For Free) site written by people for whom English is not a native language. Some amount of grammar and spelling errors should be expected.
  • The site contain some broken links as it develops like a living tree... Please try to use Google, Open directory, etc. to find a replacement link (see HOWTO search the WEB for details). We would appreciate if you can mail us a correct link.
Google Search
Open directory

Research Index

Old News ;-)

Adam Leventhal's Weblog

To give an example of using DTrace on Linux applications, I needed an application to examine. I wanted a well known program that either didn't run on Solaris or operated sufficiently differently such examining the Linux version rather than the Solaris port made sense. I decided on /usr/bin/top partly because of the dramatic differences between how it operates on Linux vs. Solaris (due to the differences in /proc), but mostly because of what I've heard my colleague, Bryan, refer to as the "top problem": your system is slow, so you run top. What's the top process? Top!

Running top in the Linux branded zone, I opened a shell in the global (Solaris) zone to use DTrace. I started as I do on Solaris applications: I looked at system calls. I was interested to see which system calls were being executed most frequently which is easily expressed in DTrace:

bash-3.00# dtrace -n lx-syscall:::entry'/execname == "top"/{ @[probefunc] = count(); }'
dtrace: description 'lx-syscall:::entry' matched 272 probes
^C

 

[Jan 22, 2006]  Supporting Multiple Page Sizes in the Solaris Operating System

Beginning with the Solaris 9 OS, multiple page sizes can be supported on UltraSPARC processors so administrators can optimize performance by changing the page size on behalf of an application. Typical performance measurement tools do not provide sufficient detail for evaluating the impact of page size and do not provide the needed support to make optimal page size choices.

This article explains how to use new tools to determine the potential performance gain. In addition, it explains how to configure larger page sizes using the multiple page size support (MPSS) feature of the Solaris 9 OS. The article addresses the following topics:

[Jan 16, 2006] Solaris 8 Memory Architecture

A common question pre-Solaris 8 users ask is "Where has all my memory gone"? The vmstat command, used to report virtual memory statistics, often reports that free memory (measured in Kbytes in the free column) is zero or close to zero on a pre-Solaris 8 system that has been up and running for a while.

Most likely, memory is being used to cache file system data, since the virtual memory system is shared by applications, data, the kernel, and file system data. By default, any free memory is used to cache data read from or written to the file system (including NFS). The size of the file system cache is dynamic -- it grows or shrinks depending on free memory.

The idea of this memory allocation scheme is to simultaneously enhance file system performance and optimize the use of an important system resource -- virtual memory. The two computing tasks of running applications and reading and writing data compete equally for system memory.

Generally, sharing a pool of memory is not an issue on small memory systems with low compute power, but with today's powerful desktop systems and servers, the file system cache can overwhelm the memory pool and make application performance suffer. Another drawback is that file system performance is tied to how quickly the virtual memory system can free memory.

Even worse, it is difficult to measure memory usage amongst the consumers of memory on the system. The vmstat command is often the first tool users run to examine virtual memory usage, but pre-Solaris 8 versions do a poor job of indicating why a system is paging (running an algorithm that moves data out from physical memory to disk, and back into physical memory from disk).

So, the question becomes: is it because the system is caching file system data, or is it because memory is a bottleneck and the system is struggling to keep up?

[Mar 25, 2005] http://www.sun.com/bigadmin/features/articles/selfheal.html

Traditionally, when a hardware or software fault occurred on a Solaris system, a message would usually be logged to the appropriate device specified in /etc/syslog.conf, and the rest of the diagnosis and repair was left to the administrator. Predictive Self-Healing technology is introduced in the Solaris 10 OS, which is available for preview through the Software Express for Solaris program.

Predictive Self-Healing is a newly designed cohesive architecture and methodology for automatically diagnosing, reporting, and handling software and hardware fault conditions.

This new technology lessens the time required to debug a hardware or software problem and provides the administrator and Sun Technical Support with detailed data about each fault. The architecture consists of an event management protocol, the fault manager, and the software fault-handling software, the Solaris Service Manager.

[Mar 17, 2005] An interesting option in telnetd for Solaris 10

It looks like it now provides a simple "not in DNS, no access" defense via option -U:

-U Refuses connections that cannot be mapped to a name through the getnameinfo(3SOCKET) function.

Anatomy of a Read and Write Call - 21k By Pat Shuff Linux Journal 2002-09-20 23:00

We look at three different tactics for optimizing read and write performance under Linux.

A few years ago I was tasked with making the Spec96 benchmark suite produce the fastest numbers possible using the Solaris Intel operating system and Compaq Proliant servers. We were given all the resources that Sun Microsystems and Compaq Computer Corporation could muster to help take both companies to the next level in Unix computing on the Intel architecture. Sun had just announced its flagship operating system on the Intel platform and Compaq was in a heated race with Dell for the best departmental servers. Unixware and SCO were the primary challengers since Windows NT 3.5 was not very stable at the time and no one had ever heard of an upstart graduate student from overseas who thought that he could build a kernel that rivaled those of multi-billion dollar corporations.

Now many years later, Linux has gained considerable market share and is the De facto Unix for all the major hardware manufacturers on the Intel architecture. In this article, I will attempt to take the lessons learned from this tuning exercise and show how they can be applied to the Linux operating system.

As it turned out, the gcc benchmark was the one that everyone seemed to be improving on the most. As we analyzed what the benchmark was doing, we found out that basically it opened a file, read its contents, created a new file, wrote new contents, then closed both files. It did this over and over and over. File operations proved to be the bottleneck in performance. We tried faster processors with insignificant improvement. We tried processors with huge (at the time) level 1 and level 2 cache and still found no significant improvement. We tried using a gigabyte of memory and found little or no improvement. By using the vmstat command, we found that the processor was relatively idle, little memory was being used, but we were getting a significant amount of reads and writes to the root disk. Using the same hardware and same test programs, Unixware was 25% faster than Solaris Intel. Initially, we decided that Solaris was just really slow. Unfortunately, I was working for Sun at the time and this was not the answer that we could take to my management. We had to figure out why it was slow and make recommendations on how to improve the performance. The target was 25% faster than Unixware, not slower.

The first thing that we did was to look at the configurations. It turns out that the two systems were identical hardware,. We just booted a different disk to boot the other operating system. The Unixware system was configured with /tmp as a tmpfs whereas the Solaris system had /tmp on the root file system. We changed the Solaris configuration to use tmpfs but it did not significantly improve performance. Later, we found that this was due to a bug in the tmpfs implementation on Solaris Intel. By braking down the file operation, we decided to focus on three areas; the libc interface, the node/dentry layer, and the device drivers managing the disk. In this article, we will look at the three different layers and talk about how to improve performance and how they specifically apply to Linux.

LISA 2001 Paper LISA 2001 Paper about RUF

This paper describes a utility named ruf that reads files from an unmounted file system. The files are accessed by reading disk structures directly so the program is peculiar to the specific file system employed. The current implementation supports the *BSD FFS, SunOS/Solaris UFS, HP-UX HFS, and Linux ext2fs file systems. All these file systems derive from the original FFS, but have peculiar differences in their specific implementations.

The utility can read files from a damaged file system. Since the utility attempts to read only those structures it requires, damaged areas of the disk can be avoided. Files can be accessed by their inode number alone, bypassing damage to structures above it in the directory hierarchy.

The functions of the utility is available in a library named libruf. The utility and library is available under the BSD license.

Introduction

There are many important reasons for being able to access unmounted file systems, the prime example being a damaged disk. This paper describes a utility that can be used to read a disk file without mounting the file system. The utility behaves similar to the regular cat utility, and was originally named dog, but was renamed to ruf for reading unmounted filesystems to avoid a name conflict with an older utility.

In order to access an unmounted file system, the utility must read the disk structures directly and perform all the tasks normally performed by the operating system; this requires a detailed understanding of how the file system is implemented. Implementing this utility for a particular file system is an interesting academic exercise and a good way to learn about the file system. The original work on this utility was in fact done in Evi Nemeth's system administration class.

Getting to know the Solaris filesystem, Part 1 - SunWorld - May 1999

Richard starts this journey into the Solaris filesystem by looking at the fundamental reasons for needing a filesystem and at the functionality various filesystems provide. In this first part of the series, you'll examine the evolution of the Solaris filesystem framework, moving into a study of major filesystem features. You'll focus on filesystems that store data on physical storage devices -- commonly called regular or on-disk filesystems. In future articles, you'll begin to explore the performance characteristics of each filesystem, and how to configure filesystems to provide the required levels of functionality and performance. Richard will also delve into the interaction between Solaris filesystems and the Solaris virtual memory system, and how it all affects performance.

Getting to know the Solaris filesystem, Part 3 - SunWorld - July ...

One of the most important features of a filesystem is its ability to cache file data. Ironically, however, the filesystem cache isn't implemented in the filesystem. In Solaris, the filesystem cache is implemented in the virtual memory system. In Part 3 of this series on the Solaris filesystem, Richard explains how Solaris file caching works and explores the interactions between the filesystem cache and the virtual memory system.

www.solarisinternals.com/si/reading/Getting to know the Solaris filesystem, Part 1 - SunWorld - May 1999


[Mar 7, 2005] CacheKit is a collection of freeware perl and shell programs to report on cache activity on a Solaris 8 SPARC server. Tools for older Solaris and Solaris x86 are also included in the kit, as well as some SE Toolkit programs and extra Solaris 10 DTrace programs. The caches the kit reports on are: I$, D$, E$, DNLC, inode cache, ufs buffer cache, segmap cache and segvn cache. This kit assists performance tuning.

download version 0.91, 05-Sep-2004


These programs have been written for a Solaris 8 (or newer) sparc server. Also included in the kit are programs for older Solaris, Solaris x86, and Solaris 10.

 

[Mar 7, 2005] docs.sun.com: Solaris Tunable Parameters Reference Manual

 

The Process Model of Linux Application Development

One of Unix's hallmarks is its process model. It is the key to understanding access rights, the relationships among open files, signals, job control, and most other low-level topics in this book. Linux adopted most of Unix's process model and added new ideas of its own to allow a truly lightweight threads implementation.

10.1 Defining a Process

What exactly is a process? In the original Unix implementations, a process was any executing program. For each program, the kernel kept track of

A process was also the basic scheduling unit for the operating system. Only processes were allowed to run on the CPU.

10.1.1 Complicating Things with Threads

Although the definition of a process may seem obvious, the concept of threads makes all of this less clear-cut. A thread allows a single program to run in multiple places at the same time. All the threads created (or spun off) by a single program share most of the characteristics that differentiate processes from each other. For example, multiple threads that originate from the same program share information on open files, credentials, current directory, and memory image. As soon as one of the threads modifies a global variable, all the threads see the new value rather than the old one.

Many Unix implementations (including AT&T's canonical System V release) were redesigned to make threads the fundamental scheduling unit for the kernel, and a process became a collection of threads that shared resources. As so many resources were shared among threads, the kernel could switch between threads in the same process more quickly than it could perform a full context switch between processes. This resulted in most Unix kernels having a two-tiered process model that differentiates between threads and processes.

10.1.2 The Linux Approach

Linux took another route, however. Linux context switches had always been extremely fast (on the same order of magnitude as the new "thread switches" introduced in the two-tiered approach), suggesting to the kernel developers that rather than change the scheduling approach Linux uses, they should allow processes to share resources more liberally.

Under Linux, a process is defined solely as a scheduling entity and the only thing unique to a process is its current execution context. It does not imply anything about shared resources, because a process creating a new child process has full control over which resources the two processes share (see the clone() system call described on page 153 for details on this). This model allows the traditional Unix process management approach to be retained while allowing a traditional thread interface to be built outside the kernel.

Luckily, the differences between the Linux process model and the two-tiered approach surface only rarely. In this book, we use the term process to refer to a set of (normally one) scheduling entities which share fundamental resources, and a thread is each of those individual scheduling entities. When a process consists of a single thread, we often use the terms interchangeably. To keep things simple, most of this chapter ignores threads completely. Toward the end, we discuss the clone() system call, which is used to create threads (and can also create normal processes).

Solaris Internals

This site provides information supporting the Solaris Internals book published by Jim Mauro and Richard McDougall. Our aim is to provide links to pertinent reference material and tools discussed in the book, plus any new and relevant information about the Solaris operating system since publication.

We hope you find this site useful - we have provided contact information for any questions you may have. We also welcome and encourage feedback.

10/21/2004 - mdb's ::memstat ported to Solaris 8!

 

Recommended Links


Copyright © 1996-2008 by Dr. Nikolai Bezroukov. www.softpanorama.org was created as a service to the UN Sustainable Development Networking Programme (SDNP) in the author free time. Submit comments This document is an industrial compilation designed and created exclusively for educational use and is placed under the copyright of the Open Content License(OPL). Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine.

Standard disclaimer: The statements, views and opinions presented on this web page are those of the author and are not endorsed by, nor do they necessarily reflect, the opinions of the author present and former employers, SDNP or any other organization the author may be associated with. We do not warrant the correctness of the information provided or its fitness for any purpose.

Last modified: November 08, 2008