Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AIX fixes #700

Open
wants to merge 7 commits into
base: master
Choose a base branch
from
Open

AIX fixes #700

wants to merge 7 commits into from

Conversation

bhuntsman
Copy link
Contributor

These commits address several issues on AIX:

  • Fix the build on newer versions of AIX
  • Clean up the errors displayed by 9c/9l
  • Update INSTALL to provide an experience closer to what is offered on other platforms.
  • Fix the build of fontsrv on AIX
  • Prevent rio from being built on AIX as it is permanently incompatible
  • Fix an issue with the post9pservice call which causes some programs to crash on AIX 7.3

Unfortunately, this introduces a requirement to install gawk on AIX, however doing so is fairly straightforward.

When running the install script on AIX, the build pipes most of the
output through dist/isum.awk.  This AWK script does not display any
output on AIX using the default awk located in /usr/bin/awk.  On
the other platforms, this script displays the current item being
compiled on a single line and updates that line as the build
progresses.  Using GNU AWK, which is available from the IBM AIX
Toolbox for Open Source Software, allows the script to run
correctly.  This makes for a prettier build at the cost of
requiring gawk on AIX.  During the run of INSTALL,
/opt/freeware/bin must also be in the PATH environment variable.
The INSTALL script detects if the fontsrv prerequsites are installed
by compiling a test program.  However, this does not work on AIX
even with freetype installed from the IBM Toolbox for Open Source
Software.  This does not work for two reasons.  First, the IBM-
supplied package is installed into /opt/freeware, and therefore
we need to include the following directories:
/opt/freeware/include
/opt/freeware/include/freetype2

Secondly, the INSTALL script uses cc to compile the script, and
this does not set up enough PLAN9 environment for the compiler
to find the right headers on AIX.  Therefore, we use 9c which
handles all of the AIX compiler envionment setup, and should also
work for the other platforms.
The INSTALL script typically prints the version of the compiler in use
during a build, and does so by running "9c -v".  Ultimately the -v
argument gets passed to the actual compiler.  However, the AIX compiler
(XL C) does not use -v to display the compiler version, but instead
uses the argument -qversion.  Therefore, we add -qversion to the 9c
invocation and it does not matter if this throws an error in addition
to printing the compiler version, as the subsequent grep call will
exclude any errors.  This should allow the script to print the
compiler version on any platform.
The fonsrv program can build on AIX so long as the prerequsites are
installed.  These include freetype2 and fontconfig, both of which
are available from the IBM Toolbox for Open Source Software.
However, these get installed into /opt/freeware.  Update the
freetyperules.sh script to look for the headers and libraries in
/opt/freeware on AIX.
When attempting to build rio on AIX, the following compiler error occurs:

9c -DDEBUG -DSHAPE -DDEBUG_EV -DDEBUG event.c
event.c:9.10: 1506-296 (S) #include file <X11/extensions/shape.h> not found.
event.c:32.59: 1506-046 (S) Syntax error.
event.c:32.60: 1506-068 (S) Operation between types "void" and "union _XEvent" is not allowed.
event.c:473.25: 1506-277 (S) Syntax error: possible missing ')' or ','?
event.c:478.23: 1506-045 (S) Undeclared identifier e.
mk: 9c -DDEBUG -DSHAPE ...  : exit status=exit(1)
mk: for i in ...  : exit status=exit(1)

This is because IBM does not supply X11/extensions/shape.h in any LPP.
Unfortunately, this means that rio is permanently unusable on AIX,
without significant modification.  The easiest path forward is to add
rio to the BUGGERED list in src/cmd/mkfile.  However, rio works on most
of the other platforms.  Therefore, add an extra test and add rio to the
BUGGERED list only if we are performing the build on AIX.
Suppress additional compiler and linker warnings on AIX.
On newer versions of AIX (at least 7.3, possibly 7.2), when a program calls
post9pservice, the call fails due to a timing problem just after the
call to waitfor.  Adding this short delay allows the function to succeed
and the program works correctly on AIX afterwards.

Fixes 9fans#699
@@ -115,8 +118,8 @@ fi
if [ `uname` != Darwin ]; then
# Determine whether fontsrv X11 files are available.
rm -f a.out
cc -o a.out -c -Iinclude -I/usr/include -I/usr/local/include -I/usr/include/freetype2 -I/usr/local/include/freetype2 \
-I/usr/X11R7/include -I/usr/X11R7/include/freetype2 \
9c -o a.out -c -Iinclude -I/usr/include -I/usr/local/include -I/usr/include/freetype2 -I/usr/local/include/freetype2 \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, I don't know about using 9c here; perhaps just keep this cc?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed it to 9c because 9c sets up more plan9port environment, much of which we need on AIX. With this just using cc, this test fails with the following error messages:

 cc -o a.out -c -Iinclude -I/usr/include -I/usr/local/include -I/usr/include/freetype2 -I/usr/local/include/freetype2 -I/usr/X11R7/include -I/usr/X11R7/include/freetype2 -I/opt/freeware/include -I/o>
"include/u.h", line 1.1: 1506-046 (S) Syntax error.
"include/u.h", line 1.4: 1506-166 (S) Definition of function See requires parentheses.
"/usr/include/sys/types.h", line 345.3: 1506-273 (E) Missing type in declaration of sigset_t.
"/usr/include/sys/types.h", line 396.9: 1506-046 (S) Syntax error.
"/usr/include/sys/types.h", line 484.9: 1506-166 (S) Definition of function uint64_t requires parentheses.
"/usr/include/sys/types.h", line 484.18: 1506-276 (S) Syntax error: possible missing '{'?
"/usr/include/unistd.h", line 909.43: 1506-282 (S) The type of the parameters must be specified in a prototype.
"/usr/include/unistd.h", line 915.56: 1506-046 (S) Syntax error.
"/usr/include/unistd.h", line 916.59: 1506-275 (S) Unexpected text off_t encountered.
"/usr/include/unistd.h", line 920.64: 1506-275 (S) Unexpected text size_t encountered.
"/usr/include/unistd.h", line 922.16: 1506-166 (S) Definition of function pid_t requires parentheses.
"/usr/include/unistd.h", line 922.42: 1506-276 (S) Syntax error: possible missing '{'?
"/usr/include/stdlib.h", line 114.3: 1506-273 (E) Missing type in declaration of div_t.
"/usr/include/stdlib.h", line 126.8: 1506-166 (S) Definition of function size_t requires parentheses.
"/usr/include/stdlib.h", line 126.15: 1506-276 (S) Syntax error: possible missing '{'?
"/usr/include/sys/signal.h", line 275.9: 1506-046 (S) Syntax error.
"/usr/include/sys/signal.h", line 309.15: 1506-007 (S) "union sigval" is undefined.
"/usr/include/sys/signal.h", line 347.9: 1506-046 (S) Syntax error.
"/usr/include/sys/vm_types.h", line 34.9: 1506-166 (S) Definition of function ulong_t requires parentheses.
"/usr/include/sys/vm_types.h", line 34.25: 1506-276 (S) Syntax error: possible missing '{'?
"/usr/include/sys/vm_types.h", line 148.9: 1506-045 (S) Undeclared identifier vmhandle_t.
"/usr/include/sys/vm_types.h", line 149.9: 1506-045 (S) Undeclared identifier caddr_t.
"/usr/include/sys/vm_types.h", line 151.3: 1506-273 (E) Missing type in declaration of vmaddr_t.
"/usr/include/sys/vm_types.h", line 164.9: 1506-046 (S) Syntax error.
"/usr/include/sys/vm_types.h", line 258.9: 1506-166 (S) Definition of function vmhandle_t requires parentheses.
"/usr/include/sys/vm_types.h", line 258.25: 1506-276 (S) Syntax error: possible missing '{'?
"/usr/include/sys/m_types.h", line 51.3: 1506-273 (E) Missing type in declaration of label_t.
"/usr/include/sys/m_types.h", line 53.9: 1506-166 (S) Definition of function int32long64_t requires parentheses.
"/usr/include/sys/m_types.h", line 53.25: 1506-276 (S) Syntax error: possible missing '{'?
"/usr/include/sys/mstsave.h", line 96.26: 1506-045 (S) Undeclared identifier __kjmpbuf.
"/usr/include/sys/mstsave.h", line 101.25: 1506-275 (S) Unexpected text __curid encountered.
"/usr/include/sys/mstsave.h", line 101.9: 1506-045 (S) Undeclared identifier pid_t.
"include/libc.h", line 1.57: 1506-209 (S) Character constants must end before the end of a line.
"include/libc.h", line 1.34: 1506-076 (W) Character constant 's /sys/include/libc.h.' has more than 4 characters. No more than rightmost 4 characters are used.
"src/cmd/fontsrv/a.h", line 15.89: 1506-209 (S) Character constants must end before the end of a line.
"src/cmd/fontsrv/a.h", line 15.65: 1506-076 (W) Character constant 's lo rune / SubfontSize' has more than 4 characters. No more than rightmost 4 characters are used.

We'd need to add several additional changes here to keep it as cc. I can investigate that if that is the preference.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh dear. Ok, I guess that's fine, then.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, we could add a quick uname test and only use 9c on AIX and leave it as-is for the others...

@@ -5,7 +5,9 @@ TARG=`ls *.[cy] *.lx | egrep -v "\.tab\.c$|^x\." | sed 's/\.[cy]//; s/\.lx//'`
<$PLAN9/src/mkmany

BUGGERED='CVS|faces|factotum|lp|ip|mailfs|upas|vncv|mnihongo|mpm|index|u9fs|secstore|smugfs|snarfer'
DIRS=lex `ls -l |sed -n 's/^d.* //p' |egrep -v "^($BUGGERED)$"|egrep -v '^lex$'` $FONTSRV
UNAME=`uname`
AIXBUGGERED=`if [ "$UNAME" == "AIX" ]; then echo rio; fi`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not append to BUGGERED if on AIX?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is the effect here, but indirectly. I suppose I could add AIXBUGGERED to BUGGERED if you would prefer to see it done that way.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think adding to BUGGERED would be better.

@@ -41,6 +41,7 @@ post9pservice(int fd, char *name, char *mtpt)
}
close(fd);
w = waitfor(pid);
sleep(10);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know that you know this, but this is clearly the wrong fix. I recognize that this probably masks the symptoms of the underlying bug enough that things work on AIX with this in place, but I'm very hesitant to commit a bandaid like this into the repo.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed. This is one of those things that is going to be monstrously difficult to track down. Single-stepping through the code under the debugger adds enough of a delay that it works correctly while debugging it, so figuring out what is going wrong is pretty tough. Furthermore, if I compile the unpatched code on AIX 7.1, it works there but then when I move the binaries over to AIX 7.3, we also hit this issue. I'd welcome suggestions for how to proceed...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe, maybe not.

Looking at what's going on, it appears that, without the sleep, waitfor is returning a bogus, nil, value. Drilling down into waitfor, we see that it delegates to _wait and the first thing that does is see if n is <= 0 and return nil if it is. The other cases where it returns nil are if it can't tokenize whatever is in buf, and if malloc fails. Let's assume malloc isn't failing; then if tokenize failed, we should see the "couldn't parse wait message" error somewhere and we don't, so it stands to reason that n <= 0 is probably true. n in this case is the return value from awaitfor.

awaitfor delegates to _await, which in turn calls wait3 or wait4, depending on whether pid4 is -1. There's a comment there that is curiously incomplete; "On Linux, pid==-1 means anyone; on SunOS it's pid==0." That's nice, but there's no special case for SunOS; perhaps the comment is simply a vestige of something that's changed over time. But it's probably irrelevant here; we know that the pid we get here is the PID of a child process created by fork() in post9pservice, and the error case there was already dealt with. Therefore, we know that _await is calling wait4 in this instance. The the conclusion here is that this code is failing because _await is returning 0 or -1.

How that can that happen? There are three returns from this function, in three separate cases. Two of those are strlen, which can't return -1, and can only return 0 if passed an empty string. But in this case, I don't see how that can happen: the string in question is the result of a copying a formatted string into a buffer, and we can see statically that that buffer isn't empty. This suggests that _await is returning -1 since due to the check against the return value of wait3 or wait4. Since we know that we're calling wait4 here, we can deduce that this function is returning -1 because wait4 is returning 0 or something negative; I'm not a betting man, but if I were, my would be on -1.

I'll bet the, "Restart the system call" message mentioned in #699 is from actually from wait4; some random Google searches suggest that that's the strerror string for "ERESTART" on AIX. I don't have access to AIX (and haven't for decades); does the man page for wait4 mention anything about ERESTART?

Something you may try, instead of single-stepping in the debugger, is seeing if you can put a breakpoint on the return -1 on line 99 of await.c and see if you hit it without the sleep(10) and without single stepping.

@@ -136,7 +139,7 @@ if [ -f LOCAL.config ]; then
fi

echo "* Compiler version:"
9c -v 2>&1 | grep -v 'Configured with:' | grep -i version | sed 's/^/ /'
9c -v -qversion 2>&1 | grep -v 'Configured with:' | grep -i version | sed 's/^/ /'
Copy link
Collaborator

@dancrossnyc dancrossnyc Feb 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-qversion is presumably specific to xlc; with e.g. clang, the output here makes sense, but isn't what I'd normally expect to see:

* Compiler version:
        clang: error: unknown argument '-qversion'; did you mean '--version'?
        OpenBSD clang version 16.0.6

(Note the error message that sneaks in.)

Perhaps a way around this is add an extra variable in the large case OS statement above that adds this specifically for AIX. Something like:

CCVERSFLAG=''
case `uname` in
# ...
AIX)
    # ...
    CCVERSFLAG=-qversion
    ;;
esac

9c -v $CCVERSFLAG 2>&1 | grep ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants