core-extra/wiki/Troubleshooting.wiki
2009-11-19 19:17:08 +00:00

51 lines
No EOL
3 KiB
Text

#summary Troubleshooting
== General ==
* *debugging commands:*
* `/var/log/cored.log` - CORE daemon log file may contain error messages
* `/var/log/coreexecd.log` - CORE execution daemon log file may indicate failed jobs, those commands exiting with a non-zero status
== Linux OpenVZ version ==
* *debugging commands:*
* `vzlist` - lists all running containers
* `brctl show` - lists all bridge devices, see if container veth devices have joined the correct bridges
* `ebtables -L` - for troubleshooting wireless connectivity, there should be two entries per wireless link
* *Issue:* I start Core and place a host into the GUI and press Start. I get the following error for just doing that
{{{
can not find channel named "-1"
can not find channel named "-1"
}}}
also, my console may display this error:
{{{
Connecting to 127.0.0.1:4038...
Failed to open API channel to 127.0.0.1:4038: couldn't open socket: connection refused
}}}
* *Resolution:* This suggests that cored was not running or is hung. Try "sudo killall cored", and if no process was killed, check also that there is not a /var/run/cored.pid file lying around. Try restarting cored in verbose mode: "sudo /usr/local/sbin cored -v" and if that works, restart it in daemon mode such as: "sudo /usr/local/sbin/cored -d". Check the log file in /var/log/cored.log if all else fails.
* *Issue:* when starting multiple quagga routers, only some come up into operational state.
* *Resolution:* This is flakiness with the OpenVZ version of CORE. I have noticed that sometimes the containers start without interface lo in an "UP" state, and that zebra sometimes does not start successfully. If you find a node in this state, here are some suggested resolution steps:
{{{
killall zebra ospfd ospf6d
ifconfig lo up
zebra -d
ospf -d
ospf6d -d
vtysh -b
}}}
* *Tested OpenVZ kernel versions:*
* 2.6.18-128.1.1.el5.028stab062.3 - sometimes IPv6 addresses are not set properly; during shutdown some containers cause this message `unregister_netdevice: waiting for lo=e319a800 to become free. Usage count = 8 ve=1000 unregister_netdevice: device e319a800 marked to leak free_netdev: device lo=e319a800 leaked`
* 2.6.18-128.2.1.el5.028stab064.4 - appears to be a bad kernel: testing 5 random wireless nodes and pressing start causes system to hang
* 2.6.18-128.2.1.el5.028stab064.7 - good kernel, fixes previous issues
== VMware version of OpenVz CORE ==
CentOS has a known clock skew issue when running as a Linux guest on VMware server. The symptoms are that the guest clock can run erratically compared to the outside world wall clock. To remedy this, pass the following command-line parameters to the OpenVz kernel configuration at boot time (i.e., within /boot/grub/menu.lst):
`divider=10 clock_source=acpi_pm`
== FreeBSD version ==
* *debugging commands:*
* `vimage -l` - lists all running vimages
* `ngctl list` - list all Netgraph nodes
* `ngctl show ...` - detailed information about a Netgraph node