From: Alan Stern on 27 Jun 2010 12:00 On Sat, 26 Jun 2010, Rafael J. Wysocki wrote: > +void pm_relax(void) > +{ > + unsigned long flags; > + > + spin_lock_irqsave(&events_lock, flags); > + if (events_in_progress) { > + event_count++; > + if (!--events_in_progress) > + wake_up_all(&events_wait_queue); > + } > + spin_unlock_irqrestore(&events_lock, flags); > +} > +bool pm_get_wakeup_count(unsigned long *count) > +{ > + bool ret; > + > + spin_lock_irq(&events_lock); > + if (capable(CAP_SYS_ADMIN)) > + events_check_enabled = false; > + > + if (events_in_progress) { > + DEFINE_WAIT(wait); > + > + do { > + prepare_to_wait(&events_wait_queue, &wait, > + TASK_INTERRUPTIBLE); > + if (!events_in_progress) > + break; > + spin_unlock_irq(&events_lock); > + > + schedule(); > + > + spin_lock_irq(&events_lock); > + } while (!signal_pending(current)); > + finish_wait(&events_wait_queue, &wait); > + } > + *count = event_count; > + ret = !events_in_progress; > + spin_unlock_irq(&events_lock); > + return ret; > +} Here's a thought. Presumably pm_relax() will end up getting called a lot more often than pm_get_wakeup_count(). Instead of using a wait queue, you could make pm_get_wakeup_count() poll at 100-ms intervals. The total overhead would be smaller. Here's another thought. If event_count and events_in_progress were atomic_t then the new spinlock wouldn't be needed at all. (But you would need an appropriate pair of memory barriers, to guarantee that when a writer decrements events_in_progress to 0 and increments event_count, a reader won't see events_in_progress == 0 without also seeing the incremented event_count.) Overall, this may not be a significant improvement. Alan Stern -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: mark gross on 27 Jun 2010 18:30 Looks good to me! --mgross On Sat, Jun 26, 2010 at 03:14:13PM +0200, Rafael J. Wysocki wrote: > From: Rafael J. Wysocki <rjw(a)sisk.pl> > > One of the arguments during the suspend blockers discussion was that > the mainline kernel didn't contain any mechanisms making it possible > to avoid losing wakeup events during system suspend. > > Generally, there are two problems in that area. First, if a wakeup > event occurs exactly when /sys/power/state is being written to, it > may be delivered to user space right before the freezer kicks in, so > the user space consumer of the event may not be able to process it > before the system is suspended. Second, if a wakeup event occurs > after user space has been frozen, it is not generally guaranteed that > the ongoing transition of the system into a sleep state will be > aborted. > > To address these issues introduce a new global sysfs attribute, > /sys/power/wakeup_count, associated with a running counter of wakeup > events and three helper functions, pm_stay_awake(), pm_relax(), and > pm_wakeup_event(), that may be used by kernel subsystems to control > the behavior of this attribute and to request the PM core to abort > system transitions into a sleep state already in progress. > > The /sys/power/wakeup_count file may be read from or written to by > user space. Reads will always succeed (unless interrupted by a > signal) and return the current value of the wakeup events counter. > Writes, however, will only succeed if the written number is equal to > the current value of the wakeup events counter. If a write is > successful, it will cause the kernel to save the current value of the > wakeup events counter and to abort the subsequent system transition > into a sleep state if any wakeup events are reported after the write > has returned. > > [The assumption is that before writing to /sys/power/state user space > will first read from /sys/power/wakeup_count. Next, user space > consumers of wakeup events will have a chance to acknowledge or > veto the upcoming system transition to a sleep state. Finally, if > the transition is allowed to proceed, /sys/power/wakeup_count will > be written to and if that succeeds, /sys/power/state will be written > to as well. Still, if any wakeup events are reported to the PM core > by kernel subsystems after that point, the transition will be > aborted.] > > Additionally, put a wakeup events counter into struct dev_pm_info and > make these per-device wakeup event counters available via sysfs, > so that it's possible to check the activity of various wakeup event > sources within the kernel. > > To illustrate how subsystems can use pm_wakeup_event(), make the > low-level PCI runtime PM wakeup-handling code use it. > > Signed-off-by: Rafael J. Wysocki <rjw(a)sisk.pl> > --- > Documentation/ABI/testing/sysfs-power | 14 + > drivers/base/power/Makefile | 2 > drivers/base/power/main.c | 1 > drivers/base/power/sysfs.c | 11 + > drivers/base/power/wakeup.c | 240 ++++++++++++++++++++++++++++++++++ > drivers/pci/pci-acpi.c | 1 > drivers/pci/pci.c | 20 ++ > drivers/pci/pci.h | 1 > drivers/pci/pcie/pme/pcie_pme.c | 5 > include/linux/pm.h | 10 + > include/linux/suspend.h | 7 > kernel/power/hibernate.c | 20 +- > kernel/power/main.c | 53 +++++++ > kernel/power/suspend.c | 4 > 14 files changed, 379 insertions(+), 10 deletions(-) > > Index: linux-2.6/kernel/power/main.c > =================================================================== > --- linux-2.6.orig/kernel/power/main.c > +++ linux-2.6/kernel/power/main.c > @@ -204,6 +204,58 @@ static ssize_t state_store(struct kobjec > > power_attr(state); > > +/* > + * The 'wakeup_count' attribute, along with the functions defined in > + * drivers/base/power/wakeup.c, provides a means by which wakeup events can be > + * handled in a non-racy way. > + * > + * If a wakeup event occurs when the system is in a sleep state, it simply is > + * woken up. In turn, if an event that would wake the system up from a sleep > + * state occurs when it is undergoing a transition to that sleep state, the > + * transition should be aborted. Moreover, if such an event occurs when the > + * system is in the working state, an attempt to start a transition to the > + * given sleep state should fail during certain period after the detection of > + * the event. Using the 'state' attribute alone is not sufficient to satisfy > + * these requirements, because a wakeup event may occur exactly when 'state' > + * is being written to and may be delivered to user space right before it is > + * frozen, so the event will remain only partially processed until the system is > + * woken up by another event. In particular, it won't cause the transition to > + * a sleep state to be aborted. > + * > + * This difficulty may be overcome if user space uses 'wakeup_count' before > + * writing to 'state'. It first should read from 'wakeup_count' and store > + * the read value. Then, after carrying out its own preparations for the system > + * transition to a sleep state, it should write the stored value to > + * 'wakeup_count'. If that fails, at least one wakeup event has occured since > + * 'wakeup_count' was read and 'state' should not be written to. Otherwise, it > + * is allowed to write to 'state', but the transition will be aborted if there > + * are any wakeup events detected after 'wakeup_count' was written to. > + */ > + > +static ssize_t wakeup_count_show(struct kobject *kobj, > + struct kobj_attribute *attr, > + char *buf) > +{ > + unsigned long val; > + > + return pm_get_wakeup_count(&val) ? sprintf(buf, "%lu\n", val) : -EINTR; > +} > + > +static ssize_t wakeup_count_store(struct kobject *kobj, > + struct kobj_attribute *attr, > + const char *buf, size_t n) > +{ > + unsigned long val; > + > + if (sscanf(buf, "%lu", &val) == 1) { > + if (pm_save_wakeup_count(val)) > + return n; > + } > + return -EINVAL; > +} > + > +power_attr(wakeup_count); > + > #ifdef CONFIG_PM_TRACE > int pm_trace_enabled; > > @@ -236,6 +288,7 @@ static struct attribute * g[] = { > #endif > #ifdef CONFIG_PM_SLEEP > &pm_async_attr.attr, > + &wakeup_count_attr.attr, > #ifdef CONFIG_PM_DEBUG > &pm_test_attr.attr, > #endif > Index: linux-2.6/drivers/base/power/wakeup.c > =================================================================== > --- /dev/null > +++ linux-2.6/drivers/base/power/wakeup.c > @@ -0,0 +1,240 @@ > +/* > + * drivers/base/power/wakeup.c - System wakeup events framework > + * > + * Copyright (c) 2010 Rafael J. Wysocki <rjw(a)sisk.pl>, Novell Inc. > + * > + * This file is released under the GPLv2. > + */ > + > +#include <linux/device.h> > +#include <linux/slab.h> > +#include <linux/sched.h> > +#include <linux/capability.h> > +#include <linux/suspend.h> > +#include <linux/pm.h> > + > +/* > + * If set, the suspend/hibernate code will abort transitions to a sleep state > + * if wakeup events are registered during or immediately before the transition. > + */ > +bool events_check_enabled; > + > +/* The counter of registered wakeup events. */ > +static unsigned long event_count; > +/* A preserved old value of event_count. */ > +static unsigned long saved_event_count; > +/* The counter of wakeup events being processed. */ > +static unsigned long events_in_progress; > + > +static DEFINE_SPINLOCK(events_lock); > +static DECLARE_WAIT_QUEUE_HEAD(events_wait_queue); > + > +/* > + * The functions below use the observation that each wakeup event starts a > + * period in which the system should not be suspended. The moment this period > + * will end depends on how the wakeup event is going to be processed after being > + * detected and all of the possible cases can be divided into two distinct > + * groups. > + * > + * First, a wakeup event may be detected by the same functional unit that will > + * carry out the entire processing of it and possibly will pass it to user space > + * for further processing. In that case the functional unit that has detected > + * the event may later "close" the "no suspend" period associated with it > + * directly as soon as it has been dealt with. The pair of pm_stay_awake() and > + * pm_relax(), balanced with each other, is supposed to be used in such > + * situations. > + * > + * Second, a wakeup event may be detected by one functional unit and processed > + * by another one. In that case the unit that has detected it cannot really > + * "close" the "no suspend" period associated with it, unless it knows in > + * advance what's going to happen to the event during processing. This > + * knowledge, however, may not be available to it, so it can simply specify time > + * to wait before the system can be suspended and pass it as the second > + * argument of pm_wakeup_event(). > + */ > + > +/** > + * pm_stay_awake - Notify the PM core that a wakeup event is being processed. > + * @dev: Device the wakeup event is related to. > + * > + * Notify the PM core of a wakeup event (signaled by @dev) by incrementing the > + * counter of wakeup events being processed. If @dev is not NULL, the counter > + * of wakeup events related to @dev is incremented too. > + * > + * Call this function after detecting of a wakeup event if pm_relax() is going > + * to be called directly after processing the event (and possibly passing it to > + * user space for further processing). > + * > + * It is safe to call this function from interrupt context. > + */ > +void pm_stay_awake(struct device *dev) > +{ > + unsigned long flags; > + > + spin_lock_irqsave(&events_lock, flags); > + if (dev) > + dev->power.wakeup_count++; > + > + events_in_progress++; > + spin_unlock_irqrestore(&events_lock, flags); > +} > + > +/** > + * pm_relax - Notify the PM core that processing of a wakeup event has ended. > + * > + * Notify the PM core that a wakeup event has been processed by decrementing > + * the counter of wakeup events being processed and incrementing the counter > + * of registered wakeup events. > + * > + * Call this function for wakeup events whose processing started with calling > + * pm_stay_awake(). > + * > + * It is safe to call it from interrupt context. > + */ > +void pm_relax(void) > +{ > + unsigned long flags; > + > + spin_lock_irqsave(&events_lock, flags); > + if (events_in_progress) { > + event_count++; > + if (!--events_in_progress) > + wake_up_all(&events_wait_queue); > + } > + spin_unlock_irqrestore(&events_lock, flags); > +} > + > +/** > + * pm_wakeup_work_fn - Deferred closing of a wakeup event. > + * > + * Execute pm_relax() for a wakeup event detected in the past and free the > + * work item object used for queuing up the work. > + */ > +static void pm_wakeup_work_fn(struct work_struct *work) > +{ > + struct delayed_work *dwork = to_delayed_work(work); > + > + pm_relax(); > + kfree(dwork); > +} > + > +/** > + * pm_wakeup_event - Notify the PM core of a wakeup event. > + * @dev: Device the wakeup event is related to. > + * @msec: Anticipated event processing time (in milliseconds). > + * > + * Notify the PM core of a wakeup event (signaled by @dev) that will take > + * approximately @msec milliseconds to be processed by the kernel. Increment > + * the counter of wakeup events being processed and queue up a work item > + * that will execute pm_relax() for the event after @msec milliseconds. If @dev > + * is not NULL, the counter of wakeup events related to @dev is incremented too. > + * > + * It is safe to call this function from interrupt context. > + */ > +void pm_wakeup_event(struct device *dev, unsigned int msec) > +{ > + unsigned long flags; > + struct delayed_work *dwork; > + > + dwork = msec ? kzalloc(sizeof(*dwork), GFP_ATOMIC) : NULL; > + > + spin_lock_irqsave(&events_lock, flags); > + if (dev) > + dev->power.wakeup_count++; > + > + if (dwork) { > + INIT_DELAYED_WORK(dwork, pm_wakeup_work_fn); > + schedule_delayed_work(dwork, msecs_to_jiffies(msec)); > + > + events_in_progress++; > + } else { > + event_count++; > + } > + spin_unlock_irqrestore(&events_lock, flags); > +} > + > +/** > + * pm_check_wakeup_events - Check for new wakeup events. > + * > + * Compare the current number of registered wakeup events with its preserved > + * value from the past to check if new wakeup events have been registered since > + * the old value was stored. Check if the current number of wakeup events being > + * processed is zero. > + */ > +bool pm_check_wakeup_events(void) > +{ > + unsigned long flags; > + bool ret = true; > + > + spin_lock_irqsave(&events_lock, flags); > + if (events_check_enabled) { > + ret = (event_count == saved_event_count) && !events_in_progress; > + events_check_enabled = ret; > + } > + spin_unlock_irqrestore(&events_lock, flags); > + return ret; > +} > + > +/** > + * pm_get_wakeup_count - Read the number of registered wakeup events. > + * @count: Address to store the value at. > + * > + * Store the number of registered wakeup events at the address in @count. Block > + * if the current number of wakeup events being processed is nonzero. > + * > + * Return false if the wait for the number of wakeup events being processed to > + * drop down to zero has been interrupted by a signal (and the current number > + * of wakeup events being processed is still nonzero). Otherwise return true. > + */ > +bool pm_get_wakeup_count(unsigned long *count) > +{ > + bool ret; > + > + spin_lock_irq(&events_lock); > + if (capable(CAP_SYS_ADMIN)) > + events_check_enabled = false; > + > + if (events_in_progress) { > + DEFINE_WAIT(wait); > + > + do { > + prepare_to_wait(&events_wait_queue, &wait, > + TASK_INTERRUPTIBLE); > + if (!events_in_progress) > + break; > + spin_unlock_irq(&events_lock); > + > + schedule(); > + > + spin_lock_irq(&events_lock); > + } while (!signal_pending(current)); > + finish_wait(&events_wait_queue, &wait); > + } > + *count = event_count; > + ret = !events_in_progress; > + spin_unlock_irq(&events_lock); > + return ret; > +} > + > +/** > + * pm_save_wakeup_count - Save the current number of registered wakeup events. > + * @count: Value to compare with the current number of registered wakeup events. > + * > + * If @count is equal to the current number of registered wakeup events and the > + * current number of wakeup events being processed is zero, store @count as the > + * old number of registered wakeup events to be used by pm_check_wakeup_events() > + * and return true. Otherwise return false. > + */ > +bool pm_save_wakeup_count(unsigned long count) > +{ > + bool ret = false; > + > + spin_lock_irq(&events_lock); > + if (count == event_count && !events_in_progress) { > + saved_event_count = count; > + events_check_enabled = true; > + ret = true; > + } > + spin_unlock_irq(&events_lock); > + return ret; > +} > Index: linux-2.6/include/linux/pm.h > =================================================================== > --- linux-2.6.orig/include/linux/pm.h > +++ linux-2.6/include/linux/pm.h > @@ -457,6 +457,7 @@ struct dev_pm_info { > #ifdef CONFIG_PM_SLEEP > struct list_head entry; > struct completion completion; > + unsigned long wakeup_count; > #endif > #ifdef CONFIG_PM_RUNTIME > struct timer_list suspend_timer; > @@ -552,6 +553,11 @@ extern void __suspend_report_result(cons > } while (0) > > extern void device_pm_wait_for_dev(struct device *sub, struct device *dev); > + > +/* drivers/base/power/wakeup.c */ > +extern void pm_wakeup_event(struct device *dev, unsigned int msec); > +extern void pm_stay_awake(struct device *dev); > +extern void pm_relax(void); > #else /* !CONFIG_PM_SLEEP */ > > #define device_pm_lock() do {} while (0) > @@ -565,6 +571,10 @@ static inline int dpm_suspend_start(pm_m > #define suspend_report_result(fn, ret) do {} while (0) > > static inline void device_pm_wait_for_dev(struct device *a, struct device *b) {} > + > +static inline void pm_wakeup_event(struct device *dev, unsigned int msec) {} > +static inline void pm_stay_awake(struct device *dev) {} > +static inline void pm_relax(void) {} > #endif /* !CONFIG_PM_SLEEP */ > > /* How to reorder dpm_list after device_move() */ > Index: linux-2.6/drivers/base/power/Makefile > =================================================================== > --- linux-2.6.orig/drivers/base/power/Makefile > +++ linux-2.6/drivers/base/power/Makefile > @@ -1,5 +1,5 @@ > obj-$(CONFIG_PM) += sysfs.o > -obj-$(CONFIG_PM_SLEEP) += main.o > +obj-$(CONFIG_PM_SLEEP) += main.o wakeup.o > obj-$(CONFIG_PM_RUNTIME) += runtime.o > obj-$(CONFIG_PM_OPS) += generic_ops.o > obj-$(CONFIG_PM_TRACE_RTC) += trace.o > Index: linux-2.6/drivers/base/power/main.c > =================================================================== > --- linux-2.6.orig/drivers/base/power/main.c > +++ linux-2.6/drivers/base/power/main.c > @@ -59,6 +59,7 @@ void device_pm_init(struct device *dev) > { > dev->power.status = DPM_ON; > init_completion(&dev->power.completion); > + dev->power.wakeup_count = 0; > pm_runtime_init(dev); > } > > Index: linux-2.6/kernel/power/suspend.c > =================================================================== > --- linux-2.6.orig/kernel/power/suspend.c > +++ linux-2.6/kernel/power/suspend.c > @@ -172,8 +172,10 @@ static int suspend_enter(suspend_state_t > > error = sysdev_suspend(PMSG_SUSPEND); > if (!error) { > - if (!suspend_test(TEST_CORE)) > + if (!suspend_test(TEST_CORE) && pm_check_wakeup_events()) { > error = suspend_ops->enter(state); > + events_check_enabled = false; > + } > sysdev_resume(); > } > > Index: linux-2.6/kernel/power/hibernate.c > =================================================================== > --- linux-2.6.orig/kernel/power/hibernate.c > +++ linux-2.6/kernel/power/hibernate.c > @@ -277,7 +277,7 @@ static int create_image(int platform_mod > goto Enable_irqs; > } > > - if (hibernation_test(TEST_CORE)) > + if (hibernation_test(TEST_CORE) || !pm_check_wakeup_events()) > goto Power_up; > > in_suspend = 1; > @@ -288,8 +288,10 @@ static int create_image(int platform_mod > error); > /* Restore control flow magically appears here */ > restore_processor_state(); > - if (!in_suspend) > + if (!in_suspend) { > + events_check_enabled = false; > platform_leave(platform_mode); > + } > > Power_up: > sysdev_resume(); > @@ -511,14 +513,20 @@ int hibernation_platform_enter(void) > > local_irq_disable(); > sysdev_suspend(PMSG_HIBERNATE); > + if (!pm_check_wakeup_events()) { > + error = -EAGAIN; > + goto Power_up; > + } > + > hibernation_ops->enter(); > /* We should never get here */ > while (1); > > - /* > - * We don't need to reenable the nonboot CPUs or resume consoles, since > - * the system is going to be halted anyway. > - */ > + Power_up: > + sysdev_resume(); > + local_irq_enable(); > + enable_nonboot_cpus(); > + > Platform_finish: > hibernation_ops->finish(); > > Index: linux-2.6/drivers/pci/pci-acpi.c > =================================================================== > --- linux-2.6.orig/drivers/pci/pci-acpi.c > +++ linux-2.6/drivers/pci/pci-acpi.c > @@ -48,6 +48,7 @@ static void pci_acpi_wake_dev(acpi_handl > if (event == ACPI_NOTIFY_DEVICE_WAKE && pci_dev) { > pci_check_pme_status(pci_dev); > pm_runtime_resume(&pci_dev->dev); > + pci_wakeup_event(pci_dev); > if (pci_dev->subordinate) > pci_pme_wakeup_bus(pci_dev->subordinate); > } > Index: linux-2.6/drivers/pci/pcie/pme/pcie_pme.c > =================================================================== > --- linux-2.6.orig/drivers/pci/pcie/pme/pcie_pme.c > +++ linux-2.6/drivers/pci/pcie/pme/pcie_pme.c > @@ -154,6 +154,7 @@ static bool pcie_pme_walk_bus(struct pci > /* Skip PCIe devices in case we started from a root port. */ > if (!pci_is_pcie(dev) && pci_check_pme_status(dev)) { > pm_request_resume(&dev->dev); > + pci_wakeup_event(dev); > ret = true; > } > > @@ -254,8 +255,10 @@ static void pcie_pme_handle_request(stru > if (found) { > /* The device is there, but we have to check its PME status. */ > found = pci_check_pme_status(dev); > - if (found) > + if (found) { > pm_request_resume(&dev->dev); > + pci_wakeup_event(dev); > + } > pci_dev_put(dev); > } else if (devfn) { > /* > Index: linux-2.6/drivers/base/power/sysfs.c > =================================================================== > --- linux-2.6.orig/drivers/base/power/sysfs.c > +++ linux-2.6/drivers/base/power/sysfs.c > @@ -73,6 +73,8 @@ > * device are known to the PM core. However, for some devices this > * attribute is set to "enabled" by bus type code or device drivers and in > * that cases it should be safe to leave the default value. > + * > + * wakeup_count - Report the number of wakeup events related to the device > */ > > static const char enabled[] = "enabled"; > @@ -144,6 +146,14 @@ wake_store(struct device * dev, struct d > > static DEVICE_ATTR(wakeup, 0644, wake_show, wake_store); > > +static ssize_t wakeup_count_show(struct device *dev, > + struct device_attribute *attr, char *buf) > +{ > + return sprintf(buf, "%lu\n", dev->power.wakeup_count); > +} > + > +static DEVICE_ATTR(wakeup_count, 0444, wakeup_count_show, NULL); > + > #ifdef CONFIG_PM_ADVANCED_DEBUG > #ifdef CONFIG_PM_RUNTIME > > @@ -230,6 +240,7 @@ static struct attribute * power_attrs[] > &dev_attr_control.attr, > #endif > &dev_attr_wakeup.attr, > + &dev_attr_wakeup_count.attr, > #ifdef CONFIG_PM_ADVANCED_DEBUG > &dev_attr_async.attr, > #ifdef CONFIG_PM_RUNTIME > Index: linux-2.6/drivers/pci/pci.h > =================================================================== > --- linux-2.6.orig/drivers/pci/pci.h > +++ linux-2.6/drivers/pci/pci.h > @@ -56,6 +56,7 @@ extern void pci_update_current_state(str > extern void pci_disable_enabled_device(struct pci_dev *dev); > extern bool pci_check_pme_status(struct pci_dev *dev); > extern int pci_finish_runtime_suspend(struct pci_dev *dev); > +extern void pci_wakeup_event(struct pci_dev *dev); > extern int __pci_pme_wakeup(struct pci_dev *dev, void *ign); > extern void pci_pme_wakeup_bus(struct pci_bus *bus); > extern void pci_pm_init(struct pci_dev *dev); > Index: linux-2.6/drivers/pci/pci.c > =================================================================== > --- linux-2.6.orig/drivers/pci/pci.c > +++ linux-2.6/drivers/pci/pci.c > @@ -1275,6 +1275,22 @@ bool pci_check_pme_status(struct pci_dev > return ret; > } > > +/* > + * Time to wait before the system can be put into a sleep state after reporting > + * a wakeup event signaled by a PCI device. > + */ > +#define PCI_WAKEUP_COOLDOWN 100 > + > +/** > + * pci_wakeup_event - Report a wakeup event related to a given PCI device. > + * @dev: Device to report the wakeup event for. > + */ > +void pci_wakeup_event(struct pci_dev *dev) > +{ > + if (device_may_wakeup(&dev->dev)) > + pm_wakeup_event(&dev->dev, PCI_WAKEUP_COOLDOWN); > +} > + > /** > * pci_pme_wakeup - Wake up a PCI device if its PME Status bit is set. > * @dev: Device to handle. > @@ -1285,8 +1301,10 @@ bool pci_check_pme_status(struct pci_dev > */ > static int pci_pme_wakeup(struct pci_dev *dev, void *ign) > { > - if (pci_check_pme_status(dev)) > + if (pci_check_pme_status(dev)) { > pm_request_resume(&dev->dev); > + pci_wakeup_event(dev); > + } > return 0; > } > > Index: linux-2.6/include/linux/suspend.h > =================================================================== > --- linux-2.6.orig/include/linux/suspend.h > +++ linux-2.6/include/linux/suspend.h > @@ -295,6 +295,13 @@ extern int unregister_pm_notifier(struct > { .notifier_call = fn, .priority = pri }; \ > register_pm_notifier(&fn##_nb); \ > } > + > +/* drivers/base/power/wakeup.c */ > +extern bool events_check_enabled; > + > +extern bool pm_check_wakeup_events(void); > +extern bool pm_get_wakeup_count(unsigned long *count); > +extern bool pm_save_wakeup_count(unsigned long count); > #else /* !CONFIG_PM_SLEEP */ > > static inline int register_pm_notifier(struct notifier_block *nb) > Index: linux-2.6/Documentation/ABI/testing/sysfs-power > =================================================================== > --- linux-2.6.orig/Documentation/ABI/testing/sysfs-power > +++ linux-2.6/Documentation/ABI/testing/sysfs-power > @@ -114,3 +114,17 @@ Description: > if this file contains "1", which is the default. It may be > disabled by writing "0" to this file, in which case all devices > will be suspended and resumed synchronously. > + > +What: /sys/power/wakeup_count > +Date: July 2010 > +Contact: Rafael J. Wysocki <rjw(a)sisk.pl> > +Description: > + The /sys/power/wakeup_count file allows user space to avoid > + losing wakeup events when transitioning the system into a sleep > + state. Reading from it returns the current number of registered > + wakeup events and it blocks if some wakeup events are being > + processed at the time the file is read from. Writing to it > + will only succeed if the current number of wakeup events is > + equal to the written value and, if successful, will make the > + kernel abort a subsequent transition to a sleep state if any > + wakeup events are reported after the write has returned. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Alan Stern on 28 Jun 2010 10:20 On Mon, 28 Jun 2010, Rafael J. Wysocki wrote: > > > +bool pm_get_wakeup_count(unsigned long *count) > > > +{ > > > + bool ret; > > > + > > > + spin_lock_irq(&events_lock); > > > + if (capable(CAP_SYS_ADMIN)) > > > + events_check_enabled = false; > > > + > > > + if (events_in_progress) { > > > + DEFINE_WAIT(wait); > > > + > > > + do { > > > + prepare_to_wait(&events_wait_queue, &wait, > > > + TASK_INTERRUPTIBLE); > > > + if (!events_in_progress) > > > + break; > > > + spin_unlock_irq(&events_lock); > > > + > > > + schedule(); > > > + > > > + spin_lock_irq(&events_lock); > > > + } while (!signal_pending(current)); > > > + finish_wait(&events_wait_queue, &wait); > > > + } > > > + *count = event_count; > > > + ret = !events_in_progress; > > > + spin_unlock_irq(&events_lock); > > > + return ret; > > > +} > > > > Here's a thought. Presumably pm_relax() will end up getting called a > > lot more often than pm_get_wakeup_count(). Instead of using a wait > > queue, you could make pm_get_wakeup_count() poll at 100-ms intervals. > > The total overhead would be smaller. > > For that I'd need a separate kernel thread or a work item that would reschedule > itself periodically, because pm_get_wakeup_count() is only called via > /sys/power/wakeup_count. It would complicate things quite a bit which I'm not > sure is worth it at this point. What? All I'm saying is that the do-while loop above should be replaced by a loop that sleeps for 100 ms instead of waiting on a wait_queue. That is: while (events_in_progress) { if (signal_pending(current)) break; spin_unlock_irq(&events_lock); schedule_timeout_interruptible(msecs_to_jiffies(100)); spin_lock_irq(&events_lock); } And of course, get rid of events_wait_queue. Alan Stern -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: mark gross on 29 Jun 2010 00:50 On Mon, Jun 28, 2010 at 02:50:10PM +0200, Rafael J. Wysocki wrote: > On Monday, June 28, 2010, mark gross wrote: > > Looks good to me! > > Great, thanks! May I add your "Acked-by" to the patch, then? yes. Acked-by: markgross <markgross(a)thegnar.org> --mgross > Rafael > > > > On Sat, Jun 26, 2010 at 03:14:13PM +0200, Rafael J. Wysocki wrote: > > > From: Rafael J. Wysocki <rjw(a)sisk.pl> > > > > > > One of the arguments during the suspend blockers discussion was that > > > the mainline kernel didn't contain any mechanisms making it possible > > > to avoid losing wakeup events during system suspend. > > > > > > Generally, there are two problems in that area. First, if a wakeup > > > event occurs exactly when /sys/power/state is being written to, it > > > may be delivered to user space right before the freezer kicks in, so > > > the user space consumer of the event may not be able to process it > > > before the system is suspended. Second, if a wakeup event occurs > > > after user space has been frozen, it is not generally guaranteed that > > > the ongoing transition of the system into a sleep state will be > > > aborted. > > > > > > To address these issues introduce a new global sysfs attribute, > > > /sys/power/wakeup_count, associated with a running counter of wakeup > > > events and three helper functions, pm_stay_awake(), pm_relax(), and > > > pm_wakeup_event(), that may be used by kernel subsystems to control > > > the behavior of this attribute and to request the PM core to abort > > > system transitions into a sleep state already in progress. > > > > > > The /sys/power/wakeup_count file may be read from or written to by > > > user space. Reads will always succeed (unless interrupted by a > > > signal) and return the current value of the wakeup events counter. > > > Writes, however, will only succeed if the written number is equal to > > > the current value of the wakeup events counter. If a write is > > > successful, it will cause the kernel to save the current value of the > > > wakeup events counter and to abort the subsequent system transition > > > into a sleep state if any wakeup events are reported after the write > > > has returned. > > > > > > [The assumption is that before writing to /sys/power/state user space > > > will first read from /sys/power/wakeup_count. Next, user space > > > consumers of wakeup events will have a chance to acknowledge or > > > veto the upcoming system transition to a sleep state. Finally, if > > > the transition is allowed to proceed, /sys/power/wakeup_count will > > > be written to and if that succeeds, /sys/power/state will be written > > > to as well. Still, if any wakeup events are reported to the PM core > > > by kernel subsystems after that point, the transition will be > > > aborted.] > > > > > > Additionally, put a wakeup events counter into struct dev_pm_info and > > > make these per-device wakeup event counters available via sysfs, > > > so that it's possible to check the activity of various wakeup event > > > sources within the kernel. > > > > > > To illustrate how subsystems can use pm_wakeup_event(), make the > > > low-level PCI runtime PM wakeup-handling code use it. > > > > > > Signed-off-by: Rafael J. Wysocki <rjw(a)sisk.pl> > > > --- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Pavel Machek on 1 Jul 2010 09:40 Hi! > @@ -114,3 +114,17 @@ Description: > if this file contains "1", which is the default. It may be > disabled by writing "0" to this file, in which case all devices > will be suspended and resumed synchronously. > + > +What: /sys/power/wakeup_count > +Date: July 2010 > +Contact: Rafael J. Wysocki <rjw(a)sisk.pl> > +Description: > + The /sys/power/wakeup_count file allows user space to avoid > + losing wakeup events when transitioning the system into a sleep > + state. Reading from it returns the current number of registered > + wakeup events and it blocks if some wakeup events are being > + processed at the time the file is read from. Writing to it > + will only succeed if the current number of wakeup events is > + equal to the written value and, if successful, will make the > + kernel abort a subsequent transition to a sleep state if any > + wakeup events are reported after the write has returned. I assume that second suspend always succeeds? I can't say I quite like the way two sysfs files interact with each other, but it is certainly better then wakelocks... Maybe we should create sys_suspend()? -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
|
Next
|
Last
Pages: 1 2 Prev: [GIT] Please pull NFS client fixes Next: [GIT PULL] percpu for v2.6.35-rc3 |