summaryrefslogtreecommitdiff
path: root/include/net/devlink.h
diff options
context:
space:
mode:
authorDavid S. Miller <davem@davemloft.net>2019-02-07 10:34:39 -0800
committerDavid S. Miller <davem@davemloft.net>2019-02-07 10:34:39 -0800
commit0739d24d0c7b79a944b366dae4aaea7e7f452cd5 (patch)
tree89580a11f3e6d759f21f4abb661349e46c733fc1 /include/net/devlink.h
parent8f289805616e81f7c1690931aa8a586c76f4fa88 (diff)
parentdb2ab7a08f06aa4dd784b87bafeb3169ee7f0d95 (diff)
Merge branch 'devlink-health'
Eran Ben Elisha says: ==================== Devlink health reporting and recovery system The health mechanism is targeted for Real Time Alerting, in order to know when something bad had happened to a PCI device - Provide alert debug information - Self healing - If problem needs vendor support, provide a way to gather all needed debugging information. The main idea is to unify and centralize driver health reports in the generic devlink instance and allow the user to set different attributes of the health reporting and recovery procedures. The devlink health reporter: Device driver creates a "health reporter" per each error/health type. Error/Health type can be a known/generic (eg pci error, fw error, rx/tx error) or unknown (driver specific). For each registered health reporter a driver can issue error/health reports asynchronously. All health reports handling is done by devlink. Device driver can provide specific callbacks for each "health reporter", e.g. - Recovery procedures - Diagnostics and object dump procedures - OOB initial attributes Different parts of the driver can register different types of health reporters with different handlers. Once an error is reported, devlink health will do the following actions: * A log is being send to the kernel trace events buffer * Health status and statistics are being updated for the reporter instance * Object dump is being taken and saved at the reporter instance (as long as there is no other dump which is already stored) * Auto recovery attempt is being done. Depends on: - Auto-recovery configuration - Grace period vs. time passed since last recover The user interface: User can access/change each reporter attributes and driver specific callbacks via devlink, e.g per error type (per health reporter) - Configure reporter's generic attributes (like: Disable/enable auto recovery) - Invoke recovery procedure - Run diagnostics - Object dump The devlink health interface (via netlink): DEVLINK_CMD_HEALTH_REPORTER_GET Retrieves status and configuration info per DEV and reporter. DEVLINK_CMD_HEALTH_REPORTER_SET Allows reporter-related configuration setting. DEVLINK_CMD_HEALTH_REPORTER_RECOVER Triggers a reporter's recovery procedure. DEVLINK_CMD_HEALTH_REPORTER_DIAGNOSE Retrieves diagnostics data from a reporter on a device. DEVLINK_CMD_HEALTH_REPORTER_DUMP_GET Retrieves the last stored dump. Devlink health saves a single dump. If an dump is not already stored by the devlink for this reporter, devlink generates a new dump. dump output is defined by the reporter. DEVLINK_CMD_HEALTH_REPORTER_DUMP_CLEAR Clears the last saved dump file for the specified reporter. netlink +--------------------------+ | | | + | | | | +--------------------------+ |request for ops |(diagnose, mlx5_core devlink |recover, |dump) +--------+ +--------------------------+ | | | reporter| | | | | +---------v----------+ | | | ops execution | | | | | <----------------------------------+ | | | | | | | | | | | + ^------------------+ | | | | | request for ops | | | | | (recover, dump) | | | | | | | | | +-+------------------+ | | | health report | | health handler | | | +-------------------------------> | | | | | +--------------------+ | | | health reporter create | | | +----------------------------> | +--------+ +--------------------------+ In this patchset, mlx5e TX reporter is implemented. Cmdline format: devlink health show [DEV reporter REPORTE_NAME] devlink health recover DEV reporter REPORTER_NAME devlink health diagnose DEV reporter REPORTER_NAME devlink health dump show DEV reporter REPORTER_NAME devlink health dump clear DEV reporter REPORTER_NAME devlink health set DEV reporter REPORTER_NAME NAME VALUE Cmdline examples: $devlink health show pci/0000:00:09.0: name tx state healthy #err 1 #recover 0 last_dump_ts N/A parameters: grace_period 500 auto_recover false $devlink health diagnose pci/0000:00:09.0 reporter tx -j -p { "SQs": [ { "sqn": 138, "HW state": 1, "stopped": false },{ "sqn": 142, "HW state": 1, "stopped": false } ] } $devlink health diagnose pci/0000:00:09.0 reporter tx SQs: sqn: 138 HW state: 1 stopped: false sqn: 142 HW state: 1 stopped: false $devlink health recover pci/0000:00:09 reporter tx $devlink health set pci/0000:00:09.0 reporter tx grace_period 3500 $devlink health set pci/0000:00:09.0 reporter tx auto_recover false Changelog: v4: - Rebase on latest net-next - Remove trace_devlink_health signature exposure in case CONFIG_NET_DEVLINK is not defined as it shall only be used from devlink. v3: - Redesign of devlink <-> driver fmsg API - Various bug fixes v2: - Remove FW* reporters to decrease the amount of patches in the patchset ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
Diffstat (limited to 'include/net/devlink.h')
-rw-r--r--include/net/devlink.h211
1 files changed, 211 insertions, 0 deletions
diff --git a/include/net/devlink.h b/include/net/devlink.h
index 74d992a68a06..c12ad6e9095d 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -30,6 +30,7 @@ struct devlink {
struct list_head param_list;
struct list_head region_list;
u32 snapshot_id;
+ struct list_head reporter_list;
struct devlink_dpipe_headers *dpipe_headers;
const struct devlink_ops *ops;
struct device *dev;
@@ -448,6 +449,29 @@ struct devlink_info_req;
typedef void devlink_snapshot_data_dest_t(const void *data);
+struct devlink_fmsg;
+struct devlink_health_reporter;
+
+/**
+ * struct devlink_health_reporter_ops - Reporter operations
+ * @name: reporter name
+ * @recover: callback to recover from reported error
+ * if priv_ctx is NULL, run a full recover
+ * @dump: callback to dump an object
+ * if priv_ctx is NULL, run a full dump
+ * @diagnose: callback to diagnose the current status
+ */
+
+struct devlink_health_reporter_ops {
+ char *name;
+ int (*recover)(struct devlink_health_reporter *reporter,
+ void *priv_ctx);
+ int (*dump)(struct devlink_health_reporter *reporter,
+ struct devlink_fmsg *fmsg, void *priv_ctx);
+ int (*diagnose)(struct devlink_health_reporter *reporter,
+ struct devlink_fmsg *fmsg);
+};
+
struct devlink_ops {
int (*reload)(struct devlink *devlink, struct netlink_ext_ack *extack);
int (*port_type_set)(struct devlink_port *devlink_port,
@@ -639,6 +663,50 @@ int devlink_info_version_running_put(struct devlink_info_req *req,
const char *version_name,
const char *version_value);
+int devlink_fmsg_obj_nest_start(struct devlink_fmsg *fmsg);
+int devlink_fmsg_obj_nest_end(struct devlink_fmsg *fmsg);
+
+int devlink_fmsg_pair_nest_start(struct devlink_fmsg *fmsg, const char *name);
+int devlink_fmsg_pair_nest_end(struct devlink_fmsg *fmsg);
+
+int devlink_fmsg_arr_pair_nest_start(struct devlink_fmsg *fmsg,
+ const char *name);
+int devlink_fmsg_arr_pair_nest_end(struct devlink_fmsg *fmsg);
+
+int devlink_fmsg_bool_put(struct devlink_fmsg *fmsg, bool value);
+int devlink_fmsg_u8_put(struct devlink_fmsg *fmsg, u8 value);
+int devlink_fmsg_u32_put(struct devlink_fmsg *fmsg, u32 value);
+int devlink_fmsg_u64_put(struct devlink_fmsg *fmsg, u64 value);
+int devlink_fmsg_string_put(struct devlink_fmsg *fmsg, const char *value);
+int devlink_fmsg_binary_put(struct devlink_fmsg *fmsg, const void *value,
+ u16 value_len);
+
+int devlink_fmsg_bool_pair_put(struct devlink_fmsg *fmsg, const char *name,
+ bool value);
+int devlink_fmsg_u8_pair_put(struct devlink_fmsg *fmsg, const char *name,
+ u8 value);
+int devlink_fmsg_u32_pair_put(struct devlink_fmsg *fmsg, const char *name,
+ u32 value);
+int devlink_fmsg_u64_pair_put(struct devlink_fmsg *fmsg, const char *name,
+ u64 value);
+int devlink_fmsg_string_pair_put(struct devlink_fmsg *fmsg, const char *name,
+ const char *value);
+int devlink_fmsg_binary_pair_put(struct devlink_fmsg *fmsg, const char *name,
+ const void *value, u16 value_len);
+
+struct devlink_health_reporter *
+devlink_health_reporter_create(struct devlink *devlink,
+ const struct devlink_health_reporter_ops *ops,
+ u64 graceful_period, bool auto_recover,
+ void *priv);
+void
+devlink_health_reporter_destroy(struct devlink_health_reporter *reporter);
+
+void *
+devlink_health_reporter_priv(struct devlink_health_reporter *reporter);
+int devlink_health_report(struct devlink_health_reporter *reporter,
+ const char *msg, void *priv_ctx);
+
#else
static inline struct devlink *devlink_alloc(const struct devlink_ops *ops,
@@ -971,6 +1039,149 @@ devlink_info_version_running_put(struct devlink_info_req *req,
{
return 0;
}
+
+static inline int
+devlink_fmsg_obj_nest_start(struct devlink_fmsg *fmsg)
+{
+ return 0;
+}
+
+static inline int
+devlink_fmsg_obj_nest_end(struct devlink_fmsg *fmsg)
+{
+ return 0;
+}
+
+static inline int
+devlink_fmsg_pair_nest_start(struct devlink_fmsg *fmsg, const char *name)
+{
+ return 0;
+}
+
+static inline int
+devlink_fmsg_pair_nest_end(struct devlink_fmsg *fmsg)
+{
+ return 0;
+}
+
+static inline int
+devlink_fmsg_arr_pair_nest_start(struct devlink_fmsg *fmsg,
+ const char *name)
+{
+ return 0;
+}
+
+static inline int
+devlink_fmsg_arr_pair_nest_end(struct devlink_fmsg *fmsg)
+{
+ return 0;
+}
+
+static inline int
+devlink_fmsg_bool_put(struct devlink_fmsg *fmsg, bool value)
+{
+ return 0;
+}
+
+static inline int
+devlink_fmsg_u8_put(struct devlink_fmsg *fmsg, u8 value)
+{
+ return 0;
+}
+
+static inline int
+devlink_fmsg_u32_put(struct devlink_fmsg *fmsg, u32 value)
+{
+ return 0;
+}
+
+static inline int
+devlink_fmsg_u64_put(struct devlink_fmsg *fmsg, u64 value)
+{
+ return 0;
+}
+
+static inline int
+devlink_fmsg_string_put(struct devlink_fmsg *fmsg, const char *value)
+{
+ return 0;
+}
+
+static inline int
+devlink_fmsg_binary_put(struct devlink_fmsg *fmsg, const void *value,
+ u16 value_len)
+{
+ return 0;
+}
+
+static inline int
+devlink_fmsg_bool_pair_put(struct devlink_fmsg *fmsg, const char *name,
+ bool value)
+{
+ return 0;
+}
+
+static inline int
+devlink_fmsg_u8_pair_put(struct devlink_fmsg *fmsg, const char *name,
+ u8 value)
+{
+ return 0;
+}
+
+static inline int
+devlink_fmsg_u32_pair_put(struct devlink_fmsg *fmsg, const char *name,
+ u32 value)
+{
+ return 0;
+}
+
+static inline int
+devlink_fmsg_u64_pair_put(struct devlink_fmsg *fmsg, const char *name,
+ u64 value)
+{
+ return 0;
+}
+
+static inline int
+devlink_fmsg_string_pair_put(struct devlink_fmsg *fmsg, const char *name,
+ const char *value)
+{
+ return 0;
+}
+
+static inline int
+devlink_fmsg_binary_pair_put(struct devlink_fmsg *fmsg, const char *name,
+ const void *value, u16 value_len)
+{
+ return 0;
+}
+
+static inline struct devlink_health_reporter *
+devlink_health_reporter_create(struct devlink *devlink,
+ const struct devlink_health_reporter_ops *ops,
+ u64 graceful_period, bool auto_recover,
+ void *priv)
+{
+ return NULL;
+}
+
+static inline void
+devlink_health_reporter_destroy(struct devlink_health_reporter *reporter)
+{
+}
+
+static inline void *
+devlink_health_reporter_priv(struct devlink_health_reporter *reporter)
+{
+ return NULL;
+}
+
+static inline int
+devlink_health_report(struct devlink_health_reporter *reporter,
+ const char *msg, void *priv_ctx)
+{
+ return 0;
+}
#endif
#if IS_REACHABLE(CONFIG_NET_DEVLINK)