课程培训
Kubeflow on OpenShift培训

一、培训目标(Training Objectives

本培训结合当前市场KubeflowOpenShift主流版本核心特性,摒弃过时工具与旧用法,聚焦Kubeflow on OpenShift全流程实战技能,帮助学员熟练掌握KubeflowOpenShift协同部署、机器学习模型开发、流水线构建、模型部署与运维的主流技术,精通容器化ML工作流搭建、模型训练与推理部署,能独立完成基于OpenShift平台的Kubeflow环境搭建、ML项目落地与集群管理,具备企业级机器学习容器化部署与运维实战能力,适配ML工程师、DevOps工程师、云计算架构师等岗位需求。

Combined with the core features of the current mainstream versions of Kubeflow and OpenShift in the market, this training abandons outdated tools and old practices, focuses on the full-process practical skills of Kubeflow on OpenShift, and helps students proficiently master mainstream technologies such as collaborative deployment of Kubeflow and OpenShift, machine learning model development, pipeline construction, model deployment and operation and maintenance. It enables students to be proficient in building containerized ML workflows, model training and inference deployment, independently complete Kubeflow environment construction, ML project implementation and cluster management based on the OpenShift platform, have practical capabilities in enterprise-level machine learning containerized deployment and operation and maintenance, and meet the needs of positions such as ML engineers, DevOps engineers, and cloud computing architects.

二、课程简介(Course Introduction

本课程聚焦Kubeflow on OpenShift主流技术与企业实战需求,紧密结合当前机器学习容器化、云原生ML的发展潮流,系统讲授KubeflowOpenShift平台上的部署、配置、使用与运维全流程知识。课程采用技术原理+实战演练的教学模式,从环境搭建入手,逐步深入模型开发、流水线构建、模型部署与集群管理,穿插大量企业级实操案例,帮助学员快速掌握Kubeflow on OpenShift核心技能,能独立落地云原生机器学习项目,提升工作效率与技术竞争力。

Focusing on the mainstream technologies of Kubeflow on OpenShift and the actual needs of enterprises, this course closely combines the current development trend of containerized machine learning and cloud-native ML, and systematically teaches the full-process knowledge of deployment, configuration, use and operation and maintenance of Kubeflow on the OpenShift platform. Adopting the teaching mode of "technical principles + practical exercises", the course starts with environment construction, and gradually goes deep into model development, pipeline construction, model deployment and cluster management, interspersing a large number of enterprise-level practical cases to help students quickly master the core skills of Kubeflow on OpenShift, independently implement cloud-native machine learning projects, and improve work efficiency and technical competitiveness.

三、培训对象(Training Objects

机器学习(ML)工程师、DevOps工程师、云计算架构师、数据科学家、想要掌握云原生机器学习技术的技术人员、从事容器化与ML相关工作的技术人员。

Machine Learning (ML) Engineers, DevOps Engineers, Cloud Computing Architects, Data Scientists, technical personnel who want to master cloud-native machine learning technologies, and technical personnel engaged in containerization and ML-related work.

四、培训内容(Training Content

专题一:概述与对比(Introduction & Comparison)(基础认知专题)

核心目标:建立KubeflowOpenShift基础认知,明确Kubeflow on OpenShift的核心优势与适用场景,掌握其与公有云托管服务的差异。

Core Objective: Establish basic cognition of Kubeflow and OpenShift, clarify the core advantages and applicable scenarios of Kubeflow on OpenShift, and master its differences from public cloud managed services.

• 1.1 KubeflowOpenShift基础介绍(Introduction to Kubeflow and OpenShift):讲解Kubeflow主流版本核心功能、OpenShift平台特性,明确二者协同的核心价值的云原生ML场景适配性。

• 1.1 Introduction to Kubeflow and OpenShift: Explain the core functions of the mainstream version of Kubeflow and the characteristics of the OpenShift platform, and clarify the core value of their collaboration and adaptability to cloud-native ML scenarios.

• 1.2 Kubeflow on OpenShift vs 公有云托管服务(Kubeflow on OpenShift vs public cloud managed services):对比Kubeflow on OpenShift与主流公有云ML托管服务(AWS SageMakerGCP AI Platform等)的差异、优势与选型逻辑,适配企业部署需求。

• 1.2 Kubeflow on OpenShift vs public cloud managed services: Compare the differences, advantages and selection logic between Kubeflow on OpenShift and mainstream public cloud ML managed services (AWS SageMaker, GCP AI Platform, etc.) to adapt to enterprise deployment needs.

• 1.3 Kubeflow on OpenShift整体概述(Overview of Kubeflow on OpenShift):详解Kubeflow on OpenShift架构组成、核心组件协同逻辑,梳理其在企业ML项目中的应用流程。

• 1.3 Overview of Kubeflow on OpenShift: Explain the architectural composition of Kubeflow on OpenShift and the collaboration logic of core components, and sort out its application process in enterprise ML projects.

专题二:环境准备基础(Environment Setup Overview)(环境基础专题)

核心目标:掌握Kubeflow on OpenShift环境搭建的前置知识,熟悉代码就绪容器与存储方案选型,为后续环境部署奠定基础。

Core Objective: Master the prerequisite knowledge for setting up the Kubeflow on OpenShift environment, be familiar with code-ready containers and storage solution selection, and lay the foundation for subsequent environment deployment.

• 2.1 代码就绪容器(Code Ready Containers):讲解OpenShift Code Ready ContainersCRC)主流版本使用方法,实操本地开发环境快速搭建,适配桌面级实战演练需求。

• 2.1 Code Ready Containers: Explain the usage method of the mainstream version of OpenShift Code Ready Containers (CRC), practice the rapid construction of a local development environment, and adapt to the needs of desktop-level practical exercises.

• 2.2 存储方案选型(Storage options):详解Kubeflow on OpenShift主流存储方案(OpenShift Container StoragePVC/PV、对象存储等),明确各类存储的适用场景与配置要点。

• 2.2 Storage options: Explain the mainstream storage solutions of Kubeflow on OpenShift (OpenShift Container Storage, PVC/PV, object storage, etc.), and clarify the applicable scenarios and configuration points of various storages.

• 2.3 环境搭建整体概述(Overview of Environment Setup):梳理Kubeflow on OpenShift环境搭建全流程,明确前置依赖、部署步骤与常见问题规避方法。

• 2.3 Overview of Environment Setup: Sort out the full process of setting up the Kubeflow on OpenShift environment, and clarify the pre-dependencies, deployment steps and methods to avoid common problems.

专题三:环境部署实操(Environment Deployment)(部署核心专题)

核心目标:精通Kubernetes集群与Kubeflow on OpenShift的部署方法,能独立完成全流程部署与验证,保障环境可用。

Core Objective: Be proficient in the deployment methods of Kubernetes clusters and Kubeflow on OpenShift, and be able to independently complete the full-process deployment and verification to ensure the environment is available.

• 3.1 Kubernetes集群搭建(Setting up a Kubernetes cluster):实操基于OpenShiftKubernetes集群搭建(主流版本),配置集群网络、存储等核心参数,完成集群健康检查。

• 3.1 Setting up a Kubernetes cluster: Practice the construction of a Kubernetes cluster based on OpenShift (mainstream version), configure core parameters such as cluster network and storage, and complete cluster health check.

• 3.2 Kubeflow on OpenShift部署(Setting up Kubeflow on OpenShift):详解Kubeflow主流版本在OpenShift平台上的部署流程,适配OpenShift最新版本兼容需求。

• 3.2 Setting up Kubeflow on OpenShift: Explain the deployment process of the mainstream version of Kubeflow on the OpenShift platform, adapting to the compatibility requirements of the latest version of OpenShift.

• 3.3 Kubeflow安装实操(Installing Kubeflow):实操Kubeflow组件安装、配置与启动,解决部署过程中的常见报错(权限、网络、依赖等),完成环境验证。

• 3.3 Installing Kubeflow: Practice the installation, configuration and startup of Kubeflow components, solve common errors (permissions, network, dependencies, etc.) during the deployment process, and complete environment verification.

专题四:机器学习模型开发(Model Coding)(模型开发专题)

核心目标:掌握基于Kubeflow on OpenShiftML模型开发方法,能独立完成算法选型、模型编码与数据读取,夯实模型开发基础。

Core Objective: Master the ML model development method based on Kubeflow on OpenShift, and be able to independently complete algorithm selection, model coding and data reading, laying a solid foundation for model development.

• 4.1 ML算法选型(Choosing an ML algorithm):讲解主流ML/DL算法(分类、回归、CNNRNN等)的适用场景,结合项目需求完成算法选型,适配Kubeflow运行特性。

• 4.1 Choosing an ML algorithm: Explain the applicable scenarios of mainstream ML/DL algorithms (classification, regression, CNN, RNN, etc.), and complete algorithm selection according to project requirements, adapting to Kubeflow operation characteristics.

• 4.2 TensorFlow CNN模型实现(Implementing a TensorFlow CNN model):实操基于TensorFlow(主流版本)的CNN模型编码、调试,适配Kubeflow容器化运行环境,优化模型训练效率。

• 4.2 Implementing a TensorFlow CNN model: Practice the coding and debugging of CNN models based on TensorFlow (mainstream version), adapt to the Kubeflow containerized operating environment, and optimize model training efficiency.

• 4.3 数据读取实操(Reading the Data):详解Kubeflow环境下的数据读取方法,适配各类数据集格式,保障模型训练数据供给。

• 4.3 Reading the Data: Explain the data reading method in the Kubeflow environment, adapt to various dataset formats, and ensure the supply of model training data.

专题五:数据集访问与管理(Dataset Access)(数据处理专题)

核心目标:掌握Kubeflow on OpenShift环境下数据集的访问、管理方法,能高效获取并处理训练数据,支撑模型开发。

Core Objective: Master the method of accessing and managing datasets in the Kubeflow on OpenShift environment, and be able to efficiently obtain and process training data to support model development.

• 5.1 数据集访问方法(Accessing a dataset):实操Kubeflow环境下访问本地数据集、远程数据集(对象存储、数据库等)的方法,配置数据访问权限,保障数据可获取性。

• 5.1 Accessing a dataset: Practice the method of accessing local datasets and remote datasets (object storage, databases, etc.) in the Kubeflow environment, configure data access permissions, and ensure data accessibility.

• 5.2 数据集预处理(Dataset Preprocessing):讲解Kubeflow中数据集清洗、转换、归一化等预处理操作,实操数据预处理流程,提升训练数据质量(补充主流实操知识点)。

• 5.2 Dataset Preprocessing: Explain the preprocessing operations such as dataset cleaning, transformation, and normalization in Kubeflow, practice the data preprocessing process, and improve the quality of training data (supplement mainstream practical knowledge points).

专题六:Kubeflow流水线搭建(Kubeflow Pipelines on OpenShift)(流水线核心专题)

核心目标:精通Kubeflow Pipelines主流用法,能独立搭建、自定义端到端ML流水线,实现模型开发全流程自动化。

Core Objective: Be proficient in the mainstream usage of Kubeflow Pipelines, and be able to independently build and customize end-to-end ML pipelines to realize full-process automation of model development.

• 6.1 端到端流水线搭建(Setting up an end-to-end Kubeflow pipeline):实操Kubeflow Pipelines(主流版本)部署与配置,搭建涵盖数据预处理、模型训练、模型评估的端到端流水线。

• 6.1 Setting up an end-to-end Kubeflow pipeline: Practice the deployment and configuration of Kubeflow Pipelines (mainstream version), and build an end-to-end pipeline covering data preprocessing, model training, and model evaluation.

• 6.2 流水线自定义配置(Customizing Kubeflow Pipelines):讲解流水线组件开发、流程编排方法,实操流水线自定义配置,适配不同ML项目需求,优化流水线运行效率。

• 6.2 Customizing Kubeflow Pipelines: Explain the methods of pipeline component development and process orchestration, practice pipeline custom configuration, adapt to different ML project needs, and optimize pipeline operation efficiency.

专题七:ML训练任务运行(Running an ML Training Job)(模型训练专题)

核心目标:掌握Kubeflow on OpenShift环境下ML训练任务的配置、运行与监控方法,能独立完成模型训练与优化。

Core Objective: Master the configuration, operation and monitoring methods of ML training jobs in the Kubeflow on OpenShift environment, and be able to independently complete model training and optimization.

• 7.1 模型训练实操(Training a model):配置ML训练任务参数(资源分配、迭代次数等),实操训练任务提交、运行,监控训练过程,解决训练过程中的常见问题(资源不足、模型过拟合等)。

• 7.1 Training a model: Configure ML training task parameters (resource allocation, number of iterations, etc.), practice submitting and running training tasks, monitor the training process, and solve common problems during training (insufficient resources, model overfitting, etc.).

• 7.2 训练任务优化(Training Job Optimization):讲解Kubeflow训练任务优化技巧(分布式训练、资源动态调整等),实操优化配置,提升训练效率(补充主流优化知识点)。

• 7.2 Training Job Optimization: Explain the optimization skills of Kubeflow training jobs (distributed training, dynamic resource adjustment, etc.), practice optimization configuration, and improve training efficiency (supplement mainstream optimization knowledge points).

专题八:模型部署(Deploying the Model)(部署实战专题)

核心目标:掌握Kubeflow on OpenShift环境下训练模型的部署方法,能将训练完成的模型部署到OpenShift平台,实现模型推理。

Core Objective: Master the deployment method of trained models in the Kubeflow on OpenShift environment, and be able to deploy trained models to the OpenShift platform to realize model inference.

• 8.1 模型部署到OpenShiftRunning a trained model on OpenShift):实操训练模型打包、部署,配置模型推理服务,完成部署验证,确保模型可正常提供推理能力。

• 8.1 Running a trained model on OpenShift: Practice packaging and deploying trained models, configure model inference services, complete deployment verification, and ensure that the model can normally provide inference capabilities.

• 8.2 模型部署优化(Model Deployment Optimization):讲解模型部署优化技巧(容器化优化、推理加速、负载均衡等),适配高并发推理场景(补充主流部署知识点)。

• 8.2 Model Deployment Optimization: Explain model deployment optimization skills (containerization optimization, inference acceleration, load balancing, etc.) to adapt to high-concurrency inference scenarios (supplement mainstream deployment knowledge points).

专题九:模型与Web应用集成(Integrating the Model into a Web Application)(集成实战专题)

核心目标:掌握训练模型与Web应用的集成方法,能独立创建示例应用,实现预测请求的发送与响应,完成ML项目落地。

Core Objective: Master the integration method of trained models and Web applications, and be able to independently create sample applications to realize the sending and response of prediction requests and complete ML project implementation.

• 9.1 示例应用创建(Creating a sample application):实操创建简易Web应用(适配主流开发语言),配置应用与模型推理服务的连接,实现应用对模型的调用。

• 9.1 Creating a sample application: Practice creating a simple Web application (adapting to mainstream development languages), configure the connection between the application and the model inference service, and realize the application's call to the model.

• 9.2 预测请求发送与处理(Sending prediction requests):实操通过Web应用发送预测请求,接收并展示模型推理结果,调试集成过程中的常见问题,确保集成流畅。

• 9.2 Sending prediction requests: Practice sending prediction requests through the Web application, receiving and displaying model inference results, debugging common problems in the integration process, and ensuring smooth integration.

专题十:Kubeflow管理与监控(Administering Kubeflow)(运维核心专题)

核心目标:掌握Kubeflow on OpenShift的日常管理与监控方法,能实现训练过程监控、日志管理,保障集群稳定运行。

Core Objective: Master the daily management and monitoring methods of Kubeflow on OpenShift, and be able to realize training process monitoring and log management to ensure stable cluster operation.

• 10.1 TensorBoard监控(Monitoring with Tensorboard):实操TensorBoard(主流版本)与Kubeflow集成,监控模型训练过程(损失值、准确率等),辅助模型优化。

• 10.1 Monitoring with Tensorboard: Practice the integration of TensorBoard (mainstream version) with Kubeflow, monitor the model training process (loss value, accuracy rate, etc.), and assist in model optimization.

• 10.2 日志管理(Managing logs):讲解KubeflowOpenShift日志收集、分析方法,实操日志查询、筛选与异常定位,为故障排查提供支撑。

• 10.2 Managing logs: Explain the methods of log collection and analysis for Kubeflow and OpenShift, practice log query, filtering and exception location, and provide support for troubleshooting.

专题十一:Kubeflow集群安全(Securing a Kubeflow Cluster)(安全核心专题)

核心目标:掌握Kubeflow on OpenShift集群安全配置方法,实现身份认证与权限管控,保障集群与数据安全。

Core Objective: Master the cluster security configuration method of Kubeflow on OpenShift, realize identity authentication and permission control, and ensure cluster and data security.

• 11.1 认证与授权配置(Setting up authentication and authorization):实操Kubeflow集群身份认证(集成OpenShift认证)、权限精细化配置,管控用户与组件访问权限。

• 11.1 Setting up authentication and authorization: Practice Kubeflow cluster identity authentication (integrating OpenShift authentication) and refined permission configuration to control user and component access permissions.

• 11.2 集群安全加固(Cluster Security Hardening):讲解Kubeflow on OpenShift集群安全加固技巧(容器安全、网络策略、敏感信息加密等),提升集群安全性(补充主流安全知识点)。

• 11.2 Cluster Security Hardening: Explain the cluster security hardening skills of Kubeflow on OpenShift (container security, network policies, sensitive information encryption, etc.) to improve cluster security (supplement mainstream security knowledge points).

专题十二:故障排查与总结




如果您想学习本课程,请预约报名
如果没找到合适的课程或有特殊培训需求,请订制培训
除培训外,同时提供相关技术咨询与技术支持服务,有需求请发需求表到邮箱soft@info-soft.cn,或致电4007991916
技术服务需求表点击在线申请

服务特点:
海量专家资源,精准匹配相关行业,相关项目专家,针对实际需求,顾问式咨询,互动式授课,案例教学,小班授课,实际项目演示,快捷高效,省时省力省钱。

专家力量:
中国科学院软件研究所,计算研究所高级研究人员
oracle,微软,vmware,MSC,Ansys,candence,Altium,达索等大型公司高级工程师,项目经理,技术支持专家
中科信软培训中心,资深专家或讲师
大多名牌大学,硕士以上学历,相关技术专业,理论素养丰富
多年实际项目经历,大型项目实战案例,热情,乐于技术分享
针对客户实际需求,案例教学,互动式沟通,学有所获
报名表下载
联系我们 更多>>

咨询电话010-62883247

                4007991916

咨询邮箱:soft@info-soft.cn  

 

  微信咨询

随时听讲课

聚焦技术实践

订制培训 更多>>