no module named 'torch optim

PyTorch1.1 1.2 PyTorch2.1 Numpy2.2 Variable2.3 Torch3.1 (1) (2) (3) 3.2 (1) (2) (3) 3.3 3.4 (1) (2) model.train()model.eval()Batch Normalization DropoutPyTorchmodeltrain/evaleval()BND PyTorchtorch.optim.lr_schedulerPyTorch, Autograd mechanics Learn how our community solves real, everyday machine learning problems with PyTorch. the custom operator mechanism. Additional data types and quantization schemes can be implemented through What Do I Do If the Error Message "Op type SigmoidCrossEntropyWithLogitsV2 of ops kernel AIcoreEngine is unsupported" Is Displayed? Tensors. This is the quantized version of hardswish(). [5/7] /usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=fused_optim -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -I/workspace/nas-data/miniconda3/envs/gpt/lib/python3.10/site-packages/colossalai/kernel/cuda_native/csrc/kernels/include -I/usr/local/cuda/include -isystem /workspace/nas-data/miniconda3/envs/gpt/lib/python3.10/site-packages/torch/include -isystem /workspace/nas-data/miniconda3/envs/gpt/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /workspace/nas-data/miniconda3/envs/gpt/lib/python3.10/site-packages/torch/include/TH -isystem /workspace/nas-data/miniconda3/envs/gpt/lib/python3.10/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /workspace/nas-data/miniconda3/envs/gpt/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS_ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -O3 --use_fast_math -lineinfo -gencode arch=compute_60,code=sm_60 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_86,code=sm_86 -std=c++14 -c /workspace/nas-data/miniconda3/envs/gpt/lib/python3.10/site-packages/colossalai/kernel/cuda_native/csrc/multi_tensor_lamb.cu -o multi_tensor_lamb.cuda.o In the preceding figure, the error path is /code/pytorch/torch/init.py. Quantize stub module, before calibration, this is same as an observer, it will be swapped as nnq.Quantize in convert. Inplace / Out-of-place; Zero Indexing; No camel casing; Numpy Bridge. As the current maintainers of this site, Facebooks Cookies Policy applies. File "/workspace/nas-data/miniconda3/envs/gpt/lib/python3.10/importlib/init.py", line 126, in import_module rank : 0 (local_rank: 0) how solve this problem?? relu() supports quantized inputs. File "", line 1050, in _gcd_import What is the correct way to screw wall and ceiling drywalls? to your account, /workspace/nas-data/miniconda3/envs/gpt/lib/python3.10/site-packages/torch/library.py:130: UserWarning: Overriding a previously registered kernel for the same operator and the same dispatch key ~`torch.nn.Conv2d` and torch.nn.ReLU. The torch package installed in the system directory instead of the torch package in the current directory is called. Allow Necessary Cookies & Continue I find my pip-package doesnt have this line. A linear module attached with FakeQuantize modules for weight, used for quantization aware training. /usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=fused_optim -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -I/workspace/nas-data/miniconda3/envs/gpt/lib/python3.10/site-packages/colossalai/kernel/cuda_native/csrc/kernels/include -I/usr/local/cuda/include -isystem /workspace/nas-data/miniconda3/envs/gpt/lib/python3.10/site-packages/torch/include -isystem /workspace/nas-data/miniconda3/envs/gpt/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /workspace/nas-data/miniconda3/envs/gpt/lib/python3.10/site-packages/torch/include/TH -isystem /workspace/nas-data/miniconda3/envs/gpt/lib/python3.10/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /workspace/nas-data/miniconda3/envs/gpt/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS_ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -O3 --use_fast_math -lineinfo -gencode arch=compute_60,code=sm_60 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_86,code=sm_86 -std=c++14 -c /workspace/nas-data/miniconda3/envs/gpt/lib/python3.10/site-packages/colossalai/kernel/cuda_native/csrc/multi_tensor_lamb.cu -o multi_tensor_lamb.cuda.o nadam = torch.optim.NAdam(model.parameters()) This gives the same error. Is this is the problem with respect to virtual environment? vegan) just to try it, does this inconvenience the caterers and staff? Furthermore, the input data is This is a sequential container which calls the Conv1d and ReLU modules. here. I have not installed the CUDA toolkit. An enum that represents different ways of how an operator/operator pattern should be observed, This module contains a few CustomConfig classes thats used in both eager mode and FX graph mode quantization. Some of our partners may process your data as a part of their legitimate business interest without asking for consent. For web site terms of use, trademark policy and other policies applicable to The PyTorch Foundation please see PyTorch is not a simple replacement for NumPy, but it does a lot of NumPy functionality. This package is in the process of being deprecated. Autograd: autogradPyTorch, tensor. This module implements the versions of those fused operations needed for Propagate qconfig through the module hierarchy and assign qconfig attribute on each leaf module, Default evaluation function takes a torch.utils.data.Dataset or a list of input Tensors and run the model on the dataset. LSTMCell, GRUCell, and Ive double checked to ensure that the conda Currently the latest version is 0.12 which you use. If this is not a problem execute this program on both Jupiter and command line a I checked my pytorch 1.1.0, it doesn't have AdamW. File "/workspace/nas-data/miniconda3/envs/gpt/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1900, in _run_ninja_build module = self._system_import(name, *args, **kwargs) File "C:\Users\Michael\PycharmProjects\Pytorch_2\venv\lib\site-packages\torch__init__.py", module = self._system_import(name, *args, **kwargs) ModuleNotFoundError: No module named 'torch._C'. Connect and share knowledge within a single location that is structured and easy to search. WebToggle Light / Dark / Auto color theme. Sign up for a free GitHub account to open an issue and contact its maintainers and the community. Is there a single-word adjective for "having exceptionally strong moral principles"? Make sure that NumPy and Scipy libraries are installed before installing the torch library that worked for me at least on windows. Install NumPy: Toggle table of contents sidebar. PyTorch, Tensorflow. Converting torch Tensor to numpy Array; Converting numpy Array to torch Tensor; CUDA Tensors; Autograd. nvcc fatal : Unsupported gpu architecture 'compute_86' csv 235 Questions Thank you in advance. Currently only used by FX Graph Mode Quantization, but we may extend Eager Mode Web Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX. A BNReLU2d module is a fused module of BatchNorm2d and ReLU, A BNReLU3d module is a fused module of BatchNorm3d and ReLU, A ConvReLU1d module is a fused module of Conv1d and ReLU, A ConvReLU2d module is a fused module of Conv2d and ReLU, A ConvReLU3d module is a fused module of Conv3d and ReLU, A LinearReLU module fused from Linear and ReLU modules. This describes the quantization related functions of the torch namespace. Already on GitHub? Caffe Layers backward forward Computational Graph , tensorflowpythontensorflow tensorflowtensorflow tensorflowpytorchpytorchtensorflow, tensorflowpythontensorflow tensorflowtensorflow tensorboardtrick1, import torchfrom torch import nnimport torch.nn.functional as Fclass dfcnn(n, opt=torch.optim.Adam(net.parameters(), lr=0.0008, betas=(0.9, 0.radients for next, https://zhuanlan.zhihu.com/p/67415439 https://www.jianshu.com/p/812fce7de08d. quantization and will be dynamically quantized during inference. A dynamic quantized linear module with floating point tensor as inputs and outputs. Return the default QConfigMapping for quantization aware training. Next Upsamples the input to either the given size or the given scale_factor. Allowing ninja to set a default number of workers (overridable by setting the environment variable MAX_JOBS=N) win10Pytorch 201941625Anaconda20195PytorchCondaHTTPError: HTTP 404 NOT FOUND for url >>import torch as tModule. Default observer for a floating point zero-point. When the import torch command is executed, the torch folder is searched in the current directory by default. I have installed Pycharm. By clicking Sign up for GitHub, you agree to our terms of service and Supported types: This package is in the process of being deprecated. If you preorder a special airline meal (e.g. Asking for help, clarification, or responding to other answers. The text was updated successfully, but these errors were encountered: You signed in with another tab or window. Observer module for computing the quantization parameters based on the moving average of the min and max values. Do I need a thermal expansion tank if I already have a pressure tank? Usually if the torch/tensorflow has been successfully installed, you still cannot import those libraries, the reason is that the python environment return importlib.import_module(self.prebuilt_import_path) Observer module for computing the quantization parameters based on the running min and max values. raise CalledProcessError(retcode, process.args, This is the quantized equivalent of Sigmoid. If you are adding a new entry/functionality, please, add it to the appropriate files under torch/ao/quantization/fx/, while adding an import statement here. Is Displayed During Model Commissioning. Default observer for static quantization, usually used for debugging. [1/7] /usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=fused_optim -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -I/workspace/nas-data/miniconda3/envs/gpt/lib/python3.10/site-packages/colossalai/kernel/cuda_native/csrc/kernels/include -I/usr/local/cuda/include -isystem /workspace/nas-data/miniconda3/envs/gpt/lib/python3.10/site-packages/torch/include -isystem /workspace/nas-data/miniconda3/envs/gpt/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /workspace/nas-data/miniconda3/envs/gpt/lib/python3.10/site-packages/torch/include/TH -isystem /workspace/nas-data/miniconda3/envs/gpt/lib/python3.10/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /workspace/nas-data/miniconda3/envs/gpt/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS_ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -O3 --use_fast_math -lineinfo -gencode arch=compute_60,code=sm_60 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_86,code=sm_86 -std=c++14 -c /workspace/nas-data/miniconda3/envs/gpt/lib/python3.10/site-packages/colossalai/kernel/cuda_native/csrc/multi_tensor_sgd_kernel.cu -o multi_tensor_sgd_kernel.cuda.o A limit involving the quotient of two sums. Enterprise products, solutions & services, Products, Solutions and Services for Carrier, Phones, laptops, tablets, wearables & other devices, Network Management, Control, and Analysis Software, Data Center Storage Consolidation Tool Suite, Huawei CloudLink Video Conferencing Platform, One-stop Platform for Marketing Development. Enable fake quantization for this module, if applicable. subprocess.run( A ConvBn3d module is a module fused from Conv3d and BatchNorm3d, attached with FakeQuantize modules for weight, used in quantization aware training. Default qconfig for quantizing activations only. FrameworkPTAdapter 2.0.1 PyTorch Network Model Porting and Training Guide 01. ModuleNotFoundError: No module named 'colossalai._C.fused_optim'. Prepares a copy of the model for quantization calibration or quantization-aware training and converts it to quantized version. A LinearReLU module fused from Linear and ReLU modules that can be used for dynamic quantization. Follow Up: struct sockaddr storage initialization by network format-string. nvcc fatal : Unsupported gpu architecture 'compute_86' How to react to a students panic attack in an oral exam? [0]: To analyze traffic and optimize your experience, we serve cookies on this site. torch torch.no_grad () HuggingFace Transformers as described in MinMaxObserver, specifically: where [xmin,xmax][x_\text{min}, x_\text{max}][xmin,xmax] denotes the range of the input data while for inference. Enable observation for this module, if applicable. Disable observation for this module, if applicable. Upsamples the input, using nearest neighbours' pixel values. django 944 Questions Fused version of default_per_channel_weight_fake_quant, with improved performance. We and our partners use cookies to Store and/or access information on a device. What video game is Charlie playing in Poker Face S01E07? Your browser version is too early. The output of this module is given by::. like conv + relu. support per channel quantization for weights of the conv and linear Sign up for a free GitHub account to open an issue and contact its maintainers and the community. function 162 Questions To subscribe to this RSS feed, copy and paste this URL into your RSS reader. Applies a 1D convolution over a quantized input signal composed of several quantized input planes. FAILED: multi_tensor_l2norm_kernel.cuda.o AdamWBERToptim=adamw_torchTrainingArgumentsadamw_hf, optim ="adamw_torch"TrainingArguments"adamw_hf"Huggingface TrainerTrainingArguments, https://stackoverflow.com/questions/75535679/implementation-of-adamw-is-deprecated-and-will-be-removed-in-a-future-version-u, .net System.Runtime.InteropServices.=4.0.1.0, .NET WebApiAzure Application Insights, .net (NamedPipeClientStream)MessageModeC# UnauthorizedAccessException. Is Displayed After Multi-Task Delivery Is Disabled (export TASK_QUEUE_ENABLE=0) During Model Running? they result in one red line on the pip installation and the no-module-found error message in python interactive. Returns a new view of the self tensor with singleton dimensions expanded to a larger size. python 16390 Questions Huawei uses machine translation combined with human proofreading to translate this document to different languages in order to help you better understand the content of this document. Note: Even the most advanced machine translation cannot match the quality of professional translators. A quantized EmbeddingBag module with quantized packed weights as inputs. This is a sequential container which calls the Conv 2d and Batch Norm 2d modules. Note that operator implementations currently only The torch.nn.quantized namespace is in the process of being deprecated. This file is in the process of migration to torch/ao/quantization, and This is a sequential container which calls the BatchNorm 3d and ReLU modules. dispatch key: Meta This module contains FX graph mode quantization APIs (prototype). Note: This will install both torch and torchvision.. Now go to Python shell and import using the command: This is the quantized version of InstanceNorm3d. This site uses cookies. Indeed, I too downloaded Python 3.6 after some awkward mess-ups in retrospect what could have happened is that I download pytorch on an old version of Python and then reinstalled a newer version. The text was updated successfully, but these errors were encountered: Hey, Applies a 3D transposed convolution operator over an input image composed of several input planes. A ConvBnReLU1d module is a module fused from Conv1d, BatchNorm1d and ReLU, attached with FakeQuantize modules for weight, used in quantization aware training. Continue with Recommended Cookies, MicroPython How to Blink an LED and More. What Do I Do If the Error Message "Error in atexit._run_exitfuncs:" Is Displayed During Model or Operator Running? Fused module that is used to observe the input tensor (compute min/max), compute scale/zero_point and fake_quantize the tensor. beautifulsoup 275 Questions By restarting the console and re-ente error_file: python-2.7 154 Questions machine-learning 200 Questions What Do I Do If the Python Process Is Residual When the npu-smi info Command Is Used to View Video Memory? Is Displayed During Distributed Model Training. I'll have to attempt this when I get home :), How Intuit democratizes AI development across teams through reusability. Down/up samples the input to either the given size or the given scale_factor. numpy 870 Questions WebI followed the instructions on downloading and setting up tensorflow on windows. but when I follow the official verification I ge But the input and output tensors are not named usually, hence you need to provide Given a Tensor quantized by linear (affine) per-channel quantization, returns a Tensor of scales of the underlying quantizer. An Elman RNN cell with tanh or ReLU non-linearity. Returns an fp32 Tensor by dequantizing a quantized Tensor. Config for specifying additional constraints for a given dtype, such as quantization value ranges, scale value ranges, and fixed quantization params, to be used in DTypeConfig. These modules can be used in conjunction with the custom module mechanism, Default qconfig for quantizing weights only. Dynamic qconfig with both activations and weights quantized to torch.float16. www.linuxfoundation.org/policies/. django-models 154 Questions By clicking Accept all cookies, you agree Stack Exchange can store cookies on your device and disclose information in accordance with our Cookie Policy. This module implements the quantized dynamic implementations of fused operations We will specify this in the requirements. This is a sequential container which calls the Linear and ReLU modules. Activate the environment using: c Swaps the module if it has a quantized counterpart and it has an observer attached. in the Python console proved unfruitful - always giving me the same error. Python Print at a given position from the left of the screen. However, when I do that and then run "import torch" I received the following error: File "C:\Program Files\JetBrains\PyCharm Community Edition 2018.1.2\helpers\pydev_pydev_bundle\pydev_import_hook.py", line 19, in do_import. [4/7] /usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=fused_optim -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -I/workspace/nas-data/miniconda3/envs/gpt/lib/python3.10/site-packages/colossalai/kernel/cuda_native/csrc/kernels/include -I/usr/local/cuda/include -isystem /workspace/nas-data/miniconda3/envs/gpt/lib/python3.10/site-packages/torch/include -isystem /workspace/nas-data/miniconda3/envs/gpt/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /workspace/nas-data/miniconda3/envs/gpt/lib/python3.10/site-packages/torch/include/TH -isystem /workspace/nas-data/miniconda3/envs/gpt/lib/python3.10/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /workspace/nas-data/miniconda3/envs/gpt/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS_ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -O3 --use_fast_math -lineinfo -gencode arch=compute_60,code=sm_60 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_86,code=sm_86 -std=c++14 -c /workspace/nas-data/miniconda3/envs/gpt/lib/python3.10/site-packages/colossalai/kernel/cuda_native/csrc/multi_tensor_adam.cu -o multi_tensor_adam.cuda.o No BatchNorm variants as its usually folded into convolution You are using a very old PyTorch version. nvcc fatal : Unsupported gpu architecture 'compute_86' I have installed Microsoft Visual Studio. Copyright 2005-2023 51CTO.COM ICP060544, ""ronghuaiyangPyTorchPyTorch. I get the following error saying that torch doesn't have AdamW optimizer. In Anaconda, I used the commands mentioned on Pytorch.org (06/05/18). can i just add this line to my init.py ? Solution Switch to another directory to run the script. new kernel: registered at /dev/null:241 (Triggered internally at ../aten/src/ATen/core/dispatch/OperatorEntry.cpp:150.) Example usage::. /usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=fused_optim -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -I/workspace/nas-data/miniconda3/envs/gpt/lib/python3.10/site-packages/colossalai/kernel/cuda_native/csrc/kernels/include -I/usr/local/cuda/include -isystem /workspace/nas-data/miniconda3/envs/gpt/lib/python3.10/site-packages/torch/include -isystem /workspace/nas-data/miniconda3/envs/gpt/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /workspace/nas-data/miniconda3/envs/gpt/lib/python3.10/site-packages/torch/include/TH -isystem /workspace/nas-data/miniconda3/envs/gpt/lib/python3.10/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /workspace/nas-data/miniconda3/envs/gpt/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS_ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -O3 --use_fast_math -lineinfo -gencode arch=compute_60,code=sm_60 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_86,code=sm_86 -std=c++14 -c /workspace/nas-data/miniconda3/envs/gpt/lib/python3.10/site-packages/colossalai/kernel/cuda_native/csrc/multi_tensor_adam.cu -o multi_tensor_adam.cuda.o A linear module attached with FakeQuantize modules for weight, used for dynamic quantization aware training. /usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=fused_optim -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -I/workspace/nas-data/miniconda3/envs/gpt/lib/python3.10/site-packages/colossalai/kernel/cuda_native/csrc/kernels/include -I/usr/local/cuda/include -isystem /workspace/nas-data/miniconda3/envs/gpt/lib/python3.10/site-packages/torch/include -isystem /workspace/nas-data/miniconda3/envs/gpt/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /workspace/nas-data/miniconda3/envs/gpt/lib/python3.10/site-packages/torch/include/TH -isystem /workspace/nas-data/miniconda3/envs/gpt/lib/python3.10/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /workspace/nas-data/miniconda3/envs/gpt/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS_ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -O3 --use_fast_math -lineinfo -gencode arch=compute_60,code=sm_60 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_86,code=sm_86 -std=c++14 -c /workspace/nas-data/miniconda3/envs/gpt/lib/python3.10/site-packages/colossalai/kernel/cuda_native/csrc/multi_tensor_sgd_kernel.cu -o multi_tensor_sgd_kernel.cuda.o