Browsing by Author "Shirazi, Hossein, author"

Now showing 1 - 5 of 5

Open Access
Claim extraction and dynamic stance detection in COVID-19 tweets
(Colorado State University. Libraries, 2023-04-30) Faramarzi, Noushin Salek, author; Chaleshtori, Fateme Hashemi, author; Shirazi, Hossein, author; Ray, Indrakshi, author; Banerjee, Ritwik, author; ACM, publisher
The information ecosystem today is noisy, and rife with messages that contain a mix of objective claims and subjective remarks or reactions. Any automated system that intends to capture the social, cultural, or political zeitgeist, must be able to analyze the claims as well as the remarks. Due to the deluge of such messages on social media, and their tremendous power to shape our perceptions, there has never been a greater need to automate these analyses, which play a pivotal role in fact-checking, opinion mining, understanding opinion trends, and other such downstream tasks of social consequence. In this noisy ecosystem, not all claims are worth checking for veracity. Such a check-worthy claim, moreover, must be accurately distilled from subjective remarks surrounding it. Finally, and especially for understanding opinion trends, it is important to understand the stance of the remarks or reactions towards that specific claim. To this end, we introduce a COVID-19 Twitter dataset, and present a three-stage process to (i) determine whether a given Tweet is indeed check-worthy, and if so, (ii) which portion of the Tweet ought to be checked for veracity, and finally, (iii) determine the author's stance towards the claim in that Tweet, thus introducing the novel task of topic-agnostic stance detection.
Open Access
Phishing detection using machine learning
(Colorado State University. Libraries, 2021) Shirazi, Hossein, author; Ray, Indrakshi, advisor; Anderson, Chuck, advisor; Malaiya, Yashwant K., committee member; Wang, Haonan, committee member
Our society, economy, education, critical infrastructure, and other aspects of our life have become largely dependent on cyber technology. Thus, cyber threats now endanger various aspects of our daily life. Phishing attacks, even with sophisticated detection algorithms, are still the top Internet crime by victim count in 2020. Adversaries learn from their previous attempts to (i) improve attacks and lure more victims and (ii) bypass existing detection algorithms to steal user's identities and sensitive information to increase their financial gain. Machine learning appears to be a promising approach for phishing detection and, classification algorithms distinguish between legitimate and phishing websites. While machine learning algorithms have shown promising results, we observe multiple limitations in existing algorithms. Current algorithms do not preserve the privacy of end-users due to inquiring third-party services. There is a lack of enough phishing samples for training machine learning algorithms and, over-represented targets have a bias in existing datasets. Finally, adversarial sampling attacks degrade the performance of detection models. We propose four sets of solutions to address the aforementioned challenges. We first propose a domain-name-based phishing detection solution that focuses solely on the domain name of websites to distinguish phishing websites from legitimate ones. This approach does not use any third-party services and preserves the privacy of end-users. We then propose a fingerprinting algorithm that consists of finding similarities (using both visual and textual characteristics) between a legitimate targeted website and a given suspicious website. This approach addresses the issue of bias towards over-represented samples in the datasets. Finally, we explore the effect of adversarial sampling attacks on phishing detection algorithms in-depth, starting with feature manipulation strategies. Results degrade the performance of the classification algorithm significantly. In the next step, we focus on two goals of improving the performance of classification algorithms by increasing the size of used datasets and making the detection algorithm robust against adversarial sampling attacks using an adversarial autoencoder.
Open Access
Sparse binary transformers for multivariate time series modeling
(Colorado State University. Libraries, 2023-08-04) Gorbett, Matt, author; Shirazi, Hossein, author; Ray, Indrakshi, author; ACM, publisher
Compressed Neural Networks have the potential to enable deep learning across new applications and smaller computational environments. However, understanding the range of learning tasks in which such models can succeed is not well studied. In this work, we apply sparse and binary-weighted Transformers to multivariate time series problems, showing that the lightweight models achieve accuracy comparable to that of dense floating-point Transformers of the same structure. Our model achieves favorable results across three time series learning tasks: classification, anomaly detection, and single-step forecasting. Additionally, to reduce the computational complexity of the attention mechanism, we apply two modifications, which show little to no decline in model performance: 1) in the classification task, we apply a fixed mask to the query, key, and value activations, and 2) for forecasting and anomaly detection, which rely on predicting outputs at a single point in time, we propose an attention mask to allow computation only at the current time step. Together, each compression technique and attention modification substantially reduces the number of non-zero operations necessary in the Transformer. We measure the computational savings of our approach over a range of metrics including parameter count, bit size, and floating point operation (FLOPs) count, showing up to a 53x reduction in storage size and up to 10.5x reduction in FLOPs.
Open Access
Tiled bit networks: sub-bit neural network compression through reuse of learnable binary vectors
(Colorado State University. Libraries, 2024-10-21) Gorbett, Matt, author; Shirazi, Hossein, author; Ray, Indrakshi, author; ACM, publisher
Binary Neural Networks (BNNs) enable efficient deep learning by saving on storage and computational costs. However, as the size of neural networks continues to grow, meeting computational requirements remains a challenge. In this work, we propose a new form of quantization to tile neural network layers with sequences of bits to achieve sub-bit compression of binary-weighted neural networks. The method learns binary vectors (i.e. tiles) to populate each layer of a model via aggregation and reshaping operations. During inference, the method reuses a single tile per layer to represent the full tensor. We employ the approach to both fully-connected and convolutional layers, which make up the breadth of space in most neural architectures. Empirically, the approach achieves near full-precision performance on a diverse range of architectures (CNNs, Transformers, MLPs) and tasks (classification, segmentation, and time series forecasting) with up to an 8x reduction in size compared to binary-weighted models. We provide two implementations for Tiled Bit Networks: 1) we deploy the model to a microcontroller to assess its feasibility in resource-constrained environments, and 2) a GPU-compatible inference kernel to facilitate the reuse of a single tile per layer in memory.
Open Access
Unbiased phishing detection using domain name based features
(Colorado State University. Libraries, 2018) Shirazi, Hossein, author; Ray, Indrakshi, advisor; Malaiya, Yashwant K., committee member; Vijayasarathy, Leo R., committee member
Internet users are coming under a barrage of phishing attacks of increasing frequency and sophistication. While these attacks have been remarkably resilient against the vast range of defenses proposed by academia, industry, and research organizations, machine learning approaches appear to be a promising one in distinguishing between phishing and legitimate websites. There are three main concerns with existing machine learning approaches for phishing detection. The first concern is there is neither a framework, preferably open-source, for extracting feature and keeping the dataset updated nor an updated dataset of phishing and legitimate website. The second concern is the large number of features used and the lack of validating arguments for the choice of the features selected to train the machine learning classifier. The last concern relates to the type of datasets used in the literature that seems to be inadvertently biased with respect to the features based on URL or content. In this thesis, we describe the implementation of our open-source and extensible framework to extract features and create up-to-date phishing dataset. With having this framework, named Fresh-Phish, we implemented 29 different features that we used to detect whether a given website is legitimate or phishing. We used 26 features that were reported in related work and added 3 new features and created a dataset of 6,000 websites with these features of which 3,000 were malicious and 3,000 were genuine and tested our approach. Using 6 different classifiers we achieved the accuracy of 93% which is a reasonable high in this field. To address the second and third concerns, we put forward the intuition that the domain name of phishing websites is the tell-tale sign of phishing and holds the key to successful phishing detection. We focus on this aspect of phishing websites and design features that explore the relationship of the domain name to the key elements of the website. Our work differs from existing state-of-the-art as our feature set ensures that there is minimal or no bias with respect to a dataset. Our learning model trains with only seven features and achieves a true positive rate of 98% and a classification accuracy of 97%, on sample dataset. Compared to the state-of-the-art work, our per data instance processing and classification is 4 times faster for legitimate websites and 10 times faster for phishing websites. Importantly, we demonstrate the shortcomings of using features based on URLs as they are likely to be biased towards dataset collection and usage. We show the robustness of our learning algorithm by testing our classifiers on unknown live phishing URLs and achieve a higher detection accuracy of 99.7% compared to the earlier known best result of 95% detection rate.