Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement more thorough check for .tsv file #73

Open
JohannesWiesner opened this issue Oct 18, 2023 · 3 comments
Open

Implement more thorough check for .tsv file #73

JohannesWiesner opened this issue Oct 18, 2023 · 3 comments
Labels
enhancement New feature or request

Comments

@JohannesWiesner
Copy link
Owner

How do I use Python to check if a file is a .tsv file? I don't think it's enough to simply check for the .tsv ending because you could for example also save a comma separated file as a .tsv file

def is_tsv_file(file_path):
    try:
        with open(file_path, 'r', encoding='utf-8') as file:
            for line in file:
                # Check if the line contains tabs (tab-separated values)
                if '\t' in line:
                    return True
            return False
    except FileNotFoundError:
        print(f"File '{file_path}' not found.")
        return False
    except Exception as e:
        print(f"An error occurred: {e}")
        return False

# Example usage:
file_path = 'example.tsv'
if is_tsv_file(file_path):
    print(f"{file_path} is a TSV file.")
else:
    print(f"{file_path} is not a TSV file.")

The is_tsv_file function takes a file_path as input and attempts to open the file using the UTF-8 encoding.
It then iterates through the file line by line, checking if each line contains tab characters ('\t').
If it finds at least one line with tabs, it considers the file as a TSV file and returns True. Otherwise, it returns False.
If the file doesn't exist or if there's an error during the file access, it returns False and prints an appropriate error message.
By examining the content for tab characters, this approach is more reliable in identifying TSV files than simply relying on the file extension.

@JohannesWiesner
Copy link
Owner Author

Each line in a .tsv file ends with a newline character, values within a record are separated by tab-characters

https://en.wikipedia.org/wiki/Tab-separated_values

@JohannesWiesner
Copy link
Owner Author

@JohannesWiesner JohannesWiesner added the enhancement New feature or request label Feb 6, 2024
@JohannesWiesner
Copy link
Owner Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant